Jump to content

FreeBSD jail with NFSv4 share causes system to hang


jinie

Recommended Posts

jinie

I've been battling this all weekend, and i'm nowhere nearer to a solution.

I'm emby in a jail on FreeBSD 12.0 (jail is 12.0 as well), and emby is installed from packages (pkg install emby-server)

 

It seems Emby, or maybe ffmpeg is doing "something" that FreeBSD's NFS implementation doesn't like.

I have a NFSv4 share on a Debian host, mounted through iocage on jail startup. I've been experimenting with different options, but these are the ones i use currently:

 

    rw,hard,nfsv4,minorversion=1,allgssname,gssname=host,sec=krb5p

 

The share mounts fine, and i can browse it and copy/move/list files all day long - until Emby starts its indexing process. Then the system flat out hangs, and all network related operations halt.

Nothing works until i run a "umount -fN /mnt/path", after which everything resumes back to the way it was.

I have other machines (Debian mostly) that work with the shares without problems, and on the same FreeBSD box i have a Plex jail that accesses the same share (different mount point) that works just fine.

 

I've tried with local nfs locks, and i've disabled "automatic watching" for the media libraries, but nothing helps.

Sometimes it will run for an hour or so, but mostly it just dies after 5 minutes.

 

Has anyone seen, and more importantly solved something similar ?

Link to comment
Share on other sites

jinie

I should add that i also have a Debian host running Emby against the same shares, running through Docker, and that one works well, which is why i'm hunting something FreeBSD specific.

I've trawled Google for a few days for similar problems, and the Plex install was a test since it puts the machine through similar workloads, but that one works perfectly, sadly my money goes to Emby :)

 

The Debian server hosts a ~20TB volume that itself is hosted on a Mergerfs volume.

Link to comment
Share on other sites

makarai

I should add that i also have a Debian host running Emby against the same shares, running through Docker, and that one works well, which is why i'm hunting something FreeBSD specific.

I've trawled Google for a few days for similar problems, and the Plex install was a test since it puts the machine through similar workloads, but that one works perfectly, sadly my money goes to Emby :)

 

The Debian server hosts a ~20TB volume that itself is hosted on a Mergerfs volume.

 

TBH: I am probably not experienced enough in the freebsd and the networking world. However, i had some similar problems using freebsd and nfs, that i fixed with the nolock option.

Edited by makarai
Link to comment
Share on other sites

unhooked

I'm serving from freebsd but generally I do some combination of spongy,soft,nolock,intr. Also not using kerberous.

 

historically if anything caused an interruption to hard mounts, it would just cause it to hang forever. 

my linux (jetson-nano) is doing:

 

192.168.0.11:/Shared       /Shared      nfs auto,x-systemd.automount,nofail,nolo
ck,intr,tcp,actimeo=1800 0 0
 

I'd have to reboot my desktop to be sure, but I believe I'm using nolock,bg,intr at the moment on 12-stable.

 

Back in the day, hard and locking was always bad ju-ju.

Link to comment
Share on other sites

sluggo45

I'd try a soft mount from the Emby server to your Debian host and see if the problem still occurs.

 

You can also try hard mount (like you are now) and use the intr option with it.

 

Given the nature of Emby scans and your description my guess is you have a locking problem.

 

 

TBH: I am probably not experienced enough in the freebsd and the networking world. However, i had some similar problems using freebsd and nfs, that i fixed with the nolock option.

 

 

"nolock" isn't valid with NFS4. Another thing to try though is use NFS3 (append vers=3) assuming the NFS host supports both. Then you can use nolock.

Link to comment
Share on other sites

unhooked

 

.

 

"nolock" isn't valid with NFS4. Another thing to try though is use NFS3 (append vers=3) assuming the NFS host supports both. Then you can use nolock.

 

Hmm, nolock seems to cause the mount to fallback to v3 with client side locking. Never bothered to check.

 

192.168.0.11:/Shared on /Shared type nfs (rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,acregmin=1800,acregmax=1800,acdirmin=1800,acdirmax=1800,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.0.11,mountvers=3,mountport=744,mountproto=tcp,local_lock=all,addr=192.168.0.11)

 

Link to comment
Share on other sites

jinie

I ended up with hard, and no intr, despite knowing full well the can of worms it brings with it.

The following is from man 8 mount_nfs:

 

BUGS
     Since nfsv4 performs open/lock operations that have their ordering
     strictly enforced by the server, the options intr and soft cannot be
     safely used.  hard nfsv4 mounts are strongly recommended.
 

I think i may finally have solved it though.

I stumbled upon the following message in dmesg:

 

    sonewconn: pcb 0xfffff802486ed988: Listen queue overflow: 193 already in queue awaiting acceptance (6 occurrences)
 

It was not a bug i had seen before installing Plex, but i figured i'd give it a shot, so i set somaxconn=1024 and restarted the Emby jail, and it's been chugging along ever since.

Looking at the output from netstat, i see at least part of the reason :

 

Proto Recv-Q Send-Q Local Address          Foreign Address        (state)    
tcp4       0    276 192.168.1.241.689      192.168.1.5.2049       ESTABLISHED

 

Edited by jinie
Link to comment
Share on other sites

sluggo45

Good call. Max connections was my next suggestion.

 

If I'm not mistaken Emby's library scan involves a lot more reads than Plex at the moment. Some of the newer Emby Server betas have been addressing it.

Link to comment
Share on other sites

makarai

is this also a mount option "maxconn=1024" when i do a volume mount from my docker host to a freenas mount i always run into similar issues as described in the first post. 

Link to comment
Share on other sites

jinie

is this also a mount option "maxconn=1024" when i do a volume mount from my docker host to a freenas mount i always run into similar issues as described in the first post.

It’s a kernel tunable. (https://www.freebsd.org/doc/en/books/handbook/configtuning-kernel-limits.html)

 

You can set it with sysctl, and to make it permanent you put it in /etc/sysctl.conf.

 

I don’t know how stuff works in FreeNAS though, but I bet there’s “somewhere” you can set it from the UI.

 

Edit: I just read docker host, and the above assumes that the problem lies on the FreeBSD end. The linux equivalent would be "/proc/sys/net/core/somaxconn", which can also be set via sysctl net.core.somaxconn, and persisted in /etc/sysctl.conf.

Edited by jinie
Link to comment
Share on other sites

jinie

Good call. Max connections was my next suggestion.

 

If I'm not mistaken Emby's library scan involves a lot more reads than Plex at the moment. Some of the newer Emby Server betas have been addressing it.

 

I tried the latest FreeBSD beta, and it has the same problem. The number of reads required could be a lot lower without me noticing it. The default somaxconn setting is 128, so even if the amount of reads are reduced by 50%, they'd probably still reach more than that.

The really weird thing though, is that my Debian box has the same somaxconn setting, but works just fine with Emby, but i guess it depends on a lot of factors.

Link to comment
Share on other sites

jinie

Still not quite satisfied (scans ran, but RecvQ kept building up) i kept digging at this, and might have found another part of the problem.

 

On the linux server i set "net.ipv4.tcp_tw_reuse = 1" (sysctl), which seems to have made the RecvQ stay pretty close to 0 consistently during scans.

I just scanned a 25k files media folder, and not once was the RecvQ over 12.

Link to comment
Share on other sites

jinie

I've been looking into this some more, and another problem seems to be my nginx reverse proxy that i keep in front of Emby.

I still get the occasional stale NFS drive, but a more frequent problem, which manifests itself the same way, is actually the send buffer from emby to nginx filling up.

It's quite easy to provoke, fire up nginx in a jail, emby in another, and as soon as the emby service has started up, start "spamming" the manage server/home page.

 

If i let it rest for a few minutes before spamming it, it appears much more stable, so i'm guessing by spamming it early (in the Emby is starting up, please wait phase), emby tries to deliver data to nginx, which tries to deliver it synchronously to the client, but the client has already moved, so the buffer never gets emptied.

 

I've enabled the following in my nginx config for emby, and it seems to have helped

 

    proxy_buffering on;
    proxy_buffer_size 4k;
    proxy_buffers 8 32k;

 

Scratch that. It just made the problem worse after a while. Instead i've completely disabled buffering.

 

        proxy_buffering off;
        client_max_body_size 0;
        proxy_http_version 1.1;
        proxy_request_buffering off;
 

Edited by jinie
Link to comment
Share on other sites

Final chapter.

 

I still haven't gotten completely rid of the random freezes, but they happen much less frequent now (they might be gone, but i'm not saying those famous last words!). It has gotten to a state where Emby is usable for days before locking up.

 

While i still haven't found the reason for why they happen, i managed to minimize them by removing "minorversion=1" and downgrading kerberos security from "sec=krb5p" to "sec=krb5".

My Emby jail with these options has "survived" for 3 days now without locking up, while my Piwigo jail with the old options still lock up whenever there is a lot of disk (nfs) access.

 

I also tried with sec=krb5i, but that also locks up. By downgrading the security to "sec=krb5" it no longer encrypts data, so i assume the problem is caused by some incompatibility between Debian and FreeBSD's GSS daemons, and by using just "krb5" it only uses the GSS daemon for authentication/authorization, instead of sending the entire network stream through it. It could be caused by FreeBSD using Heimdal and Linux using the MIT version of Kerberos.

 

I assume i could enable "minorversion=1" again, but if it aint broken....

Edited by jinie
Link to comment
Share on other sites

unhooked

Final chapter.

 

I still haven't gotten completely rid of the random freezes, but they happen much less frequent now (they might be gone, but i'm not saying those famous last words!). It has gotten to a state where Emby is usable for days before locking up.

 

While i still haven't found the reason for why they happen, i managed to minimize them by removing "minorversion=1" and downgrading kerberos security from "sec=krb5p" to "sec=krb5".

My Emby jail with these options has "survived" for 3 days now without locking up, while my Piwigo jail with the old options still lock up whenever there is a lot of disk (nfs) access.

 

I also tried with sec=krb5i, but that also locks up. By downgrading the security to "sec=krb5" it no longer encrypts data, so i assume the problem is caused by some incompatibility between Debian and FreeBSD's GSS daemons, and by using just "krb5" it only uses the GSS daemon for authentication/authorization, instead of sending the entire network stream through it. It could be caused by FreeBSD using Heimdal and Linux using the MIT version of Kerberos.

 

I assume i could enable "minorversion=1" again, but if it aint broken....

If this is for a home network, I'd just completely remove krb (unless you need it for something else that is). Without the proper infrastructure it's just another host/point to attack and really doesn't offer anything if you're only doing nfs to serve media. Personally I just use firewall rules and host based access.

Link to comment
Share on other sites

If this is for a home network, I'd just completely remove krb (unless you need it for something else that is). Without the proper infrastructure it's just another host/point to attack and really doesn't offer anything if you're only doing nfs to serve media. Personally I just use firewall rules and host based access.

 

It is for a home based network, but Kerberos solves more than just encryption. I have multiple hosts all using storage from the same storage server via NFS. 

The servers live on different networks. There is an internal "DMZ" and an external (internet facing) DMZ, and all traffic is routed through the firewall / IDS, with only ports 88 (Kerberos) and 2049 (NFS) open.

While the external facing hosts only have read-only access, things like my surveillance system has write access to save motion captures to the server. 

 

With NFSv3 i'd need a lot more ports open, which would open up even more services for attack, and running NFSv4 (and v3) with sec=sys requires me to synchronize user identifiers across MacOS, Linux and FreeBSD, including service users, or face the wrath of the package maintainers when upgrading packages.

 

But i have been evaluating just throwing Emby on my internal host through Docker, and accessing it via VPN instead. The only thing keeping me back is the amount of bandwidth required on the VPN endpoint.I've tried it with my old L2TP/IPSec setup on an EdgeRouter 4, but that didn't have nearly enough bandwidth, despite having hardware accellerated IPSec. I've also been experimenting with WireGuard, but results haven't exactly been great.

My current IKEv2/IPSec solution might be up to the task. I easily get 300mbits over that, but i haven't had time to experiment with it yet.

 

I mostly use my Emby installation remotely when i'm travelling and stuck in some hotel that only has "local" channels. The problem with VPN is, most VPNs don't take kindly to RFC 1918 addresses being shared between the source network and the destination network (i.e. 192.168.0.0/16).

 

As for security, I already do geoip blocking in the firewall to only allow select countries access to my server, and open up new countries temporarily when travelling. The IDS does a fine job of catching most random "bad" traffic, and just blocks it at the gate. And still, having only one country allowed in the firewall, it's amazing the amount of junk that arrives at my router every second of the day.

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...