puithove 208 Posted March 1, 2015 Share Posted March 1, 2015 (edited) I'm seeing an issue in the past couple updates of the server where the service is unresponsive after idling for a while - meaning, running fine when I go to bed, but in the morning doesn't respond to clients or to the web console. Page loading for the web console will just hang until the browser times out. My Kodi clients will attempt to load a media file (I'm using the built-in HTTP streaming, not SMB) but just hang on the loading indicator. A simple restart of the service brings it back to life. When in this state, I see this continually repeating every couple seconds (this is from the console log collected by journald) Mar 01 07:14:55 big1 mediabrowser-server[24304]: getifaddrs() failed Mar 01 07:14:55 big1 mediabrowser-server[24304]: System.SystemException Mar 01 07:14:55 big1 mediabrowser-server[24304]: at System.Net.NetworkInformation.LinuxNetworkInterface.ImplGetAllNetworkInterfaces () [0x00000] in <filename unknown>:0 Mar 01 07:14:55 big1 mediabrowser-server[24304]: at System.Net.NetworkInformation.NetworkInterface.GetAllNetworkInterfaces () [0x00000] in <filename unknown>:0 Mar 01 07:15:11 big1 mediabrowser-server[24304]: UnhandledException Mar 01 07:15:11 big1 mediabrowser-server[24304]: getifaddrs() failed Mar 01 07:15:11 big1 mediabrowser-server[24304]: System.SystemException Mar 01 07:15:11 big1 mediabrowser-server[24304]: at System.Net.NetworkInformation.LinuxNetworkInterface.ImplGetAllNetworkInterfaces () [0x00000] in <filename unknown>:0 Mar 01 07:15:11 big1 mediabrowser-server[24304]: at System.Net.NetworkInformation.NetworkInterface.GetAllNetworkInterfaces () [0x00000] in <filename unknown>:0 Mar 01 07:15:11 big1 mediabrowser-server[24304]: UnhandledException Mar 01 07:15:11 big1 mediabrowser-server[24304]: getifaddrs() failed Mar 01 07:15:11 big1 mediabrowser-server[24304]: System.SystemException Mar 01 07:15:11 big1 mediabrowser-server[24304]: at System.Net.NetworkInformation.LinuxNetworkInterface.ImplGetAllNetworkInterfaces () [0x00000] in <filename unknown>:0 Mar 01 07:15:11 big1 mediabrowser-server[24304]: at System.Net.NetworkInformation.NetworkInterface.GetAllNetworkInterfaces () [0x00000] in <filename unknown>:0 Mar 01 07:15:17 big1 mediabrowser-server[24304]: UnhandledException I also see occasionally this mixed inbetween (may or may not be related): Mar 01 07:14:54 big1 mediabrowser-server[24304]: many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files Mar 01 07:14:55 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too manUnhandledException Mar 01 07:14:55 big1 mediabrowser-server[24304]: getifaddrs() failed Mar 01 07:14:55 big1 mediabrowser-server[24304]: System.SystemException Mar 01 07:14:55 big1 mediabrowser-server[24304]: at System.Net.NetworkInformation.LinuxNetworkInterface.ImplGetAllNetworkInterfaces () [0x00000] in <filename unknown>:0 Mar 01 07:14:55 big1 mediabrowser-server[24304]: at System.Net.NetworkInformation.NetworkInterface.GetAllNetworkInterfaces () [0x00000] in <filename unknown>:0 Mar 01 07:15:11 big1 mediabrowser-server[24304]: UnhandledException Mar 01 07:15:11 big1 mediabrowser-server[24304]: getifaddrs() failed Mar 01 07:15:11 big1 mediabrowser-server[24304]: System.SystemException Mar 01 07:15:11 big1 mediabrowser-server[24304]: at System.Net.NetworkInformation.LinuxNetworkInterface.ImplGetAllNetworkInterfaces () [0x00000] in <filename unknown>:0 Mar 01 07:15:11 big1 mediabrowser-server[24304]: at System.Net.NetworkInformation.NetworkInterface.GetAllNetworkInterfaces () [0x00000] in <filename unknown>:0 Mar 01 07:15:11 big1 mediabrowser-server[24304]: UnhandledException Mar 01 07:15:11 big1 mediabrowser-server[24304]: getifaddrs() failed Mar 01 07:15:11 big1 mediabrowser-server[24304]: System.SystemException Mar 01 07:15:11 big1 mediabrowser-server[24304]: at System.Net.NetworkInformation.LinuxNetworkInterface.ImplGetAllNetworkInterfaces () [0x00000] in <filename unknown>:0 Mar 01 07:15:11 big1 mediabrowser-server[24304]: at System.Net.NetworkInformation.NetworkInterface.GetAllNetworkInterfaces () [0x00000] in <filename unknown>:0 I will say that this machine has a somewhat unusual networking setup because I run several VMs on it using KVM. I have 4 NIC ports in a bond, then that is part of a bridge which is the active network interface that the host OS uses as well as where the virtual NICs for the VMs are bridged in. Bridge interface has both IPv4 & IPv6 addresses statically assigned. Mediabrowser server is running on the host OS directly. I was thinking because of this and because the error seems to indicate an issue with getting interface info, it would be good to tell MB to use a certain interface only, but I don't see that option anywhere. Either way, this didn't seem to be an issue until something in the last couple updates (or maybe a kernel update or something). Any ideas for something to try? Edited March 1, 2015 by puithove Link to comment Share on other sites More sharing options...
puithove 208 Posted March 1, 2015 Author Share Posted March 1, 2015 (edited) Running on Arch Linux with the following versions Mediabrowser server: 3.0.5518.7 Mono: 3.12.0 Kernel: 3.18.6 Edited March 1, 2015 by puithove Link to comment Share on other sites More sharing options...
thefirstofthe300 291 Posted March 1, 2015 Share Posted March 1, 2015 (edited) This might be over my head but the first thing I would recommend is using downgrade from the AUR to install mono 3.10. Mono 3.12 has had some known issues with the server due to a bug in mono. Edited March 1, 2015 by DaBungalow Link to comment Share on other sites More sharing options...
puithove 208 Posted March 2, 2015 Author Share Posted March 2, 2015 Just had the issue when I went to start a video now. Was working fine several hours ago. Had mono 3.10 in my pacman cache, so have downgraded. Will try that and see how things are tomorrow. thanks. Link to comment Share on other sites More sharing options...
puithove 208 Posted March 2, 2015 Author Share Posted March 2, 2015 Hmmmm.... interesting. That might be it. All good this morning. I'll leave it that way for another day or so. If still good, I'll verify by switching back to 3.12. Link to comment Share on other sites More sharing options...
puithove 208 Posted March 2, 2015 Author Share Posted March 2, 2015 Nope, unfortunately it seems to have fallen down again the same way while I was at work. No change. Link to comment Share on other sites More sharing options...
xelar 4 Posted March 2, 2015 Share Posted March 2, 2015 Hi, I'm on Debian Jessy with 3.16 kernel, running mono 3.10. I get the same error messages, but it happens quite randomly. Sometimes the server runs fine for a couple of days, sometimes it crashes after one day. Link to comment Share on other sites More sharing options...
thefirstofthe300 291 Posted March 3, 2015 Share Posted March 3, 2015 Can you post a server log? http://mediabrowser.tv/community/index.php?/topic/739-how-to-report-a-problem/ Link to comment Share on other sites More sharing options...
puithove 208 Posted March 3, 2015 Author Share Posted March 3, 2015 Can you post a server log? http://mediabrowser.tv/community/index.php?/topic/739-how-to-report-a-problem/ Sure. This is the log that runs up until I checked it this morning. It was having the issue when I restarted the service at 6:36 (just a few minutes ago). The last entry in the log was at 5:35 http://pastebin.com/WigibfCz Link to comment Share on other sites More sharing options...
thefirstofthe300 291 Posted March 3, 2015 Share Posted March 3, 2015 Considering that server log was nothing but errors, I would say something is definitely wrong. Those errors I have never seen before though so this is a bit out of my depth. @@Luke Link to comment Share on other sites More sharing options...
Luke 37099 Posted March 3, 2015 Share Posted March 3, 2015 i don't see any problems there. those are all errors the server can recover from and continue on. Link to comment Share on other sites More sharing options...
Jackrats 4 Posted March 3, 2015 Share Posted March 3, 2015 I've now had 2 instances of MBS becoming unhappy with "Too many files open." Interestingly, when it happens, I can still get to the WebServer. I would have thought that the new connection would introduce an additional filehandle, but alas it responds there happily. But playing from Roku fails in this state and there are entries in the WebUI's notifications that scheduled functions are failing due to the filehandle exhaustion, such as: Cache file cleanup failed. 3 days ago Too many open files Check for plugin updates failed. 3 days ago Error: ConnectFailure (Too many open files) The couple times it happened, I did a cursory look at lsof and was amazed at the line count for MBS related open files being > 125000 files! I thought I'd poke around today while in the working state and saw 112000 items, but after looking closer, I realize that every instance of a file, library, binary, socket, etc for every MBS thread is reported there. As the filehandles are shared across all threads of the process, this was actually a very misleading count. My /proc/xxx/fd for the MBS PID shows presently only 74 FD's in use. Ulimit on my system is default at 1024 with the kernel hard limit at 4096. The next time I see a failure, I'll investigate a bit closer to see where the extra 900+ FD's have gone. Some additional details: My MBS is also running on a KVM guest running CentOS 7 on a CentOS 7 host, with a simple bridge on the host network. However, I don't see the getifaddrs noted by the OP as likely being relevant to the root cause, but rather a side effect error of not having any more FD's available to the process. Timing wise, I feel like this issue started after bringing in the Roku thumbnail functionality. Based on what logs I do have available, I upgraded to the BIF capable release on Feb 10 (3.0.5518.1-Beta.31.1.noarch) and subsequently started the thumbnail creation process. On Feb 16, I installed lsof, which I did when I first ran into the FD exhaustion. Snipped a long block that basically says: Nothing in the log files stands out, the "Too many open files" message starts out of the blue when nothing else seems to be going on. This second instance was while running the 3.0.5518.4-Beta.32.1.noarch.rpm. At this point, I think my best hope of pinning this down further is to investigate what's actually open the next time it occurs, but now I'm ready for it this time! Anyone else running into it, you'll probably want to start with: 1. Determine the PID for MediaBrowserServer ("ps -ef | grep MediaBrowserServer") 2. See what the open file descriptors are for ("ls -l /proc/PIDFROMSTEP1/fd") That will show what each FD is for. Hopefully the leak will be on a filesystem file and not a socket. At present time I have 41 files and 35 sockets open. If you see a thousand'ish socket entries, pick some of the higher numbered ones (first part of the output after the date) and use "lsof" to see what they are exactly. For example: [root@mbserver mediabrowser]# ls -l /proc/32452/fd ... snip ... lrwx------. 1 MediaBrowserServer media 64 Mar 3 11:44 63 -> socket:[12357592] That shows FD 63 is a socket identified by 12357592. [root@mbserver mediabrowser]# lsof | grep 12357592 Threadpoo 32452 15530 MediaBrowserServer 63u IPv4 12357592 0t0 TCP mbserver.mydomain.com:8096->roku.mydomain.com:46629 (ESTABLISHED) You'll actually see a ton of entries, one for each thread spawned by MBS. But they will all have the same details on the right as far as where they go and what ports are used. It might also be interesting to see how many threads are spawned. My instance currently has 452 threads. This seems particularly high to me. I just restarted by MBS service and now have 16. Link to comment Share on other sites More sharing options...
puithove 208 Posted March 4, 2015 Author Share Posted March 4, 2015 FWIW - I haven't installed the Roku thumbs plugin. I'll do a little digging on the files when it happens again (and if the wife isn't trying to watch tv at the time). Right now things are working, but the open file count does seem high. Output from lsof shows lots of NETLINK sockets like this (just a small snippet): Threadpoo 27211 29824 mediabrowser 211u sock 0,7 0t0 9684816 protocol: NETLINK Threadpoo 27211 29824 mediabrowser 212u sock 0,7 0t0 9668292 protocol: NETLINK Threadpoo 27211 29824 mediabrowser 213u sock 0,7 0t0 9686518 protocol: NETLINK Threadpoo 27211 29824 mediabrowser 214u sock 0,7 0t0 9684827 protocol: NETLINK Threadpoo 27211 29824 mediabrowser 215u sock 0,7 0t0 9723198 protocol: NETLINK Threadpoo 27211 29824 mediabrowser 216u sock 0,7 0t0 9686536 protocol: NETLINK Threadpoo 27211 29824 mediabrowser 217u sock 0,7 0t0 9721231 protocol: NETLINK Threadpoo 27211 29824 mediabrowser 218u sock 0,7 0t0 9684866 protocol: NETLINK And the fd from proc (also just a snippet): lrwx------ 1 mediabrowser mediabrowser 64 Mar 4 06:57 210 -> socket:[9648782] lrwx------ 1 mediabrowser mediabrowser 64 Mar 4 06:57 211 -> socket:[9684816] lrwx------ 1 mediabrowser mediabrowser 64 Mar 4 06:57 212 -> socket:[9668292] lrwx------ 1 mediabrowser mediabrowser 64 Mar 4 06:57 213 -> socket:[9686518] lrwx------ 1 mediabrowser mediabrowser 64 Mar 4 06:57 214 -> socket:[9684827] lrwx------ 1 mediabrowser mediabrowser 64 Mar 4 06:57 215 -> socket:[9723198] lrwx------ 1 mediabrowser mediabrowser 64 Mar 4 06:57 216 -> socket:[9686536] lrwx------ 1 mediabrowser mediabrowser 64 Mar 4 06:57 217 -> socket:[9721231] lrwx------ 1 mediabrowser mediabrowser 64 Mar 4 06:57 218 -> socket:[9684866] lrwx------ 1 mediabrowser mediabrowser 64 Mar 4 07:00 219 -> socket:[9671237] The number here is growing as I'm sitting watching it. Looks like about 1 every 15-20 secs or so. Which actually, now that I think about it, that's reflected in the timestamps on the file: lrwx------ 1 mediabrowser mediabrowser 64 Mar 4 07:03 220 -> socket:[9686684] lrwx------ 1 mediabrowser mediabrowser 64 Mar 4 07:03 221 -> socket:[9724578] lrwx------ 1 mediabrowser mediabrowser 64 Mar 4 07:03 222 -> socket:[9686716] lrwx------ 1 mediabrowser mediabrowser 64 Mar 4 07:03 223 -> socket:[9668351] lrwx------ 1 mediabrowser mediabrowser 64 Mar 4 07:03 224 -> socket:[9687435] lrwx------ 1 mediabrowser mediabrowser 64 Mar 4 07:04 225 -> socket:[9671312] lrwx------ 1 mediabrowser mediabrowser 64 Mar 4 07:05 226 -> socket:[9724614] lrwx------ 1 mediabrowser mediabrowser 64 Mar 4 07:05 227 -> socket:[9724618] lrwx------ 1 mediabrowser mediabrowser 64 Mar 4 07:06 228 -> socket:[9687663] lrwx------ 1 mediabrowser mediabrowser 64 Mar 4 07:06 229 -> socket:[9724654] Link to comment Share on other sites More sharing options...
puithove 208 Posted March 4, 2015 Author Share Posted March 4, 2015 After restarting of course the count in /fd for the new process is drastically lower. It seems to continue the same behavior of adding sockets a couple minutes after starting up. Link to comment Share on other sites More sharing options...
Jackrats 4 Posted March 4, 2015 Share Posted March 4, 2015 Did you by any chance look at "lsof" output to see where those sockets are going? Link to comment Share on other sites More sharing options...
puithove 208 Posted March 5, 2015 Author Share Posted March 5, 2015 As of this morning it's in the hung state again. Mediabrowser has 1024 open files under /proc/[pid]/fd So I did take a couple examples from there to get the lsof output. Examples: lrwx------ 1 mediabrowser mediabrowser 64 Mar 5 05:59 998 -> socket:[11688527] lrwx------ 1 mediabrowser mediabrowser 64 Mar 5 05:59 999 -> socket:[11693484] lsof|grep 11693484 mono 23963 mediabrowser 999u sock 0,7 0t0 11693484 protocol: NETLINK Threadpoo 23963 10271 mediabrowser 999u sock 0,7 0t0 11693484 protocol: NETLINK IO\x20Thr 23963 12789 mediabrowser 999u sock 0,7 0t0 11693484 protocol: NETLINK Threadpoo 23963 14553 mediabrowser 999u sock 0,7 0t0 11693484 protocol: NETLINK Threadpoo 23963 14731 mediabrowser 999u sock 0,7 0t0 11693484 protocol: NETLINK Threadpoo 23963 16495 mediabrowser 999u sock 0,7 0t0 11693484 protocol: NETLINK Threadpoo 23963 22438 mediabrowser 999u sock 0,7 0t0 11693484 protocol: NETLINK Threadpoo 23963 23814 mediabrowser 999u sock 0,7 0t0 11693484 protocol: NETLINK Finalizer 23963 23964 mediabrowser 999u sock 0,7 0t0 11693484 protocol: NETLINK Timer-Sch 23963 23965 mediabrowser 999u sock 0,7 0t0 11693484 protocol: NETLINK Threadpoo 23963 23966 mediabrowser 999u sock 0,7 0t0 11693484 protocol: NETLINK Threadpoo 23963 23967 mediabrowser 999u sock 0,7 0t0 11693484 protocol: NETLINK Threadpoo 23963 23969 mediabrowser 999u sock 0,7 0t0 11693484 protocol: NETLINK mono 23963 24053 mediabrowser 999u sock 0,7 0t0 11693484 protocol: NETLINK IO\x20Thr 23963 24054 mediabrowser 999u sock 0,7 0t0 11693484 protocol: NETLINK Threadpoo 23963 24456 mediabrowser 999u sock 0,7 0t0 11693484 protocol: NETLINK Threadpoo 23963 24888 mediabrowser 999u sock 0,7 0t0 11693484 protocol: NETLINK IO\x20Thr 23963 26333 mediabrowser 999u sock 0,7 0t0 11693484 protocol: NETLINK Threadpoo 23963 26594 mediabrowser 999u sock 0,7 0t0 11693484 protocol: NETLINK Threadpoo 23963 29018 mediabrowser 999u sock 0,7 0t0 11693484 protocol: NETLINK lsof|grep 11688527 mono 23963 mediabrowser 998u sock 0,7 0t0 11688527 protocol: NETLINK Threadpoo 23963 10271 mediabrowser 998u sock 0,7 0t0 11688527 protocol: NETLINK IO\x20Thr 23963 12789 mediabrowser 998u sock 0,7 0t0 11688527 protocol: NETLINK Threadpoo 23963 14553 mediabrowser 998u sock 0,7 0t0 11688527 protocol: NETLINK Threadpoo 23963 14731 mediabrowser 998u sock 0,7 0t0 11688527 protocol: NETLINK Threadpoo 23963 16495 mediabrowser 998u sock 0,7 0t0 11688527 protocol: NETLINK Threadpoo 23963 22438 mediabrowser 998u sock 0,7 0t0 11688527 protocol: NETLINK Threadpoo 23963 23814 mediabrowser 998u sock 0,7 0t0 11688527 protocol: NETLINK Finalizer 23963 23964 mediabrowser 998u sock 0,7 0t0 11688527 protocol: NETLINK Timer-Sch 23963 23965 mediabrowser 998u sock 0,7 0t0 11688527 protocol: NETLINK Threadpoo 23963 23966 mediabrowser 998u sock 0,7 0t0 11688527 protocol: NETLINK Threadpoo 23963 23967 mediabrowser 998u sock 0,7 0t0 11688527 protocol: NETLINK Threadpoo 23963 23969 mediabrowser 998u sock 0,7 0t0 11688527 protocol: NETLINK mono 23963 24053 mediabrowser 998u sock 0,7 0t0 11688527 protocol: NETLINK IO\x20Thr 23963 24054 mediabrowser 998u sock 0,7 0t0 11688527 protocol: NETLINK Threadpoo 23963 24456 mediabrowser 998u sock 0,7 0t0 11688527 protocol: NETLINK Threadpoo 23963 24888 mediabrowser 998u sock 0,7 0t0 11688527 protocol: NETLINK IO\x20Thr 23963 26333 mediabrowser 998u sock 0,7 0t0 11688527 protocol: NETLINK Threadpoo 23963 26594 mediabrowser 998u sock 0,7 0t0 11688527 protocol: NETLINK Threadpoo 23963 29018 mediabrowser 998u sock 0,7 0t0 11688527 protocol: NETLINK Link to comment Share on other sites More sharing options...
Jackrats 4 Posted March 5, 2015 Share Posted March 5, 2015 Of course, it would be a NETLINK socket, which I am admittedly clueless about. Looks like it is a mechanism to communicate with the kernel for things like routing table updates, multicast traffic, and some other things. My guess would be it's being used for multicast for UPNP functions. Which sort of makes sense from the one instance that I was able to dig into my logs for before they rolled -- the first complainer in the logs was UPnP messages failing to go out. My system currently only has 1 NETLINK socket open. I'm trying to figure out how to dig more into the NETLINK socket, but on the next instance it might be worth looking at /proc/net/netlink: # grep PIDOFMBS /proc/link/netlink Not sure if that will yield anything useful or not yet, but it's the most I've found so far for looking into NETLINK sockets. Also trying to see if GDB might give anything useful on them. Link to comment Share on other sites More sharing options...
Jackrats 4 Posted March 5, 2015 Share Posted March 5, 2015 So, actually, in reading further on netlink, grep'ing to the PID won't give the goods. The PID is only used as the reference for the first netlink socket for a given process: nl_pid is the unicast address of netlink socket. It's always 0 if the destination is in the kernel. For a user-space process, nl_pid is usually the PID of the process owning the destination socket. However, nl_pid identifies a netlink socket, not a process. If a process owns several netlink sockets, then nl_pid can be equal to the process ID only for at most one socket. There are two ways to assign nl_pid to a netlink socket. If the application sets nl_pid before calling bind(2), then it is up to the application to make sure that nl_pid is unique. If the application sets it to 0, the kernel takes care of assigning it. The kernel assigns the process ID to the first netlink socket the process opens and assigns a unique nl_pid to every netlink socket that the process subsequently creates. So, it's probably worth just cat'ing the entire /proc/net/netlink file. My system currently only has 38 entries, and my MBS currently only has 1 NETLINK socket open. The "Groups" column appears to decode to the type, here mine is 11 for the NETLINK_CONNECTOR type. The columns don't line up well. [root@mbserver mediabrowser]# cat /proc/net/netlink sk Eth Pid Groups Rmem Wmem Dump Locks Drops Inode ffff88013971f800 0 32485 00000011 0 0 0000000000000000 2 0 9807871 Also, looks like I typo'ed in my previous port -- it's /proc/net/netlink not /proc/link/netlink. Link to comment Share on other sites More sharing options...
puithove 208 Posted March 6, 2015 Author Share Posted March 6, 2015 I didn't know anything about NETLINK either when looking at it this morning, but did a quick google and turned up basically what you mention above. I'll try to look a bit more when I can. Interestingly I do remember seeing in my logs the complaints about UPNP as well, so that's definitely worth looking a little further into. Link to comment Share on other sites More sharing options...
Jackrats 4 Posted March 6, 2015 Share Posted March 6, 2015 It looks like the "ss" command that comes with the iproute package also shows some info on netlink sockets: ss -a -f netlink Link to comment Share on other sites More sharing options...
puithove 208 Posted March 8, 2015 Author Share Posted March 8, 2015 Well this is interesting. Restarted the mediabrowser service yesterday because it was unresponsive. Took a look at the file descriptors right after the restart and it was running around 50 open files/sockets. This morning, all is well, and I looked at the count again - it's still right at 50. The only thing I can think of that would be different is that I did a full library refresh yesterday afternoon (for completely separate reasons - it seemed a lot of my metadata & images had gone missing). Looking at the server log now, the majority of the errors shown in the log I posted previously seem to be gone. I'm not yet ready to say that's the cure here, but others having the issue might want to try and see if that helps. For each of my sections, in Metadata Manager, I did an Advanced Refresh, Refresh all data / Replace Images. Then after that was complete, ran a refresh in the auto box sets plugin. Link to comment Share on other sites More sharing options...
puithove 208 Posted March 9, 2015 Author Share Posted March 9, 2015 Still running, haven't had to restart it. Open files still at 50. Link to comment Share on other sites More sharing options...
Luke 37099 Posted March 9, 2015 Share Posted March 9, 2015 Well done Link to comment Share on other sites More sharing options...
puithove 208 Posted March 14, 2015 Author Share Posted March 14, 2015 So after having to reboot the server, this issue is back. Seems that "good" run may have been a fluke, and I probably should have restarted the service to check it more thoroughly. Same symptoms as before - number of open sockets slow grows (leaks) until it reaches the limit of open files (1024). Oddly enough, I did again observe quite a few items in my library that were missing images like before. These are items that I KNOW had images downloaded before, had gone missing before, and were re-downloaded when I did the last refresh. I've refreshed everything again, but the socket leak issue is still happening this time. Can't say that it's related, but definitely odd. Link to comment Share on other sites More sharing options...
Jackrats 4 Posted March 14, 2015 Share Posted March 14, 2015 Were those items missing images after restarting the service (ie post restart to free up the used filehandles)? Or did you notice them missing while MB3 was in the "Too many open filehandles" state? Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now