Jump to content

entire VM machine crashing now


Recommended Posts

Posted

Hi, over the past 2 days my server, and I mean ENTIRE server has crashed to a point of not being able to login to my hypervisor and a hard reset is required. (Emby is run in a VM, storage server, router, etc. etc.) I have attached the most recent log before it crashed. I'm hoping someone can tell me its not emby so I can move on to the next piece of the puzzle. The weird thing is it has done it twice and both times the server was transcoding a video.

Emby server log 161106 rev2.txt

Posted

server log looks OK, nothing out of the ordinary to me.

Posted

Thx you. I also noticed my ntp config got out of wack and it was showing the emby server and my entire network basically about 13 hours different then what it actually was. Could that have screwed up something as I imagine the client had the right time?

Posted

Last question. When emby is transcoding is it normal for it to make like 100 files per viewing? 

 

  58673c0e9e1a5_embytranscodingfiles.png

Posted

the time probably won't make a difference, and yes for some apps we split the video into segments when transcoding.

Posted

Thanks luke. I guess I'll just wait and see when it happens again if I can diagnose. I moved the transcoding folder to a new disk so it wouldnt be writing to the same disk that the rest of the VMs write to hopfully that frees something up. otherwise I'm going to have a littttttle bit of an issue lol. Happy new year!

Posted

Hey Luke so I'm able to reproduce this issue on a regular basis now. I'm streaming anything over a "long distance" and it is crashing the server. I can stream 6 things at once with transcoding no issue, in the house and out of the house but as soon as I do it at one location that's further away in terms of geographic distance the server suddenly crashes in the middle of a stream... I don't see anything in the logs so should I turn on debug logging?? It's a pretty big issue now as it crashes the entire host, not just the emby service.

 

Sent from my VS987 using Tapatalk

puithove
Posted

Curious what hypervisor you're using.

Posted

Curious what hypervisor you're using.

VMware Esxi 6

 

Sent from my VS987 using Tapatalk

puithove
Posted

Also, to be blunt - it should be near impossible for an application-specific problem inside a fully virtualized OS to be able to crash the host OS.  My off-the-cuff thought would be that something else is going on.  Not an issue with Emby server crashing everything, more like it creating load on the box that ends up triggering the actual issue that happens at a higher level.

 

First thought would be a hardware issue... like under the stress of transcoding, chip temp spikes to the point where you have a fault.  Especially if you've overclocked or anything.  But I've seen something kinda similar to what you're talking about under Qemu/KVM that ended up being an issue with how CPU instruction sets were handled.

puithove
Posted

VMware Esxi 6

 

Sent from my VS987 using Tapatalk

 

Yea, so if you're crashing ESXi, you should probably be looking at something other than Emby as being the cause.

 

Try a different way to generate load in that VM instead of transcoding.  Something like mprime's stress test might be good.

 

Then spin up a new VM on the same host and see if you can do it there.

puithove
Posted

Does the console actually crash, or are things just hung up where it drops off the network?  Are you still able to ping the management network address?

 

Are you doing anything wacky on the ESXi side like NIC teaming?  Is the VM's OS disk stored on local storage, or is that on the storage sever going over iSCSI / NFS?

Posted

Does the console actually crash, or are things just hung up where it drops off the network? Are you still able to ping the management network address?

 

Are you doing anything wacky on the ESXi side like NIC teaming? Is the VM's OS disk stored on local storage, or is that on the storage sever going over iSCSI / NFS?

Console becomes unresponsive at the machine. I haven't tried to introduce a nmi yet, that will be next. I don't think it's a hardware issue because its only been happening when I think the ping is higher. Bandwidth is fine between the client and server... My pfsense vm dies also so all my network dies. Regardless, I'm in the same subnet as the host though and still cannot ping it so the Esxi host is definitely hung up...

 

As for load, I did a load test by transcoding 4 things at once and direct streaming 2, yesterday...

 

The host has 2 ssds installed for local data stores. The vms are all installed on ssd1 and emby has a 2nd datastore for transcoding temp on ssd2. That's the only thing on ssd 2... The host os (esxi) is installed on a SanDisk USB plugged in the internal USB space on the supermicro x10sl7f motherboard.

 

Also using 32gb of ecc ddr3 1600 memory.

 

Sent from my VS987 using Tapatalk

Posted

Does the console actually crash, or are things just hung up where it drops off the network? Are you still able to ping the management network address?

 

Are you doing anything wacky on the ESXi side like NIC teaming? Is the VM's OS disk stored on local storage, or is that on the storage sever going over iSCSI / NFS?

No nic teaming. It's been working fine for so long, just started crashing the host about a week ago or so. If any issues would come up in the past the emby service would hang up but never the whole host.

 

Sent from my VS987 using Tapatalk

puithove
Posted

Interesting.  Anything in the host's logs from vmkernel?

Posted

Interesting. Anything in the host's logs from vmkernel?

Well I haven't actually checked any of the hosts logs because I haven't been entirely sure how to do that haha

 

Sent from my VS987 using Tapatalk

Posted

I appreciate all the help you have given me so far in diagnosing the issue. It's weird I cannot seem to reproduce this issue over the past 48 hours.

 

I went and bought a new set of 32gb ECC ram and a new mechanical hard drive (In case I need to switch the hypervisor off the usb thumb drive) and I've been waiting for the host to crash now for 2 days but it hasnt done it yet. I feel like its when you try to watch water boil it wont ever do it. Last night I had the 2 devices that were far away geographically try to crash the server by playing live TV for over an hour and nothing broke so I cant even reproduce the issue at this time.

 

I did make 1 change however, though I gave the emby VM a share of 27% of my vm processing power and 16gb of reserved ram, I never gave it any reserved CPU power. So I changed the reservation to 3000 mhz in case its having a hang up when it the cpu scheduler goes to queue up the vms instructions. If it crashes again I am going to scale back the router from 4 cpu cores to 2 because I barely ever break 10% cpu load even when I have the whole world funneling through it haha.

 

These kinds of issues where they are sorta random are the worst. I am going to monitor this issue closely and tweak as necessary. Thanks again for all your help and I will definitely report back as soon as I know more.

puithove
Posted

Yea, that can be frustrating.  Good luck.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...