funkypenguin 0 Posted October 17, 2018 Posted October 17, 2018 Hey guys, I'm trying to debug the occasional (but horrible) system crash I get with nVidia NVRM Xid 31s (https://devtalk.nvidia.com/default/topic/1042835/linux/nvidia-docker-based-host-hangs-when-gpu-memory-exceeded-with-ffmpeg-transcodes/post/5289940/?offset=10#5290457 and https://trac.ffmpeg.org/ticket/7012), and I'd like to try altering the ffmpeg compile options used for the docker container. I've spent some time poking around through https://github.com/MediaBrowser/Emby.Build, but I haven't found how the emby-specific ffmpeg binary is added to the image: /bin/ffmpeg -v ffmpeg version 4.0.2-emby_2018_09_13 Copyright (c) 2000-2018 the FFmpeg developers built with gcc 6.3.0 (crosstool-NG crosstool-ng-1.23.0) Any suggestions on how I can tweak the compile options here? Thanks! D
Luke 42077 Posted October 17, 2018 Posted October 17, 2018 We don't have a documented process for this. What did you want to change?
funkypenguin 0 Posted October 17, 2018 Author Posted October 17, 2018 We don't have a documented process for this. What did you want to change? Well, I'm not really sure TBH It seems unlikely that multiple generations of nVidia drivers would carry such a significant bug, and only one other ffmpeg user reported a similar problem. It seems that the ffmpeg bundled with the container might be the issue, so I thought I'd start by removing the unused options (I only care about nvenc for my GPU arch, for example), and go from there..
funkypenguin 0 Posted October 17, 2018 Author Posted October 17, 2018 The best summary is this: root@docker1:# cat nvidia-bug-report.log | grep Xid Oct 16 17:14:02 docker1 kernel: NVRM: Xid (PCI:0000:03:00): 31, Ch 00000030, engmask 00008100, intr 10000000 Oct 16 17:14:02 docker1 kernel: NVRM: Xid (PCI:0000:03:00): 68, CCMDs 00000030 0000c2b0 [17264.379746] NVRM: Xid (PCI:0000:03:00): 31, Ch 00000030, engmask 00008100, intr 10000000 [17264.535490] NVRM: Xid (PCI:0000:03:00): 68, CCMDs 00000030 0000c2b0root@docker1:# But there's more detail in the devtalk.nvidia thread here : https://devtalk.nvidia.com/default/topic/1042835/linux/nvidia-docker-based-host-hangs-when-gpu-memory-exceeded-with-ffmpeg-transcodes/post/5289940/?offset=10#5290457
Luke 42077 Posted October 17, 2018 Posted October 17, 2018 Can you attach the ffmpeg log from Emby? thanks.
funkypenguin 0 Posted October 17, 2018 Author Posted October 17, 2018 Attaching 2 files, one is the ffmpeg log from Emby which co-incides with one of the [defunct] ffmpeg processes at the time of the fault, and the other is my (failed) attempt to recreate the fault be re-executing the same ffmpeg command, while exec'd into the container (the transcode seemed to work perfectly when I retried it) failed_transcode.log attempt_rerun_failed_transcode.log
Luke 42077 Posted October 18, 2018 Posted October 18, 2018 Have you verified you can transcode in Emby with the cpu?
funkypenguin 0 Posted October 18, 2018 Author Posted October 18, 2018 Yes, I'm 100% certain I can transcode with the CPU
Luke 42077 Posted October 18, 2018 Posted October 18, 2018 Meaning you've tried it? Your failed log is interesting in that there's no error message. It appears to just be stalled for whatever reason.
funkypenguin 0 Posted October 18, 2018 Author Posted October 18, 2018 Meaning I routinely turn off GPU transcoding and use CPU transcoding I agree the transcode seems to just "stall", and my theory is that I'm maxing out the GPU RAM (aligns with the Xid 31/memory page error), but instead of erroring (as I'd hoped it would), ffmpeg causes the hang of all i/o, requiring a hard-reboot to recover from (same as in https://trac.ffmpeg.org/ticket/7012). But this is all conjucture, which is why I'm hoping to be able to build custom ffmpeg binaries and run further tests..
Luke 42077 Posted October 18, 2018 Posted October 18, 2018 Well in the beginning of the server log you can see the startup command line. So in theory you could run the server with an alternate command line that points to different ffmpeg and ffprobe executables.
funkypenguin 0 Posted October 18, 2018 Author Posted October 18, 2018 I’ve been working on that, but ran into library / dependeny issues. Should I be trying to compile for alpine or SUSE? Just working my way backwards, curious how to compile the most compatible version
Luke 42077 Posted October 18, 2018 Posted October 18, 2018 I'll have to check with our developer but I think it's alpine based.
funkypenguin 0 Posted October 24, 2018 Author Posted October 24, 2018 Hey Luke, Any response from the developer? I've tried compiling for Alpine, Tumbleweed, and Ubuntu, but I still run into libc dependency issues - it'd help if I knew how the latest images were compiled - they don't seem to be built from https://github.com/MediaBrowser/Emby.Build/tree/master/docker-containers/base anymore.. Thanks! D I'll have to check with our developer but I think it's alpine based.
Luke 42077 Posted October 25, 2018 Posted October 25, 2018 They're not based on either of them: FROM ${ARCH}/busybox:latest
funkypenguin 0 Posted October 26, 2018 Author Posted October 26, 2018 I'm 10 days in now, and _so_ close to solving this Can you share how you compiled / sourced the ffmpeg that's bundled with the container? I've learned that statically compiling ffmpeg against nvidia drivers is a non-trivial task!
funkypenguin 0 Posted November 26, 2018 Author Posted November 26, 2018 Hey guys, Any hints here? I feel like I've almost got this - problem is the container seems to be running on musl and not glibc, so the ffmpeg I'm compiling doesn't work properly. Can you point me to how the ffmpeg binaries / libraries end up in the containerR? Thanks! D I'm 10 days in now, and _so_ close to solving this Can you share how you compiled / sourced the ffmpeg that's bundled with the container? I've learned that statically compiling ffmpeg against nvidia drivers is a non-trivial task!
Luke 42077 Posted November 26, 2018 Posted November 26, 2018 Have you checked this out? https://developer.nvidia.com/ffmpeg We followed their guide there. This was also helpful as well: https://gist.github.com/Brainiarc7/988473b79fd5c8f0db54b92ebb47387a
funkypenguin 0 Posted November 26, 2018 Author Posted November 26, 2018 Yes, and you'll see some comments from me on the gist thread in Oct I've even built myself a dockerfile which produces a static version (https://github.com/funkypenguin/ffmpeg-static). I guess my question is just this - what's the underlying OS used to build the ffmpeg in the container? When I build a version which works on Ubuntu 1604, it fails to run in the container due to missing libc Thanks! D
funkypenguin 0 Posted November 27, 2018 Author Posted November 27, 2018 (edited) Here's an example - maybe I've missed something fundamental.. I built ffmpeg in nvidia's ubuntu1804 cuda container: root@f7d037ae7fb2:/# /root/ffmpeg-build-static-binaries/bin/ffmpeg ffmpeg version N-92418-gee47ac97d7 Copyright (c) 2000-2018 the FFmpeg developers built with gcc 7 (Ubuntu 7.3.0-27ubuntu1~18.04) configuration: --pkg-config-flags=--static --prefix=/root/ffmpeg-build-static-binaries --bindir=/root/ffmpeg-build-static-binaries/bin --extra-cflags='-I /root/ffmpeg-build-static-binaries/include -I /usr/local/cuda/include/' --extra-ldflags='-L /root/ffmpeg-build-static-binaries/lib -L /usr/local/cuda/lib64/' --extra-libs=-lpthread --enable-cuda-sdk --enable-cuvid --enable-libnpp --enable-gpl --enable-libass --enable-libfdk-aac --enable-vaapi --enable-libfreetype --enable-libmp3lame --enable-libopus --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-nonfree --enable-nvenc libavutil 56. 23.101 / 56. 23.101 libavcodec 58. 39.100 / 58. 39.100 libavformat 58. 22.100 / 58. 22.100 libavdevice 58. 6.100 / 58. 6.100 libavfilter 7. 43.100 / 7. 43.100 libswscale 5. 4.100 / 5. 4.100 libswresample 3. 4.100 / 3. 4.100 libpostproc 55. 4.100 / 55. 4.100 Hyper fast Audio and Video encoder usage: ffmpeg [options] [[infile options] -i infile]... {[outfile options] outfile}... Use -h to get full help or, even better, run 'man ffmpeg' root@f7d037ae7fb2:/# But when I take that binary, and try to run it within the emby container, I get this: [root@kvm ~]# docker run -t --rm -i -v /tmp:/tmp --entrypoint=/bin/ash emby/embyserver / # /tmp/ffmpeg /bin/ash: /tmp/ffmpeg: not found / # So I tried to show which libraries/dependencies were missing: / # LD_TRACE_LOADED_OBJECTS=1 /tmp/ffmpeg /bin/ash: /tmp/ffmpeg: not found / # The "not found" error (even though the file clearly exists) seems to be tied to gcc or stdlib dependencies. Can you give me any further hints re how the ffmpeg is compiled, or how it gets into the emby container based on busybox? Thanks! D Edited November 27, 2018 by funkypenguin
Luke 42077 Posted November 27, 2018 Posted November 27, 2018 In case it helps, we are using arch linux.
funkypenguin 0 Posted November 27, 2018 Author Posted November 27, 2018 Do you use arch for the OS of the docker container too? I tried dropping the ffmpeg from arch linux into the latest stable container, and I get the same libc error. Is the Dockerfile used to build the latest image publicly available somewhere? I just want to drop in a replacement ffmpeg build, in the hopes that it'll prevent my XID crash issue
Luke 42077 Posted November 27, 2018 Posted November 27, 2018 Do you use arch for the OS of the docker container too? Yes.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now