softworkz 3326 Posted February 15, 2019 Share Posted February 15, 2019 (edited) In the slower case there was a scaling filter applied which wasn't applied with hwa enabled and wasn't even necessary. This is probably a little bug in the old version or maybe the file being played wasn't analyzed correctly (or not yet) which made Emby add the scaling to limit the image size. Unfortunately you hadn't use the same file for both logs. You may try this comparison with 4.0 and see what happens, but please keep using the same file for testing if you want to post more logs for comparison. Edited February 15, 2019 by softworkz Link to comment Share on other sites More sharing options...
ken-ji 0 Posted February 19, 2019 Author Share Posted February 19, 2019 Ok. Tried it with the same File as the previous 3.5.3 log and there seems to be a 5fps disparity, but I guess there's no real solution for me but to change processors at this point. Thanks for being patient with me. Link to comment Share on other sites More sharing options...
ken-ji 0 Posted February 28, 2019 Author Share Posted February 28, 2019 (edited) I had a bizarre idea and took a look at the ffmpg command line options and I noticed this: -filter_complex "[0:1]format=nv12|vaapi,hwupload,scale_vaapi,hwmap=mode=read+write+direct,format=nv12... and I was wondering if we allow copies to be made rather forcing direct mode -filter_complex "[0:1]format=nv12|vaapi,hwupload,scale_vaapi,hwmap,format=nv12 and it seems to work fairly fast in this case: (read+write) frame= 550 fps= 62 q=-0.0 Lsize=N/A time=00:00:23.38 bitrate=N/A speed=2. vs (read+write+direct) frame= 546 fps= 22 q=-0.0 size=N/A time=00:00:23.33 bitrate=N/A speed=0.941x Is there anyway we can force/tweak these options? I'd like to see if there any compatibility issues... Edited February 28, 2019 by ken-ji Link to comment Share on other sites More sharing options...
Luke 37025 Posted February 28, 2019 Share Posted February 28, 2019 @@softworkz Link to comment Share on other sites More sharing options...
softworkz 3326 Posted February 28, 2019 Share Posted February 28, 2019 I had a bizarre idea and took a look at the ffmpg command line options and I noticed this: -filter_complex "[0:1]format=nv12|vaapi,hwupload,scale_vaapi,hwmap=mode=read+write+direct,format=nv12... and I was wondering if we allow copies to be made rather forcing direct mode -filter_complex "[0:1]format=nv12|vaapi,hwupload,scale_vaapi,hwmap,format=nv12 and it seems to work fairly fast in this case: (read+write) frame= 550 fps= 62 q=-0.0 Lsize=N/A time=00:00:23.38 bitrate=N/A speed=2. vs (read+write+direct) frame= 546 fps= 22 q=-0.0 size=N/A time=00:00:23.33 bitrate=N/A speed=0.941x Is there anyway we can force/tweak these options? I'd like to see if there any compatibility issues... Could you please post the full command line, I'll need to see context. Link to comment Share on other sites More sharing options...
softworkz 3326 Posted February 28, 2019 Share Posted February 28, 2019 Generally speaking, there is room for improvement, specifically regarding filter chain creation. But there's an incredible amount of variations that we need to cover and certain things may work in one case but not in another case. The good news is that this is an area where improvements are planned for the near future. Link to comment Share on other sites More sharing options...
ken-ji 0 Posted March 1, 2019 Author Share Posted March 1, 2019 /bin/ffmpeg -init_hw_device vaapi=vad0:/dev/dri/renderD128 -filter_hw_device vad0 -f matroska -i file:"/mnt/user/Media/Anime Series/Mahouka Koukou no Rettousei/[Doki] Mahouka Koukou no Rettousei - 16 (1920x1080 Hi10P BD FLAC) [F97267AB].mkv" -threads 0 -map 0:1 -map 0:2 -c:v:0 h264_vaapi -copyts -filter_complex "[0:1]format=nv12|vaapi,hwupload,scale_vaapi,hwmap=mode=read+write+direct,format=nv12,subtitles='/mnt/user/Media/Anime Series/Mahouka Koukou no Rettousei/[Doki] Mahouka Koukou no Rettousei - 16 (1920x1080 Hi10P BD FLAC) [F97267AB].mkv:si=0':force_style='FontName=Droid Sans Fallback':fontsdir='/config/fonts',hwmap" -b:v:0 4451487 -maxrate 4451487 -bufsize 8902974 -profile high -level 4.1 -look_ahead 0 -force_key_frames "expr:if(isnan(prev_forced_t),eq(t,t),gte(t,prev_forced_t+3))" -vsync -1 -codec:a:0 aac -strict experimental -metadata:s:a:0 language=jpn -disposition:a:0 default -ac:a:0 2 -ab:a:0 192000 -f segment -max_delay 5000000 -avoid_negative_ts disabled -map_metadata -1 -map_chapters -1 -start_at_zero -segment_time 3 -individual_header_trailer 0 -segment_format mpegts -segment_list_type m3u8 -segment_start_number 0 -segment_list "/transcoding/transcoding-temp/c73fab6ff193645686d40416929f3d27.m3u8" -y "/transcoding/transcoding-temp/c73fab6ff193645686d40416929f3d27%d.ts" Link to comment Share on other sites More sharing options...
softworkz 3326 Posted March 1, 2019 Share Posted March 1, 2019 Thanks for the line. So the hwmap is for allowing subtitle burn-in. Actually the direct mode is meant to avoid copying frames between system and gpu memory. Before we look into this any further: In your second example without the direct mode option, have you watched the output file and checked whether the subtitles are actually burnt into the video? Link to comment Share on other sites More sharing options...
ken-ji 0 Posted March 1, 2019 Author Share Posted March 1, 2019 yes. otherwise I wasn't going to report it. Link to comment Share on other sites More sharing options...
ken-ji 0 Posted December 21, 2019 Author Share Posted December 21, 2019 @@Luke @@softworkz So just wondering where we are at with tweaking the this ffmpeg hwmap which seems to be the cause of the poor performance of subtitle burn-in. I will admit my testing is limited to intel gpu - via quicksync, but wouldn't it make sense to disable the direct hwmap option on the intel GPUs? or allow it to be disabled to further see the effects? Link to comment Share on other sites More sharing options...
Luke 37025 Posted December 21, 2019 Share Posted December 21, 2019 Hi, yes there is room for improvement that we are working on. Thanks for the feedback. Link to comment Share on other sites More sharing options...
softworkz 3326 Posted December 22, 2019 Share Posted December 22, 2019 @@Luke @@softworkz So just wondering where we are at with tweaking the this ffmpeg hwmap which seems to be the cause of the poor performance of subtitle burn-in. I will admit my testing is limited to intel gpu - via quicksync, but wouldn't it make sense to disable the direct hwmap option on the intel GPUs? or allow it to be disabled to further see the effects? What do you mean by "disable the direct hwmap"? What alternative would you suggest? These are the possible variants I can think of: Download/overlay/upload: We could download the video frames from the GPU, perform the overlay, then upload back for encodingLast time I tested (and also obvious): This is slower than hwmap because all frames will be copied from GPU to CPU memory and back to GPU memory after processing (overlay) . HWMAP: This avoids copying the video frames by mapping GPU memory to CPU memory. That variant is only possible when the GPU uses shared system memory (e.g. onboard graphics) where memory is physically the same (and when the GPU supports that special mode)The subtitles are burnt-in by the subtitles filter as-if it would overlay local video frames. . Have a subtitles-filter that creates the text as video of transparent images and upload that to the GPU, then do the overlay in hardware.As nice as that sounds, afaik, the subtiles filter does not support this because it relies on having the original video frames for synchronizing the timing, so it wouldn't work overlaying over an empty video. (that approach might work for graphical subtitles, though) . Create a modified subtitles-filter that can work with and synchronize with hardware-frames.Sounds easy, but will require significant work. If you have any better idea... Link to comment Share on other sites More sharing options...
ken-ji 0 Posted December 23, 2019 Author Share Posted December 23, 2019 (edited) @@softworkz I meant this actual command /bin/ffmpeg -loglevel +timing -y -copyts -start_at_zero -f matroska,webm -hwaccel:0 vaapi -hwaccel_device:0 /dev/dri/renderD128 -hwaccel_output_format:0 vaapi -i "/mnt/user/Media/Anime/R/R-15/Show/[Kira-Fansub] R-15 - 01 (BD 1080p h264 FLAC) [2A134FA2].mkv" -filter_complex "[0:0]scale_vaapi,hwmap=mode=read+write+direct,format=nv12,subtitles='/mnt/user/Media/Anime/R/R-15/Show/[Kira-Fansub] R-15 - 01 (BD 1080p h264 FLAC) [2A134FA2].mkv':si=0:force_style='FontName=Droid Sans Fallback':fontsdir='/config/fonts',hwmap" -map 0:0 -map 0:1 -sn -c:v:0 h264_vaapi -b:v:0 4476127 -g:v:0 72 -maxrate:v:0 4476127 -bufsize:v:0 8952254 -sc_threshold:v:0 0 -keyint_min:v:0 72 -profile:v:0 high -level:v:0 4.1 -c:a:0 aac -ab:a:0 192000 -ac:a:0 2 -metadata:s:a:0 language=jpn -disposition:a:0 default -max_delay 5000000 -avoid_negative_ts disabled -f segment -map_metadata -1 -map_chapters -1 -segment_format mpegts -segment_list /transcode/transcoding-temp/9ac694d4d41f324c4c78dd2383f0bb2c.m3u8 -segment_list_type m3u8 -segment_time 3 -segment_start_number 0 -individual_header_trailer 0 -segment_write_temp 1 "/transcode/transcoding-temp/9ac694d4d41f324c4c78dd2383f0bb2c%d.ts" would like to be able to run it like this: /bin/ffmpeg -loglevel +timing -y -copyts -start_at_zero -f matroska,webm -hwaccel:0 vaapi -hwaccel_device:0 /dev/dri/renderD128 -hwaccel_output_format:0 vaapi -i "/mnt/user/Media/Anime/R/R-15/Show/[Kira-Fansub] R-15 - 01 (BD 1080p h264 FLAC) [2A134FA2].mkv" -filter_complex "[0:0]scale_vaapi,hwmap=mode=read+write,format=nv12,subtitles='/mnt/user/Media/Anime/R/R-15/Show/[Kira-Fansub] R-15 - 01 (BD 1080p h264 FLAC) [2A134FA2].mkv':si=0:fontsdir='/config/fonts',hwmap" -map 0:0 -map 0:1 -sn -c:v:0 h264_vaapi -b:v:0 4476127 -g:v:0 72 -maxrate:v:0 4476127 -bufsize:v:0 8952254 -sc_threshold:v:0 0 -keyint_min:v:0 72 -profile:v:0 high -level:v:0 4.1 -c:a:0 aac -ab:a:0 192000 -ac:a:0 2 -metadata:s:a:0 language=jpn -disposition:a:0 default -max_delay 5000000 -avoid_negative_ts disabled -f segment -map_metadata -1 -map_chapters -1 -segment_format mpegts -segment_list /transcode/transcoding-temp/9ac694d4d41f324c4c78dd2383f0bb2c.m3u8 -segment_list_type m3u8 -segment_time 3 -segment_start_number 0 -individual_header_trailer 0 -segment_write_temp 1 "/transcode/transcoding-temp/9ac694d4d41f324c4c78dd2383f0bb2c%d.ts" I removed the direct option of the hwmap and left it in read+write and the whole encode runs from about 18fps to about 80+fps. I also disabled the annoying forcing of just using Droid Sans Fallback as the only font. Seems to work well for my specific use case. I actually have no idea why omitting direct allows the whole transcoding run faster than with direct enabled. Maybe because I'm running a docker container? I guess this is the same as the HWMAP option you mentioned? Edited December 23, 2019 by ken-ji Link to comment Share on other sites More sharing options...
softworkz 3326 Posted December 24, 2019 Share Posted December 24, 2019 Thanks for your reply. This is quite interesting. Actually, the 'direct' option is meant to avoid copying and fail if that is not possible. But it's quite unexpected that it is causing a slowdown. We will investigate and test this further on various systems and situations. But it's a very good hint! We are about to re-work the whole hardware filter-chaining anyway, so you'll see some progress in the beta channel during the next few weeks. Thanks a lot for pointing this out! softworkz Link to comment Share on other sites More sharing options...
softworkz 3326 Posted December 24, 2019 Share Posted December 24, 2019 What you're experiencing could very well be caused by the fact that you're running inside Docker and that it's not possible to really get direct access to the system memory. Link to comment Share on other sites More sharing options...
ken-ji 0 Posted January 22, 2020 Author Share Posted January 22, 2020 @@softworkz I saw the Diagnostics plugin for 4.4.0.8 and tried it out The option that interested me the most is the parameter adjustment as it allowed me to disable the direct hwmapping - which mitigates the slow performance we were talking about with the docker version. Do we have a timeline for * disabling the direct hwmapping for docker versions * disabling the forced font styling of Droid Sans Fallback and I noticed only in this version, that if transcoding is done, the client (Emby for Fire TV 1.5.73a) the subtitles would still soft display along with the burned in subs. Link to comment Share on other sites More sharing options...
softworkz 3326 Posted January 22, 2020 Share Posted January 22, 2020 The option that interested me the most is the parameter adjustment as it allowed me to disable the direct hwmapping - which mitigates the slow performance we were talking about with the docker version. Do we have a timeline for * disabling the direct hwmapping for docker versions * disabling the forced font styling of Droid Sans Fallback Those are in fact good candidates for testing the alternatives via diagnostic options. and I noticed only in this version, that if transcoding is done, the client (Emby for Fire TV 1.5.73a) the subtitles would still soft display along with the burned in subs. Do you mean after you activated "force subtitle burn-in" in the diagnostic options? Link to comment Share on other sites More sharing options...
ken-ji 0 Posted January 23, 2020 Author Share Posted January 23, 2020 Do you mean after you activated "force subtitle burn-in" in the diagnostic options? Actually I didn't enable force subtitle burn-in. I only disabled "Allow subtitle extraction on the fly" Link to comment Share on other sites More sharing options...
Luke 37025 Posted January 23, 2020 Share Posted January 23, 2020 That will cause a lot of subtitle burn in so make sure you actually need that. Link to comment Share on other sites More sharing options...
ken-ji 0 Posted January 24, 2020 Author Share Posted January 24, 2020 @@Luke Yes, I know it does cause a lot of subtitle burn-in which i kinda prefer as a lot of the stuff I watch has ASS subtitles, and I prefer the advanced formatting. Treating ASS as simple subtitles causes awkward scenarios like two or more people talking and only seeing one person's subs; or having a lot of small text on screen that is translated, and seeing the whole screened filled with subtitles (no positioning or fonting) - Speaking of fonting, can we make the forcing of font to Droid Sans Falllback a configurable option? (ie allow us to add a font file and use that, or turn the setting off all together? Link to comment Share on other sites More sharing options...
Luke 37025 Posted January 24, 2020 Share Posted January 24, 2020 We can't yet allow turning if off as on some platforms it will fail. But hopefully down the line we can work those things out. Link to comment Share on other sites More sharing options...
softworkz 3326 Posted February 13, 2020 Share Posted February 13, 2020 @@ken-ji - Please try the latest beta (.13) - it used hwupload and hwdownload instead of hwmap. Link to comment Share on other sites More sharing options...
ken-ji 0 Posted February 13, 2020 Author Share Posted February 13, 2020 I've given it a try on the browser client, remotely. though seems a lot of my stuff is now transcoded using software (probably because of the subtitles) I was able to try watching a 4k HEVC file with image subtitles and it transcoded using hardware and seemed just as fast as hwmap:read+write Seems like we are going the right direction. Link to comment Share on other sites More sharing options...
softworkz 3326 Posted February 13, 2020 Share Posted February 13, 2020 I've given it a try on the browser client, remotely. though seems a lot of my stuff is now transcoded using software (probably because of the subtitles) You mean because of the browser client? I was able to try watching a 4k HEVC file with image subtitles and it transcoded using hardware and seemed just as fast as hwmap:read+write Seems like we are going the right direction. What do you mean by "just as fast". Earlier you said that hwmap would be slow on Docker and that we should use hwdownload instead, which is what we're doing right now (at least temporarily for testing how that compares). Link to comment Share on other sites More sharing options...
ken-ji 0 Posted February 13, 2020 Author Share Posted February 13, 2020 @@softworkz You mean because of the browser client? I'm currently busy and unable to test with a client device like Roku or FireTV What do you mean by "just as fast". Earlier you said that hwmap would be slow on Docker and that we should use hwdownload instead, which is what we're doing right now (at least temporarily for testing how that compares). Sorry if I wasn't clear. I meant its running pretty fast on the few videos I tried with (HW transcoding) - about 60fps which is more than the ~20fps when we do HW transcoding before with the hwmap-direct I mentioned before that hwmap was faster on docker containers if you omitted the direct option hence my answer that it was "as fast as hwmap:read+write". Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now