Jump to content


Photo

Scale subtitle stream to target dimension instead of the original dimension

transcoding subtitle 4K

  • Please log in to reply
34 replies to this topic

#1 alyssa0326rr OFFLINE  

alyssa0326rr

    Newbie

  • Members
  • 3 posts
  • Local time: 05:36 AM

Posted 18 July 2018 - 12:10 AM

I met a situation that my server can transcode 4K HEVC video to 1080P smoothly (40~80fps) , but cannot when subtitle enabled (15fps)

After the investigation, I found the difference of ffmpeg commond is the filters.

 

The original filter is 

 -filter_complex '[0:8]scale=3840:1604:force_original_aspect_ratio=decrease[sub];[0:0][sub]overlay,scale=trunc(min(max(iw\,ih*dar)\,1920)/2)*2:trunc(ow/dar/2)*2' 

 

I tried to learn what does this mean, then figured out that the subtitle steam first converted to a 4K size steam and overlay with the original, finally converted to 1080P,

 

Then I tried to change this filter to 

 -filter_complex '[0:8]scale=1920:802:force_original_aspect_ratio=decrease[sub];[0:0]scale=trunc(min(max(iw\,ih*dar)\,1920)/2)*2:trunc(ow/dar/2)*2[video];[video][sub]overlay'

 

that first convert the original video and subtitle to 1080P, and overlay them together

rerun the command, and it works, fps is around 37.

 

I checked the output ts files, they contains the subtitle I want.

 

Do you think this is possible to do?

Thanks.

 



#2 Luke OFFLINE  

Luke

    System Architect

  • Administrators
  • 137728 posts
  • Local time: 05:36 PM

Posted 18 July 2018 - 12:12 AM

Hi there, thanks for the suggestion ! @Waldonnis, what's your take?



#3 Waldonnis OFFLINE  

Waldonnis

    Advanced Member

  • Members
  • 652 posts
  • Local time: 05:36 PM

Posted 18 July 2018 - 04:38 PM

Interesting idea and I can't think of any reason not to do this.  In fact, I thought prior to this post that that sub overlay generation was always done to the target frame dimensions rather than the source's...guess I was wrong.  I would assume that the same scale expression should be used for both the sub overlay and the source, but aside from that, I don't see any bad side-effects to this off-hand and the results should be the same.

 

Good catch!



#4 Luke OFFLINE  

Luke

    System Architect

  • Administrators
  • 137728 posts
  • Local time: 05:36 PM

Posted 18 July 2018 - 05:11 PM

Well i guess one issue would be that we're going to have to know the exact output size, whereas before we would just give ffmpeg max width/height values and let it figure it all out. So that's potentially going to be tricky with anamorphic content, no?



#5 Waldonnis OFFLINE  

Waldonnis

    Advanced Member

  • Members
  • 652 posts
  • Local time: 05:36 PM

Posted 18 July 2018 - 05:38 PM

Hmmm, It might be.  I hadn't considered anamorphic (doh).  I'll think about it and see if I have an anamorphic sample laying around with a subtitle track.  I'm wondering if the scale2ref filter would come in handy here (using it against the already-scaled video pad), but I'd have to try it.


Edited by Waldonnis, 18 July 2018 - 05:39 PM.


#6 Luke OFFLINE  

Luke

    System Architect

  • Administrators
  • 137728 posts
  • Local time: 05:36 PM

Posted 18 July 2018 - 05:45 PM

Another question is what happens if we're not scaling.

#7 Waldonnis OFFLINE  

Waldonnis

    Advanced Member

  • Members
  • 652 posts
  • Local time: 05:36 PM

Posted 18 July 2018 - 06:03 PM

Another question is what happens if we're not scaling.

 

That question entered my mind and why I was thinking about scale2ref.  It would theoretically allow the rendered subtitle frames to be scaled to whatever the specified pad's dimensions are...even if the other pad isn't being scaled (I've done this with watermark/graphic image overlays before, but not with subtitles).

 

Thinking more about it, there are some potential quirks that I want to check out (interaction with libass rendering, etc).  Pretty sure I found an anamorphic DVD to rip/play with and have tons of UHDs and BRDs (and can convert almost any sub format to pretty much any other sub format), so I'll see what I can figure out.



#8 Luke OFFLINE  

Luke

    System Architect

  • Administrators
  • 137728 posts
  • Local time: 05:36 PM

Posted 18 July 2018 - 09:36 PM

Great, thank you.

#9 Waldonnis OFFLINE  

Waldonnis

    Advanced Member

  • Members
  • 652 posts
  • Local time: 05:36 PM

Posted 18 July 2018 - 10:27 PM

Did a quick preliminary test with an anamorphic DVD (so, VOBSUBs) just to see if scale2ref worked as I figured it would with easy picture-based subs.  I still have yet to test a bunch of other sub types (and rip some UHD/BRD samples to test with, which is time consuming), so this is just preliminary.  I just wanted to confirm scale2ref's behaviour with picture-based subtitles, as I already knew that it worked for other inputs like standard images (which are basically the same thing as picture subs anyway).

 

My original file had subs that were already the same dimensions as the video (720p), so I created two files: one upscaled the video to 1080p and one downscaled to 320x"-1" (without touching the sub frame size, so they were still 720p) to see what happened with a larger/smaller native subtitle frame compared to the main video.  In both cases, I had to scale the subtitles to fit the existing dimensions of the video, so both operations simulate not scaling the video at all...and in both cases, scale2ref worked very well.  Example command line (0:0 is the video's stream ID and 0:2 is the sub's stream ID):

ffmpeg -i input.mkv -filter_complex "[0:2][0:0]scale2ref[sub][ref];[ref][sub]overlay[v]" -map "[v]" -map 0:a -c:v libx264 -c:a copy -y output.mkv

I also did a closer simulation to what Emby would do if we scaled the video up/down and it also worked equally well without having to supply any frame dimensions for the subtitle stream's scaling.  Example:

ffmpeg -i input.mkv -filter_complex "[0:0]scale=320:-1[video];[0:2][video]scale2ref[sub][ref];[ref][sub]overlay[v]" -map "[v]" -map 0:a -c:v libx264 -c:a copy -y output.mkv

Of course, mappings and pad names are flexible, but the thing to note is that I scale the video first, then base the sub scaling on scale's output.  In the case where video isn't scaled, I just fed it the stream ID of the video instead of an output pad name.  This would probably work with any existing picture subs, but I still need to play with PGS and a few other formats to be sure.  scale2ref has additional options available (and I think it supports at least some of scale's options as well), but I doubt they'd be needed for such a simple operation.

 

I'd be interested in seeing how this compares to the suggested command line when it comes to transcoding rate.  I'd assume it would be similar, but my only test file so far is too "easy" to really show much of a difference.  I suspect it'll be easier to compare once I get a UHD file copied over.



#10 Waldonnis OFFLINE  

Waldonnis

    Advanced Member

  • Members
  • 652 posts
  • Local time: 05:36 PM

Posted 18 July 2018 - 10:34 PM

Oh, and since you're using the subtitles filter for text-based subs, I doubt you'd even need to bother with any of this for text-based subs.  Those are written using libass directly onto the frame in the filtergraph, so scaling that output should handle everything (obviously).

 

I don't deal with subtitles very often (nor overlays) and tend to forget techniques that I don't use frequently, so this has been a nice reason to refresh/relearn stuff and add it to my notes  :P



#11 alyssa0326rr OFFLINE  

alyssa0326rr

    Newbie

  • Members
  • 3 posts
  • Local time: 05:36 AM

Posted 18 July 2018 - 11:07 PM

I tested my ffmpeg command with scale ref

 -filter_complex '[0:0]scale=trunc(min(max(iw\,ih*dar)\,1920)/2)*2:trunc(ow/dar/2)*2[video];[0:8][video]scale2ref[sub][ref];[ref][sub]overlay' 

is working.

The position of subtitle is accurate in output video.

 

 

One observation is that, the encoding is done by CPU.

 

My platform is intel G4560, and intel_quick_sync is set in Emby.



#12 Waldonnis OFFLINE  

Waldonnis

    Advanced Member

  • Members
  • 652 posts
  • Local time: 05:36 PM

Posted 18 July 2018 - 11:13 PM

I tested my ffmpeg command with scale ref

 -filter_complex '[0:0]scale=trunc(min(max(iw\,ih*dar)\,1920)/2)*2:trunc(ow/dar/2)*2[video];[0:8][video]scale2ref[sub][ref];[ref][sub]overlay' 

is working.

The position of subtitle is accurate in output video.

 

 

One observation is that, the encoding is done by CPU.

 

My platform is intel G4560, and intel_quick_sync is set in Emby.

 

Nothing in the filter string determines the encoder, so I'm not sure what's going on.  In my command line examples, I use libx264 because my iGPU is deactivated at the moment (unplugged) and tend to use libx264 out of habit when testing.  If you're using my command line to test, just change -c:v libx264 to -c:v h264_qsv to see if that works.  If not, then post the full command line and I can figure out what's going on.



#13 Luke OFFLINE  

Luke

    System Architect

  • Administrators
  • 137728 posts
  • Local time: 05:36 PM

Posted 18 July 2018 - 11:37 PM

@Waldonnis, you've got two samples there, with and without scaling. can the second one be used in all situations, for example, simply omitting the scale portion if there is no scaling? thanks.



#14 Waldonnis OFFLINE  

Waldonnis

    Advanced Member

  • Members
  • 652 posts
  • Local time: 05:36 PM

Posted 19 July 2018 - 12:15 AM

@Waldonnis, you've got two samples there, with and without scaling. can the second one be used in all situations, for example, simply omitting the scale portion if there is no scaling? thanks.

 

 

Not sure I get the question, but I'll take a stab at it.  If neither the video or sub streams need to be scaled, you can just use the overlay filter using the stream IDs as input pads, like so:

ffmpeg -i input.mkv -filter_complex "[0:v][0:2]overlay[v]" -map "[v]" -map 0:a -c:v libx264 -c:a copy -y output.mkv

That being said, I just realised I misread the original filter string a bit and also thought of a really bad catch with scale2ref (videos cropped to remove letter/pillarbox bars), so strike the scale2ref idea.  Note to self: don't think too hard when tired.

 

I'll revisit the original filter string after some sleep to see what improvements can be made....shouldn't be too hard with a less-frazzled brain.



#15 Luke OFFLINE  

Luke

    System Architect

  • Administrators
  • 137728 posts
  • Local time: 05:36 PM

Posted 19 July 2018 - 12:21 AM

Ok. the other thing to consider is the case of external sub/idx. I had all of this working nicely with your scale2ref examples, but with external sub/idx, although the transcoding was successful, there were no visible subs.

ffmpeg.exe -f matroska,webm -i file:"test.mkv" -canvas_size 1920:1080 -i "test.idx" -threads 0 -map 0:0 -map 0:1 -map 1:0 -sn -codec:v:0 libx264 -filter_complex "[0:0][1:0]scale2ref[sub][ref];[ref][sub]overlay[0:0]" -pix_fmt yuv420p -preset veryfast -crf 23 -maxrate 13570827 -bufsize 27141654 -profile:v high -level 4.1 -x264opts:0 subme=0:me_range=4:rc_lookahead=10:me=dia:no_chroma_me:8x8dct=0:partitions=none -force_key_frames "expr:if(isnan(prev_forced_t),eq(t,t),gte(t,prev_forced_t+3))" -copyts -vsync -1 -codec:a:0 libmp3lame -ac 2 -ab 384000 -af "volume=2" -f segment -max_delay 5000000 -avoid_negative_ts disabled -map_metadata -1 -map_chapters -1 -start_at_zero -segment_time 3  -individual_header_trailer 0 -segment_format mpegts -segment_list_type m3u8 -segment_start_number 0 -segment_list "7ddcdf635c2fcc96b9930936381e31ca.m3u8" -y "7ddcdf635c2fcc96b9930936381e31ca%d.ts"


#16 Waldonnis OFFLINE  

Waldonnis

    Advanced Member

  • Members
  • 652 posts
  • Local time: 05:36 PM

Posted 19 July 2018 - 05:36 PM

Okay, a bit fresher now  B)  I do have some questions before diving into this further about what the expected behaviour would be in situations where videos are cropped to remove padding/letterboxing (so their DAR isn't 16:9 or 4:3 any longer).  I mostly just want to make sure I'm not overthinking things here.

 

This is tricky to word in a way that can be easily communicated, but I'll give it a shot.  If a movie is cropped to say 2:35:1 but the sub stream's dimensions are still 16:9, is the goal to scale the 16:9 sub frame to fit within the 2:35:1 video area (thereby rendering the captions in the video area) or just render them as-is?  I can see some problems if subs were originally intentionally rendered in places where the letterbox padding would've been since no video exists there...but I can also see problems if it's scaled since it would throw off any positioning of the text elements not to mention that the text would be smaller/aliased due to scaling.  Basically, it's the difference between scaling to fit width/height (whichever is smaller) while preserving the sub AR...or not scaling at all and risking cropped-out subtitles if they were originally rendered outside of the cropped AR.  I've rarely seen subs that were rendered to display in the padding, so this may not be a big concern overall, but worth asking about since there are ways to mitigate this type of thing.

 

If the intent is to always render the entire sub frame within the dimensions of the cropped video even if the ARs of the video and subs don't match, then scale2ref could work with the proper option to preserve the original subs' AR (to prevent stretching).  If you want to render the subs with their original AR and don't want to scale, then you'd have to intentionally add the padding back during the transcode to restore the subs' AR, which is pretty easy.

 

My gut is to say that rendering them as-is and hoping they end up in-frame is the way to go (mostly because they did all kinds of weird crap in the DVD days and most subs are rendered to display over the video area anyway), but wanted to get an idea of what was currently happening v. what you expected to happen.



#17 Luke OFFLINE  

Luke

    System Architect

  • Administrators
  • 137728 posts
  • Local time: 05:36 PM

Posted 19 July 2018 - 05:39 PM

Of these two choices, which way do our current command lines go?



#18 Waldonnis OFFLINE  

Waldonnis

    Advanced Member

  • Members
  • 652 posts
  • Local time: 05:36 PM

Posted 19 July 2018 - 10:10 PM

Of these two choices, which way do our current command lines go?

 

I don't have a log handy to tell, but from memory of previous logs and the filter in the OP, it looks like you're scaling the sub frame to the video's dimensions if needed while keeping the sub frame's AR.  This should result in guaranteeing that the entire sub frame fits within the video's size even if it's cropped (it's shrinking the frame to fit within the now-smaller video frame).  This is likely the safest option in case the text was originally rendered to display in a padding area (so it doesn't get cropped out).  Aliasing and positional changes shouldn't be too drastic either, so I can't argue with the decision to do that.  What I don't know is if there are cases where you just use the overlay filter without scaling anything and what those cases are (not gonna whip up samples just to test all of the variations here since the code should reflect that, so any logs or a pointer to the proper spot in the code would be welcome).

 

If the behaviour is always to scale subs to the video's size regardless and it's always scaling the sub frame as I outlined above, then scale2ref is back on the table.  It would allow you to not care about whether or not video is being scaled since it would base scaling on whatever reference frame it's given.  I haven't tested it, but I'd guess it would also require the force_original_aspect_ratio filter option like the scale filter in the OP does so the sub frame's AR is preserved....something I didn't do in my examples above (scale2ref is pretty much just like scale, but uses a reference to determine the dimensions rather than having to specify them, then outputs both the reference and the scaled second frame).  If there are times when you just use the overlay filter rather than doing any scaling of the video and/or sub streams, then nothing would change for those cases (except maybe in 2160p cases, as noted below).

 

This probably hasn't been much of a problem before since DVD- and BRD-sourced subs are often the same dimensions as the video and the lower resolution (compared to 2160p) meant scaling up/down wasn't as heavy of an operation when you needed to do it.  The UHD-sourced PGS streams I've checked so far have all been 1080p, so they'd require upscaling for 2160p overlay and would likely not require any scaling at all in the OP's situation.  Since this kind of dimension discrepancy was previously uncommon but may be more likely in the future, maybe the better path is to rethink when/how to resize subtitles...possibly such that only overlay is used unless the dimensions of the sub and video streams differ.  In such cases, just scale2ref the sub streams to either the rescaled video or the source video stream specifier (which ever is applicable).  You could even probably use the null filter if you wanted to use a the same pad name for the unchanged video stream that you would've used if you were scaling the video....so that you could use the same scale2ref syntax for both rescaled and unscaled video situations (something like "[0:0]null[ref]" so that you could always use a pad named [ref] as input for scale2ref; again, untested by me, but it should work).  You probably get the idea, but I can whip up an example if needed.


Edited by Waldonnis, 19 July 2018 - 10:12 PM.


#19 Luke OFFLINE  

Luke

    System Architect

  • Administrators
  • 137728 posts
  • Local time: 05:36 PM

Posted 19 July 2018 - 10:35 PM

when would the resolution of embedded subtitles not match the resolution of the video stream?



#20 Waldonnis OFFLINE  

Waldonnis

    Advanced Member

  • Members
  • 652 posts
  • Local time: 05:36 PM

Posted 20 July 2018 - 12:20 AM

when would the resolution of embedded subtitles not match the resolution of the video stream?

 

The most common example would be if someone re-encoded a BRD rip and cropped the letterbox area out leaving just the video in its actual AR rather than the padded 16x9 frame on the disc...and doing this without cropping the sub stream too (nobody crops the sub stream, in my experience...they just copy it).  For example, one of my files started out with 16x9 1080p dimensions, but I cropped all of the letterbox bar area out and re-encoded it to just include the 2.40:1 AR of the film itself (so the final dimensions are 1920x810).  The PGS sub stream, however, is still 16x9 1920x1080 (1080p).  If the subtitle text would ordinarily be rendered in the area that I had cropped out, then not scaling would mean some/all of the caption text would be rendered off screen, so I would need to scale down the subtitles so the entire sub frame fits within the smaller vertical area of the 2.40:1 AR, 810-pixels of video.

 

One of my other files still contains the letterbox bars and is still 1920x1080 16:9 (same as the subtitle stream), so there's no risk of rendering off-screen and don't have to scale the subtitle frame at all and a simple overlay would work fine.

 

In the case of the UHDs I've seen so far (maybe all?), the PGS subtitle streams have all had dimensions that were 25% of the video's 2160p frame (so, the subs are 1080p while the video is 2160p, but both are the same AR).  If I rendered a 1080p frame on top of a 2160p video, I'd end up with a caption frame that is small compared to the video and "floats" somewhere within the much larger video frame (likely smack dab in the centre).  It's entirely possible that ffmpeg could autoscale in these situations, but I've never tried it and wouldn't expect it to do so.

 

In both situations (cropped input video and UHD), the sub stream's dimensions would differ from the source video's...and scaling would be appropriate for different reasons (for cropped, because we might render offscreen and with UHD because the sub frame would be too small).  In the case of my "still 1080p" video example above, no scaling is needed because the dimensions do match between the streams.

 

I haven't thought about how anamorphic would fit in to this, but I'd expect you could choose to always use scale2ref and if the presentation seems odd during testing, changing the force_original_aspect_ratio option will probably compensate for it.  Honestly, I have no idea how picture-based subs are rendered with anamorphic content, so that's more of a guess.  I do know that the subs on my anamorphic test file have no SAR/DAR (obviously), so they normally are either stretched to match the DAR-derived dimensions or are rendered as-is without any scaling (no idea which is done on players since I've it's something I've never had to think about).  Either way, it's easy to reproduce that behaviour once we figure out what it's supposed to be.


Edited by Waldonnis, 20 July 2018 - 12:22 AM.






Also tagged with one or more of these keywords: transcoding, subtitle, 4K

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users