Jump to content

Scale subtitle stream to target dimension instead of the original dimension


alyssa0326rr

Recommended Posts

alyssa0326rr

I met a situation that my server can transcode 4K HEVC video to 1080P smoothly (40~80fps) , but cannot when subtitle enabled (15fps)

After the investigation, I found the difference of ffmpeg commond is the filters.

 

The original filter is 

 -filter_complex '[0:8]scale=3840:1604:force_original_aspect_ratio=decrease;[0:0]overlay,scale=trunc(min(max(iw\,ih*dar)\,1920)/2)*2:trunc(ow/dar/2)*2' 

 

I tried to learn what does this mean, then figured out that the subtitle steam first converted to a 4K size steam and overlay with the original, finally converted to 1080P,

 

Then I tried to change this filter to 

 -filter_complex '[0:8]scale=1920:802:force_original_aspect_ratio=decrease;[0:0]scale=trunc(min(max(iw\,ih*dar)\,1920)/2)*2:trunc(ow/dar/2)*2;overlay'

 

that first convert the original video and subtitle to 1080P, and overlay them together

rerun the command, and it works, fps is around 37.

 

I checked the output ts files, they contains the subtitle I want.

 

Do you think this is possible to do?

Thanks.

 

Link to comment
Share on other sites

Waldonnis

Interesting idea and I can't think of any reason not to do this.  In fact, I thought prior to this post that that sub overlay generation was always done to the target frame dimensions rather than the source's...guess I was wrong.  I would assume that the same scale expression should be used for both the sub overlay and the source, but aside from that, I don't see any bad side-effects to this off-hand and the results should be the same.

 

Good catch!

Link to comment
Share on other sites

Well i guess one issue would be that we're going to have to know the exact output size, whereas before we would just give ffmpeg max width/height values and let it figure it all out. So that's potentially going to be tricky with anamorphic content, no?

Link to comment
Share on other sites

Waldonnis

Hmmm, It might be.  I hadn't considered anamorphic (doh).  I'll think about it and see if I have an anamorphic sample laying around with a subtitle track.  I'm wondering if the scale2ref filter would come in handy here (using it against the already-scaled video pad), but I'd have to try it.

Edited by Waldonnis
Link to comment
Share on other sites

Waldonnis

Another question is what happens if we're not scaling.

 

That question entered my mind and why I was thinking about scale2ref.  It would theoretically allow the rendered subtitle frames to be scaled to whatever the specified pad's dimensions are...even if the other pad isn't being scaled (I've done this with watermark/graphic image overlays before, but not with subtitles).

 

Thinking more about it, there are some potential quirks that I want to check out (interaction with libass rendering, etc).  Pretty sure I found an anamorphic DVD to rip/play with and have tons of UHDs and BRDs (and can convert almost any sub format to pretty much any other sub format), so I'll see what I can figure out.

Link to comment
Share on other sites

Waldonnis

Did a quick preliminary test with an anamorphic DVD (so, VOBSUBs) just to see if scale2ref worked as I figured it would with easy picture-based subs.  I still have yet to test a bunch of other sub types (and rip some UHD/BRD samples to test with, which is time consuming), so this is just preliminary.  I just wanted to confirm scale2ref's behaviour with picture-based subtitles, as I already knew that it worked for other inputs like standard images (which are basically the same thing as picture subs anyway).

 

My original file had subs that were already the same dimensions as the video (720p), so I created two files: one upscaled the video to 1080p and one downscaled to 320x"-1" (without touching the sub frame size, so they were still 720p) to see what happened with a larger/smaller native subtitle frame compared to the main video.  In both cases, I had to scale the subtitles to fit the existing dimensions of the video, so both operations simulate not scaling the video at all...and in both cases, scale2ref worked very well.  Example command line (0:0 is the video's stream ID and 0:2 is the sub's stream ID):

ffmpeg -i input.mkv -filter_complex "[0:2][0:0]scale2ref[sub][ref];[ref][sub]overlay[v]" -map "[v]" -map 0:a -c:v libx264 -c:a copy -y output.mkv

I also did a closer simulation to what Emby would do if we scaled the video up/down and it also worked equally well without having to supply any frame dimensions for the subtitle stream's scaling.  Example:

ffmpeg -i input.mkv -filter_complex "[0:0]scale=320:-1[video];[0:2][video]scale2ref[sub][ref];[ref][sub]overlay[v]" -map "[v]" -map 0:a -c:v libx264 -c:a copy -y output.mkv

Of course, mappings and pad names are flexible, but the thing to note is that I scale the video first, then base the sub scaling on scale's output.  In the case where video isn't scaled, I just fed it the stream ID of the video instead of an output pad name.  This would probably work with any existing picture subs, but I still need to play with PGS and a few other formats to be sure.  scale2ref has additional options available (and I think it supports at least some of scale's options as well), but I doubt they'd be needed for such a simple operation.

 

I'd be interested in seeing how this compares to the suggested command line when it comes to transcoding rate.  I'd assume it would be similar, but my only test file so far is too "easy" to really show much of a difference.  I suspect it'll be easier to compare once I get a UHD file copied over.

Link to comment
Share on other sites

Waldonnis

Oh, and since you're using the subtitles filter for text-based subs, I doubt you'd even need to bother with any of this for text-based subs.  Those are written using libass directly onto the frame in the filtergraph, so scaling that output should handle everything (obviously).

 

I don't deal with subtitles very often (nor overlays) and tend to forget techniques that I don't use frequently, so this has been a nice reason to refresh/relearn stuff and add it to my notes  :P

Link to comment
Share on other sites

alyssa0326rr

I tested my ffmpeg command with scale ref

 -filter_complex '[0:0]scale=trunc(min(max(iw\,ih*dar)\,1920)/2)*2:trunc(ow/dar/2)*2[video];[0:8][video]scale2ref[sub][ref];[ref][sub]overlay' 

is working.

The position of subtitle is accurate in output video.

 

 

One observation is that, the encoding is done by CPU.

 

My platform is intel G4560, and intel_quick_sync is set in Emby.

Link to comment
Share on other sites

Waldonnis

I tested my ffmpeg command with scale ref

 -filter_complex '[0:0]scale=trunc(min(max(iw\,ih*dar)\,1920)/2)*2:trunc(ow/dar/2)*2[video];[0:8][video]scale2ref[sub][ref];[ref][sub]overlay' 

is working.

The position of subtitle is accurate in output video.

 

 

One observation is that, the encoding is done by CPU.

 

My platform is intel G4560, and intel_quick_sync is set in Emby.

 

Nothing in the filter string determines the encoder, so I'm not sure what's going on.  In my command line examples, I use libx264 because my iGPU is deactivated at the moment (unplugged) and tend to use libx264 out of habit when testing.  If you're using my command line to test, just change -c:v libx264 to -c:v h264_qsv to see if that works.  If not, then post the full command line and I can figure out what's going on.

Link to comment
Share on other sites

@@Waldonnis, you've got two samples there, with and without scaling. can the second one be used in all situations, for example, simply omitting the scale portion if there is no scaling? thanks.

Link to comment
Share on other sites

Waldonnis

@@Waldonnis, you've got two samples there, with and without scaling. can the second one be used in all situations, for example, simply omitting the scale portion if there is no scaling? thanks.

 

 

Not sure I get the question, but I'll take a stab at it.  If neither the video or sub streams need to be scaled, you can just use the overlay filter using the stream IDs as input pads, like so:

ffmpeg -i input.mkv -filter_complex "[0:v][0:2]overlay[v]" -map "[v]" -map 0:a -c:v libx264 -c:a copy -y output.mkv

That being said, I just realised I misread the original filter string a bit and also thought of a really bad catch with scale2ref (videos cropped to remove letter/pillarbox bars), so strike the scale2ref idea.  Note to self: don't think too hard when tired.

 

I'll revisit the original filter string after some sleep to see what improvements can be made....shouldn't be too hard with a less-frazzled brain.

Link to comment
Share on other sites

Ok. the other thing to consider is the case of external sub/idx. I had all of this working nicely with your scale2ref examples, but with external sub/idx, although the transcoding was successful, there were no visible subs.

ffmpeg.exe -f matroska,webm -i file:"test.mkv" -canvas_size 1920:1080 -i "test.idx" -threads 0 -map 0:0 -map 0:1 -map 1:0 -sn -codec:v:0 libx264 -filter_complex "[0:0][1:0]scale2ref[sub][ref];[ref][sub]overlay[0:0]" -pix_fmt yuv420p -preset veryfast -crf 23 -maxrate 13570827 -bufsize 27141654 -profile:v high -level 4.1 -x264opts:0 subme=0:me_range=4:rc_lookahead=10:me=dia:no_chroma_me:8x8dct=0:partitions=none -force_key_frames "expr:if(isnan(prev_forced_t),eq(t,t),gte(t,prev_forced_t+3))" -copyts -vsync -1 -codec:a:0 libmp3lame -ac 2 -ab 384000 -af "volume=2" -f segment -max_delay 5000000 -avoid_negative_ts disabled -map_metadata -1 -map_chapters -1 -start_at_zero -segment_time 3  -individual_header_trailer 0 -segment_format mpegts -segment_list_type m3u8 -segment_start_number 0 -segment_list "7ddcdf635c2fcc96b9930936381e31ca.m3u8" -y "7ddcdf635c2fcc96b9930936381e31ca%d.ts"
Link to comment
Share on other sites

Waldonnis

Okay, a bit fresher now  B)  I do have some questions before diving into this further about what the expected behaviour would be in situations where videos are cropped to remove padding/letterboxing (so their DAR isn't 16:9 or 4:3 any longer).  I mostly just want to make sure I'm not overthinking things here.

 

This is tricky to word in a way that can be easily communicated, but I'll give it a shot.  If a movie is cropped to say 2:35:1 but the sub stream's dimensions are still 16:9, is the goal to scale the 16:9 sub frame to fit within the 2:35:1 video area (thereby rendering the captions in the video area) or just render them as-is?  I can see some problems if subs were originally intentionally rendered in places where the letterbox padding would've been since no video exists there...but I can also see problems if it's scaled since it would throw off any positioning of the text elements not to mention that the text would be smaller/aliased due to scaling.  Basically, it's the difference between scaling to fit width/height (whichever is smaller) while preserving the sub AR...or not scaling at all and risking cropped-out subtitles if they were originally rendered outside of the cropped AR.  I've rarely seen subs that were rendered to display in the padding, so this may not be a big concern overall, but worth asking about since there are ways to mitigate this type of thing.

 

If the intent is to always render the entire sub frame within the dimensions of the cropped video even if the ARs of the video and subs don't match, then scale2ref could work with the proper option to preserve the original subs' AR (to prevent stretching).  If you want to render the subs with their original AR and don't want to scale, then you'd have to intentionally add the padding back during the transcode to restore the subs' AR, which is pretty easy.

 

My gut is to say that rendering them as-is and hoping they end up in-frame is the way to go (mostly because they did all kinds of weird crap in the DVD days and most subs are rendered to display over the video area anyway), but wanted to get an idea of what was currently happening v. what you expected to happen.

Link to comment
Share on other sites

Waldonnis

Of these two choices, which way do our current command lines go?

 

I don't have a log handy to tell, but from memory of previous logs and the filter in the OP, it looks like you're scaling the sub frame to the video's dimensions if needed while keeping the sub frame's AR.  This should result in guaranteeing that the entire sub frame fits within the video's size even if it's cropped (it's shrinking the frame to fit within the now-smaller video frame).  This is likely the safest option in case the text was originally rendered to display in a padding area (so it doesn't get cropped out).  Aliasing and positional changes shouldn't be too drastic either, so I can't argue with the decision to do that.  What I don't know is if there are cases where you just use the overlay filter without scaling anything and what those cases are (not gonna whip up samples just to test all of the variations here since the code should reflect that, so any logs or a pointer to the proper spot in the code would be welcome).

 

If the behaviour is always to scale subs to the video's size regardless and it's always scaling the sub frame as I outlined above, then scale2ref is back on the table.  It would allow you to not care about whether or not video is being scaled since it would base scaling on whatever reference frame it's given.  I haven't tested it, but I'd guess it would also require the force_original_aspect_ratio filter option like the scale filter in the OP does so the sub frame's AR is preserved....something I didn't do in my examples above (scale2ref is pretty much just like scale, but uses a reference to determine the dimensions rather than having to specify them, then outputs both the reference and the scaled second frame).  If there are times when you just use the overlay filter rather than doing any scaling of the video and/or sub streams, then nothing would change for those cases (except maybe in 2160p cases, as noted below).

 

This probably hasn't been much of a problem before since DVD- and BRD-sourced subs are often the same dimensions as the video and the lower resolution (compared to 2160p) meant scaling up/down wasn't as heavy of an operation when you needed to do it.  The UHD-sourced PGS streams I've checked so far have all been 1080p, so they'd require upscaling for 2160p overlay and would likely not require any scaling at all in the OP's situation.  Since this kind of dimension discrepancy was previously uncommon but may be more likely in the future, maybe the better path is to rethink when/how to resize subtitles...possibly such that only overlay is used unless the dimensions of the sub and video streams differ.  In such cases, just scale2ref the sub streams to either the rescaled video or the source video stream specifier (which ever is applicable).  You could even probably use the null filter if you wanted to use a the same pad name for the unchanged video stream that you would've used if you were scaling the video....so that you could use the same scale2ref syntax for both rescaled and unscaled video situations (something like "[0:0]null[ref]" so that you could always use a pad named [ref] as input for scale2ref; again, untested by me, but it should work).  You probably get the idea, but I can whip up an example if needed.

Edited by Waldonnis
Link to comment
Share on other sites

when would the resolution of embedded subtitles not match the resolution of the video stream?

Link to comment
Share on other sites

Waldonnis

when would the resolution of embedded subtitles not match the resolution of the video stream?

 

The most common example would be if someone re-encoded a BRD rip and cropped the letterbox area out leaving just the video in its actual AR rather than the padded 16x9 frame on the disc...and doing this without cropping the sub stream too (nobody crops the sub stream, in my experience...they just copy it).  For example, one of my files started out with 16x9 1080p dimensions, but I cropped all of the letterbox bar area out and re-encoded it to just include the 2.40:1 AR of the film itself (so the final dimensions are 1920x810).  The PGS sub stream, however, is still 16x9 1920x1080 (1080p).  If the subtitle text would ordinarily be rendered in the area that I had cropped out, then not scaling would mean some/all of the caption text would be rendered off screen, so I would need to scale down the subtitles so the entire sub frame fits within the smaller vertical area of the 2.40:1 AR, 810-pixels of video.

 

One of my other files still contains the letterbox bars and is still 1920x1080 16:9 (same as the subtitle stream), so there's no risk of rendering off-screen and don't have to scale the subtitle frame at all and a simple overlay would work fine.

 

In the case of the UHDs I've seen so far (maybe all?), the PGS subtitle streams have all had dimensions that were 25% of the video's 2160p frame (so, the subs are 1080p while the video is 2160p, but both are the same AR).  If I rendered a 1080p frame on top of a 2160p video, I'd end up with a caption frame that is small compared to the video and "floats" somewhere within the much larger video frame (likely smack dab in the centre).  It's entirely possible that ffmpeg could autoscale in these situations, but I've never tried it and wouldn't expect it to do so.

 

In both situations (cropped input video and UHD), the sub stream's dimensions would differ from the source video's...and scaling would be appropriate for different reasons (for cropped, because we might render offscreen and with UHD because the sub frame would be too small).  In the case of my "still 1080p" video example above, no scaling is needed because the dimensions do match between the streams.

 

I haven't thought about how anamorphic would fit in to this, but I'd expect you could choose to always use scale2ref and if the presentation seems odd during testing, changing the force_original_aspect_ratio option will probably compensate for it.  Honestly, I have no idea how picture-based subs are rendered with anamorphic content, so that's more of a guess.  I do know that the subs on my anamorphic test file have no SAR/DAR (obviously), so they normally are either stretched to match the DAR-derived dimensions or are rendered as-is without any scaling (no idea which is done on players since I've it's something I've never had to think about).  Either way, it's easy to reproduce that behaviour once we figure out what it's supposed to be.

Edited by Waldonnis
Link to comment
Share on other sites

can you give an example of how force_original_aspect_ratio would be added to your scale2ref commands, and why not just always use it?

Link to comment
Share on other sites

Waldonnis

can you give an example of how force_original_aspect_ratio would be added to your scale2ref commands, and why not just always use it?

 

Sure, but it'll have to wait until tomorrow.  I try to post copy-and-pasted command lines I've actually used/tested to avoid typos, so I want to do that first...but I need to start an HEVC encode for the night in a few minutes and need to start closing everything running on the machine now.

 

As for why not always using it, do you mean force_original_aspect_ratio or the scale2ref?  If you mean scale2ref, sure, you could use it since it wouldn't do jack if no scaling was required.  My tendency is to avoid including things in command lines that aren't useful, hence why I wouldn't personally include a scale-type filter that does nothing (programmer in me always says don't burn cycles on stuff that does nothing)....but it's not like it's a "costly" operation in the grand scheme so there's no harm in doing so that I can think of.

 

If instead you mean if you could use force_original_aspect_ratio all of the time...you'd probably want to, but the value may change if you need to stretch the sub frame for some reason rather than keeping it's normal AR (e.g. in case subs are stretched to match an anamorphic DAR width which is likely wider than the sub's native dimensions/AR).  If it turns out that subs on anamorphic videos are normally just rendered without any "stretching", then the force_original_aspect_ratio would never need to change since we'd always want to maintain the sub frame's AR when scaling.

Link to comment
Share on other sites

Thanks that would be great, and yea I was asking about force_original_aspect_ratio. Thanks.

Link to comment
Share on other sites

Waldonnis

Bah, force_original_aspect_ratio apparently doesn't have the effect on scale2ref that I expected, so I had to preserve the sub frame's AR the hard way.  Note: this is really rough still and the expressions probably need to be rounded or truncated to avoid floating point errors, but it seems to work no matter what I've thrown at it so far:

ffmpeg -canvas_size 1920:1080 -i input.mkv -filter_complex "null[video];[0:2][video]scale2ref=ih*mdar:ih[sub][ref];[ref][sub]overlay=(W-w)/2:(H-h)/2" -c:v libx264 -c:a copy output.mkv

A little explanation:

  • null
    This is just creating a "passthrough" pad so I can create a named pad for the video input.  If you're going to scale the video for any reason or do some other operation on the video, this can be replaced by that operation as long as you keep the output pad name the same as the input pad name being fed to scale2ref.  Really, the names don't matter as long as you can understand which one goes to which (you could use something like [foo] if you really wanted, but it may be too vague and make it harder to figure out what it is if you have to fix something with it in the future).
  • [0:2]scale2ref=ih*mdar:ih[ref]
    Takes the subtitle ([0:2]) and reference video (), then scales the subtitle frame while keeping the sub frame's aspect ratio.  Width is determined by multiplying the height of the video by the DAR of the subtitle frame, and height is the height of the video.  The filter then outputs the scaled sub frames and video frames to and respectively.
  • [ref]overlay=(W-w)/2:(H-h)/2[v]
    Pretty standard overlay operation, but since the sub frame may not be the same AR as the video, we centre the sub frame on the video frame.

If you try it and get a ton of "Changing frame properties on the fly..." warnings, make sure canvas_size is set to the sub stream's frame dimensions and those should go away.  I'll run some tests in a bit to see if truncation is needed on the expressions, but the above should be enough for local testing in the meantime to see if I missed something.  Oh, and if you have problems with external IDXs again, post an ffmpeg report and I'll look at it (don't have any IDX's laying around and haven't converted/copied one yet).

 

There is one case I still need to test: 4:3 videos that are cropped to remove pillarbox padding.  I doubt the current scale2ref expression would work out if the original subs had text that was rendered into the padding area.  Should be easy enough to fix, but I haven't done the math on it yet and need to dig up a sample to verify.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...