coffee47 3 Posted November 5, 2025 Posted November 5, 2025 Regarding my latest forum posts. So... I have found a solution. I guess I got a little too impatient waiting for proper server-side bob-deinterlacing. So… I went ahead and built a tiny proof-of-concept plugin that hooks into Emby’s FFmpeg pipeline and forces yadif_cuda (or similar filters) to output at 50 FPS. (currently only NVIDIA though, sorry) Before anyone gets excited — this is absolutely not production-ready. It uses pure .NET reflection and pointer swaps to intercept internal Emby classes at runtime, so it’s more of a hacky POC than a real feature. Still, it proves that the concept works perfectly — so maybe it’s time to expose some of the transcoding components (like ways to mess with the Filter-pipeline) through the public Emby SDK, so developers can implement these features properly and safely. I’ll share the source (with no support whatsoever) anyway for anyone who wants to play around or experiment, but again: use it at your own risk! Also, please keep in mind that Emby will never show the true 50 FPS through "nerd statistics" or in the Diagnostics plugin as the override happens at a later stage. However one can clearly notice the difference. Cheers, Coffee Plugin.cs 1
softworkz 5065 Posted November 6, 2025 Posted November 6, 2025 (edited) 1 hour ago, coffee47 said: So... I have found a solution. I guess I got a little too impatient waiting for proper server-side bob-deinterlacing. So… I went ahead and built a tiny proof-of-concept plugin that hooks into Emby’s FFmpeg pipeline and forces yadif_cuda (or similar filters) to output at 50 FPS. (currently only NVIDIA though, sorry) Before anyone gets excited — this is absolutely not production-ready. It uses pure .NET reflection and pointer swaps to intercept internal Emby classes at runtime, so it’s more of a hacky POC than a real feature. Still, it proves that the concept works perfectly — so maybe it’s time to expose some of the transcoding components (like ways to mess with the Filter-pipeline) through the public Emby SDK, so developers can implement these features properly and safely. When I came to Emby, many many years ago, hardware transcoding was a kind of lottery game. The percentage might not be accurate, and it's been individually different of course, but when you had 8 out of 10 hw transcodings succeeeding, you've already been lucky. I have several thousand ffmpeg logs that were posted by users, and it was a neverending story, trying to fight them down. Eventually, I made a proposal for re-implementing transcoding from scratch with a whole new logic, which was subsequently introduced in several steps (I'm skipping some ugly details). In the course of this I had named two or three things where I said they need to be dtopped in order to achieve reliable, and deterministic transcoding that always matches exactly the intended plan for transcoding. It wasn't an easy sell, but eventually the promise could be fulfilled.. It's been the result of many individual changes and building blocks, but this was the only functional removal (the others could be compensated somehow). Since then, only every now and then, somebody came to ask about this, and so it never got much traction. 1 hour ago, coffee47 said: plugin that hooks into Emby’s FFmpeg pipeline and forces yadif_cuda (or similar filters) to output at 50 FPS [..] Still, it proves that the concept works perfectly Which concept does this "prove"? Do you think that we don't know that it's possible to configure the filter like this and that we would have been just too lazy to enable that switch? Even before the story above - more than a decade ago - it happened like that: people came and named this FFmpeg option and that FFmpeg option quite regularly, and Luke wanted to make all users happy, so he added many of those requested parameters, one after another - until it was not possible anymore to manage and separate all the countless cases and combinations of cases which multiply up to insane figures. A core concept of the new transcoding unit is predictability: When you look at the hardware detection log, you can see that this allows us to know exactly which hw encoders, decoders and partially filters are available and which capabilities they offer. When you look into the ffmpeg logs, you can see a filter chain printed, even before ffmpeg has started, and this is almost always accurate (unlees it errors). We generally have equality in transcoding, independent of the hardware on which it runs (or sw otherwise). So we know that we reliably produce the same results when running on different hardware or switching between hardware and sw during a single playback. (tone mapping is an exception, bandwidths can differ, the relevant part through, is that iit's playbable without disruption) Oh, and I almost forgot the most important thing: when we add this feature, it would have to work successfully in combination with all other transcoding operations. I haven't looked into it since a long time, but as far as I remember: BOB deinterlacing is not available for all types of transcoding pipelines (sw + all hw), so a uniform behavior cannot be achieved BOB deinterlaciing (at least one of the ones that support it) behaves unpredictable: for certain files it works for others not, and we didn't find out on what it depends. Without predicabilitiy of the output, this doesn't fly. These were the main blockers AFAIR. Of course it can be looked into. But it requires resources - definitely days, possibly more. What you have done is something like 2 minutes of that time - or better: that's not the kind of work that is required. (yet I appreciate your effort and determination to hack this in ) Edited November 6, 2025 by softworkz
coffee47 3 Posted November 6, 2025 Author Posted November 6, 2025 38 minutes ago, softworkz said: When I came to Emby, many many years ago, hardware transcoding was a kind of lottery game. The percentage might not be accurate, and it's been individually different of course, but when you had 8 out of 10 hw transcodings succeeeding, you've already been lucky. I have several thousand ffmpeg logs that were posted by users, and it was a neverending story, trying to fight them down. Eventually, I made a proposal for re-implementing transcoding from scratch with a whole new logic, which was subsequently introduced in several steps (I'm skipping some ugly details). In the course of this I had named two or three things where I said they need to be dtopped in order to achieve reliable, and deterministic transcoding that always matches exactly the intended plan for transcoding. It wasn't an easy sell, but eventually the promise could be fulfilled.. It's been the result of many individual changes and building blocks, but this was the only functional removal (the others could be compensated somehow). Since then, only every now and then, somebody came to ask about this, and so it never got much traction. Which concept does this "prove"? Do you think that we don't know that it's possible to configure the filter like this and that we would have been just too lazy to enable that switch? Even before the story above - more than a decade ago - it happened like that: people came and named this FFmpeg option and that FFmpeg option quite regularly, and Luke wanted to make all users happy, so he added many of those requested parameters, one after another - until it was not possible anymore to manage and separate all the countless cases and combinations of cases which multiply up to insane figures. A core concept of the new transcoding unit is predictability: When you look at the hardware detection log, you can see that this allows us to know exactly which hw encoders, decoders and partially filters are available and which capabilities they offer. When you look into the ffmpeg logs, you can see a filter chain printed, even before ffmpeg has started, and this is almost always accurate (unlees it errors). We generally have equality in transcoding, independent of the hardware on which it runs (or sw otherwise). So we know that we reliably produce the same results when running on different hardware or switching between hardware and sw during a single playback. (tone mapping is an exception, bandwidths can differ, the relevant part through, is that iit's playbable without disruption) Oh, and I almost forgot the most important thing: when we add this feature, it would have to work successfully in combination with all other transcoding operations. I haven't looked into it since a long time, but as far as I remember: BOB deinterlacing is not available for all types of transcoding pipelines (sw + all hw), so a uniform behavior cannot be achieved BOB deinterlaciing (at least one of the ones that support it) behaves unpredictable: for certain files it works for others not, and we didn't find out on what it depends. Without predicabilitiy of the output, this doesn't fly. These were the main blockers AFAIR. Of course it can be looked into. But it requires resources - definitely days, possibly more. What you have done is something like 2 minutes of that time - or better: that's not the kind of work that is required. (yet I appreciate your effort and determination to hack this in ) Hey @softworkz, Haha, fair point — I definitely didn’t mean to imply that anyone at Emby didn’t know how to turn that switch on You’re absolutely right — what I did was more of a “let’s see if this still blows up” weekend experiment than a real engineering effort. I develop software for a living, just like you. So I totally understand your point and the reasoning behind the deterministic transcoding pipeline — predictability and consistency across hardware types is key for a company delivering a stable product. But can we talk about the origin of Emby? An open source project that anyone can tailor to their specific needs. (not that everyone should or have the technical knowledge to in the first place, but it was possible. AND… I see that Emby is probably THE most customizable media solution currently out there. I had LOTS of fun setting up my instance at home. And I really don’t say that to a lot of products. I’m someone that is very adamant about my stuff running perfect. Not good - perfect. … I’ve always felt that Emby is still a safe haven for the average tinkerer, even being closed source now. The Deinterlacer is one of the things I couldn’t sleep through with (yes, fr. Watching Live TV through Emby in the bed caused me to get up again and start writing my first post in this forum, specifically requesting that feature. So I’m very sorry for stepping over the line and being so adamant about it in my previous posts). I tried it on all platforms I had at hand and never got a satisfactory result watching soccer/football. Over the next few days I catched myself staring at other people’s TV, seeing how smoothly sports and even news felt. Of course these STBs/Receivers do have hardware deinterlacers built deeply into their SoCs and we will never get that performance on a consumer grade graphics card. Look at YADIF, it’s only a CPU program adapted and ported to run on a GPU (yadif_cuda). It was never originally meant to. Yet Emby uses it today as their main deinterlacer on NVIDIA hardware whereas (in my opinion) better options exist. A lot of things changed over the past years. NVIDIA for example has (as to my knowledge) three different ways to deinterlace (yadif_cuda, bwdif_cuda and CUVID deinterlacing), the latter even becoming the “standard” approach as it already happens in the decoding process, eliminating the need of a filter entirely. Each have their own settings and modes, including either weaving, bobbing or motion compensation of some kind mimicking the bob-mode. Even YADIF does it (but just isn’t very good at it imho). Some people think that interlaced TV was a necessity because of CRT TVs (which is also true). But the main reason (still today) is to save costs. It drastically reduces bandwidth and all STB/Receiver manufacturers already interpolate the missing information almost perfectly. Hence we still send out interlaced signals even though most devices are natively progressive. And even achieve a smoother look on sports because we have more frames to display (interpolated or not). And today (to my knowledge) all major HW accel. providers, including AMD, Intel and of course NVIDIA provide some way of doing that. Software as well. That’s also why I approached it as a quick proof-of-concept rather than something you could safely ship (which probably you wouldn’t approve as plugin because of the dirty pointer substitution. Still, it’s nice to confirm that the known behavior technically can be reintroduced without ffmpeg losing its mind immediately My hope was simply to demonstrate that a developer-facing hook layer (or a few exposed transcoding classes in the SDK) could open the door for experimental features like this — safely and sandboxed, without having to resort to reflection hacks. And of course, officially unsupported by you. If that kind of extension point ever becomes part of the SDK, I’d be happy to help test or even bake this into something clean. So.. the main problem I see is that you have different ambitions for Emby than some power users like me or the TO that really need this kind of feature (that most people don’t even know exists). Either in form of a (cleanly crafted) plugin or a switch with a big red label on it “Flick this on and we’ll not answer your tickets, period.” Don’t get me wrong, this is just the last 2% missing to make Emby THE solution to all my media needs. WebStreams being the latest contribution for that. 1
Solution softworkz 5065 Posted November 6, 2025 Solution Posted November 6, 2025 (edited) On 11/6/2025 at 2:52 AM, coffee47 said: what I did was more of a “let’s see if this still blows up” weekend experiment than a real engineering effort. I develop software for a living, just like you. I realized that - hence the longer reply On 11/6/2025 at 2:52 AM, coffee47 said: Look at YADIF, it’s only a CPU program adapted and ported to run on a GPU (yadif_cuda). It was never originally meant to. Yet Emby uses it today as their main deinterlacer on NVIDIA hardware whereas (in my opinion) better options exist. You are right. There are better options now. Though, for internal reasons, I haven't actively worked on transcoding during the past 2+ years, so many things have become a bit rusty indeed. On 11/6/2025 at 2:52 AM, coffee47 said: three different ways to deinterlace (yadif_cuda, bwdif_cuda and CUVID deinterlacing), the latter even becoming the “standard” approach as it already happens in the decoding process, eliminating the need of a filter entirely. CUVID is not really the "standard" way. It is rather avoided generally, because CUVID - as opposed to NVDEC does its own parsing of the video streams and that is in almost all aspects inferior to ffmpeg's parser (used with NVDEC). The CUVID scaling and deinterlacing is implemented by a fixed-function block which isn't accessible otherwise, but it is told to not be of the best quality (scaling, don'tt know about deint quality). The appeal of using it is rather when 3d/CUDA resources are limited - like in case of games. As a game developer, you'll be probably very thankful at times, that video scaling/deint does not account for 3D operations. We do not use it - one reason is that we sometimes do other things before scaling - which isn't possible with this kind of scaling. On 11/6/2025 at 2:52 AM, coffee47 said: My hope was simply to demonstrate that a developer-facing hook layer (or a few exposed transcoding classes in the SDK) could open the door for experimental features like this — safely and sandboxed, without having to resort to reflection hacks. And of course, officially unsupported by you. If that kind of extension point ever becomes part of the SDK, I’d be happy to help test or even bake this into something clean. The problem with that is that it would probably give too much details away. For example, we have an FFmpeg lib with an object model comprised of more than a thousand classes where every single ffmpeg option is reflected with strong typing, enums, etc. Commands are only build through this model without ever working with strings. Not that it can't be looked at in some way regardless, but as long as it's private, no competitor can copy or replicate it - which is not that clear when it would be part of a public API. Well, if certain things hadn't happened, it might be open source today... On 11/6/2025 at 2:52 AM, coffee47 said: Of course these STBs/Receivers do have hardware deinterlacers built deeply into their SoCs and we will never get that performance on a consumer grade graphics card. In the context of TVnext, I came to the conclusion, that ideally, deinterlacing should happen at the client side, where the video is shown - as much as possible. That's for several reasons a better way, and the chance of hw deinterlacing getting applied (e.g. in case of SmartTV Emby apps) is always preferable just for that reason you're giving. Edited November 7, 2025 by softworkz
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now