Jump to content

One small step for ffmpeg... one giant leap for Emby! (Looking for C developer(s) to help with transcode throttling)


jluce50

Recommended Posts

CBers

Yeah, that's already been determined and a new build put up.

Not getting notification as I had to go back to beta MBS to get transcoding to work.

 

Would be nice if people acknowledged posts and replied with a comment about a release.

 

Thanks for your reply Eric.

Link to comment
Share on other sites

techywarrior

 

I would personally tend to agree with jaberra, but I trust that Luke has a much broader perspective as to what MB users in general want. That said, I would think that either solution will greatly help NAS users. jaberra had a good point about not wasting CPU cycles by pausing when lowering priority would give all of the same performance benefits when it comes to multiple transcodes (and might even be preferable on low powered systems).

There are other benefits to pausing the thread though. For one it allows the CPU to enter an idle state and greatly reduce power usage until it has to throttle up again. It prevents unnecessary transcoding of data that is never viewed (which both reduces CPU usage as well as power. And also frees up CPU time for other transcodes that may be happening).

Link to comment
Share on other sites

FredipusRex

So, something like this could theoretically add pause/unpause:

ffmpeg.c first diff

FilterGraph **filtergraphs;

int                     nb_filtergraphs;

++

++int         paused = 0;

#if HAVE_TERMIOS_H

 

ffmpeg.c second diff

static int transcode_step(void)
{
    OutputStream *ost;
    InputStream *ist;
    int ret;

++

++    if (paused)

++    {

++        av_usleep(10000);

++        return 0;

++    }

 

    ost = choose_output();

 

ffmpeg.c third diff in procedure static int check_keyboard_interaction(int64_t cur_time):

 

    if(cur_time - last_time >= 100000 && !run_as_daemon){
        key = read_key();
        last_time = cur_time;
    }else
        key = -1;
++
++    if (key != -1) paused = 0;      // Any valid key unpauses - backward compatibility
++    if (key == 'p') paused = 1;     // Pauses transcoding
++    if (key == 'u') paused = 0;     // Dedicated key for unpausing transcoding
    if (key == 'q')
        return AVERROR_EXIT;

...

    if (key == '?'){
        fprintf(stderr, "key function\n"
                        "? show this help\n"
                        "+ increase verbosity\n"
                        "- decrease verbosity\n"
                        "c Send command to first matching filter supporting it\n"
                        "C Send/Que command to all matching filters\n"
                        "D cycle through available debug modes\n"
                        "h dump packets/hex press to cycle through the 3 states\n"
++                        "p pause transcoding\n"
                        "q quit\n"
++                        "u unpause transcoding\n"
                        "s Show QP histogram\n"
        );



 

These changes are based off the current ffmpeg trunk here: https://github.com/FFmpeg/FFmpeg/blob/master/ffmpeg.c
 

Edit: I modified the sleep time as the existing code will only read a key every 100000 time units so there's no point in sleeping less.

Second edit: Modified the help display to include pause/unpause keys.

Edited by FredipusRex
Link to comment
Share on other sites

jluce50

So, something like this could theoretically add pause/unpause:

I would love to give this a shot, but I'm having a hell of a time getting ffmpeg to build in Windows. I've followed countless guides and tutorials and it's just one problem after another.

Link to comment
Share on other sites

denethor

Yeah, that's already been determined and a new build put up.

 

And look like it works fine. Problem with the multiple stream seems solved. Below is the CPU usage during transcoding of 2 simultaneous stream. 1080p@5Mbit on Safari 8.0.3 and 720p@4Mbit on iOS 8.02 Safari. My CPU speeds down till 1,26Ghz which is good enough to cool down. Both source files was >25Mbit full Blu-ray Remuxes

 

54f4d18b98b66_throtling.png

Edited by denethor
Link to comment
Share on other sites

FredipusRex

I would love to give this a shot, but I'm having a hell of a time getting ffmpeg to build in Windows. I've followed countless guides and tutorials and it's just one problem after another.

Might want to talk to @@mjb2000, who has been building a custom ffmpeg for GPU Transcoding (look at the thread a few down from this one).

Link to comment
Share on other sites

jluce50

ok, i think i've got that fixed. give it a shot again in the build going up. thanks.

I can also confirm that it seems to be working great now!

Link to comment
Share on other sites

jabbera

There are other benefits to pausing the thread though. For one it allows the CPU to enter an idle state and greatly reduce power usage until it has to throttle up again. It prevents unnecessary transcoding of data that is never viewed (which both reduces CPU usage as well as power. And also frees up CPU time for other transcodes that may be happening).

I agree there are benefits but if you are trying to support the most number of simultaneous users pausing is not the most efficient mechanism. Eager transcoding with idle time is.

Link to comment
Share on other sites

gillmacca

Been following this thread, and have just installed the latest dev build, and can say it has helped me greatly.

I have my media on a readynas NV+ (MBS on laptop), and use a nowtv box. Before when FF, I would have to wait a few seconds for MBS to transcode, but now the video starts immediately after FF

 

Many thanks for all you guys involved in this

  • Like 1
Link to comment
Share on other sites

I agree there are benefits but if you are trying to support the most number of simultaneous users pausing is not the most efficient mechanism. Eager transcoding with idle time is.

 

we can always measure and compare. we're not married to any specific approach. 

  • Like 1
Link to comment
Share on other sites

techywarrior

I agree there are benefits but if you are trying to support the most number of simultaneous users pausing is not the most efficient mechanism. Eager transcoding with idle time is.

 

But this would be bad for a lot of other instances (slower CPUs/NAS boxes, people concerned with heat and power). I'm also not convinced that there would be a benefit for multiple users with your method over the pause method but I suppose, as Luke said, measurements can be taken.

 

If there is a difference then we may need to have both options. But that is going to be confusing to the user (and I don't even know how to explain the option)

Link to comment
Share on other sites

FredipusRex

If there is a difference then we may need to have both options. But that is going to be confusing to the user (and I don't even know how to explain the option)

I'd simply list it as Power Saving/Low CPU for the pause technique and Maximum Streaming/High CPU for the idle technique. Have some text that explains that the first option is more energy efficient and good for less capable servers while the second will support more streams on more powerful CPUs at the cost of more energy use.

  • Like 1
Link to comment
Share on other sites

techywarrior

I don't think labeling one as max streaming is good though. It makes it sound like the other option is limiting in some way. Also, I'm still not convinced that  your method will support more streams. I think that we really need to get some tests/numbers and see how much of a difference there is really going to be.

 

I suppose that really needs to be next step once everything can be made functional. Then we can figure out what to do.

Link to comment
Share on other sites

gillmacca

Using the latest dev build, I am noticing random skipping when playing video files. In a typical episode, it is happening 2 or 3 times. Not sure of the exact length of the skip, but I would guess about half a second or so

Link to comment
Share on other sites

I agree there are benefits but if you are trying to support the most number of simultaneous users pausing is not the most efficient mechanism. Eager transcoding with idle time is.

I'm not sure how you can say this.

 

Being able to control TOTAL ffmpeg usage is the only way to maximize the total streams in use at any given time. Priorities are going to work differently across different operating system and of course will work differently if running in a VM vs at the operating system level. (talking total CPU slices here).  A setting of Low or Below Normal may or may not keep ffmpeg doing something depending on what else is running at the same time.

 

Here's a fictitious example. Lets say your computer can handle transcoding with the superfast preset at 5x speed with one transcode going.  With 5 going at the same time it might only due 4.5X speed as the disc are jumping around trying to read data from different spots. The old sequential read vs random read disc performance.  Now lets also assume that on low priority it's still trying to do some work and overall can do 0.3X speed per process.  So if you had 5 transcodes going you might overall not be able to transcode at 5X which is what is needed.  Some processed will be farther ahead then where they need to be and others behind.  MB3 will stil have to manage this by kicking the process to NORMAL or above NORMAL to get it moving and to stay ahead of the client.  Meanwhile other ffmpeg processes are still working in the background on tasks that aren't important at that moment in time.  When you are right up against the ceiling of how many clients can be supported at one time the priority based method will get in the way and cause contention.

 

Now on the other hand if you can clearly pause 4 of the 5 processes and only have 1 going at any time you reduce the disc contention and the server can fully control which stream is getting the FULL attention it needs at that point in time.  That's not to say that 2, 3 or 4 ffmpeg processes can't be run at the same time if the server is smart enough to manage this but there will be a limit on most systems to how many ffmpeg processes are optimal at any one time due to disc overhead.  The point being that by being able to completely pause/resume the process the server gets to be in charge and it will automately be much easier to tune over time as it controls everything and not leaving part of it to the operating system to manage in the background.

 

Just my 2 cents,

Carlo

Link to comment
Share on other sites

jabbera

I agree with you except we don't currently exercise total control and there is no limit to the number of ffmpeg processes that can be run at once. We pause once we buffer enough, then resume once the buffer is too low. That's it. In your scenario we could end up running 5 processes at once currently. If we wrote an ffmpeg scheduler, which is what you are talking about, we could probably get the ideal performance.

 

I'm talking just talking about an easy solution comparable to the amount of work that has been done thus far. Using your example above with idle transcode as I'm thinking (5X single, 4.5X for 5 streams at once) time for a 30 minute show.

 

If Client A starts watching, and 6 minutes later B, C, D, E all start watching all 5 will be able to watch the show at once since client A needs no transcoding after 6 minutes since the whole file is done. In your example with the current scheduler we would only have 4.5X available and all would start to buffer after B, C, D and E join as we would need to be able to transcode all streams at once.

Edited by jabbera
Link to comment
Share on other sites

Maudite

Everyone is making very good points to numerous scenarios.  There are also numerous kinds of system configurations and uses that would affect how we may want transcoding to perform.

 

There are those who run off a NAS system vs direct install on how powered server type machine vs. running in a VM.

 

Then there are the differing workloads, perhaps only one stream at a time, yet likes to get up for 10 minute breaks vs. complete TV service replacement with multiple streams in the household vs. many family and friends with remote access so streams in the double digits at one time.

 

There was mention of disk contention, I don't think that to be an issue for those in the high powered scenarios with many people running RAID systems on 6Gb/s disks.

 

 

The point I'm trying to make is, I think there are different use cases for the community and the option to configure how the transcoding is done would be very beneficial.  Yes perhaps the default method would be the "pause" method for those basic installs for newcomers to MBS, but there certainly are power users out there as well.

 

Now, I for one continue to grow my RAID and use of MBS, so a more powerful configuration would be most beneficial.  I believe someone had mentioned working on a way of building a transcode farm, as well as leveraging GPUs as well.  This would be the scenario I would be most interested in.  I run my server within a VM, and it can handle multiple streams no problem.  However, if I could take the workload out to numerous systems and onto GPUs available on physical machines, well that would be a glorious day.

 

Keep up the good work everyone.

Link to comment
Share on other sites

jluce50

 

Now, I for one continue to grow my RAID and use of MBS, so a more powerful configuration would be most beneficial.  I believe someone had mentioned working on a way of building a transcode farm, as well as leveraging GPUs as well.  This would be the scenario I would be most interested in.  I run my server within a VM, and it can handle multiple streams no problem.  However, if I could take the workload out to numerous systems and onto GPUs available on physical machines, well that would be a glorious day.

 

 

I take it you've seen this thread? 

 

http://mediabrowser.tv/community/index.php?/topic/10723-gpu-transcoding-intel-quicksync-and-nvidia-nvenc/

Link to comment
Share on other sites

jabbera

Everyone is making very good points to numerous scenarios.  There are also numerous kinds of system configurations and uses that would affect how we may want transcoding to perform.

 

There are those who run off a NAS system vs direct install on how powered server type machine vs. running in a VM.

 

Then there are the differing workloads, perhaps only one stream at a time, yet likes to get up for 10 minute breaks vs. complete TV service replacement with multiple streams in the household vs. many family and friends with remote access so streams in the double digits at one time.

 

There was mention of disk contention, I don't think that to be an issue for those in the high powered scenarios with many people running RAID systems on 6Gb/s disks.

 

 

The point I'm trying to make is, I think there are different use cases for the community and the option to configure how the transcoding is done would be very beneficial.  Yes perhaps the default method would be the "pause" method for those basic installs for newcomers to MBS, but there certainly are power users out there as well.

 

Now, I for one continue to grow my RAID and use of MBS, so a more powerful configuration would be most beneficial.  I believe someone had mentioned working on a way of building a transcode farm, as well as leveraging GPUs as well.  This would be the scenario I would be most interested in.  I run my server within a VM, and it can handle multiple streams no problem.  However, if I could take the workload out to numerous systems and onto GPUs available on physical machines, well that would be a glorious day.

 

Keep up the good work everyone.

 

 

I totally agree. I hadn't thought of the low power usage since it isn't really something I'm concerned about. I basically have a small data center in my basement. I think there is going to be room for multiple "schedulers" for lack of a better term. Some will favor certain types of use cases. Due to the design it's relatively easy to swap them out at runtime so it can be a preference. 

 

The point of GPUs becomes even more interesting when talking about "schedulers" since some only support a certain number of streams simultaneously. If you are using gpu transcoding the algorithm will be totally different most likely. This is a spot to evolve and really make MB shine over competitors.

Link to comment
Share on other sites

I agree with you except we don't currently exercise total control and there is no limit to the number of ffmpeg processes that can be run at once. We pause once we buffer enough, then resume once the buffer is too low. That's it. In your scenario we could end up running 5 processes at once currently. If we wrote an ffmpeg scheduler, which is what you are talking about, we could probably get the ideal performance.

 

But that's my point.  MB3 is fully controlling exactly which ffmpeg processes need to run at any one time.  Be it 1, 2, 3 or 10 processes.  It's only running the processes that need to run in order to stay a head of the client.  This way if someone FF 20 minutes into a movie we aren't wasting the CPU for something that is never used.

 

However, as has been pointed out and I think is TRUE.  It would be good to have an option to allow transcodes to keep going without throttling back. They are still managed in an orderly fashion however.  As already pointed out if you have 4 clients transcoding but can do 5X transcode speed then you could theoretically only support one additional client.  BUT if all the CPU were used and it could fall back to keeping everyone a head as far as possible then as as the first stream completes you could actually handle 2 additional streams.

 

Some of you might at first think this is just like it previously worked but it's not.  This would/could be managed so it would constantly focus it's attention on the person with the least buffer available vs just starting each process and letting it go until it's done. In that case one could be 45 minutes ahead and still going while the newest client is fighting to stay 1 minute ahead.

 

So I'd propose a config option to enable background processing where the client with the least buffer is worked on with spare CPU cycles. <-- better worded of course.

 

Everyone is making very good points to numerous scenarios.  There are also numerous kinds of system configurations and uses that would affect how we may want transcoding to perform.

 

There are those who run off a NAS system vs direct install on how powered server type machine vs. running in a VM.

 

Then there are the differing workloads, perhaps only one stream at a time, yet likes to get up for 10 minute breaks vs. complete TV service replacement with multiple streams in the household vs. many family and friends with remote access so streams in the double digits at one time.

 

There was mention of disk contention, I don't think that to be an issue for those in the high powered scenarios with many people running RAID systems on 6Gb/s disks.

 

 

The point I'm trying to make is, I think there are different use cases for the community and the option to configure how the transcoding is done would be very beneficial.  Yes perhaps the default method would be the "pause" method for those basic installs for newcomers to MBS, but there certainly are power users out there as well.

 

Now, I for one continue to grow my RAID and use of MBS, so a more powerful configuration would be most beneficial.  I believe someone had mentioned working on a way of building a transcode farm, as well as leveraging GPUs as well.  This would be the scenario I would be most interested in.  I run my server within a VM, and it can handle multiple streams no problem.  However, if I could take the workload out to numerous systems and onto GPUs available on physical machines, well that would be a glorious day.

 

Keep up the good work everyone.

Yep, tricky to say the least.

 

One thing we know is that we'll end up with something great.  We are exploring new territory and there are numerous ways to accomplish this stuff.  No one way is "best" for all people and that's why we are getting all these great ideas thrown out here in the thread.  It's obvious we are really thinking the process through from various perspectives!

 

Yea, that's me working on the transcode farm and numerous people working on the GPU front.  I've sort of put the "farm" on hold for a week or two to see how this is going to work without bugging Luke.  But I'll need to get with Luke for an API or similar the "farm" can use to manage priorities of ffmpeg sessions so it too can throttle, pause, resume based on what MB3 needs to happen.  I can already do this but just to be able to checkin with MB3 every X amount of time to see what needs pausing/resuming.  

 

One thing for sure is that we are going to have a powerful setup shortly.  Imagine being able to pickup a $300 NUC that could be used as part of the distributed "farm" with an i5 or i7 CPU and GPU.  A NUC could probably be used both as a client for your TV and a transcoder at the same time to help with streaming to friends.  Sky is going to be the limit, but let's get past phase 1 first. :)

Edited by cayars
  • Like 1
Link to comment
Share on other sites

jluce50

@@Luke - I did noticed that if a client stops watching while the ffmpeg process is suspended the processes is never cleaned up. It just stays out there in a suspended state until (I'm assuming) a reboot.

  • Like 1
Link to comment
Share on other sites

ah yes, because it's not going to be able to respond to the q key to close. so that will need to be looked at.

Link to comment
Share on other sites

denethor

@@Luke - I did notice that if a client stops watching while the ffmpeg process is suspended the processes is never cleaned up. It just stays out there in a suspended state until (I'm assuming) a reboot.

Or you can simple kill those processes instead of rebooting. I also confirmed the issue. Edited by denethor
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...