Jump to content

Need/Want help to make GPU transocoding useful


Carlo

Recommended Posts

scanner50

Man, you guys rock. Just FYI that I have unlimited OneDrive storage, so I'll be another to offer hosting any test vids you guys may need.

 

Also, I'm sort of an odd-use case for Server as I'm currently testing it on an i3-4020Y Dell Venue 11 Pro tablet. I tried it over on mjb2000's thread and it didn't work for me at the time, but if you like I might be able to devote some time for testing as sort of an outlier scenario when things get rolling a bit more here.

Link to comment
Share on other sites

Hi,

 

OK, some initial results, below. Note that I have to run on two different machines (QuickSync in one, NVENC in the other). Let me know what else you'd like to see ... and some notes below that are interesting (like QS failing under ffmpeg, but not QSTranscode). It also looks like NVENC is working, but there is some room for it to get better yet.

 

54e29fe96aa30_results.jpg

 

Thoughts?

 

BTW, is there a better way to get FPS from ffmpeg, rather that just watching it and guessing?

Link to comment
Share on other sites

CBers

BTW, is there a better way to get FPS from ffmpeg, rather that just watching it and guessing?

In a normal MB3 transcode, the FPS rate is written to the ffmpeg log file.

Link to comment
Share on other sites

Arrmo,  good stuff.

 

Could you also try q264 (one of the links given) and let me know what kind of FPS you get with that.

 

Could you by any chance also give the command lines used for each program when you ran them.  This makes sure we compare apples to apples.  For example on your initial ffmpeg 264 job were you using the SUPERFAST preset which is the thing we are trying to be faster than.

 

When I run the ffmpeg version of QS on two different machines I almost always get the device error/busy message you noted.  I wish there were a couple different sources of ffmpeg with QS built in but I can't find any. 

 

Right now for QuickSync I'd say the leading candidates to use will be QSTranscode and q264.  We'll have to try putting them through there paces with different bitrate settings and changing resolutions and things that we would typically do with ffmpeg.  We are also going to have to verify we can USE THE FILE while it's being transcoded as this would be a deal breaker if we have to wait for the file to complete processing before using it.

 

This is something we should probably check right away and make sure both audio and video are being processed as the file is being written.  If audio is muxed in after the fact or anything like that it won't work for us.

 

Carlo

Link to comment
Share on other sites

BTW, is there a better way to get FPS from ffmpeg, rather that just watching it and guessing?

 

We could just record the time it takes to complete overall then do our own math based on the total number of frames in the source. Or we ignore this for now and just record how long each program takes to process the file regardless of the reported FPS given by the program itself.  This is probably the best thing to do right now.  Since we are using the same source file then just comparing time to complete is good for now.

 

Carlo

Link to comment
Share on other sites

Could you also try q264 (one of the links given) and let me know what kind of FPS you get with that.

 

You bet! Is the coded exactly that "q264"? Or is this a different application? Sorry if I missed the link ... :(

 

 

Could you by any chance also give the command lines used for each program when you ran them.  This makes sure we compare apples to apples.  For example on your initial ffmpeg 264 job were you using the SUPERFAST preset which is the thing we are trying to be faster than.

 

Here you go - note that I did not use presets for now ... lots of tuning we can do yet! I figured to not make this overly complex to start, but also the presets don't seem to work with the GPU approaches right now?

 

ffmpeg.exe -i "Transformers - Revenge of the Fallen - Teaser.mp4" -codec:v libx264  -b:v 3145728 -codec:a aac -b:a 65536 -strict -2 "Transformers - Revenge of the Fallen - Teaser - ffmpeg - libx264.mp4"
ffmpeg.exe -i "Transformers - Revenge of the Fallen - Teaser.mp4" -codec:v h264_qsv -b:v 3145728 -codec:a aac -b:a 65536 -strict -2 "Transformers - Revenge of the Fallen - Teaser - ffmpeg - h264_qsv.mp4"
qstranscode.exe -i "Transformers - Revenge of the Fallen - Teaser.mp4" -o "Transformers - Revenge of the Fallen - Teaser - qstranscode.mp4" -h264 -aac -w 1280 -h 720 -b 3072 -u 7 -ab 65536
 
ffmpeg.exe -i "Transformers - Revenge of the Fallen - Teaser.mp4" -codec:v libx264  -b:v 3145728 -codec:a aac -b:a 65536 -strict -2 "Transformers - Revenge of the Fallen - Teaser - ffmpeg - libx264.mp4"
ffmpeg.exe -i "Transformers - Revenge of the Fallen - Teaser.mp4" -codec:v libnvenc -b:v 3145728 -codec:a aac -b:a 65536 -strict -2 "Transformers - Revenge of the Fallen - Teaser - ffmpeg - libnvenc.mp4"
 
 

When I run the ffmpeg version of QS on two different machines I almost always get the device error/busy message you noted.  I wish there were a couple different sources of ffmpeg with QS built in but I can't find any. 

 

I do note that this doesn't happen with a lower resolution / bitrate input file. Add one to our test case list?

 

And QSTranscode is the other option I see - it seems to work quite well. I'd really like to find another / reference version of NVENC, but no luck there so far.

 

 

Right now for QuickSync I'd say the leading candidates to use will be QSTranscode and q264.  We'll have to try putting them through there paces with different bitrate settings and changing resolutions and things that we would typically do with ffmpeg.  We are also going to have to verify we can USE THE FILE while it's being transcoded as this would be a deal breaker if we have to wait for the file to complete processing before using it.

 

Agreed! Do we have the option to use these applications (vs. ffmpeg)?

 
 

This is something we should probably check right away and make sure both audio and video are being processed as the file is being written.  If audio is muxed in after the fact or anything like that it won't work for us.

 

Also agreed. I can try to play back the file while encoding is going on ... OK?

 

Link to comment
Share on other sites

FYI, I just found out that HandBrake includes QS now! I ran the encode quickly, it worked, and an interesting output from the log,

work: average encoding speed for job is 349.336853 fps

 

So we have another source for QS! Now, to find one for NVENC ...

Edited by arrmo
Link to comment
Share on other sites

And the file I played with earlier (720x388, 1.5 Mbps) just transcoded down to 1 Mbps at 1290 fps! ... :)

 

Perhaps HandBrake is an option as well?

Link to comment
Share on other sites

arrmo,

 

Message 6 contains samples of ffmpeg with the profiles that can be used.  We are most interested in speeds faster than SUPERFAST since that is what Media Browser will use by default.

 

I found an older version of ffmpeg.exe with QS in it but haven't tested yet.  There is also binary at https://github.com/drocon11/ffmpeg-qsv

 

I think there may also be a version of libav (avconv.exe) that can do QS encoding.  This might be the best as libav is more or less a fork/branch of ffmpeg (develop split and went in two different directions). Much of what is presently in ffmpeg comes from libav so it's worth looking at.

 

Yep, I know Handbrake supports QuickSync and I currently use it on my PVR machine. I've never had a problem with any of the files either. 

 

QSTranscode and Handbrake both publish their source codes so they could be possibilities depending on how the files are built in real-time.

 

If we can get ffmpeg or libav to work then MB3 probably won't need any changes.  If we use source from QSTranscode or Handbrake we either need to change MB3 or these programs to mimic output similar to ffmpeg.  Both jobs are probably a chore but we'll worry about that after figuring out the fastest best solution based on time of encoding and quality of output.

 

 

You are getting much faster times then I from QS.  We probably need to setup some specific tests. Example only. Take a 10mbit 1080p source down to 720p with a max bitrate of 3mbit.  Then do the same with ffmpeg on SUPERFAST (all that really matters) and compare FPS and quality to see if we are in the ballpark.

 

Carlo
Link to comment
Share on other sites

Makes sense, and we're on the same page. I will test the file above with ffmpeg-qsv (and also q264, if you can clarify that one).

 

I agree with you - let's worry about MB after, first how to convert the files. To that point though - isn't our first focus to get ffmpeg from MB working (with both QS and NVENC, as they are currently both struggling)?

 

Thanks! 

Link to comment
Share on other sites

Check it out here: www.tetrachromesoftware.com

 

Carlo

 

PS We can use help from multiple people so if you can help us test just let us know.  It's not hard.

Link to comment
Share on other sites

Got it - installed and ran it, and updated some from above (my error, missed video rescaling on some command lines. Here is the update ...

 

54e403adc42aa_results.jpg

 

Conclusion is as above though ... we need to get GPU encoding working first, then compare speeds really. And to your question - all but one fail to playback while encoding.

 

Thoughts?

 

 

Link to comment
Share on other sites

1) My hunch is that q264 may just be handling the video and then using ffmpeg to mux audio and video together.  That won't work for us.

2) I got the same results with the "play while encoding" test with a couple of them.

 

What is strange is that the present MB3 uses ffmpeg which showed it couldn't play while encoding and we know we presently use it.  It's probably not a valid test as technically it's not being played but being "streamed" which can make a big difference.  So I think as long as video and audio are both being handled at the same time we are ok.
 

3) Of the above the QSTranscode appears to be the one that is doing things correctly or maybe put another way, has what we are looking for.  By that it's clear it's working fastest and also has the lowest CPU use.

4) Would you mind adding Handbrake CLI to the above since that is a possibility we could use.

 

5) noticed the FPS is higher this time for libx264 then previous times.  Is this because it's using the SUPERFAST preset?

 

Good stuff,

Carlo

Link to comment
Share on other sites

Hi,

 

Good idea - numbering works easier to track this ... :-). Some thoughts,

  1. Agree with you, and this is what it says it's doing ... so q264 off the table.
  2. Also agree with you, had the same concerns / thoughts. We need to figure out how to test this properly.
  3. Yep, agree here also ... but also consider HandBrake? In any case though - isn't the question if MB can work with something other than ffmpeg? And on that ... we need to know if MB is real-time monitoring the log file, or if it only needs the video file to exist - agreed?
  4. Yep, I can add HandBrake (have the results, just not in table).
  5. Nope, because of the output resolution. I haven't played with presets at all yet (as I mentioned, lots of dimensions we can look at here)

To me, #3 is key. Does MB mandate / require ffmpeg, or do we have the ability to use another CLI program (just change the command)? That would help here, in terms of flexibility ... agreed?

 

Thanks!

... Russell

Link to comment
Share on other sites

OK, a rather interesting new finding. I was able to get the NVIDIA NVENC SDK samples to build on my machine (under Visual Studio). Ran the same file through this ... 520 fps on my GPU, with ~ 14% load (and 60% CPU load). I'll add it to the picture, but this is by far the best I have seen on my GPU, and it does say that libnvenc has some room to go (that's good though!).

 

The video is not playable for some reason, but I think that's due to limitations of the sample. Will continue to look though.

Link to comment
Share on other sites

FYI, with some help from Brainiarc7 I was able to get the latest version of ffmpeg to build - and am trying it out locally. It supports NVENC - but still video issues (more command line related using it with MB I think), and it runs ~ 10x slower (for now) than the samples from NVIDIA.

Link to comment
Share on other sites

Ouch, that sucks.  I think the architecture of ffmpeg isn't the best for GPU type stuff.

Link to comment
Share on other sites

Not sure ffmpeg limits it - if I copy (which still muxes, etc,. right?), the frame rates are much higher. Some testing below

 

  1. Different file than above, as it's too short for some of this testing (numbers gone too quick ... :(). Format of results is codec for video / audio.
    • copy / copy: 10,000 fps (so codec is the limitation, agreed?)
    • copy / aac: 585 fps
    • copy / ac3: 1900 fps (so aac is more load than ac3)
    • nvenc / copy: 335 fps (note, for all nvenc results, CPU is ~ 90%, GPU ~ 6%)
    • nvenc / ac3: 290 fps (so video is the limitation ... expected based on 1900 fps above)
    • nvenc (with -preset hp) / copy: 360 fps (slight improvement, high performance preset)
    • nvenc (with -preset hp, resize to 720x400, same as the source) / copy: 360 fps (so no impact if no resize)
    • nvenc (with -preset hp, resize to 720x388) / copy: 193 fps
    • nvenc (with -preset hp, resize to 1280x720) / copy: 89 fps => so resize has a significant impact, agreed? This is the area to focus on first?
  2. The Transformers file you shared,
    • copy / copy: too fast (file done without a reading ... this is the reason for the file I used above - it's longer)
    • nvenc / copy: 95 fps (103 fps with -preset hp)
    • nvenc / copy (resize to 1280x720): 59 fps (64 fps with -preset hp) => so again, scaling is the place to focus?
  3. The Transformes file, but with NvTranscoder (Sample from NVIDIA, only takes care of the video, no audio),
    • with or with resize the results are the same, ~ 525 fps! And the CPU / CPU is about the same => so there is potential here, and demuxing / muxing is not the issue ... agreed?
    • For fun, I also ran the lower resolution file (#1 above, without resize ... so 720x400 video, 5 Mbps) ... 970 fps ... :)

Thoughts? Feel free to disagree with my conclusions above! But to me, this says the potential is there - agreed?

 

 

Link to comment
Share on other sites

Makes sense, thanks!

 

FYI, I'm running a version of ffmpeg that I built from the master (git) version on Github ... so current, but that doesn't of course mean that these items are in there. The results above are for my custom version.

 

Note that they have made some changes that may / will impact MB (as I tried to stuff my version in, and it doesn't work) ... for example, codec is now nvenc (lot libnvenc), presets are hp, ll, etc.

 

Thanks again.

Link to comment
Share on other sites

FYI,

 

I took a change of pace to keep things moving.  I commented in another thread about this but wanted to comment here also.

 

One of the things I said I wanted to do besides the GPU transcoding was to enable distributed transcoding.  So if you have multiple computers on your network then you would have the ability to run some of the ffmpeg jobs on these computers instead of just the MB3 server.

 

I have a working proto-type of this with a 2nd computer doing all the ffmpeg jobs.  I need to ad intelligence and other stuff but just getting this to work in proto-type functionality is a very good thing.  So at some point in the near future the MB3 server will be able to wake up other computers on the LAN and send them ffmpeg transcode (or bif - rokus trick) jobs.  This will be useful for real time viewing, syncing or bif generations.  Basically anything using ffmpeg.

 

Of course once we have GPU encoding working then this will work on the MB3 server or any other LAN computer too!

 

Good stuff,

 

Carlo 

Edited by cayars
Link to comment
Share on other sites

thefirstofthe300

FYI,

 

I took a change of pace to keep things moving.  I commented in another thread about this but wanted to comment here also.

 

One of the things I said I wanted to do besides the GPU transcoding was to enable distributed transcoding.  So if you have multiple computers on your network then you would have the ability to run some of the ffmpeg jobs on these computers instead of just the MB3 server.

 

I have a working proto-type of this with a 2nd computer doing all the ffmpeg jobs.  I need to ad intelligence and other stuff but just getting this to work in proto-type functionality is a very good thing.  So at some point in the near future the MB3 server will be able to wake up other computers on the LAN and send them ffmpeg transcode (or bif - rokus trick) jobs.  This will be useful for real time viewing, syncing or bif generations.  Basically anything using ffmpeg.

 

Of course once we have GPU encoding working then this will work on the MB3 server or any other LAN computer too!

 

Good stuff,

 

Carlo 

 

I can totally see this enabling one server library to have a ton of people streaming from it if you can offload transcoding to other PCs.

 

One server to rule them all and with the transcoders stream them. :)

Link to comment
Share on other sites

We have long thought of the idea of allowing users to setup slave machines just for encoding purposes, or offloading a scheduled task. eventually we may decide to circle back to it

  • Like 1
Link to comment
Share on other sites

Yea, it's a great idea IMHO and if we can combine this down the road with GPU transcodes we'll have a hot solution.  Hopefully with any luck you won't have to circle back to it as the community will do it for you.

 

I'm concentrating on Windows at the moment since this is what I personally need but I'm trying to keep things generic as much as possible so it can easily be ported to different operating systems or used in a mixed environment IE NAS and windows or linux computers. The communications layer is built on top of named pipes which are more or less generic across operating systems. The communication could easily be replaced with sockets or WCF or anything else as it can be built modular.

 

Right now it's just an early proto-type that is rather "hard coded" but can easily be expanded.  What I really need to figure out (and get feedback from others) is the best way to evaluate which computers are available to transcode.

 

IE. Watch CPU use on each computer and assign newest transcodes to lowest CPU %. Or better yet and kind of where I was going is to be able to setup a config file of available clients along with a CPU benchmark score for said computer.  Then by being able to "sample" or watch the CPU% use of any computer at any given time we should be able to calculate the usefulness of said machine.  Then depending on priorities in the "queue" they can be delegated appropriately.

 

Not an easy task by any means but not the hardest either.  We just need to put some thought into how it should work, how priorities should work. What functions should be able to use it (sync, BIF, transcodes, etc)

 

I've got rough ideas but will solicit feedback from others to come up with a robust implementation.  It will be open source or can be rolled into MB3 for sure but shouldn't need any changes to MB3 source since it's really just an ffmpeg replacement (intercept/proxy). At least until we roll GPU into it.  Then we "might" need some internal changes but we'll cross that bridge when we come to it and it's a bit of a ways off.

 

Personally, I don't want Luke or Ebr to get involved during the early stages as they have enough on their plates.  But if there are any other good C# programmers in the ranks who are interested in something like this shoot me a PM and let me know. We can surely collaborate and work on it together.

 

Carlo

Edited by cayars
  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...