Jump to content

QSV - UHD770 vs iRISXe - Transcoding Performance


rbjtech

Recommended Posts

rbjtech

Hi All,

I got a new laptop recently with an i5 1245U CPU (2P cores, 8E cores) and it has the Intel iRIS Xe iGPU.

I wanted to see it's transcoding performance vs my Emby Servers i7-12700K with a UHD770 iGPU

The test was just a 240 second test from 10 mins in on a 25Mbit 4K HDR HEVC file to an 8Mbit 1080p SDR h264 with Tonemapping.  Audio just copied to keep it out the equation. 

To my surprise - it's identical (~200fps!) - suggesting it's using exactly the same enc/dec silicon inside the chips. (maybe?)

I used the emby ffmpeg build and config (modified to write the file, not stream it)

The ffmpeg logs/output (+ the syntax I ran) is below for those interested.

 

irisXe.txt

uhd770.txt

 

I'm excited to see how the dGPU's (Xe MAX) from Intel (A380 etc) perform now - as they will have dedicated memory etc - so maybe they will be even faster - plus of course AV1 encoding and deep link integration with existing iGPU's ...

Anyway - that's it - it's not a great test as it's just one sample - but just thought I'd report on my findings.

@softworkz

 

Edited by rbjtech
  • Like 4
Link to comment
Share on other sites

4 hours ago, rbjtech said:

To my surprise - it's identical (~200fps!) - suggesting it's using exactly the same enc/dec silicon inside the chips. (maybe?)

I think it is just a branding difference. When you click on the question mark on the CPU specs page, you get:

image.png.b71e9d73100fdac31294de220ed07218.png

image.png.b0da0cfd2a8fdc6586d0bf14f85b8ad2.png

 

  • Like 1
Link to comment
Share on other sites

5 hours ago, rbjtech said:

I got a new laptop recently with an i5 1245U CPU (2P cores, 8E cores) and it has the Intel iRIS Xe iGPU.

I suppose it's not one of those laptops which have an additional discrete GPU (Xe Max)?

5 hours ago, rbjtech said:

I'm excited to see how the dGPU's (Xe MAX) from Intel (A380 etc) perform now - as they will have dedicated memory etc - so maybe they will be even faster

Xe Max were the first dGPUs which had moderate success only ("DG1").

The ones you mention (like A350, A380, etc) are from the new generation ("DG2"), previously named "Alchimist", now named "Arc".

I'm excited as well. I'll get one in the next few days.. 🙂 

  • Like 2
Link to comment
Share on other sites

rbjtech
1 hour ago, softworkz said:

I think it is just a branding difference. When you click on the question mark on the CPU specs page, you get:

image.png.b71e9d73100fdac31294de220ed07218.png

image.png.b0da0cfd2a8fdc6586d0bf14f85b8ad2.png

 

I think you are right - it's just a branding difference - that would also explain the identical performance ;)

I had a look to get an A380 earlier this week (the Asus one) but I would have to import from the US - so I'll just wait for them to appear in the UK, hopefully not at silly inflated prices as they may be a fantastic alternative to Nvidia cards for transcoding offloading on older CPU's, need to keep that quiet until I get one - haha ..

Edited by rbjtech
Link to comment
Share on other sites

RanmaCanada

I usually do mention that all cpu's per generation SHOULD have the same ASICS.  I am happy that we have confirmation that the wikipedia entry for quicksync is not lying.

Link to comment
Share on other sites

Yes, but there can be different counts of them per CPU, they can run at different clock rates, have differently wide and/or fast memory access and a different number of shaders, which means that you cannot assume that you'll get the same performance, now matter which CPU you choose from a certain generation.

Link to comment
Share on other sites

rbjtech
4 hours ago, softworkz said:

Yes, but there can be different counts of them per CPU, they can run at different clock rates, have differently wide and/or fast memory access and a different number of shaders, which means that you cannot assume that you'll get the same performance, now matter which CPU you choose from a certain generation.

From a 3D perspective yes, but from an ASICS perspective, as per Ranma's thoughts, I believe they are the same regardless of the CPU specification within the same generation.

edit - Doh! - I've actually just remembered that I build my other half a new PC a few months ago as well - that is a i3 12100 - that has a UHD730 in it - so I'll run exactly the same test on that and I highly suspect I'll get exactly the same fps again.

I'll follow up shortly ..

 

Link to comment
Share on other sites

9 hours ago, rbjtech said:

From a 3D perspective yes, but from an ASICS perspective, as per Ranma's thoughts, I believe they are the same regardless of the CPU specification within the same generation.

Hehe, if it only was so simple...

...but it's more complicated by magnitudes.

First of all - as mentioned above - the count of asics can be different. For example, look at this document: https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-kbl-vol04-configurations.pdf => Page 5: GT2 and GT3 configurations

Second: there are different ways to do decoding, encoding and processing: https://github.com/intel/media-driver#components-and-features
Some of them are using shader kernels, which means that the amount of available shaders is also relevant for media acceleration.

  • Thanks 1
Link to comment
Share on other sites

9 hours ago, rbjtech said:

I've actually just remembered that I build my other half a new PC a few months ago as well - that is a i3 12100 - that has a UHD730 in it - so I'll run exactly the same test on that and I highly suspect I'll get exactly the same fps again.

You should also compare two parallel transcoding operations on each system.

  • Like 1
Link to comment
Share on other sites

rbjtech
2 minutes ago, softworkz said:

You should also compare two parallel transcoding operations on each system.

I just ran a dual parallel transcode on the iRIS Xe as I had it open - it does now hit 100% on the GPU - single thread only hit about 70%

fps dropped to 128fps each (256 fps overall)

3 at once dropped to 85fps.  (255 fps overall)

4 at once dropped to 64fps (256 fps overall)

So it seems to be scaling very well considering it's reading the same source file and writing to the same NVME drive. 

I guess if this was extrapolated out - then I'd get back to my ~10 x 4K  24fps 'streams' on any flavour of UHD7xx/iRIS Xe - which is mighty impressive for an iGPU on a laptop !

during those 4 transcodes - cpu was @ 30% (@1.47Ghz) 

  • Like 2
Link to comment
Share on other sites

  • 1 month later...
embylad892746
On 8/24/2022 at 6:21 PM, softworkz said:

I suppose it's not one of those laptops which have an additional discrete GPU (Xe Max)?

Xe Max were the first dGPUs which had moderate success only ("DG1").

The ones you mention (like A350, A380, etc) are from the new generation ("DG2"), previously named "Alchimist", now named "Arc".

I'm excited as well. I'll get one in the next few days.. 🙂 

@softworkz any updates you can give on this?! How did the a380 perform? E.g simultaneous 4k transcodes? I would be super interested in buying a380 if it could surpass even the uhd770 which I’m assuming it must. 

Link to comment
Share on other sites

  • 2 months later...
embylad892746
On 10/1/2022 at 11:09 AM, softworkz said:

I'm still waiting...

still waiting? can you share/have you already shared your results of a380? Im really curious.

Link to comment
Share on other sites

VirulentPip
23 hours ago, embylad892746 said:

still waiting? can you share/have you already shared your results of a380? Im really curious.

The link shared above shows my results of the A380. 

Link to comment
Share on other sites

  • 8 months later...
bcm00re
On 8/25/2022 at 3:26 AM, rbjtech said:

I've actually just remembered that I build my other half a new PC a few months ago as well - that is a i3 12100 - that has a UHD730 in it - so I'll run exactly the same test on that and I highly suspect I'll get exactly the same fps again.

I'll follow up shortly ..

 

Did you ever compare performance of your i5-1245U to the i3-12100?

Edited by bcm00re
Link to comment
Share on other sites

rbjtech
4 hours ago, bcm00re said:

Did you ever compare performance of your i5-1245U to the i3-12100?

Not directly no, but the iGPU ASICS (the bit that does the transcoding) in the same generation of CPU are almost identical in performance terms.   The frequency and number of EU's are largely irrelevant as they are used for 3D work, not media pipelines.   

Link to comment
Share on other sites

14 hours ago, rbjtech said:

Not directly no, but the iGPU ASICS (the bit that does the transcoding) in the same generation of CPU are almost identical in performance terms.   The frequency and number of EU's are largely irrelevant as they are used for 3D work, not media pipelines.   

Oh, that's not correct. Shaders are used by several encoders. 

Please take a look at the following table: https://github.com/intel/media-driver#decodingencoding-features

"Es" means that shaders are used and Emby prefers this mode over the VDEnc modes ("E").

The table above is only for VAAPI, so don't get confused by the newer CPU gens only showing "E". QuickSync/OneVPL is a different story and these include their own (more capable) shader kernels for encoding.

  • Thanks 1
Link to comment
Share on other sites

rbjtech
13 hours ago, softworkz said:

Oh, that's not correct. Shaders are used by several encoders. 

Please take a look at the following table: https://github.com/intel/media-driver#decodingencoding-features

"Es" means that shaders are used and Emby prefers this mode over the VDEnc modes ("E").

The table above is only for VAAPI, so don't get confused by the newer CPU gens only showing "E". QuickSync/OneVPL is a different story and these include their own (more capable) shader kernels for encoding.

Ah that's interesting - thankyou for the correction.

So the EU's are used for some codecs only ? 

UHD770(DG1) - uses them for HEVC - mostly (if I'm reading the table correctly)

ARC (DG2) - does not use them, or uses it's own dedicated pipeline... ?

---

So - if that is the case, then technically the laptop in the original post should encode/decode hevc faster .. but .. the number of EU's may not be a factor .. it may just use 'some' of the EU's - in this case it's 32 vs 80 - but the UHD770 runs at a faster core ...

When I get a chance, I'll setup a more detailed experiement as I'm still on the fence with ARC (DG2)..

Thanks for the Info @softworkz - always insightful responses 👍

  

Quote

Processor Graphics

Processor Graphics  
Intel® Iris® Xe Graphics eligible
Graphics Max Dynamic Frequency 
1.20 GHz
Graphics Output 
eDP 1.4b, DP 1.4a, HDMI 2.1
Execution Units 
80
Max Resolution (HDMI) 
4096 x 2304 @ 60Hz
Max Resolution (DP)‡ 
7680 x 4320 @ 60Hz
Max Resolution (eDP - Integrated Flat Panel)‡ 
4096 x 2304 @ 120Hz
DirectX* Support 
12.1
OpenGL* Support 
4.6
OpenCL* Support 
3.0
Multi-Format Codec Engines 
2
Intel® Quick Sync Video 
Yes
# of Displays Supported 
4
Device ID
0x46A8

 

Quote

Processor Graphics

Processor Graphics  
Intel® UHD Graphics 770
Graphics Base Frequency 
300 MHz
Graphics Max Dynamic Frequency 
1.50 GHz
Graphics Output 
eDP 1.4b, DP 1.4a, HDMI 2.1
Execution Units 
32
Max Resolution (HDMI) 
4096 x 2160 @ 60Hz
Max Resolution (DP)‡ 
7680 x 4320 @ 60Hz
Max Resolution (eDP - Integrated Flat Panel)‡ 
5120 x 3200 @ 120Hz
DirectX* Support 
12
OpenGL* Support 
4.5
OpenCL* Support 
3.0
Multi-Format Codec Engines 
2
Intel® Quick Sync Video 
Yes
Intel® Clear Video HD Technology 
Yes
# of Displays Supported 
4
Device ID
0x4680

 

Edited by rbjtech
Link to comment
Share on other sites

7 hours ago, rbjtech said:

Ah that's interesting - thankyou for the correction.

So the EU's are used for some codecs only ? 

UHD770(DG1) - uses them for HEVC - mostly (if I'm reading the table correctly)

ARC (DG2) - does not use them, or uses it's own dedicated pipeline... ?

As mentioned above. this is all about VAAPI - not QuickSync. (MSDK/OneVPL). The latter have their own shader kernels as those encoders are much more capable.

This leads to the following order from left-to-right increasing capability (encoder features and quality):

     Fixed function block (VDEnc) >>> VAAPI EU Kernels >>> QuickSync Kernels 

VDEnc is intended to be efficient and resource saving. Examples:

  • for not affecting game performance when streaming screen casts
  • for being able to do long video conferences from a laptop

It's not the best choice in terms of quality, that's why Emby doesn't use it by default (with QuickSync encoders).
But you can activate it in the H.264 QSV Encoder parameters:

  • Check the "Low Power Encoding" option
7 hours ago, rbjtech said:

So - if that is the case, then technically the laptop in the original post should encode/decode hevc faster .. but .. the number of EU's may not be a factor .. it may just use 'some' of the EU's - in this case it's 32 vs 80 - but the UHD770 runs at a faster core ...

It's not only (or not necessarily) a question of faster vs. slower. You need to take quality into account. VDEnc is fast and simple but it's limited in features and quality.

I'm don't really know much regarding the amount of shader units being used and how flexible that is in terms of assignment. I would suspect that it also depends on the output image size. Typically the frames are divided into smaller sections for parallelism and their sizes are depending on the encoder logic and aren't chosen arbitrarily. This means that for a larger frame size, more shader units could possibly be used. 

What also must not be forgotten with regards to encoding performance on different CPU models is clock rate (I'm not sure whether VDEnc blocks have different clock rates on different models, but I would assume so), and lastly, there's also memory bandwidth which depends on CPU clock rate and the installed RAM's supported rates (or dual vs. single-channel)..

It's many factors which come into play. Also, transcoding very often involves CPU-bound tasks, of many different kinds.
Hence, my overall conclusion (wrt Emby): The CPU model matters a lot! You cannot say an i3 would be equally suitable to an i7 because it has the same graphics specs on paper...

  • Like 2
Link to comment
Share on other sites

  • 2 weeks later...
bcm00re
On 8/24/2022 at 5:58 AM, rbjtech said:

The test was just a 240 second test from 10 mins in on a 25Mbit 4K HDR HEVC file to an 8Mbit 1080p SDR h264 with Tonemapping.  Audio just copied to keep it out the equation. 

To my surprise - it's identical (~200fps!) - suggesting it's using exactly the same enc/dec silicon inside the chips.

I wonder if the use of relatively short clips skewed your results.  I did some testing last night and saw some pretty notable differences.   See my thread in this same Hardware forum for more details.

Edited by bcm00re
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...