Jump to content

Hardware accelerated tone mapping not working on Intel ARC A380


Recommended Posts

Posted (edited)

tl;dr hardware accelerated tone mapping does not work with Emby on an Intel ARC A380. Regular hardware acceleration (tonemapping disabled) works. This problem appears unique to the Emby build of ffmpeg, as it works on the native Ubuntu 24.10 and Jellyfin builds of ffmpeg.

First, I am not sure if this is a supported setup, so I will understand if you can't help due to me running Emby in a VM under Proxmox. However, I believe it should work (and does work with Jellyfin), so please bear with me for a bit here.

Setup:

  • Bare metal: Supermicro SYS-6028R-TR (2x Intel Xeon E5-2690 v3)
  • GPU: ASRock A380 LP 6G (updated to latest firmware with Windows driver 32.0.101.6297), also tried with a ASRock Challenger A380 but see the exact same behavior
  • OS: Proxmox 8.2.7 (6.8.12-1-pve), no i915 driver/modules installed (GPU passed through to VM, not docker/LXC)
  • VM: Ubuntu 24.10 (6.11.0-9-generic), Ubuntu 24.04.1 (6.8.0-49-generic), and Debian 12 (6.11.5-1~bpo12+1) all behave the same
  • Emby: 4.8.10.0 installed from .deb

The GuC and HuC firmware loaded successfully during VM startup:

Spoiler
root@arcubuntu2410:~# dmesg | grep "firmware i915"
[    6.907034] i915 0000:01:00.0: [drm] Finished loading DMC firmware i915/dg2_dmc_ver2_08.bin (v2.8)
[    7.086485] i915 0000:01:00.0: [drm] GT0: GuC firmware i915/dg2_guc_70.bin version 70.29.2
[    7.086490] i915 0000:01:00.0: [drm] GT0: HuC firmware i915/dg2_huc_gsc.bin version 7.10.16

root@arcubuntu2410:~# cat /sys/kernel/debug/dri/0/gt0/uc/huc_info
HuC firmware: i915/dg2_huc_gsc.bin
        status: RUNNING
        version: found 7.10.16
        uCode: 0 bytes
        RSA: 0 bytes
HuC status: 0x00164001

root@arcubuntu2410:~# cat /sys/kernel/debug/dri/0/gt0/uc/guc_info
GuC firmware: i915/dg2_guc_70.bin
        status: RUNNING
        version: found 70.29.2
        uCode: 368896 bytes
        RSA: 384 bytes
GuC status 0x800300ec:
        Bootrom status = 0x76
        uKernel status = 0x0
        MIA Core status = 0x3
Scratch registers:
         0:     0x0
         1:     0xb03d7
         2:     0x42c800
         3:     0x4
         4:     0x40
         5:     0x3a0
         6:     0x56a50005
         7:     0x0
         8:     0x0
         9:     0x0
        10:     0x0
        11:     0x0
        12:     0x0
        13:     0x0
        14:     0x0
        15:     0x0

GuC logging stats:
        Relay full count: 0
        DEBUG:  flush count          0, overflow count          0
        CRASH:  flush count          0, overflow count          0
        CAPTURE:        flush count          0, overflow count          0
CT enabled
H2G Space: 2080
Head: 503
Tail: 503
G2H Space: 12284
Head: 97
Tail: 97
GuC Submission API Version: 1.13.4
GuC Number Outstanding Submission G2H: 0
GuC tasklet count: 0
Requests in GuC submit tasklet:

Global scheduling policies:
  DPC promote time   = 500000
  Max num work items = 15
  Flags              = 0

 

The error in the ffmpeg-transcode log file is "Failed to get number of OpenCL platforms: -1001.". I can replicate that by running /opt/emby-server/bin/emby-ffmpeg directly as the emby user:

image.png.86ec10578fa7150520ea46486e7b1a31.png

However, running the native ffmpeg package from Ubuntu 24.10 seems to find & initialize the GPU just fine:

image.png.1140de05bfab85e78fe1e5998763b930.png

 

Jellyfin seems to handle transcode perfectly (including tonemapping) in an Ubuntu VM:

Spoiler

image.png.444d9fbe4ef88ba93598dc0a11de0345.png

image.png.e14e3eae2c98ed780c126250eb2af10a.png

image.png.911a57407d1acb71a74985ce25c3124b.png

 

It seems like the ffmpeg build used by Emby is not initializing the ARC, while the native Ubuntu and Jellyfin builds do.

Here is some other system info that might help. Let me know if you need anything else.

Spoiler
arc@arcubuntu2410:~$ ls -alR /dev/dri/
/dev/dri/:
total 0
drwxr-xr-x  3 root root        100 Nov 20 17:27 .
drwxr-xr-x 21 root root       4300 Nov 20 17:27 ..
drwxr-xr-x  2 root root         80 Nov 20 17:27 by-path
crw-rw----  1 root video  226,   0 Nov 20 17:27 card0
crw-rw----  1 root render 226, 128 Nov 20 17:27 renderD128

/dev/dri/by-path:
total 0
drwxr-xr-x 2 root root  80 Nov 20 17:27 .
drwxr-xr-x 3 root root 100 Nov 20 17:27 ..
lrwxrwxrwx 1 root root   8 Nov 20 17:27 pci-0000:01:00.0-card -> ../card0
lrwxrwxrwx 1 root root  13 Nov 20 17:27 pci-0000:01:00.0-render -> ../renderD128

emby@arcubuntu2410:~$ clinfo
Number of platforms                               1
  Platform Name                                   Intel(R) OpenCL Graphics
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 3.0
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_byte_addressable_store cl_khr_device_uuid cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_extended_bit_ops cl_khr_suggested_local_work_size cl_intel_split_work_group_barrier cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_linkonce_odr cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_ext_float_atomics cl_khr_external_memory cl_intel_planar_yuv cl_intel_packed_yuv cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_bfloat16_conversions cl_intel_create_buffer_with_properties cl_intel_subgroup_local_block_io cl_intel_subgroup_matrix_multiply_accumulate cl_intel_subgroup_split_matrix_multiply_accumulate cl_khr_integer_dot_product cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info
  Platform Extensions with Version                cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_fp16                                                      0x400000 (1.0.0)
                                                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_intel_command_queue_families                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups                                               0x400000 (1.0.0)
                                                  cl_intel_required_subgroup_size                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups_short                                         0x400000 (1.0.0)
                                                  cl_khr_spir                                                      0x400000 (1.0.0)
                                                  cl_intel_accelerator                                             0x400000 (1.0.0)
                                                  cl_intel_driver_diagnostics                                      0x400000 (1.0.0)
                                                  cl_khr_priority_hints                                            0x400000 (1.0.0)
                                                  cl_khr_throttle_hints                                            0x400000 (1.0.0)
                                                  cl_khr_create_command_queue                                      0x400000 (1.0.0)
                                                  cl_intel_subgroups_char                                          0x400000 (1.0.0)
                                                  cl_intel_subgroups_long                                          0x400000 (1.0.0)
                                                  cl_khr_il_program                                                0x400000 (1.0.0)
                                                  cl_intel_mem_force_host_memory                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_extended_types                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_vote                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_ballot                                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_arithmetic                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle                                          0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle_relative                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_clustered_reduce                                 0x400000 (1.0.0)
                                                  cl_intel_device_attribute_query                                  0x400000 (1.0.0)
                                                  cl_khr_extended_bit_ops                                          0x400000 (1.0.0)
                                                  cl_khr_suggested_local_work_size                                 0x400000 (1.0.0)
                                                  cl_intel_split_work_group_barrier                                0x400000 (1.0.0)
                                                  cl_intel_spirv_media_block_io                                    0x400000 (1.0.0)
                                                  cl_intel_spirv_subgroups                                         0x400000 (1.0.0)
                                                  cl_khr_spirv_linkonce_odr                                        0x400000 (1.0.0)
                                                  cl_khr_spirv_no_integer_wrap_decoration                          0x400000 (1.0.0)
                                                  cl_intel_unified_shared_memory                                   0x400000 (1.0.0)
                                                  cl_khr_mipmap_image                                              0x400000 (1.0.0)
                                                  cl_khr_mipmap_image_writes                                       0x400000 (1.0.0)
                                                  cl_ext_float_atomics                                             0x400000 (1.0.0)
                                                  cl_khr_external_memory                                             0x9001 (0.9.1)
                                                  cl_intel_planar_yuv                                              0x400000 (1.0.0)
                                                  cl_intel_packed_yuv                                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_image2d_from_buffer                                       0x400000 (1.0.0)
                                                  cl_khr_depth_images                                              0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_intel_media_block_io                                          0x400000 (1.0.0)
                                                  cl_intel_bfloat16_conversions                                    0x400000 (1.0.0)
                                                  cl_intel_create_buffer_with_properties                           0x400000 (1.0.0)
                                                  cl_intel_subgroup_local_block_io                                 0x400000 (1.0.0)
                                                  cl_intel_subgroup_matrix_multiply_accumulate                     0x400000 (1.0.0)
                                                  cl_intel_subgroup_split_matrix_multiply_accumulate               0x400000 (1.0.0)
                                                  cl_khr_integer_dot_product                                       0x800000 (2.0.0)
                                                  cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                  cl_khr_gl_depth_images                                           0x400000 (1.0.0)
                                                  cl_khr_gl_event                                                  0x400000 (1.0.0)
                                                  cl_khr_gl_msaa_sharing                                           0x400000 (1.0.0)
                                                  cl_intel_va_api_media_sharing                                    0x400000 (1.0.0)
                                                  cl_intel_sharing_format_query                                    0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)
  Platform Numeric Version                        0xc00000 (3.0.0)
  Platform Extensions function suffix             INTEL
  Platform Host timer resolution                  1ns
  Platform External memory handle types           DMA buffer

  Platform Name                                   Intel(R) OpenCL Graphics
Number of devices                                 1
  Device Name                                     Intel(R) Arc(TM) A380 Graphics
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 3.0 NEO
  Device UUID                                     8680a556-0500-0000-0100-000000000000
  Driver UUID                                     32342e33-352e-3033-3038-373200000000
  Valid Device LUID                               No
  Device LUID                                     f095-0036fc7f0000
  Device Node Mask                                0
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  24.35.030872
  Device OpenCL C Version                         OpenCL C 1.2
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_int64                                                 0xc00000 (3.0.0)
                                                  __opencl_c_3d_image_writes                                       0xc00000 (3.0.0)
                                                  __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_read_write_images                                     0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_acq_rel                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_seq_cst                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_scope_all_devices                              0xc00000 (3.0.0)
                                                  __opencl_c_atomic_scope_device                                   0xc00000 (3.0.0)
                                                  __opencl_c_generic_address_space                                 0xc00000 (3.0.0)
                                                  __opencl_c_program_scope_global_variables                        0xc00000 (3.0.0)
                                                  __opencl_c_work_group_collective_functions                       0xc00000 (3.0.0)
                                                  __opencl_c_subgroups                                             0xc00000 (3.0.0)
                                                  __opencl_c_ext_fp32_global_atomic_add                            0xc00000 (3.0.0)
                                                  __opencl_c_ext_fp32_local_atomic_add                             0xc00000 (3.0.0)
                                                  __opencl_c_ext_fp32_global_atomic_min_max                        0xc00000 (3.0.0)
                                                  __opencl_c_ext_fp32_local_atomic_min_max                         0xc00000 (3.0.0)
                                                  __opencl_c_ext_fp16_global_atomic_load_store                     0xc00000 (3.0.0)
                                                  __opencl_c_ext_fp16_local_atomic_load_store                      0xc00000 (3.0.0)
                                                  __opencl_c_ext_fp16_global_atomic_min_max                        0xc00000 (3.0.0)
                                                  __opencl_c_ext_fp16_local_atomic_min_max                         0xc00000 (3.0.0)
                                                  __opencl_c_integer_dot_product_input_4x8bit                      0xc00000 (3.0.0)
                                                  __opencl_c_integer_dot_product_input_4x8bit_packed               0xc00000 (3.0.0)
  Latest conformance test passed                  v2024-02-27-00
  Device Type                                     GPU
  Device PCI bus info (KHR)                       PCI-E, 0000:01:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               128
  Max clock frequency                             2450MHz
  Device IP (Intel)                               0x30e0005 (12.224.5)
  Device ID (Intel)                               22181
  Slices (Intel)                                  1
  Sub-slices per slice (Intel)                    8
  EUs per sub-slice (Intel)                       16
  Threads per EU (Intel)                          8
  Feature capabilities (Intel)                    DP4A, DPAS
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             1024
  Preferred work group size multiple (device)     64
  Preferred work group size multiple (kernel)     64
  Max sub-groups per work group                   128
  Sub-group sizes (Intel)                         8, 16, 32
  Preferred / native vector sizes
    char                                                16 / 16
    short                                                8 / 8
    int                                                  4 / 4
    long                                                 1 / 1
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                1 / 1
    double                                               0 / 0        (n/a)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (n/a)
  Address bits                                    64, Little-Endian
  External memory handle types                    DMA buffer
  Global memory size                              6064541696 (5.648GiB)
  Error Correction support                        No
  Max memory allocation                           3032270848 (2.824GiB)
  Unified memory for Host and Device              No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Unified Shared Memory (USM)                     (cl_intel_unified_shared_memory)
  Host USM capabilities (Intel)                   USM access
  Device USM capabilities (Intel)                 USM access, USM atomic access
  Single-Device USM caps (Intel)                  USM access, USM atomic access
  Cross-Device USM caps (Intel)                   USM access, USM atomic access
  Shared System USM caps (Intel)                  (n/a)
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics
    SVM                                           64 bytes
    Global                                        64 bytes
    Local                                         64 bytes
  Atomic memory capabilities                      relaxed, acquire/release, sequentially-consistent, work-group scope, device scope, all-devices scope
  Atomic fence capabilities                       relaxed, acquire/release, sequentially-consistent, work-item scope, work-group scope, device scope, all-devices scope
  Max size for global variable                    65536 (64KiB)
  Preferred total size of global vars             3032270848 (2.824GiB)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        4194304 (4MiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            189516928 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   4 bytes
    Pitch alignment for 2D image buffers          4 pixels
    Max 2D image size                             16384x16384 pixels
    Max planar YUV image size                     16384x16128 pixels
    Max 3D image size                             16384x16384x2048 pixels
    Max number of read image args                 128
    Max number of write image args                128
    Max number of read/write image args           128
  Pipe support                                    No
  Max number of pipe args                         0
  Max active pipe reservations                    0
  Max pipe packet size                            0
  Local memory type                               Local
  Local memory size                               65536 (64KiB)
  Max number of constant args                     8
  Max constant buffer size                        3032270848 (2.824GiB)
  Generic address space support                   Yes
  Max size of kernel argument                     2048 (2KiB)
  Queue properties (on host)
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Device enqueue capabilities                     (n/a)
  Queue properties (on device)
    Out-of-order execution                        No
    Profiling                                     No
    Preferred size                                0
    Max size                                      0
  Max queues on device                            0
  Max events on device                            0
  Device queue families                           ccs                                                              (1)
                                                                                                 Queue properties  Out-of-order execution, Profiling
                                                                                                     Capabilities  create single-queue events, create cross-queue events
                                                  bcs                                                              (1)
                                                                                                 Queue properties  Out-of-order execution, Profiling
                                                                                                     Capabilities  create single-queue events, create cross-queue events
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      52ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       Yes
    Work-group collective functions               Yes
    Sub-group independent forward progress        No
    IL version                                    SPIR-V_1.3 SPIR-V_1.2 SPIR-V_1.1 SPIR-V_1.0
    ILs with version                              SPIR-V                                                           0x403000 (1.3.0)
                                                  SPIR-V                                                           0x402000 (1.2.0)
                                                  SPIR-V                                                           0x401000 (1.1.0)
                                                  SPIR-V                                                           0x400000 (1.0.0)
    SPIR versions                                 1.2
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                (n/a)
  Built-in kernels with version                   (n/a)
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_device_uuid cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_extended_bit_ops cl_khr_suggested_local_work_size cl_intel_split_work_group_barrier cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_linkonce_odr cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_ext_float_atomics cl_khr_external_memory cl_intel_planar_yuv cl_intel_packed_yuv cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_bfloat16_conversions cl_intel_create_buffer_with_properties cl_intel_subgroup_local_block_io cl_intel_subgroup_matrix_multiply_accumulate cl_intel_subgroup_split_matrix_multiply_accumulate cl_khr_integer_dot_product cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info
  Device Extensions with Version                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_fp16                                                      0x400000 (1.0.0)
                                                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_intel_command_queue_families                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups                                               0x400000 (1.0.0)
                                                  cl_intel_required_subgroup_size                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups_short                                         0x400000 (1.0.0)
                                                  cl_khr_spir                                                      0x400000 (1.0.0)
                                                  cl_intel_accelerator                                             0x400000 (1.0.0)
                                                  cl_intel_driver_diagnostics                                      0x400000 (1.0.0)
                                                  cl_khr_priority_hints                                            0x400000 (1.0.0)
                                                  cl_khr_throttle_hints                                            0x400000 (1.0.0)
                                                  cl_khr_create_command_queue                                      0x400000 (1.0.0)
                                                  cl_intel_subgroups_char                                          0x400000 (1.0.0)
                                                  cl_intel_subgroups_long                                          0x400000 (1.0.0)
                                                  cl_khr_il_program                                                0x400000 (1.0.0)
                                                  cl_intel_mem_force_host_memory                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_extended_types                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_vote                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_ballot                                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_arithmetic                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle                                          0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle_relative                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_clustered_reduce                                 0x400000 (1.0.0)
                                                  cl_intel_device_attribute_query                                  0x400000 (1.0.0)
                                                  cl_khr_extended_bit_ops                                          0x400000 (1.0.0)
                                                  cl_khr_suggested_local_work_size                                 0x400000 (1.0.0)
                                                  cl_intel_split_work_group_barrier                                0x400000 (1.0.0)
                                                  cl_intel_spirv_media_block_io                                    0x400000 (1.0.0)
                                                  cl_intel_spirv_subgroups                                         0x400000 (1.0.0)
                                                  cl_khr_spirv_linkonce_odr                                        0x400000 (1.0.0)
                                                  cl_khr_spirv_no_integer_wrap_decoration                          0x400000 (1.0.0)
                                                  cl_intel_unified_shared_memory                                   0x400000 (1.0.0)
                                                  cl_khr_mipmap_image                                              0x400000 (1.0.0)
                                                  cl_khr_mipmap_image_writes                                       0x400000 (1.0.0)
                                                  cl_ext_float_atomics                                             0x400000 (1.0.0)
                                                  cl_khr_external_memory                                             0x9001 (0.9.1)
                                                  cl_intel_planar_yuv                                              0x400000 (1.0.0)
                                                  cl_intel_packed_yuv                                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_image2d_from_buffer                                       0x400000 (1.0.0)
                                                  cl_khr_depth_images                                              0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_intel_media_block_io                                          0x400000 (1.0.0)
                                                  cl_intel_bfloat16_conversions                                    0x400000 (1.0.0)
                                                  cl_intel_create_buffer_with_properties                           0x400000 (1.0.0)
                                                  cl_intel_subgroup_local_block_io                                 0x400000 (1.0.0)
                                                  cl_intel_subgroup_matrix_multiply_accumulate                     0x400000 (1.0.0)
                                                  cl_intel_subgroup_split_matrix_multiply_accumulate               0x400000 (1.0.0)
                                                  cl_khr_integer_dot_product                                       0x800000 (2.0.0)
                                                  cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                  cl_khr_gl_depth_images                                           0x400000 (1.0.0)
                                                  cl_khr_gl_event                                                  0x400000 (1.0.0)
                                                  cl_khr_gl_msaa_sharing                                           0x400000 (1.0.0)
                                                  cl_intel_va_api_media_sharing                                    0x400000 (1.0.0)
                                                  cl_intel_sharing_format_query                                    0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Intel(R) OpenCL Graphics
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [INTEL]
  clCreateContext(NULL, ...) [default]            Success [INTEL]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Intel(R) OpenCL Graphics
    Device Name                                   Intel(R) Arc(TM) A380 Graphics
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Intel(R) OpenCL Graphics
    Device Name                                   Intel(R) Arc(TM) A380 Graphics
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Intel(R) OpenCL Graphics
    Device Name                                   Intel(R) Arc(TM) A380 Graphics

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.3.2
  ICD loader Profile                              OpenCL 3.0

 

Edited by DAVe3283
Fix screenshots that didn't upload right
Posted

Hi there, please attach the emby server, ffmpeg and hardware detection log files. Thanks !

Posted (edited)

Looks like tone mapping issues, try with vaapi tonemapping and without.

It look like you also disabled software tone mapping.

See how that works out.

Edited by Neminem
Posted

No dice, still kicks to CPU transcode if hardware tone mapping is enabled.

Hopefully I am setting things right. Transcoding settings for VAAPI:

Spoiler

image.thumb.png.b994bea316952451c6978ad62f4863c3.png

Then basically switch VAAPI off and QuickSync on for the other test.

Tone Mapping settings are the same for both:

Spoiler

image.png.92d0a7e2f1b32d7d5884273444dd7455.png

VAAPI ffmpeg-transcode-aaaa5ec3-fdd5-482d-b5c7-abacc52ff602_1.txtQuickSync ffmpeg-transcode-ad9b0a21-6fc2-4642-874d-c23a06ca601e_1.txtembyserver.txt

 

Posted

Intel Arc cards drivers are bugged with Windows and Unbuntu and won't work with Tone Mapping.

Plex and Jellyfin made a workaround somehow.
On Linux (none Unbuntu) you can avoid the bug by not using kernel 6.8.
On Windows you can fix it by using an old driver (forgot version).

I sadly don't think there is much you can do unless Intel fixes their mess or the Emby team makes the same workaround like Plex and Jellyfin.

I'm using an Arc card my self and i'm dreading the day Unraid updates their kernel. Who knows if the card will still work.

  • Like 1
  • 3 weeks later...
rotational467
Posted (edited)

I just added an A310 to my emby server and have been messing with this initially because of Handbrake refusing to see the qsv engine.  On Ubuntu 24.04, going from the 6.8 kernel to 6.11 (oem kernel) did not help.  Card firmware was flashed to latest on a Windows box first.  24.3.4 iHD drivers, latest firmwares, compute packages etc. from Intel's repo installed.

Neither latest ffmpeg nightly nor handbrake 1.9.0 (both built from source) can see the qsv engine for encoding.  ffmepg via VAAPI does work, and that looks like what jellyfin is doing based on the screenshot above.  It works on Emby as well.

@DAVe3283 You need to set Tone Mapping Method to "Disabled" under Intel Quick Sync.  The VAAPI setting is correct.

 

hwtonemapping_arc.jpg

edit - I don't know what emby secret sauce is able to access qsv on the arc for encoding, I can't get anything else to do it.

Edited by rotational467
Posted
On 11/21/2024 at 7:49 PM, DAVe3283 said:

tl;dr hardware accelerated tone mapping does not work with Emby on an Intel ARC A380. Regular hardware acceleration (tonemapping disabled) works. This problem appears unique to the Emby build of ffmpeg, as it works on the native Ubuntu 24.10 and Jellyfin builds of ffmpeg.

First, I am not sure if this is a supported setup, so I will understand if you can't help due to me running Emby in a VM under Proxmox. However, I believe it should work (and does work with Jellyfin), so please bear with me for a bit here.

Setup:

  • Bare metal: Supermicro SYS-6028R-TR (2x Intel Xeon E5-2690 v3)
  • GPU: ASRock A380 LP 6G (updated to latest firmware with Windows driver 32.0.101.6297), also tried with a ASRock Challenger A380 but see the exact same behavior
  • OS: Proxmox 8.2.7 (6.8.12-1-pve), no i915 driver/modules installed (GPU passed through to VM, not docker/LXC)
  • VM: Ubuntu 24.10 (6.11.0-9-generic), Ubuntu 24.04.1 (6.8.0-49-generic), and Debian 12 (6.11.5-1~bpo12+1) all behave the same
  • Emby: 4.8.10.0 installed from .deb

The GuC and HuC firmware loaded successfully during VM startup:

  Reveal hidden contents
root@arcubuntu2410:~# dmesg | grep "firmware i915"
[    6.907034] i915 0000:01:00.0: [drm] Finished loading DMC firmware i915/dg2_dmc_ver2_08.bin (v2.8)
[    7.086485] i915 0000:01:00.0: [drm] GT0: GuC firmware i915/dg2_guc_70.bin version 70.29.2
[    7.086490] i915 0000:01:00.0: [drm] GT0: HuC firmware i915/dg2_huc_gsc.bin version 7.10.16

root@arcubuntu2410:~# cat /sys/kernel/debug/dri/0/gt0/uc/huc_info
HuC firmware: i915/dg2_huc_gsc.bin
        status: RUNNING
        version: found 7.10.16
        uCode: 0 bytes
        RSA: 0 bytes
HuC status: 0x00164001

root@arcubuntu2410:~# cat /sys/kernel/debug/dri/0/gt0/uc/guc_info
GuC firmware: i915/dg2_guc_70.bin
        status: RUNNING
        version: found 70.29.2
        uCode: 368896 bytes
        RSA: 384 bytes
GuC status 0x800300ec:
        Bootrom status = 0x76
        uKernel status = 0x0
        MIA Core status = 0x3
Scratch registers:
         0:     0x0
         1:     0xb03d7
         2:     0x42c800
         3:     0x4
         4:     0x40
         5:     0x3a0
         6:     0x56a50005
         7:     0x0
         8:     0x0
         9:     0x0
        10:     0x0
        11:     0x0
        12:     0x0
        13:     0x0
        14:     0x0
        15:     0x0

GuC logging stats:
        Relay full count: 0
        DEBUG:  flush count          0, overflow count          0
        CRASH:  flush count          0, overflow count          0
        CAPTURE:        flush count          0, overflow count          0
CT enabled
H2G Space: 2080
Head: 503
Tail: 503
G2H Space: 12284
Head: 97
Tail: 97
GuC Submission API Version: 1.13.4
GuC Number Outstanding Submission G2H: 0
GuC tasklet count: 0
Requests in GuC submit tasklet:

Global scheduling policies:
  DPC promote time   = 500000
  Max num work items = 15
  Flags              = 0

 

The error in the ffmpeg-transcode log file is "Failed to get number of OpenCL platforms: -1001.". I can replicate that by running /opt/emby-server/bin/emby-ffmpeg directly as the emby user:

image.png.86ec10578fa7150520ea46486e7b1a31.png

However, running the native ffmpeg package from Ubuntu 24.10 seems to find & initialize the GPU just fine:

image.png.1140de05bfab85e78fe1e5998763b930.png

 

Jellyfin seems to handle transcode perfectly (including tonemapping) in an Ubuntu VM:

  Reveal hidden contents

image.png.444d9fbe4ef88ba93598dc0a11de0345.png

image.png.e14e3eae2c98ed780c126250eb2af10a.png

image.png.911a57407d1acb71a74985ce25c3124b.png

 

It seems like the ffmpeg build used by Emby is not initializing the ARC, while the native Ubuntu and Jellyfin builds do.

Here is some other system info that might help. Let me know if you need anything else.

  Reveal hidden contents
arc@arcubuntu2410:~$ ls -alR /dev/dri/
/dev/dri/:
total 0
drwxr-xr-x  3 root root        100 Nov 20 17:27 .
drwxr-xr-x 21 root root       4300 Nov 20 17:27 ..
drwxr-xr-x  2 root root         80 Nov 20 17:27 by-path
crw-rw----  1 root video  226,   0 Nov 20 17:27 card0
crw-rw----  1 root render 226, 128 Nov 20 17:27 renderD128

/dev/dri/by-path:
total 0
drwxr-xr-x 2 root root  80 Nov 20 17:27 .
drwxr-xr-x 3 root root 100 Nov 20 17:27 ..
lrwxrwxrwx 1 root root   8 Nov 20 17:27 pci-0000:01:00.0-card -> ../card0
lrwxrwxrwx 1 root root  13 Nov 20 17:27 pci-0000:01:00.0-render -> ../renderD128

emby@arcubuntu2410:~$ clinfo
Number of platforms                               1
  Platform Name                                   Intel(R) OpenCL Graphics
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 3.0
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_byte_addressable_store cl_khr_device_uuid cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_extended_bit_ops cl_khr_suggested_local_work_size cl_intel_split_work_group_barrier cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_linkonce_odr cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_ext_float_atomics cl_khr_external_memory cl_intel_planar_yuv cl_intel_packed_yuv cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_bfloat16_conversions cl_intel_create_buffer_with_properties cl_intel_subgroup_local_block_io cl_intel_subgroup_matrix_multiply_accumulate cl_intel_subgroup_split_matrix_multiply_accumulate cl_khr_integer_dot_product cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info
  Platform Extensions with Version                cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_fp16                                                      0x400000 (1.0.0)
                                                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_intel_command_queue_families                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups                                               0x400000 (1.0.0)
                                                  cl_intel_required_subgroup_size                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups_short                                         0x400000 (1.0.0)
                                                  cl_khr_spir                                                      0x400000 (1.0.0)
                                                  cl_intel_accelerator                                             0x400000 (1.0.0)
                                                  cl_intel_driver_diagnostics                                      0x400000 (1.0.0)
                                                  cl_khr_priority_hints                                            0x400000 (1.0.0)
                                                  cl_khr_throttle_hints                                            0x400000 (1.0.0)
                                                  cl_khr_create_command_queue                                      0x400000 (1.0.0)
                                                  cl_intel_subgroups_char                                          0x400000 (1.0.0)
                                                  cl_intel_subgroups_long                                          0x400000 (1.0.0)
                                                  cl_khr_il_program                                                0x400000 (1.0.0)
                                                  cl_intel_mem_force_host_memory                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_extended_types                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_vote                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_ballot                                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_arithmetic                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle                                          0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle_relative                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_clustered_reduce                                 0x400000 (1.0.0)
                                                  cl_intel_device_attribute_query                                  0x400000 (1.0.0)
                                                  cl_khr_extended_bit_ops                                          0x400000 (1.0.0)
                                                  cl_khr_suggested_local_work_size                                 0x400000 (1.0.0)
                                                  cl_intel_split_work_group_barrier                                0x400000 (1.0.0)
                                                  cl_intel_spirv_media_block_io                                    0x400000 (1.0.0)
                                                  cl_intel_spirv_subgroups                                         0x400000 (1.0.0)
                                                  cl_khr_spirv_linkonce_odr                                        0x400000 (1.0.0)
                                                  cl_khr_spirv_no_integer_wrap_decoration                          0x400000 (1.0.0)
                                                  cl_intel_unified_shared_memory                                   0x400000 (1.0.0)
                                                  cl_khr_mipmap_image                                              0x400000 (1.0.0)
                                                  cl_khr_mipmap_image_writes                                       0x400000 (1.0.0)
                                                  cl_ext_float_atomics                                             0x400000 (1.0.0)
                                                  cl_khr_external_memory                                             0x9001 (0.9.1)
                                                  cl_intel_planar_yuv                                              0x400000 (1.0.0)
                                                  cl_intel_packed_yuv                                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_image2d_from_buffer                                       0x400000 (1.0.0)
                                                  cl_khr_depth_images                                              0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_intel_media_block_io                                          0x400000 (1.0.0)
                                                  cl_intel_bfloat16_conversions                                    0x400000 (1.0.0)
                                                  cl_intel_create_buffer_with_properties                           0x400000 (1.0.0)
                                                  cl_intel_subgroup_local_block_io                                 0x400000 (1.0.0)
                                                  cl_intel_subgroup_matrix_multiply_accumulate                     0x400000 (1.0.0)
                                                  cl_intel_subgroup_split_matrix_multiply_accumulate               0x400000 (1.0.0)
                                                  cl_khr_integer_dot_product                                       0x800000 (2.0.0)
                                                  cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                  cl_khr_gl_depth_images                                           0x400000 (1.0.0)
                                                  cl_khr_gl_event                                                  0x400000 (1.0.0)
                                                  cl_khr_gl_msaa_sharing                                           0x400000 (1.0.0)
                                                  cl_intel_va_api_media_sharing                                    0x400000 (1.0.0)
                                                  cl_intel_sharing_format_query                                    0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)
  Platform Numeric Version                        0xc00000 (3.0.0)
  Platform Extensions function suffix             INTEL
  Platform Host timer resolution                  1ns
  Platform External memory handle types           DMA buffer

  Platform Name                                   Intel(R) OpenCL Graphics
Number of devices                                 1
  Device Name                                     Intel(R) Arc(TM) A380 Graphics
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 3.0 NEO
  Device UUID                                     8680a556-0500-0000-0100-000000000000
  Driver UUID                                     32342e33-352e-3033-3038-373200000000
  Valid Device LUID                               No
  Device LUID                                     f095-0036fc7f0000
  Device Node Mask                                0
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  24.35.030872
  Device OpenCL C Version                         OpenCL C 1.2
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_int64                                                 0xc00000 (3.0.0)
                                                  __opencl_c_3d_image_writes                                       0xc00000 (3.0.0)
                                                  __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_read_write_images                                     0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_acq_rel                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_order_seq_cst                                  0xc00000 (3.0.0)
                                                  __opencl_c_atomic_scope_all_devices                              0xc00000 (3.0.0)
                                                  __opencl_c_atomic_scope_device                                   0xc00000 (3.0.0)
                                                  __opencl_c_generic_address_space                                 0xc00000 (3.0.0)
                                                  __opencl_c_program_scope_global_variables                        0xc00000 (3.0.0)
                                                  __opencl_c_work_group_collective_functions                       0xc00000 (3.0.0)
                                                  __opencl_c_subgroups                                             0xc00000 (3.0.0)
                                                  __opencl_c_ext_fp32_global_atomic_add                            0xc00000 (3.0.0)
                                                  __opencl_c_ext_fp32_local_atomic_add                             0xc00000 (3.0.0)
                                                  __opencl_c_ext_fp32_global_atomic_min_max                        0xc00000 (3.0.0)
                                                  __opencl_c_ext_fp32_local_atomic_min_max                         0xc00000 (3.0.0)
                                                  __opencl_c_ext_fp16_global_atomic_load_store                     0xc00000 (3.0.0)
                                                  __opencl_c_ext_fp16_local_atomic_load_store                      0xc00000 (3.0.0)
                                                  __opencl_c_ext_fp16_global_atomic_min_max                        0xc00000 (3.0.0)
                                                  __opencl_c_ext_fp16_local_atomic_min_max                         0xc00000 (3.0.0)
                                                  __opencl_c_integer_dot_product_input_4x8bit                      0xc00000 (3.0.0)
                                                  __opencl_c_integer_dot_product_input_4x8bit_packed               0xc00000 (3.0.0)
  Latest conformance test passed                  v2024-02-27-00
  Device Type                                     GPU
  Device PCI bus info (KHR)                       PCI-E, 0000:01:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               128
  Max clock frequency                             2450MHz
  Device IP (Intel)                               0x30e0005 (12.224.5)
  Device ID (Intel)                               22181
  Slices (Intel)                                  1
  Sub-slices per slice (Intel)                    8
  EUs per sub-slice (Intel)                       16
  Threads per EU (Intel)                          8
  Feature capabilities (Intel)                    DP4A, DPAS
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             1024
  Preferred work group size multiple (device)     64
  Preferred work group size multiple (kernel)     64
  Max sub-groups per work group                   128
  Sub-group sizes (Intel)                         8, 16, 32
  Preferred / native vector sizes
    char                                                16 / 16
    short                                                8 / 8
    int                                                  4 / 4
    long                                                 1 / 1
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                1 / 1
    double                                               0 / 0        (n/a)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (n/a)
  Address bits                                    64, Little-Endian
  External memory handle types                    DMA buffer
  Global memory size                              6064541696 (5.648GiB)
  Error Correction support                        No
  Max memory allocation                           3032270848 (2.824GiB)
  Unified memory for Host and Device              No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Unified Shared Memory (USM)                     (cl_intel_unified_shared_memory)
  Host USM capabilities (Intel)                   USM access
  Device USM capabilities (Intel)                 USM access, USM atomic access
  Single-Device USM caps (Intel)                  USM access, USM atomic access
  Cross-Device USM caps (Intel)                   USM access, USM atomic access
  Shared System USM caps (Intel)                  (n/a)
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics
    SVM                                           64 bytes
    Global                                        64 bytes
    Local                                         64 bytes
  Atomic memory capabilities                      relaxed, acquire/release, sequentially-consistent, work-group scope, device scope, all-devices scope
  Atomic fence capabilities                       relaxed, acquire/release, sequentially-consistent, work-item scope, work-group scope, device scope, all-devices scope
  Max size for global variable                    65536 (64KiB)
  Preferred total size of global vars             3032270848 (2.824GiB)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        4194304 (4MiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            189516928 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   4 bytes
    Pitch alignment for 2D image buffers          4 pixels
    Max 2D image size                             16384x16384 pixels
    Max planar YUV image size                     16384x16128 pixels
    Max 3D image size                             16384x16384x2048 pixels
    Max number of read image args                 128
    Max number of write image args                128
    Max number of read/write image args           128
  Pipe support                                    No
  Max number of pipe args                         0
  Max active pipe reservations                    0
  Max pipe packet size                            0
  Local memory type                               Local
  Local memory size                               65536 (64KiB)
  Max number of constant args                     8
  Max constant buffer size                        3032270848 (2.824GiB)
  Generic address space support                   Yes
  Max size of kernel argument                     2048 (2KiB)
  Queue properties (on host)
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Device enqueue capabilities                     (n/a)
  Queue properties (on device)
    Out-of-order execution                        No
    Profiling                                     No
    Preferred size                                0
    Max size                                      0
  Max queues on device                            0
  Max events on device                            0
  Device queue families                           ccs                                                              (1)
                                                                                                 Queue properties  Out-of-order execution, Profiling
                                                                                                     Capabilities  create single-queue events, create cross-queue events
                                                  bcs                                                              (1)
                                                                                                 Queue properties  Out-of-order execution, Profiling
                                                                                                     Capabilities  create single-queue events, create cross-queue events
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      52ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       Yes
    Work-group collective functions               Yes
    Sub-group independent forward progress        No
    IL version                                    SPIR-V_1.3 SPIR-V_1.2 SPIR-V_1.1 SPIR-V_1.0
    ILs with version                              SPIR-V                                                           0x403000 (1.3.0)
                                                  SPIR-V                                                           0x402000 (1.2.0)
                                                  SPIR-V                                                           0x401000 (1.1.0)
                                                  SPIR-V                                                           0x400000 (1.0.0)
    SPIR versions                                 1.2
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                (n/a)
  Built-in kernels with version                   (n/a)
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_device_uuid cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_extended_bit_ops cl_khr_suggested_local_work_size cl_intel_split_work_group_barrier cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_linkonce_odr cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_ext_float_atomics cl_khr_external_memory cl_intel_planar_yuv cl_intel_packed_yuv cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_bfloat16_conversions cl_intel_create_buffer_with_properties cl_intel_subgroup_local_block_io cl_intel_subgroup_matrix_multiply_accumulate cl_intel_subgroup_split_matrix_multiply_accumulate cl_khr_integer_dot_product cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info
  Device Extensions with Version                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_fp16                                                      0x400000 (1.0.0)
                                                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_intel_command_queue_families                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups                                               0x400000 (1.0.0)
                                                  cl_intel_required_subgroup_size                                  0x400000 (1.0.0)
                                                  cl_intel_subgroups_short                                         0x400000 (1.0.0)
                                                  cl_khr_spir                                                      0x400000 (1.0.0)
                                                  cl_intel_accelerator                                             0x400000 (1.0.0)
                                                  cl_intel_driver_diagnostics                                      0x400000 (1.0.0)
                                                  cl_khr_priority_hints                                            0x400000 (1.0.0)
                                                  cl_khr_throttle_hints                                            0x400000 (1.0.0)
                                                  cl_khr_create_command_queue                                      0x400000 (1.0.0)
                                                  cl_intel_subgroups_char                                          0x400000 (1.0.0)
                                                  cl_intel_subgroups_long                                          0x400000 (1.0.0)
                                                  cl_khr_il_program                                                0x400000 (1.0.0)
                                                  cl_intel_mem_force_host_memory                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_extended_types                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_vote                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_ballot                                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_arithmetic                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle                                          0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle_relative                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_clustered_reduce                                 0x400000 (1.0.0)
                                                  cl_intel_device_attribute_query                                  0x400000 (1.0.0)
                                                  cl_khr_extended_bit_ops                                          0x400000 (1.0.0)
                                                  cl_khr_suggested_local_work_size                                 0x400000 (1.0.0)
                                                  cl_intel_split_work_group_barrier                                0x400000 (1.0.0)
                                                  cl_intel_spirv_media_block_io                                    0x400000 (1.0.0)
                                                  cl_intel_spirv_subgroups                                         0x400000 (1.0.0)
                                                  cl_khr_spirv_linkonce_odr                                        0x400000 (1.0.0)
                                                  cl_khr_spirv_no_integer_wrap_decoration                          0x400000 (1.0.0)
                                                  cl_intel_unified_shared_memory                                   0x400000 (1.0.0)
                                                  cl_khr_mipmap_image                                              0x400000 (1.0.0)
                                                  cl_khr_mipmap_image_writes                                       0x400000 (1.0.0)
                                                  cl_ext_float_atomics                                             0x400000 (1.0.0)
                                                  cl_khr_external_memory                                             0x9001 (0.9.1)
                                                  cl_intel_planar_yuv                                              0x400000 (1.0.0)
                                                  cl_intel_packed_yuv                                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_image2d_from_buffer                                       0x400000 (1.0.0)
                                                  cl_khr_depth_images                                              0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_intel_media_block_io                                          0x400000 (1.0.0)
                                                  cl_intel_bfloat16_conversions                                    0x400000 (1.0.0)
                                                  cl_intel_create_buffer_with_properties                           0x400000 (1.0.0)
                                                  cl_intel_subgroup_local_block_io                                 0x400000 (1.0.0)
                                                  cl_intel_subgroup_matrix_multiply_accumulate                     0x400000 (1.0.0)
                                                  cl_intel_subgroup_split_matrix_multiply_accumulate               0x400000 (1.0.0)
                                                  cl_khr_integer_dot_product                                       0x800000 (2.0.0)
                                                  cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                  cl_khr_gl_depth_images                                           0x400000 (1.0.0)
                                                  cl_khr_gl_event                                                  0x400000 (1.0.0)
                                                  cl_khr_gl_msaa_sharing                                           0x400000 (1.0.0)
                                                  cl_intel_va_api_media_sharing                                    0x400000 (1.0.0)
                                                  cl_intel_sharing_format_query                                    0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Intel(R) OpenCL Graphics
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [INTEL]
  clCreateContext(NULL, ...) [default]            Success [INTEL]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Intel(R) OpenCL Graphics
    Device Name                                   Intel(R) Arc(TM) A380 Graphics
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Intel(R) OpenCL Graphics
    Device Name                                   Intel(R) Arc(TM) A380 Graphics
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Intel(R) OpenCL Graphics
    Device Name                                   Intel(R) Arc(TM) A380 Graphics

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.3.2
  ICD loader Profile                              OpenCL 3.0

 

You need to go a lower kernel like 6.5.x to get it to work. 

Posted
5 hours ago, rotational467 said:

I just added an A310 to my emby server and have been messing with this initially because of Handbrake refusing to see the qsv engine.  On Ubuntu 24.04, going from the 6.8 kernel to 6.11 (oem kernel) did not help.  Card firmware was flashed to latest on a Windows box first.  24.3.4 iHD drivers, latest firmwares, compute packages etc. from Intel's repo installed.

Neither latest ffmpeg nightly nor handbrake 1.9.0 (both built from source) can see the qsv engine for encoding.  ffmepg via VAAPI does work, and that looks like what jellyfin is doing based on the screenshot above.  It works on Emby as well.

@DAVe3283 You need to set Tone Mapping Method to "Disabled" under Intel Quick Sync.  The VAAPI setting is correct.

 

hwtonemapping_arc.jpg

edit - I don't know what emby secret sauce is able to access qsv on the arc for encoding, I can't get anything else to do it.

At least the difference between QSV and VAAPI should be almost nonexistence.
So as long as VAAPI works i wouldn't worry one bit.

Posted
13 hours ago, rotational467 said:

I just added an A310 to my emby server and have been messing with this initially because of Handbrake refusing to see the qsv engine.  On Ubuntu 24.04, going from the 6.8 kernel to 6.11 (oem kernel) did not help.  Card firmware was flashed to latest on a Windows box first.  24.3.4 iHD drivers, latest firmwares, compute packages etc. from Intel's repo installed.

Neither latest ffmpeg nightly nor handbrake 1.9.0 (both built from source) can see the qsv engine for encoding.  ffmepg via VAAPI does work, and that looks like what jellyfin is doing based on the screenshot above.  It works on Emby as well.

@DAVe3283 You need to set Tone Mapping Method to "Disabled" under Intel Quick Sync.  The VAAPI setting is correct.

 

hwtonemapping_arc.jpg

edit - I don't know what emby secret sauce is able to access qsv on the arc for encoding, I can't get anything else to do it.

How do you get this detailed transcoding overview? Id like to see this too, to debug my setup better.

Posted
14 minutes ago, Twistator said:

How do you get this detailed transcoding overview? Id like to see this too, to debug my setup better.

Hi, that's in the diagnostics plugin.

Posted
13 hours ago, Luke said:

Hi, that's in the diagnostics plugin.

@Lukesorry to be need spoon feeding here, where is this enabled. I have the diagnostic plugin installed but I dont see this option. 

Posted
1 hour ago, GWTPqZp6b said:

@Lukesorry to be need spoon feeding here, where is this enabled. I have the diagnostic plugin installed but I dont see this option. 

it's the detailed view for User Sessions in the Advanced section

  • Thanks 1
  • 1 month later...
Posted (edited)

Hello, was there any development on this subject? I'm having the same issue as described in the original post, tested with the currect stable version and the .35 beta release. 

 

HW transcoding works fine, for both VAAPI and QuickSync, but without Tone Mapping. That only works with software method, neither OpenCL nor VAAPI work with hardware.

Running emby in Proxmox lxc, with 6.8 kernel

Edited by pear235
Posted (edited)

It's kernel 6.8 and newer that are the issue.  For now at least, install kernel 6.5 and it'll work with HW tone mapping.  

Edited by jhoff80
Posted
1 hour ago, jhoff80 said:

It's kernel 6.8 and newer that are the issue.  For now at least, install kernel 6.5 and it'll work with HW tone mapping.  

Kernels newer than 6.8 should work afaik.

Posted
26 minutes ago, yocker said:

Kernels newer than 6.8 should work afaik.

They don't.  At least not in my experience in the exact same situation as the OP.  (A310 on Proxmox using kernel 6.11.)  The only thing that worked was 6.5.

  • Like 1
Posted
1 hour ago, jhoff80 said:

They don't.  At least not in my experience in the exact same situation as the OP.  (A310 on Proxmox using kernel 6.11.)  The only thing that worked was 6.5.

Okay will mark that down, i just went from info i found from googling.
Nice to have it tested and confirmed.

Might be Linux distro dependent though, i know on Unraid that the problem can be fixed with an extra command but it doesn't seem to work in Ubuntu.

Posted (edited)

I was actually able to get it to work on A310 / Kernel 6.8 / Lxc inside Proxmox, by adding these parametres into embyservice service file (location can be taken from "systemctl status embyservice", i'm not at home now so can't check it and don't remember it precisely.)

 

Environment="NEOReadDebugKeys=1"

Environment="OverrideGpuAddressSpace=48"

 

 

Added those two under [Service] in the service file, then "systemctl daemon-reload", then reboot for good measure.  Now i have all quicksync options enabled in the transcoding tab, for both decoding and encoding, as well as OpenCL tone mapping for both QSV and VAAPI in the tone mapping tab. 

Edited by pear235
  • Like 1
  • Agree 1
  • Thanks 2
Posted (edited)
3 hours ago, pear235 said:

I was actually able to get it to work on A310 / Kernel 6.8 / Lxc inside Proxmox, by adding these parametres into embyservice service file (location can be taken from "systemctl status embyservice", i'm not at home now so can't check it and don't remember it precisely.)

 

Environment="NEOReadDebugKeys=1"

Environment="OverrideGpuAddressSpace=48"

 

 

Added those two under [Service] in the service file, then "systemctl daemon-reload", then reboot for good measure.  Now i have all quicksync options enabled in the transcoding tab, for both decoding and encoding, as well as OpenCL tone mapping for both QSV and VAAPI in the tone mapping tab. 

Thank you!! I added these 2 ENV variables as well and now Tone Mapping via Intel OpenCL seems to work!

Just as a friendly note, you shouldn't edit the supplied emby-server.service file, because that can get overwritten on updates. What you wanna do is either create a copy with the edits here:

/etc/systemd/system/emby-server.service

OR (what I did) create a drop in dir for the emby service:

mkdir /etc/systemd/system/emby-server.service.d
nano /etc/systemd/system/emby-server.service.d/tonemap.conf

 

[Service]
Environment="NEOReadDebugKeys=1"
Environment="OverrideGpuAddressSpace=48"

After that:

systemctl daemon-reload
systemctl restart emby-server

If you do systemctl status emby-server after that it will list the tonmap.conf file as a dropin it used to ammend your service definition.

 

Edited by Twistator
  • Like 1
  • Thanks 2
Posted

You can also do

Quote

sudo systemctl edit emby-server.service

And it will create /etc/systemd/system/emby-server.service.d/override.conf for you. Just put in your extra lines inbetween as it indicates.

  • Like 2
Posted

This fixed it for me too, both on kernel 6.8 and kernel 6.11.

  • Thanks 1
Posted (edited)

Hi Team, getting a very similar issue when running Emby in Proxmox on a NUC8i7HNK (Hades Canyon) with an Intel CPU and an AMD GPU (AMD Radeon RX Vega M GL Graphics)

Same issue with the emby-ffmpeg not able to load openCL but the normal ffmpeg being able to.

I've tried the fix proposed here (setting NEOReadDebugKeys / OverrideGpuAddressSpace) but it doesn't do anything.

Since I don't understand where these values come from. Would I need to adjust them for AMD?

Edit: I've also downgraded my proxmox kernel to 6.5.13-5-pve and the issue is still there.
Thanks

Edited by gcorgnet
Posted
On 11/22/2024 at 9:09 PM, yocker said:

Intel Arc cards drivers are bugged with Windows and Unbuntu and won't work with Tone Mapping.

Plex and Jellyfin made a workaround somehow.
On Linux (none Unbuntu) you can avoid the bug by not using kernel 6.8.
On Windows you can fix it by using an old driver (forgot version).

I sadly don't think there is much you can do unless Intel fixes their mess or the Emby team makes the same workaround like Plex and Jellyfin.

I'm using an Arc card my self and i'm dreading the day Unraid updates their kernel. Who knows if the card will still work.

Thank you for that information! Did you ever figure out what driver I should "downgrade" to on windows (using mini PC with Intel 125H)? For some reason VAAPi Tone mapping is not available in Emby settings, not sure if there is something wrong with my setup or if it's not supported with my device for some reason. I've been tone mapping my media with handbrake because I haven't been able to with Emby (same issue as OP).

Posted
2 hours ago, Grimx00 said:

Thank you for that information! Did you ever figure out what driver I should "downgrade" to on windows (using mini PC with Intel 125H)? For some reason VAAPi Tone mapping is not available in Emby settings, not sure if there is something wrong with my setup or if it's not supported with my device for some reason. I've been tone mapping my media with handbrake because I haven't been able to with Emby (same issue as OP).

I can't remember what version it is, sorry.
Try and look through the post history here on Emby and you might find it.

To be 100% sure, you mean VAAPI isn't available as in thats the problem or Tone Mapping isn't working with QSV and you don't have VAAPI to fall back to?

That said.
When i tested with my MiniPC with a 13700H in it i had no problems with Transcoding with Emby on Windows.
Install various Intel drivers like fx. chipset drivers, install Intel GFX drivers, Install Emby, setup Emby to use QSV plus tone mapping and off it went.
 

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...