DAVe3283 7 Posted November 22, 2024 Posted November 22, 2024 (edited) tl;dr hardware accelerated tone mapping does not work with Emby on an Intel ARC A380. Regular hardware acceleration (tonemapping disabled) works. This problem appears unique to the Emby build of ffmpeg, as it works on the native Ubuntu 24.10 and Jellyfin builds of ffmpeg. First, I am not sure if this is a supported setup, so I will understand if you can't help due to me running Emby in a VM under Proxmox. However, I believe it should work (and does work with Jellyfin), so please bear with me for a bit here. Setup: Bare metal: Supermicro SYS-6028R-TR (2x Intel Xeon E5-2690 v3) GPU: ASRock A380 LP 6G (updated to latest firmware with Windows driver 32.0.101.6297), also tried with a ASRock Challenger A380 but see the exact same behavior OS: Proxmox 8.2.7 (6.8.12-1-pve), no i915 driver/modules installed (GPU passed through to VM, not docker/LXC) VM: Ubuntu 24.10 (6.11.0-9-generic), Ubuntu 24.04.1 (6.8.0-49-generic), and Debian 12 (6.11.5-1~bpo12+1) all behave the same Emby: 4.8.10.0 installed from .deb The GuC and HuC firmware loaded successfully during VM startup: Spoiler root@arcubuntu2410:~# dmesg | grep "firmware i915" [ 6.907034] i915 0000:01:00.0: [drm] Finished loading DMC firmware i915/dg2_dmc_ver2_08.bin (v2.8) [ 7.086485] i915 0000:01:00.0: [drm] GT0: GuC firmware i915/dg2_guc_70.bin version 70.29.2 [ 7.086490] i915 0000:01:00.0: [drm] GT0: HuC firmware i915/dg2_huc_gsc.bin version 7.10.16 root@arcubuntu2410:~# cat /sys/kernel/debug/dri/0/gt0/uc/huc_info HuC firmware: i915/dg2_huc_gsc.bin status: RUNNING version: found 7.10.16 uCode: 0 bytes RSA: 0 bytes HuC status: 0x00164001 root@arcubuntu2410:~# cat /sys/kernel/debug/dri/0/gt0/uc/guc_info GuC firmware: i915/dg2_guc_70.bin status: RUNNING version: found 70.29.2 uCode: 368896 bytes RSA: 384 bytes GuC status 0x800300ec: Bootrom status = 0x76 uKernel status = 0x0 MIA Core status = 0x3 Scratch registers: 0: 0x0 1: 0xb03d7 2: 0x42c800 3: 0x4 4: 0x40 5: 0x3a0 6: 0x56a50005 7: 0x0 8: 0x0 9: 0x0 10: 0x0 11: 0x0 12: 0x0 13: 0x0 14: 0x0 15: 0x0 GuC logging stats: Relay full count: 0 DEBUG: flush count 0, overflow count 0 CRASH: flush count 0, overflow count 0 CAPTURE: flush count 0, overflow count 0 CT enabled H2G Space: 2080 Head: 503 Tail: 503 G2H Space: 12284 Head: 97 Tail: 97 GuC Submission API Version: 1.13.4 GuC Number Outstanding Submission G2H: 0 GuC tasklet count: 0 Requests in GuC submit tasklet: Global scheduling policies: DPC promote time = 500000 Max num work items = 15 Flags = 0 The error in the ffmpeg-transcode log file is "Failed to get number of OpenCL platforms: -1001.". I can replicate that by running /opt/emby-server/bin/emby-ffmpeg directly as the emby user: However, running the native ffmpeg package from Ubuntu 24.10 seems to find & initialize the GPU just fine: Jellyfin seems to handle transcode perfectly (including tonemapping) in an Ubuntu VM: Spoiler It seems like the ffmpeg build used by Emby is not initializing the ARC, while the native Ubuntu and Jellyfin builds do. Here is some other system info that might help. Let me know if you need anything else. Spoiler arc@arcubuntu2410:~$ ls -alR /dev/dri/ /dev/dri/: total 0 drwxr-xr-x 3 root root 100 Nov 20 17:27 . drwxr-xr-x 21 root root 4300 Nov 20 17:27 .. drwxr-xr-x 2 root root 80 Nov 20 17:27 by-path crw-rw---- 1 root video 226, 0 Nov 20 17:27 card0 crw-rw---- 1 root render 226, 128 Nov 20 17:27 renderD128 /dev/dri/by-path: total 0 drwxr-xr-x 2 root root 80 Nov 20 17:27 . drwxr-xr-x 3 root root 100 Nov 20 17:27 .. lrwxrwxrwx 1 root root 8 Nov 20 17:27 pci-0000:01:00.0-card -> ../card0 lrwxrwxrwx 1 root root 13 Nov 20 17:27 pci-0000:01:00.0-render -> ../renderD128 emby@arcubuntu2410:~$ clinfo Number of platforms 1 Platform Name Intel(R) OpenCL Graphics Platform Vendor Intel(R) Corporation Platform Version OpenCL 3.0 Platform Profile FULL_PROFILE Platform Extensions cl_khr_byte_addressable_store cl_khr_device_uuid cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_extended_bit_ops cl_khr_suggested_local_work_size cl_intel_split_work_group_barrier cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_linkonce_odr cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_ext_float_atomics cl_khr_external_memory cl_intel_planar_yuv cl_intel_packed_yuv cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_bfloat16_conversions cl_intel_create_buffer_with_properties cl_intel_subgroup_local_block_io cl_intel_subgroup_matrix_multiply_accumulate cl_intel_subgroup_split_matrix_multiply_accumulate cl_khr_integer_dot_product cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info Platform Extensions with Version cl_khr_byte_addressable_store 0x400000 (1.0.0) cl_khr_device_uuid 0x400000 (1.0.0) cl_khr_fp16 0x400000 (1.0.0) cl_khr_global_int32_base_atomics 0x400000 (1.0.0) cl_khr_global_int32_extended_atomics 0x400000 (1.0.0) cl_khr_icd 0x400000 (1.0.0) cl_khr_local_int32_base_atomics 0x400000 (1.0.0) cl_khr_local_int32_extended_atomics 0x400000 (1.0.0) cl_intel_command_queue_families 0x400000 (1.0.0) cl_intel_subgroups 0x400000 (1.0.0) cl_intel_required_subgroup_size 0x400000 (1.0.0) cl_intel_subgroups_short 0x400000 (1.0.0) cl_khr_spir 0x400000 (1.0.0) cl_intel_accelerator 0x400000 (1.0.0) cl_intel_driver_diagnostics 0x400000 (1.0.0) cl_khr_priority_hints 0x400000 (1.0.0) cl_khr_throttle_hints 0x400000 (1.0.0) cl_khr_create_command_queue 0x400000 (1.0.0) cl_intel_subgroups_char 0x400000 (1.0.0) cl_intel_subgroups_long 0x400000 (1.0.0) cl_khr_il_program 0x400000 (1.0.0) cl_intel_mem_force_host_memory 0x400000 (1.0.0) cl_khr_subgroup_extended_types 0x400000 (1.0.0) cl_khr_subgroup_non_uniform_vote 0x400000 (1.0.0) cl_khr_subgroup_ballot 0x400000 (1.0.0) cl_khr_subgroup_non_uniform_arithmetic 0x400000 (1.0.0) cl_khr_subgroup_shuffle 0x400000 (1.0.0) cl_khr_subgroup_shuffle_relative 0x400000 (1.0.0) cl_khr_subgroup_clustered_reduce 0x400000 (1.0.0) cl_intel_device_attribute_query 0x400000 (1.0.0) cl_khr_extended_bit_ops 0x400000 (1.0.0) cl_khr_suggested_local_work_size 0x400000 (1.0.0) cl_intel_split_work_group_barrier 0x400000 (1.0.0) cl_intel_spirv_media_block_io 0x400000 (1.0.0) cl_intel_spirv_subgroups 0x400000 (1.0.0) cl_khr_spirv_linkonce_odr 0x400000 (1.0.0) cl_khr_spirv_no_integer_wrap_decoration 0x400000 (1.0.0) cl_intel_unified_shared_memory 0x400000 (1.0.0) cl_khr_mipmap_image 0x400000 (1.0.0) cl_khr_mipmap_image_writes 0x400000 (1.0.0) cl_ext_float_atomics 0x400000 (1.0.0) cl_khr_external_memory 0x9001 (0.9.1) cl_intel_planar_yuv 0x400000 (1.0.0) cl_intel_packed_yuv 0x400000 (1.0.0) cl_khr_int64_base_atomics 0x400000 (1.0.0) cl_khr_int64_extended_atomics 0x400000 (1.0.0) cl_khr_image2d_from_buffer 0x400000 (1.0.0) cl_khr_depth_images 0x400000 (1.0.0) cl_khr_3d_image_writes 0x400000 (1.0.0) cl_intel_media_block_io 0x400000 (1.0.0) cl_intel_bfloat16_conversions 0x400000 (1.0.0) cl_intel_create_buffer_with_properties 0x400000 (1.0.0) cl_intel_subgroup_local_block_io 0x400000 (1.0.0) cl_intel_subgroup_matrix_multiply_accumulate 0x400000 (1.0.0) cl_intel_subgroup_split_matrix_multiply_accumulate 0x400000 (1.0.0) cl_khr_integer_dot_product 0x800000 (2.0.0) cl_khr_gl_sharing 0x400000 (1.0.0) cl_khr_gl_depth_images 0x400000 (1.0.0) cl_khr_gl_event 0x400000 (1.0.0) cl_khr_gl_msaa_sharing 0x400000 (1.0.0) cl_intel_va_api_media_sharing 0x400000 (1.0.0) cl_intel_sharing_format_query 0x400000 (1.0.0) cl_khr_pci_bus_info 0x400000 (1.0.0) Platform Numeric Version 0xc00000 (3.0.0) Platform Extensions function suffix INTEL Platform Host timer resolution 1ns Platform External memory handle types DMA buffer Platform Name Intel(R) OpenCL Graphics Number of devices 1 Device Name Intel(R) Arc(TM) A380 Graphics Device Vendor Intel(R) Corporation Device Vendor ID 0x8086 Device Version OpenCL 3.0 NEO Device UUID 8680a556-0500-0000-0100-000000000000 Driver UUID 32342e33-352e-3033-3038-373200000000 Valid Device LUID No Device LUID f095-0036fc7f0000 Device Node Mask 0 Device Numeric Version 0xc00000 (3.0.0) Driver Version 24.35.030872 Device OpenCL C Version OpenCL C 1.2 Device OpenCL C all versions OpenCL C 0x400000 (1.0.0) OpenCL C 0x401000 (1.1.0) OpenCL C 0x402000 (1.2.0) OpenCL C 0xc00000 (3.0.0) Device OpenCL C features __opencl_c_int64 0xc00000 (3.0.0) __opencl_c_3d_image_writes 0xc00000 (3.0.0) __opencl_c_images 0xc00000 (3.0.0) __opencl_c_read_write_images 0xc00000 (3.0.0) __opencl_c_atomic_order_acq_rel 0xc00000 (3.0.0) __opencl_c_atomic_order_seq_cst 0xc00000 (3.0.0) __opencl_c_atomic_scope_all_devices 0xc00000 (3.0.0) __opencl_c_atomic_scope_device 0xc00000 (3.0.0) __opencl_c_generic_address_space 0xc00000 (3.0.0) __opencl_c_program_scope_global_variables 0xc00000 (3.0.0) __opencl_c_work_group_collective_functions 0xc00000 (3.0.0) __opencl_c_subgroups 0xc00000 (3.0.0) __opencl_c_ext_fp32_global_atomic_add 0xc00000 (3.0.0) __opencl_c_ext_fp32_local_atomic_add 0xc00000 (3.0.0) __opencl_c_ext_fp32_global_atomic_min_max 0xc00000 (3.0.0) __opencl_c_ext_fp32_local_atomic_min_max 0xc00000 (3.0.0) __opencl_c_ext_fp16_global_atomic_load_store 0xc00000 (3.0.0) __opencl_c_ext_fp16_local_atomic_load_store 0xc00000 (3.0.0) __opencl_c_ext_fp16_global_atomic_min_max 0xc00000 (3.0.0) __opencl_c_ext_fp16_local_atomic_min_max 0xc00000 (3.0.0) __opencl_c_integer_dot_product_input_4x8bit 0xc00000 (3.0.0) __opencl_c_integer_dot_product_input_4x8bit_packed 0xc00000 (3.0.0) Latest conformance test passed v2024-02-27-00 Device Type GPU Device PCI bus info (KHR) PCI-E, 0000:01:00.0 Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes Linker Available Yes Max compute units 128 Max clock frequency 2450MHz Device IP (Intel) 0x30e0005 (12.224.5) Device ID (Intel) 22181 Slices (Intel) 1 Sub-slices per slice (Intel) 8 EUs per sub-slice (Intel) 16 Threads per EU (Intel) 8 Feature capabilities (Intel) DP4A, DPAS Device Partition (core) Max number of sub-devices 0 Supported partition types None Supported affinity domains (n/a) Max work item dimensions 3 Max work item sizes 1024x1024x1024 Max work group size 1024 Preferred work group size multiple (device) 64 Preferred work group size multiple (kernel) 64 Max sub-groups per work group 128 Sub-group sizes (Intel) 8, 16, 32 Preferred / native vector sizes char 16 / 16 short 8 / 8 int 4 / 4 long 1 / 1 half 8 / 8 (cl_khr_fp16) float 1 / 1 double 0 / 0 (n/a) Half-precision Floating-point support (cl_khr_fp16) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations Yes Double-precision Floating-point support (n/a) Address bits 64, Little-Endian External memory handle types DMA buffer Global memory size 6064541696 (5.648GiB) Error Correction support No Max memory allocation 3032270848 (2.824GiB) Unified memory for Host and Device No Shared Virtual Memory (SVM) capabilities (core) Coarse-grained buffer sharing Yes Fine-grained buffer sharing No Fine-grained system sharing No Atomics No Unified Shared Memory (USM) (cl_intel_unified_shared_memory) Host USM capabilities (Intel) USM access Device USM capabilities (Intel) USM access, USM atomic access Single-Device USM caps (Intel) USM access, USM atomic access Cross-Device USM caps (Intel) USM access, USM atomic access Shared System USM caps (Intel) (n/a) Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Preferred alignment for atomics SVM 64 bytes Global 64 bytes Local 64 bytes Atomic memory capabilities relaxed, acquire/release, sequentially-consistent, work-group scope, device scope, all-devices scope Atomic fence capabilities relaxed, acquire/release, sequentially-consistent, work-item scope, work-group scope, device scope, all-devices scope Max size for global variable 65536 (64KiB) Preferred total size of global vars 3032270848 (2.824GiB) Global Memory cache type Read/Write Global Memory cache size 4194304 (4MiB) Global Memory cache line size 64 bytes Image support Yes Max number of samplers per kernel 16 Max size for 1D images from buffer 189516928 pixels Max 1D or 2D image array size 2048 images Base address alignment for 2D image buffers 4 bytes Pitch alignment for 2D image buffers 4 pixels Max 2D image size 16384x16384 pixels Max planar YUV image size 16384x16128 pixels Max 3D image size 16384x16384x2048 pixels Max number of read image args 128 Max number of write image args 128 Max number of read/write image args 128 Pipe support No Max number of pipe args 0 Max active pipe reservations 0 Max pipe packet size 0 Local memory type Local Local memory size 65536 (64KiB) Max number of constant args 8 Max constant buffer size 3032270848 (2.824GiB) Generic address space support Yes Max size of kernel argument 2048 (2KiB) Queue properties (on host) Out-of-order execution Yes Profiling Yes Device enqueue capabilities (n/a) Queue properties (on device) Out-of-order execution No Profiling No Preferred size 0 Max size 0 Max queues on device 0 Max events on device 0 Device queue families ccs (1) Queue properties Out-of-order execution, Profiling Capabilities create single-queue events, create cross-queue events bcs (1) Queue properties Out-of-order execution, Profiling Capabilities create single-queue events, create cross-queue events Prefer user sync for interop Yes Profiling timer resolution 52ns Execution capabilities Run OpenCL kernels Yes Run native kernels No Non-uniform work-groups Yes Work-group collective functions Yes Sub-group independent forward progress No IL version SPIR-V_1.3 SPIR-V_1.2 SPIR-V_1.1 SPIR-V_1.0 ILs with version SPIR-V 0x403000 (1.3.0) SPIR-V 0x402000 (1.2.0) SPIR-V 0x401000 (1.1.0) SPIR-V 0x400000 (1.0.0) SPIR versions 1.2 printf() buffer size 4194304 (4MiB) Built-in kernels (n/a) Built-in kernels with version (n/a) Device Extensions cl_khr_byte_addressable_store cl_khr_device_uuid cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_extended_bit_ops cl_khr_suggested_local_work_size cl_intel_split_work_group_barrier cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_linkonce_odr cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_ext_float_atomics cl_khr_external_memory cl_intel_planar_yuv cl_intel_packed_yuv cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_bfloat16_conversions cl_intel_create_buffer_with_properties cl_intel_subgroup_local_block_io cl_intel_subgroup_matrix_multiply_accumulate cl_intel_subgroup_split_matrix_multiply_accumulate cl_khr_integer_dot_product cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info Device Extensions with Version cl_khr_byte_addressable_store 0x400000 (1.0.0) cl_khr_device_uuid 0x400000 (1.0.0) cl_khr_fp16 0x400000 (1.0.0) cl_khr_global_int32_base_atomics 0x400000 (1.0.0) cl_khr_global_int32_extended_atomics 0x400000 (1.0.0) cl_khr_icd 0x400000 (1.0.0) cl_khr_local_int32_base_atomics 0x400000 (1.0.0) cl_khr_local_int32_extended_atomics 0x400000 (1.0.0) cl_intel_command_queue_families 0x400000 (1.0.0) cl_intel_subgroups 0x400000 (1.0.0) cl_intel_required_subgroup_size 0x400000 (1.0.0) cl_intel_subgroups_short 0x400000 (1.0.0) cl_khr_spir 0x400000 (1.0.0) cl_intel_accelerator 0x400000 (1.0.0) cl_intel_driver_diagnostics 0x400000 (1.0.0) cl_khr_priority_hints 0x400000 (1.0.0) cl_khr_throttle_hints 0x400000 (1.0.0) cl_khr_create_command_queue 0x400000 (1.0.0) cl_intel_subgroups_char 0x400000 (1.0.0) cl_intel_subgroups_long 0x400000 (1.0.0) cl_khr_il_program 0x400000 (1.0.0) cl_intel_mem_force_host_memory 0x400000 (1.0.0) cl_khr_subgroup_extended_types 0x400000 (1.0.0) cl_khr_subgroup_non_uniform_vote 0x400000 (1.0.0) cl_khr_subgroup_ballot 0x400000 (1.0.0) cl_khr_subgroup_non_uniform_arithmetic 0x400000 (1.0.0) cl_khr_subgroup_shuffle 0x400000 (1.0.0) cl_khr_subgroup_shuffle_relative 0x400000 (1.0.0) cl_khr_subgroup_clustered_reduce 0x400000 (1.0.0) cl_intel_device_attribute_query 0x400000 (1.0.0) cl_khr_extended_bit_ops 0x400000 (1.0.0) cl_khr_suggested_local_work_size 0x400000 (1.0.0) cl_intel_split_work_group_barrier 0x400000 (1.0.0) cl_intel_spirv_media_block_io 0x400000 (1.0.0) cl_intel_spirv_subgroups 0x400000 (1.0.0) cl_khr_spirv_linkonce_odr 0x400000 (1.0.0) cl_khr_spirv_no_integer_wrap_decoration 0x400000 (1.0.0) cl_intel_unified_shared_memory 0x400000 (1.0.0) cl_khr_mipmap_image 0x400000 (1.0.0) cl_khr_mipmap_image_writes 0x400000 (1.0.0) cl_ext_float_atomics 0x400000 (1.0.0) cl_khr_external_memory 0x9001 (0.9.1) cl_intel_planar_yuv 0x400000 (1.0.0) cl_intel_packed_yuv 0x400000 (1.0.0) cl_khr_int64_base_atomics 0x400000 (1.0.0) cl_khr_int64_extended_atomics 0x400000 (1.0.0) cl_khr_image2d_from_buffer 0x400000 (1.0.0) cl_khr_depth_images 0x400000 (1.0.0) cl_khr_3d_image_writes 0x400000 (1.0.0) cl_intel_media_block_io 0x400000 (1.0.0) cl_intel_bfloat16_conversions 0x400000 (1.0.0) cl_intel_create_buffer_with_properties 0x400000 (1.0.0) cl_intel_subgroup_local_block_io 0x400000 (1.0.0) cl_intel_subgroup_matrix_multiply_accumulate 0x400000 (1.0.0) cl_intel_subgroup_split_matrix_multiply_accumulate 0x400000 (1.0.0) cl_khr_integer_dot_product 0x800000 (2.0.0) cl_khr_gl_sharing 0x400000 (1.0.0) cl_khr_gl_depth_images 0x400000 (1.0.0) cl_khr_gl_event 0x400000 (1.0.0) cl_khr_gl_msaa_sharing 0x400000 (1.0.0) cl_intel_va_api_media_sharing 0x400000 (1.0.0) cl_intel_sharing_format_query 0x400000 (1.0.0) cl_khr_pci_bus_info 0x400000 (1.0.0) NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Intel(R) OpenCL Graphics clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [INTEL] clCreateContext(NULL, ...) [default] Success [INTEL] clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1) Platform Name Intel(R) OpenCL Graphics Device Name Intel(R) Arc(TM) A380 Graphics clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name Intel(R) OpenCL Graphics Device Name Intel(R) Arc(TM) A380 Graphics clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name Intel(R) OpenCL Graphics Device Name Intel(R) Arc(TM) A380 Graphics ICD loader properties ICD loader Name OpenCL ICD Loader ICD loader Vendor OCL Icd free software ICD loader Version 2.3.2 ICD loader Profile OpenCL 3.0 Edited November 22, 2024 by DAVe3283 Fix screenshots that didn't upload right
Luke 42077 Posted November 22, 2024 Posted November 22, 2024 Hi there, please attach the emby server, ffmpeg and hardware detection log files. Thanks !
DAVe3283 7 Posted November 22, 2024 Author Posted November 22, 2024 Oh dangit, I had those on there and somehow messed up all the screenshots and attachments. Sorry about that, here you go: hardware_detection-63867819052.txtffmpeg-transcode-ca80ba25-a615-4afe-b1ac-da490c71220d_1.txtembyserver.txt
Neminem 1518 Posted November 22, 2024 Posted November 22, 2024 (edited) Looks like tone mapping issues, try with vaapi tonemapping and without. It look like you also disabled software tone mapping. See how that works out. Edited November 22, 2024 by Neminem
DAVe3283 7 Posted November 23, 2024 Author Posted November 23, 2024 No dice, still kicks to CPU transcode if hardware tone mapping is enabled. Hopefully I am setting things right. Transcoding settings for VAAPI: Spoiler Then basically switch VAAPI off and QuickSync on for the other test. Tone Mapping settings are the same for both: Spoiler VAAPI ffmpeg-transcode-aaaa5ec3-fdd5-482d-b5c7-abacc52ff602_1.txtQuickSync ffmpeg-transcode-ad9b0a21-6fc2-4642-874d-c23a06ca601e_1.txtembyserver.txt
yocker 1247 Posted November 23, 2024 Posted November 23, 2024 Intel Arc cards drivers are bugged with Windows and Unbuntu and won't work with Tone Mapping. Plex and Jellyfin made a workaround somehow. On Linux (none Unbuntu) you can avoid the bug by not using kernel 6.8. On Windows you can fix it by using an old driver (forgot version). I sadly don't think there is much you can do unless Intel fixes their mess or the Emby team makes the same workaround like Plex and Jellyfin. I'm using an Arc card my self and i'm dreading the day Unraid updates their kernel. Who knows if the card will still work. 1
rotational467 43 Posted December 9, 2024 Posted December 9, 2024 (edited) I just added an A310 to my emby server and have been messing with this initially because of Handbrake refusing to see the qsv engine. On Ubuntu 24.04, going from the 6.8 kernel to 6.11 (oem kernel) did not help. Card firmware was flashed to latest on a Windows box first. 24.3.4 iHD drivers, latest firmwares, compute packages etc. from Intel's repo installed. Neither latest ffmpeg nightly nor handbrake 1.9.0 (both built from source) can see the qsv engine for encoding. ffmepg via VAAPI does work, and that looks like what jellyfin is doing based on the screenshot above. It works on Emby as well. @DAVe3283 You need to set Tone Mapping Method to "Disabled" under Intel Quick Sync. The VAAPI setting is correct. edit - I don't know what emby secret sauce is able to access qsv on the arc for encoding, I can't get anything else to do it. Edited December 9, 2024 by rotational467
guunter 49 Posted December 9, 2024 Posted December 9, 2024 On 11/21/2024 at 7:49 PM, DAVe3283 said: tl;dr hardware accelerated tone mapping does not work with Emby on an Intel ARC A380. Regular hardware acceleration (tonemapping disabled) works. This problem appears unique to the Emby build of ffmpeg, as it works on the native Ubuntu 24.10 and Jellyfin builds of ffmpeg. First, I am not sure if this is a supported setup, so I will understand if you can't help due to me running Emby in a VM under Proxmox. However, I believe it should work (and does work with Jellyfin), so please bear with me for a bit here. Setup: Bare metal: Supermicro SYS-6028R-TR (2x Intel Xeon E5-2690 v3) GPU: ASRock A380 LP 6G (updated to latest firmware with Windows driver 32.0.101.6297), also tried with a ASRock Challenger A380 but see the exact same behavior OS: Proxmox 8.2.7 (6.8.12-1-pve), no i915 driver/modules installed (GPU passed through to VM, not docker/LXC) VM: Ubuntu 24.10 (6.11.0-9-generic), Ubuntu 24.04.1 (6.8.0-49-generic), and Debian 12 (6.11.5-1~bpo12+1) all behave the same Emby: 4.8.10.0 installed from .deb The GuC and HuC firmware loaded successfully during VM startup: Reveal hidden contents root@arcubuntu2410:~# dmesg | grep "firmware i915" [ 6.907034] i915 0000:01:00.0: [drm] Finished loading DMC firmware i915/dg2_dmc_ver2_08.bin (v2.8) [ 7.086485] i915 0000:01:00.0: [drm] GT0: GuC firmware i915/dg2_guc_70.bin version 70.29.2 [ 7.086490] i915 0000:01:00.0: [drm] GT0: HuC firmware i915/dg2_huc_gsc.bin version 7.10.16 root@arcubuntu2410:~# cat /sys/kernel/debug/dri/0/gt0/uc/huc_info HuC firmware: i915/dg2_huc_gsc.bin status: RUNNING version: found 7.10.16 uCode: 0 bytes RSA: 0 bytes HuC status: 0x00164001 root@arcubuntu2410:~# cat /sys/kernel/debug/dri/0/gt0/uc/guc_info GuC firmware: i915/dg2_guc_70.bin status: RUNNING version: found 70.29.2 uCode: 368896 bytes RSA: 384 bytes GuC status 0x800300ec: Bootrom status = 0x76 uKernel status = 0x0 MIA Core status = 0x3 Scratch registers: 0: 0x0 1: 0xb03d7 2: 0x42c800 3: 0x4 4: 0x40 5: 0x3a0 6: 0x56a50005 7: 0x0 8: 0x0 9: 0x0 10: 0x0 11: 0x0 12: 0x0 13: 0x0 14: 0x0 15: 0x0 GuC logging stats: Relay full count: 0 DEBUG: flush count 0, overflow count 0 CRASH: flush count 0, overflow count 0 CAPTURE: flush count 0, overflow count 0 CT enabled H2G Space: 2080 Head: 503 Tail: 503 G2H Space: 12284 Head: 97 Tail: 97 GuC Submission API Version: 1.13.4 GuC Number Outstanding Submission G2H: 0 GuC tasklet count: 0 Requests in GuC submit tasklet: Global scheduling policies: DPC promote time = 500000 Max num work items = 15 Flags = 0 The error in the ffmpeg-transcode log file is "Failed to get number of OpenCL platforms: -1001.". I can replicate that by running /opt/emby-server/bin/emby-ffmpeg directly as the emby user: However, running the native ffmpeg package from Ubuntu 24.10 seems to find & initialize the GPU just fine: Jellyfin seems to handle transcode perfectly (including tonemapping) in an Ubuntu VM: Reveal hidden contents It seems like the ffmpeg build used by Emby is not initializing the ARC, while the native Ubuntu and Jellyfin builds do. Here is some other system info that might help. Let me know if you need anything else. Reveal hidden contents arc@arcubuntu2410:~$ ls -alR /dev/dri/ /dev/dri/: total 0 drwxr-xr-x 3 root root 100 Nov 20 17:27 . drwxr-xr-x 21 root root 4300 Nov 20 17:27 .. drwxr-xr-x 2 root root 80 Nov 20 17:27 by-path crw-rw---- 1 root video 226, 0 Nov 20 17:27 card0 crw-rw---- 1 root render 226, 128 Nov 20 17:27 renderD128 /dev/dri/by-path: total 0 drwxr-xr-x 2 root root 80 Nov 20 17:27 . drwxr-xr-x 3 root root 100 Nov 20 17:27 .. lrwxrwxrwx 1 root root 8 Nov 20 17:27 pci-0000:01:00.0-card -> ../card0 lrwxrwxrwx 1 root root 13 Nov 20 17:27 pci-0000:01:00.0-render -> ../renderD128 emby@arcubuntu2410:~$ clinfo Number of platforms 1 Platform Name Intel(R) OpenCL Graphics Platform Vendor Intel(R) Corporation Platform Version OpenCL 3.0 Platform Profile FULL_PROFILE Platform Extensions cl_khr_byte_addressable_store cl_khr_device_uuid cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_extended_bit_ops cl_khr_suggested_local_work_size cl_intel_split_work_group_barrier cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_linkonce_odr cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_ext_float_atomics cl_khr_external_memory cl_intel_planar_yuv cl_intel_packed_yuv cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_bfloat16_conversions cl_intel_create_buffer_with_properties cl_intel_subgroup_local_block_io cl_intel_subgroup_matrix_multiply_accumulate cl_intel_subgroup_split_matrix_multiply_accumulate cl_khr_integer_dot_product cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info Platform Extensions with Version cl_khr_byte_addressable_store 0x400000 (1.0.0) cl_khr_device_uuid 0x400000 (1.0.0) cl_khr_fp16 0x400000 (1.0.0) cl_khr_global_int32_base_atomics 0x400000 (1.0.0) cl_khr_global_int32_extended_atomics 0x400000 (1.0.0) cl_khr_icd 0x400000 (1.0.0) cl_khr_local_int32_base_atomics 0x400000 (1.0.0) cl_khr_local_int32_extended_atomics 0x400000 (1.0.0) cl_intel_command_queue_families 0x400000 (1.0.0) cl_intel_subgroups 0x400000 (1.0.0) cl_intel_required_subgroup_size 0x400000 (1.0.0) cl_intel_subgroups_short 0x400000 (1.0.0) cl_khr_spir 0x400000 (1.0.0) cl_intel_accelerator 0x400000 (1.0.0) cl_intel_driver_diagnostics 0x400000 (1.0.0) cl_khr_priority_hints 0x400000 (1.0.0) cl_khr_throttle_hints 0x400000 (1.0.0) cl_khr_create_command_queue 0x400000 (1.0.0) cl_intel_subgroups_char 0x400000 (1.0.0) cl_intel_subgroups_long 0x400000 (1.0.0) cl_khr_il_program 0x400000 (1.0.0) cl_intel_mem_force_host_memory 0x400000 (1.0.0) cl_khr_subgroup_extended_types 0x400000 (1.0.0) cl_khr_subgroup_non_uniform_vote 0x400000 (1.0.0) cl_khr_subgroup_ballot 0x400000 (1.0.0) cl_khr_subgroup_non_uniform_arithmetic 0x400000 (1.0.0) cl_khr_subgroup_shuffle 0x400000 (1.0.0) cl_khr_subgroup_shuffle_relative 0x400000 (1.0.0) cl_khr_subgroup_clustered_reduce 0x400000 (1.0.0) cl_intel_device_attribute_query 0x400000 (1.0.0) cl_khr_extended_bit_ops 0x400000 (1.0.0) cl_khr_suggested_local_work_size 0x400000 (1.0.0) cl_intel_split_work_group_barrier 0x400000 (1.0.0) cl_intel_spirv_media_block_io 0x400000 (1.0.0) cl_intel_spirv_subgroups 0x400000 (1.0.0) cl_khr_spirv_linkonce_odr 0x400000 (1.0.0) cl_khr_spirv_no_integer_wrap_decoration 0x400000 (1.0.0) cl_intel_unified_shared_memory 0x400000 (1.0.0) cl_khr_mipmap_image 0x400000 (1.0.0) cl_khr_mipmap_image_writes 0x400000 (1.0.0) cl_ext_float_atomics 0x400000 (1.0.0) cl_khr_external_memory 0x9001 (0.9.1) cl_intel_planar_yuv 0x400000 (1.0.0) cl_intel_packed_yuv 0x400000 (1.0.0) cl_khr_int64_base_atomics 0x400000 (1.0.0) cl_khr_int64_extended_atomics 0x400000 (1.0.0) cl_khr_image2d_from_buffer 0x400000 (1.0.0) cl_khr_depth_images 0x400000 (1.0.0) cl_khr_3d_image_writes 0x400000 (1.0.0) cl_intel_media_block_io 0x400000 (1.0.0) cl_intel_bfloat16_conversions 0x400000 (1.0.0) cl_intel_create_buffer_with_properties 0x400000 (1.0.0) cl_intel_subgroup_local_block_io 0x400000 (1.0.0) cl_intel_subgroup_matrix_multiply_accumulate 0x400000 (1.0.0) cl_intel_subgroup_split_matrix_multiply_accumulate 0x400000 (1.0.0) cl_khr_integer_dot_product 0x800000 (2.0.0) cl_khr_gl_sharing 0x400000 (1.0.0) cl_khr_gl_depth_images 0x400000 (1.0.0) cl_khr_gl_event 0x400000 (1.0.0) cl_khr_gl_msaa_sharing 0x400000 (1.0.0) cl_intel_va_api_media_sharing 0x400000 (1.0.0) cl_intel_sharing_format_query 0x400000 (1.0.0) cl_khr_pci_bus_info 0x400000 (1.0.0) Platform Numeric Version 0xc00000 (3.0.0) Platform Extensions function suffix INTEL Platform Host timer resolution 1ns Platform External memory handle types DMA buffer Platform Name Intel(R) OpenCL Graphics Number of devices 1 Device Name Intel(R) Arc(TM) A380 Graphics Device Vendor Intel(R) Corporation Device Vendor ID 0x8086 Device Version OpenCL 3.0 NEO Device UUID 8680a556-0500-0000-0100-000000000000 Driver UUID 32342e33-352e-3033-3038-373200000000 Valid Device LUID No Device LUID f095-0036fc7f0000 Device Node Mask 0 Device Numeric Version 0xc00000 (3.0.0) Driver Version 24.35.030872 Device OpenCL C Version OpenCL C 1.2 Device OpenCL C all versions OpenCL C 0x400000 (1.0.0) OpenCL C 0x401000 (1.1.0) OpenCL C 0x402000 (1.2.0) OpenCL C 0xc00000 (3.0.0) Device OpenCL C features __opencl_c_int64 0xc00000 (3.0.0) __opencl_c_3d_image_writes 0xc00000 (3.0.0) __opencl_c_images 0xc00000 (3.0.0) __opencl_c_read_write_images 0xc00000 (3.0.0) __opencl_c_atomic_order_acq_rel 0xc00000 (3.0.0) __opencl_c_atomic_order_seq_cst 0xc00000 (3.0.0) __opencl_c_atomic_scope_all_devices 0xc00000 (3.0.0) __opencl_c_atomic_scope_device 0xc00000 (3.0.0) __opencl_c_generic_address_space 0xc00000 (3.0.0) __opencl_c_program_scope_global_variables 0xc00000 (3.0.0) __opencl_c_work_group_collective_functions 0xc00000 (3.0.0) __opencl_c_subgroups 0xc00000 (3.0.0) __opencl_c_ext_fp32_global_atomic_add 0xc00000 (3.0.0) __opencl_c_ext_fp32_local_atomic_add 0xc00000 (3.0.0) __opencl_c_ext_fp32_global_atomic_min_max 0xc00000 (3.0.0) __opencl_c_ext_fp32_local_atomic_min_max 0xc00000 (3.0.0) __opencl_c_ext_fp16_global_atomic_load_store 0xc00000 (3.0.0) __opencl_c_ext_fp16_local_atomic_load_store 0xc00000 (3.0.0) __opencl_c_ext_fp16_global_atomic_min_max 0xc00000 (3.0.0) __opencl_c_ext_fp16_local_atomic_min_max 0xc00000 (3.0.0) __opencl_c_integer_dot_product_input_4x8bit 0xc00000 (3.0.0) __opencl_c_integer_dot_product_input_4x8bit_packed 0xc00000 (3.0.0) Latest conformance test passed v2024-02-27-00 Device Type GPU Device PCI bus info (KHR) PCI-E, 0000:01:00.0 Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes Linker Available Yes Max compute units 128 Max clock frequency 2450MHz Device IP (Intel) 0x30e0005 (12.224.5) Device ID (Intel) 22181 Slices (Intel) 1 Sub-slices per slice (Intel) 8 EUs per sub-slice (Intel) 16 Threads per EU (Intel) 8 Feature capabilities (Intel) DP4A, DPAS Device Partition (core) Max number of sub-devices 0 Supported partition types None Supported affinity domains (n/a) Max work item dimensions 3 Max work item sizes 1024x1024x1024 Max work group size 1024 Preferred work group size multiple (device) 64 Preferred work group size multiple (kernel) 64 Max sub-groups per work group 128 Sub-group sizes (Intel) 8, 16, 32 Preferred / native vector sizes char 16 / 16 short 8 / 8 int 4 / 4 long 1 / 1 half 8 / 8 (cl_khr_fp16) float 1 / 1 double 0 / 0 (n/a) Half-precision Floating-point support (cl_khr_fp16) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations Yes Double-precision Floating-point support (n/a) Address bits 64, Little-Endian External memory handle types DMA buffer Global memory size 6064541696 (5.648GiB) Error Correction support No Max memory allocation 3032270848 (2.824GiB) Unified memory for Host and Device No Shared Virtual Memory (SVM) capabilities (core) Coarse-grained buffer sharing Yes Fine-grained buffer sharing No Fine-grained system sharing No Atomics No Unified Shared Memory (USM) (cl_intel_unified_shared_memory) Host USM capabilities (Intel) USM access Device USM capabilities (Intel) USM access, USM atomic access Single-Device USM caps (Intel) USM access, USM atomic access Cross-Device USM caps (Intel) USM access, USM atomic access Shared System USM caps (Intel) (n/a) Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Preferred alignment for atomics SVM 64 bytes Global 64 bytes Local 64 bytes Atomic memory capabilities relaxed, acquire/release, sequentially-consistent, work-group scope, device scope, all-devices scope Atomic fence capabilities relaxed, acquire/release, sequentially-consistent, work-item scope, work-group scope, device scope, all-devices scope Max size for global variable 65536 (64KiB) Preferred total size of global vars 3032270848 (2.824GiB) Global Memory cache type Read/Write Global Memory cache size 4194304 (4MiB) Global Memory cache line size 64 bytes Image support Yes Max number of samplers per kernel 16 Max size for 1D images from buffer 189516928 pixels Max 1D or 2D image array size 2048 images Base address alignment for 2D image buffers 4 bytes Pitch alignment for 2D image buffers 4 pixels Max 2D image size 16384x16384 pixels Max planar YUV image size 16384x16128 pixels Max 3D image size 16384x16384x2048 pixels Max number of read image args 128 Max number of write image args 128 Max number of read/write image args 128 Pipe support No Max number of pipe args 0 Max active pipe reservations 0 Max pipe packet size 0 Local memory type Local Local memory size 65536 (64KiB) Max number of constant args 8 Max constant buffer size 3032270848 (2.824GiB) Generic address space support Yes Max size of kernel argument 2048 (2KiB) Queue properties (on host) Out-of-order execution Yes Profiling Yes Device enqueue capabilities (n/a) Queue properties (on device) Out-of-order execution No Profiling No Preferred size 0 Max size 0 Max queues on device 0 Max events on device 0 Device queue families ccs (1) Queue properties Out-of-order execution, Profiling Capabilities create single-queue events, create cross-queue events bcs (1) Queue properties Out-of-order execution, Profiling Capabilities create single-queue events, create cross-queue events Prefer user sync for interop Yes Profiling timer resolution 52ns Execution capabilities Run OpenCL kernels Yes Run native kernels No Non-uniform work-groups Yes Work-group collective functions Yes Sub-group independent forward progress No IL version SPIR-V_1.3 SPIR-V_1.2 SPIR-V_1.1 SPIR-V_1.0 ILs with version SPIR-V 0x403000 (1.3.0) SPIR-V 0x402000 (1.2.0) SPIR-V 0x401000 (1.1.0) SPIR-V 0x400000 (1.0.0) SPIR versions 1.2 printf() buffer size 4194304 (4MiB) Built-in kernels (n/a) Built-in kernels with version (n/a) Device Extensions cl_khr_byte_addressable_store cl_khr_device_uuid cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_extended_bit_ops cl_khr_suggested_local_work_size cl_intel_split_work_group_barrier cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_linkonce_odr cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_ext_float_atomics cl_khr_external_memory cl_intel_planar_yuv cl_intel_packed_yuv cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_bfloat16_conversions cl_intel_create_buffer_with_properties cl_intel_subgroup_local_block_io cl_intel_subgroup_matrix_multiply_accumulate cl_intel_subgroup_split_matrix_multiply_accumulate cl_khr_integer_dot_product cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info Device Extensions with Version cl_khr_byte_addressable_store 0x400000 (1.0.0) cl_khr_device_uuid 0x400000 (1.0.0) cl_khr_fp16 0x400000 (1.0.0) cl_khr_global_int32_base_atomics 0x400000 (1.0.0) cl_khr_global_int32_extended_atomics 0x400000 (1.0.0) cl_khr_icd 0x400000 (1.0.0) cl_khr_local_int32_base_atomics 0x400000 (1.0.0) cl_khr_local_int32_extended_atomics 0x400000 (1.0.0) cl_intel_command_queue_families 0x400000 (1.0.0) cl_intel_subgroups 0x400000 (1.0.0) cl_intel_required_subgroup_size 0x400000 (1.0.0) cl_intel_subgroups_short 0x400000 (1.0.0) cl_khr_spir 0x400000 (1.0.0) cl_intel_accelerator 0x400000 (1.0.0) cl_intel_driver_diagnostics 0x400000 (1.0.0) cl_khr_priority_hints 0x400000 (1.0.0) cl_khr_throttle_hints 0x400000 (1.0.0) cl_khr_create_command_queue 0x400000 (1.0.0) cl_intel_subgroups_char 0x400000 (1.0.0) cl_intel_subgroups_long 0x400000 (1.0.0) cl_khr_il_program 0x400000 (1.0.0) cl_intel_mem_force_host_memory 0x400000 (1.0.0) cl_khr_subgroup_extended_types 0x400000 (1.0.0) cl_khr_subgroup_non_uniform_vote 0x400000 (1.0.0) cl_khr_subgroup_ballot 0x400000 (1.0.0) cl_khr_subgroup_non_uniform_arithmetic 0x400000 (1.0.0) cl_khr_subgroup_shuffle 0x400000 (1.0.0) cl_khr_subgroup_shuffle_relative 0x400000 (1.0.0) cl_khr_subgroup_clustered_reduce 0x400000 (1.0.0) cl_intel_device_attribute_query 0x400000 (1.0.0) cl_khr_extended_bit_ops 0x400000 (1.0.0) cl_khr_suggested_local_work_size 0x400000 (1.0.0) cl_intel_split_work_group_barrier 0x400000 (1.0.0) cl_intel_spirv_media_block_io 0x400000 (1.0.0) cl_intel_spirv_subgroups 0x400000 (1.0.0) cl_khr_spirv_linkonce_odr 0x400000 (1.0.0) cl_khr_spirv_no_integer_wrap_decoration 0x400000 (1.0.0) cl_intel_unified_shared_memory 0x400000 (1.0.0) cl_khr_mipmap_image 0x400000 (1.0.0) cl_khr_mipmap_image_writes 0x400000 (1.0.0) cl_ext_float_atomics 0x400000 (1.0.0) cl_khr_external_memory 0x9001 (0.9.1) cl_intel_planar_yuv 0x400000 (1.0.0) cl_intel_packed_yuv 0x400000 (1.0.0) cl_khr_int64_base_atomics 0x400000 (1.0.0) cl_khr_int64_extended_atomics 0x400000 (1.0.0) cl_khr_image2d_from_buffer 0x400000 (1.0.0) cl_khr_depth_images 0x400000 (1.0.0) cl_khr_3d_image_writes 0x400000 (1.0.0) cl_intel_media_block_io 0x400000 (1.0.0) cl_intel_bfloat16_conversions 0x400000 (1.0.0) cl_intel_create_buffer_with_properties 0x400000 (1.0.0) cl_intel_subgroup_local_block_io 0x400000 (1.0.0) cl_intel_subgroup_matrix_multiply_accumulate 0x400000 (1.0.0) cl_intel_subgroup_split_matrix_multiply_accumulate 0x400000 (1.0.0) cl_khr_integer_dot_product 0x800000 (2.0.0) cl_khr_gl_sharing 0x400000 (1.0.0) cl_khr_gl_depth_images 0x400000 (1.0.0) cl_khr_gl_event 0x400000 (1.0.0) cl_khr_gl_msaa_sharing 0x400000 (1.0.0) cl_intel_va_api_media_sharing 0x400000 (1.0.0) cl_intel_sharing_format_query 0x400000 (1.0.0) cl_khr_pci_bus_info 0x400000 (1.0.0) NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Intel(R) OpenCL Graphics clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [INTEL] clCreateContext(NULL, ...) [default] Success [INTEL] clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1) Platform Name Intel(R) OpenCL Graphics Device Name Intel(R) Arc(TM) A380 Graphics clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name Intel(R) OpenCL Graphics Device Name Intel(R) Arc(TM) A380 Graphics clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name Intel(R) OpenCL Graphics Device Name Intel(R) Arc(TM) A380 Graphics ICD loader properties ICD loader Name OpenCL ICD Loader ICD loader Vendor OCL Icd free software ICD loader Version 2.3.2 ICD loader Profile OpenCL 3.0 You need to go a lower kernel like 6.5.x to get it to work.
yocker 1247 Posted December 9, 2024 Posted December 9, 2024 5 hours ago, rotational467 said: I just added an A310 to my emby server and have been messing with this initially because of Handbrake refusing to see the qsv engine. On Ubuntu 24.04, going from the 6.8 kernel to 6.11 (oem kernel) did not help. Card firmware was flashed to latest on a Windows box first. 24.3.4 iHD drivers, latest firmwares, compute packages etc. from Intel's repo installed. Neither latest ffmpeg nightly nor handbrake 1.9.0 (both built from source) can see the qsv engine for encoding. ffmepg via VAAPI does work, and that looks like what jellyfin is doing based on the screenshot above. It works on Emby as well. @DAVe3283 You need to set Tone Mapping Method to "Disabled" under Intel Quick Sync. The VAAPI setting is correct. edit - I don't know what emby secret sauce is able to access qsv on the arc for encoding, I can't get anything else to do it. At least the difference between QSV and VAAPI should be almost nonexistence. So as long as VAAPI works i wouldn't worry one bit.
Twistator 3 Posted December 9, 2024 Posted December 9, 2024 13 hours ago, rotational467 said: I just added an A310 to my emby server and have been messing with this initially because of Handbrake refusing to see the qsv engine. On Ubuntu 24.04, going from the 6.8 kernel to 6.11 (oem kernel) did not help. Card firmware was flashed to latest on a Windows box first. 24.3.4 iHD drivers, latest firmwares, compute packages etc. from Intel's repo installed. Neither latest ffmpeg nightly nor handbrake 1.9.0 (both built from source) can see the qsv engine for encoding. ffmepg via VAAPI does work, and that looks like what jellyfin is doing based on the screenshot above. It works on Emby as well. @DAVe3283 You need to set Tone Mapping Method to "Disabled" under Intel Quick Sync. The VAAPI setting is correct. edit - I don't know what emby secret sauce is able to access qsv on the arc for encoding, I can't get anything else to do it. How do you get this detailed transcoding overview? Id like to see this too, to debug my setup better.
Luke 42077 Posted December 9, 2024 Posted December 9, 2024 14 minutes ago, Twistator said: How do you get this detailed transcoding overview? Id like to see this too, to debug my setup better. Hi, that's in the diagnostics plugin.
GWTPqZp6b 50 Posted December 10, 2024 Posted December 10, 2024 13 hours ago, Luke said: Hi, that's in the diagnostics plugin. @Lukesorry to be need spoon feeding here, where is this enabled. I have the diagnostic plugin installed but I dont see this option.
guunter 49 Posted December 10, 2024 Posted December 10, 2024 1 hour ago, GWTPqZp6b said: @Lukesorry to be need spoon feeding here, where is this enabled. I have the diagnostic plugin installed but I dont see this option. it's the detailed view for User Sessions in the Advanced section 1
pear235 4 Posted January 20, 2025 Posted January 20, 2025 (edited) Hello, was there any development on this subject? I'm having the same issue as described in the original post, tested with the currect stable version and the .35 beta release. HW transcoding works fine, for both VAAPI and QuickSync, but without Tone Mapping. That only works with software method, neither OpenCL nor VAAPI work with hardware. Running emby in Proxmox lxc, with 6.8 kernel Edited January 20, 2025 by pear235
jhoff80 94 Posted January 22, 2025 Posted January 22, 2025 (edited) It's kernel 6.8 and newer that are the issue. For now at least, install kernel 6.5 and it'll work with HW tone mapping. Edited January 22, 2025 by jhoff80
yocker 1247 Posted January 22, 2025 Posted January 22, 2025 1 hour ago, jhoff80 said: It's kernel 6.8 and newer that are the issue. For now at least, install kernel 6.5 and it'll work with HW tone mapping. Kernels newer than 6.8 should work afaik.
jhoff80 94 Posted January 22, 2025 Posted January 22, 2025 26 minutes ago, yocker said: Kernels newer than 6.8 should work afaik. They don't. At least not in my experience in the exact same situation as the OP. (A310 on Proxmox using kernel 6.11.) The only thing that worked was 6.5. 1
yocker 1247 Posted January 22, 2025 Posted January 22, 2025 1 hour ago, jhoff80 said: They don't. At least not in my experience in the exact same situation as the OP. (A310 on Proxmox using kernel 6.11.) The only thing that worked was 6.5. Okay will mark that down, i just went from info i found from googling. Nice to have it tested and confirmed. Might be Linux distro dependent though, i know on Unraid that the problem can be fixed with an extra command but it doesn't seem to work in Ubuntu.
pear235 4 Posted January 23, 2025 Posted January 23, 2025 (edited) I was actually able to get it to work on A310 / Kernel 6.8 / Lxc inside Proxmox, by adding these parametres into embyservice service file (location can be taken from "systemctl status embyservice", i'm not at home now so can't check it and don't remember it precisely.) Environment="NEOReadDebugKeys=1" Environment="OverrideGpuAddressSpace=48" Added those two under [Service] in the service file, then "systemctl daemon-reload", then reboot for good measure. Now i have all quicksync options enabled in the transcoding tab, for both decoding and encoding, as well as OpenCL tone mapping for both QSV and VAAPI in the tone mapping tab. Edited January 23, 2025 by pear235 1 1 2
Twistator 3 Posted January 23, 2025 Posted January 23, 2025 (edited) 3 hours ago, pear235 said: I was actually able to get it to work on A310 / Kernel 6.8 / Lxc inside Proxmox, by adding these parametres into embyservice service file (location can be taken from "systemctl status embyservice", i'm not at home now so can't check it and don't remember it precisely.) Environment="NEOReadDebugKeys=1" Environment="OverrideGpuAddressSpace=48" Added those two under [Service] in the service file, then "systemctl daemon-reload", then reboot for good measure. Now i have all quicksync options enabled in the transcoding tab, for both decoding and encoding, as well as OpenCL tone mapping for both QSV and VAAPI in the tone mapping tab. Thank you!! I added these 2 ENV variables as well and now Tone Mapping via Intel OpenCL seems to work! Just as a friendly note, you shouldn't edit the supplied emby-server.service file, because that can get overwritten on updates. What you wanna do is either create a copy with the edits here: /etc/systemd/system/emby-server.service OR (what I did) create a drop in dir for the emby service: mkdir /etc/systemd/system/emby-server.service.d nano /etc/systemd/system/emby-server.service.d/tonemap.conf [Service] Environment="NEOReadDebugKeys=1" Environment="OverrideGpuAddressSpace=48" After that: systemctl daemon-reload systemctl restart emby-server If you do systemctl status emby-server after that it will list the tonmap.conf file as a dropin it used to ammend your service definition. Edited January 23, 2025 by Twistator 1 2
Lessaj 467 Posted January 23, 2025 Posted January 23, 2025 You can also do Quote sudo systemctl edit emby-server.service And it will create /etc/systemd/system/emby-server.service.d/override.conf for you. Just put in your extra lines inbetween as it indicates. 2
jhoff80 94 Posted January 24, 2025 Posted January 24, 2025 This fixed it for me too, both on kernel 6.8 and kernel 6.11. 1
gcorgnet 5 Posted January 29, 2025 Posted January 29, 2025 (edited) Hi Team, getting a very similar issue when running Emby in Proxmox on a NUC8i7HNK (Hades Canyon) with an Intel CPU and an AMD GPU (AMD Radeon RX Vega M GL Graphics) Same issue with the emby-ffmpeg not able to load openCL but the normal ffmpeg being able to. I've tried the fix proposed here (setting NEOReadDebugKeys / OverrideGpuAddressSpace) but it doesn't do anything. Since I don't understand where these values come from. Would I need to adjust them for AMD? Edit: I've also downgraded my proxmox kernel to 6.5.13-5-pve and the issue is still there. Thanks Edited January 29, 2025 by gcorgnet
Grimx00 0 Posted February 1, 2025 Posted February 1, 2025 On 11/22/2024 at 9:09 PM, yocker said: Intel Arc cards drivers are bugged with Windows and Unbuntu and won't work with Tone Mapping. Plex and Jellyfin made a workaround somehow. On Linux (none Unbuntu) you can avoid the bug by not using kernel 6.8. On Windows you can fix it by using an old driver (forgot version). I sadly don't think there is much you can do unless Intel fixes their mess or the Emby team makes the same workaround like Plex and Jellyfin. I'm using an Arc card my self and i'm dreading the day Unraid updates their kernel. Who knows if the card will still work. Thank you for that information! Did you ever figure out what driver I should "downgrade" to on windows (using mini PC with Intel 125H)? For some reason VAAPi Tone mapping is not available in Emby settings, not sure if there is something wrong with my setup or if it's not supported with my device for some reason. I've been tone mapping my media with handbrake because I haven't been able to with Emby (same issue as OP).
yocker 1247 Posted February 1, 2025 Posted February 1, 2025 2 hours ago, Grimx00 said: Thank you for that information! Did you ever figure out what driver I should "downgrade" to on windows (using mini PC with Intel 125H)? For some reason VAAPi Tone mapping is not available in Emby settings, not sure if there is something wrong with my setup or if it's not supported with my device for some reason. I've been tone mapping my media with handbrake because I haven't been able to with Emby (same issue as OP). I can't remember what version it is, sorry. Try and look through the post history here on Emby and you might find it. To be 100% sure, you mean VAAPI isn't available as in thats the problem or Tone Mapping isn't working with QSV and you don't have VAAPI to fall back to? That said. When i tested with my MiniPC with a 13700H in it i had no problems with Transcoding with Emby on Windows. Install various Intel drivers like fx. chipset drivers, install Intel GFX drivers, Install Emby, setup Emby to use QSV plus tone mapping and off it went.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now