Posts

vulkan video decoding: anv status update

After hacking the Intel media-driver and ffmpeg I managed to work out how the anv hardware mostly works now for h264 decoding. I've pushed a branch [1] and a MR[2] to mesa. The basics of h264 decoding are working great on gen9 and compatible hardware. I've tested it on my one Lenovo WhiskeyLake laptop. I have ported the code to hasvk as well, and once we get moving on this I'll polish that up and check we can h264 decode on IVB/HSW devices. The one feature I know is missing is status reporting, radv can't support that from what I can work out due to firmware, but anv should be able to so I might dig into that a bit. [1] https://gitlab.freedesktop.org/airlied/mesa/-/tree/anv-vulkan-video-decode [2] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20782

vulkan video decoding: av1 (yes av1) status update

Needless to say h264/5 weren't my real goals in life for video decoding. Lynne and myself decided to see what we could do to drive AV1 decode forward by creating our own extensions called VK_MESA_video_decode_av1. This is a radv only extension so far, and may expose some peculiarities of AMD hardware/firmware. Lynne's blog entry[1] has all the gory details, so go read that first. (really read it first). Now that you've read and understood all that, I'll just rant here a bit. Figuring out the DPB management and hw frame ref and curr_pic_idx fields was a bit of a nightmare. I spent a few days hacking up a lot of wrong things before landing on the thing we agreed was the least wrong which was having the ffmpeg code allocate a frame index in the same fashion as the vaapi radeon implementation did. I had another hacky solution that involved overloading the slotIndex value to mean something that wasn't DPB slot index, but it wasn't really any better. I think there may

vulkan video encoding: radv update

After the video decode stuff was fairly nailed down, Lynne from ffmpeg nerdsniped^Wtalked me into looking at h264 encoding. The AMD VCN encoder engine is a very different interface to the decode engine and required a lot of code porting from the radeon vaapi driver. Prior to Xmas I burned a few days on typing that all in, and yesterday I finished typing and moved to debugging the pile of trash I'd just typed in. Lynne meanwhile had written the initial ffmpeg side implementation, and today we threw them at each other, and polished off a lot of sharp edges. We were rewarded with valid encoded frames. The code at this point is only doing I-frame encoding, we will work on P/B frames when we get a chance. There are also a bunch of hacks and workarounds for API/hw mismatches, that I need to consult with Vulkan spec and AMD teams about, but we have a good starting point to move forward from. I'll also be offline for a few days on holidays so I'm not sure it will get much further u

vulkan video decoding: anv status

After cleaning up the radv stuff I decided to go back and dig into the anv support for H264. The current status of this work is in a branch[1]. This work is all against the current EXT decode beta extensions in the spec. This contains an initial implementation of H264 Intel GPUs that anv supports. I've only tested it on Kabylake equivalents so far. It decodes some of the basic streams I've thrown at it from ffmpeg. Now this isn't as far along as the AMD implementation, but I'm also not sure I'm programming the hardware correctly. The Windows DXVA API has 2 ways to decode H264, short and long. I believe but I'm not 100% sure the current Vulkan API is quite close to "short", but the only Intel implementations I've found source for are for "long". I've bridged this gap by writing a slice header parser in mesa, but I think the hw might be capable of taking over that task, and I could in theory dump a bunch of code. But the programming gu

vulkan video decoding: radv status

I've been working the past couple of weeks with an ffmpeg developer (Lynne) doing Vulkan video decode bringup on radv. The current status of this work is in a branch[1]. This work is all against the current EXT decode beta extensions in the spec. Khronos has released the final specs for these extensions. This work is rebased onto the final KHR form and is in a merge request for radv[2]. This contains an initial implementation of H264 and H265 decoding for AMD GPUs from TONGA to NAVI2x. It passes the basic conformance tests but fails some of the more complicated ones, but it has decoded the streams we've been throwing at it using ffmpeg. Building: git clone https://gitlab.freedesktop.org/airlied/mesa git checkout radv-vulkan-video-prelim-decode mkdir build meson build -Dvulkan-beta=true -Dvulkan-drivers=amd -Dvideo-codecs=h264dec,h265dec --prefix=<prefix> cd build ninja ninja install Running: export VK_ICD_FILENAMES=<prefix>/share/vulkan/icd.d/radeon_icd.x86_64.json

LPC 2022 GPU BOF (user console and cgroups)

 At LPC 2022 we had a BoF session for GPUs with two topics. Moving to userspace consoles: Currently most mainline distros still use the kernel console, provided by the VT subsystem. We'd like to move to CONFIG_VT=n as the console and vt subsystem have historically been a source of bugs but are also nasty places for locking etc. It also can be the cause of oops going missing when it takes out the panic path with locking bugs stopping other paths from completely processing the oops (like pstore or serial).  The session started discussing what things would like. Lennart gave a great summary of the work David did a few years ago and the train of thought involved. Once you think through all the paths and things you want supported, you realise the best user console is going to be one that supports emojis and non-Latin scripts. This probably means you want a lightweight wayland compositor running a fullscreen VTE based terminal. Working back from the consequences of this means you probabl

LPC 2022 Accelerators BOF outcomes summary

 At Linux Plumbers Conference 2022, we held a BoF session around accelerators. This is a summary made from memory and notes taken by John Hubbard. We started with defining categories of accelerator devices. 1. single shot data processors, submit one off jobs to a device. (simpler image processors) 2. single-user, single task offload devices (ML training devices) 3. multi-app devices (GPU, ML/inference execution engines) One of the main points made is that common device frameworks are normally about targeting a common userspace (e.g. mesa for GPUs). Since a common userspace doesn't exist for accelerators, this presents a problem of what sort of common things can be targetted. Discussion about tensorflow, pytorch as being the userspace, but also camera image processing and OpenCL. OpenXLA was also named as a userspace API that might be of interest to use as a target for implementations.  There was a discussion on what to call the subsystem and where to place it in the tree. It was ag