Posts

LPC 2022 GPU BOF (user console and cgroups)

 At LPC 2022 we had a BoF session for GPUs with two topics. Moving to userspace consoles: Currently most mainline distros still use the kernel console, provided by the VT subsystem. We'd like to move to CONFIG_VT=n as the console and vt subsystem have historically been a source of bugs but are also nasty places for locking etc. It also can be the cause of oops going missing when it takes out the panic path with locking bugs stopping other paths from completely processing the oops (like pstore or serial).  The session started discussing what things would like. Lennart gave a great summary of the work David did a few years ago and the train of thought involved. Once you think through all the paths and things you want supported, you realise the best user console is going to be one that supports emojis and non-Latin scripts. This probably means you want a lightweight wayland compositor running a fullscreen VTE based terminal. Working back from the consequences of this means you probabl

LPC 2022 Accelerators BOF outcomes summary

 At Linux Plumbers Conference 2022, we held a BoF session around accelerators. This is a summary made from memory and notes taken by John Hubbard. We started with defining categories of accelerator devices. 1. single shot data processors, submit one off jobs to a device. (simpler image processors) 2. single-user, single task offload devices (ML training devices) 3. multi-app devices (GPU, ML/inference execution engines) One of the main points made is that common device frameworks are normally about targeting a common userspace (e.g. mesa for GPUs). Since a common userspace doesn't exist for accelerators, this presents a problem of what sort of common things can be targetted. Discussion about tensorflow, pytorch as being the userspace, but also camera image processing and OpenCL. OpenXLA was also named as a userspace API that might be of interest to use as a target for implementations.  There was a discussion on what to call the subsystem and where to place it in the tree. It was ag

lavapipe Vulkan 1.3 conformant

The software Vulkan renderer in Mesa, lavapipe, achieved official Vulkan 1.3 conformance. The official entry in the table is  here . We can now remove the nonconformant warning from the driver. Thanks to everyone involved!

lavapipe Vulkan 1.2 conformant

The software Vulkan renderer in Mesa, lavapipe, achieved official Vulkan 1.2 conformance. The non obvious entry in the table is  here . Thanks to all the Mesa team who helped achieve this, Shout outs to Mike of Zink fame who drove a bunch of pieces over the line, Roland who helped review some of the funkier changes.  We will be submitting 1.3 conformance soon, just a few things to iron out.

optimizing llvmpipe vertex/fragment processing.

Around 2 years ago while I was working on tessellation support for llvmpipe, and running the heaven benchmark on my Ryzen, I noticed that heaven despite running slowly wasn't saturating all the cores. I dug in a bit, and found that llvmpipe despite threading rasterization, fragment shading and blending stages, never did anything else while those were happening. I dug into the code as I clearly remembered seeing a concept of a "scene" where all the primitives were binned into and then dispatched. It turned out the "scene" was always executed synchronously. At the time I wrote support to allow multiple scenes to exist, so while one scene was executing the vertex shading and binning for the next scene could execute, and it would be queued up. For heaven at the time I saw some places where it would build 36 scenes. However heaven was still 1fps with tess, and regressions in other areas were rampant, and I mostly left them in a branch. The reasons so many things were

video decode: crossing the streams

I was interested in how much work a vaapi on top of vulkan video proof of concept would be. My main reason for being interested is actually video encoding, there is no good vulkan video encoding demo yet, and I'm not experienced enough in the area to write one, but I can hack stuff. I think it is probably easier to hack a vaapi encode to vulkan video encode than write a demo app myself. With that in mind I decided to see what decode would look like first. I talked to Mike B (most famous zink author) before he left for holidays, then I ignored everything he told me and wrote a super hack. This morning I convinced zink vaapi on top anv with iris GL doing the presents in mpv to show me some useful frames of video. However zink vaapi on anv with zink GL is failing miserably (well green jellyfish). I'm not sure how much more I'll push on the decode side at this stage, I really wanted it to validate the driver side code, and I've found a few bugs in there already. The WIP hac

h264: more AMD hw worked on

Previously I mentioned having AMD VCN h264 support. Today I added initial support for the older UVD engine[1]. This is found on chips from Vega back to SI. I've only tested it on my Vega so far. I also worked out the "correct" answer to the how to I send the reset command correctly, however the nvidia player I'm using as a demo doesn't do things that way yet, so I've forked it for now[2]. The answer is to use vkCmdControlVideoCodingKHR to send a reset the first type a session is used. However I can't see how the app is meant to know this is necessary, but I've asked the appropriate people. The initial anv branch I mentioned last week is now here[3]. [1] https://gitlab.freedesktop.org/airlied/mesa/-/commits/radv-vulkan-video-uvd-h264 [2] https://github.com/airlied/vk_video_samples/tree/radv-fixes [3] https://gitlab.freedesktop.org/airlied/mesa/-/tree/anv-vulkan-video-prelim-decode