lavapipe and sparse memory bindings: part two

 Thanks for all the suggestions, on here, and on twitter and on mastodon, anyway who noted I could use a single fd and avoid all the pain was correct! I hacked up an ever growing ftruncate/madvise memfd and it seemed to work fine. In order to use it for sparse I have to use it for all device memory allocations in lavapipe which means if I push forward I probably have to prove it works and scales a bit better to myself. I suspect layering some of the pb bufmgr code on top of an ever growing fd might work, or maybe just having multiple 2GB buffers might be enough. Not sure how best to do shaderResourceResidency, userfaultfd might be somewhat useful, mapping with PROT_NONE and then using write(2) to get a -EFAULT is also promising, but I'm not sure how best to avoid segfaults for read/writes to PROT_NONE regions. Once I got that going, though I ran headfirst into something that should have been obvious to me, but I hadn't thought through. llvmpipe allocates all it's textures l

lavapipe and sparse memory bindings

Mike nerdsniped me into wondering how hard sparse memory support would be in lavapipe. The answer is unfortunately extremely. Sparse binding essentially allows creating a vulkan buffer/image of a certain size, then plugging in chunks of memory to back it in page-size multiple chunks. This works great with GPU APIs where we've designed this, but it's actually hard to pull off on the CPU. Currently lavapipe allocates memory with an aligned malloc. It allocates objects with no backing and non-sparse bindings connect objects to the malloced memory. However with sparse objects, the object creation should allocate a chunk of virtual memory space, then sparse binding should bind allocated device memory into the virtual memory space. Except Linux has no interfaces for doing this without using a file descriptor. You can't mmap a chunk of anonymous memory that you allocated with malloc to another location. So if I malloc backing memory A at 0x1234000, but the virtual memory I've

Fedora 38 LLVM vs Team Fortress 2 (TF2)

F38 just released and seeing a bunch of people complain that TF2 dies on AMD or other platforms when lavapipe is installed. Who's at fault? I've no real idea. How to fix it? I've no real idea. What's happening? AMD OpenGL drivers use LLVM as the backend compiler. Fedora 38 updated to LLVM 16. LLVM 16 is built with c++17 by default. C++17 introduces new "operator new/delete" interfaces[1]. TF2 ships with it's own implementation, tcmalloc expects to replace all the new/delete interfaces, but the version in TF2 must not support or had incorrect support for the new align interfaces. What happens is when TF2 probes OpenGL and LLVM is loaded, when DenseMap initializes, one "new" path fails to go into tcmalloc, but the "delete" path does, and this causes tcmalloc to explode with "src/] Attempt to free invalid pointer" Fixing it? I'll talk to Valve and see if we can work out something, LLVM 16

nouveau/gsp + kernel module firmware selection for initramfs generation

There are plans for nouveau to support using the NVIDIA supplied GSP firmware in order to support new hardware going forward The nouveau project doesn't have any input or control over the firmware. NVIDIA have made no promises around stable ABI or firmware versioning. The current status quo is that NVIDIA will release versioned signed gsp firmwares as part of their driver distribution packages that are version locked to their proprietary drivers (open source and binary). They are working towards allowing these firmwares to be redistributed in linux-firmware. The NVIDIA firmwares are quite large. The nouveau project will control the selection of what versions of the released firmwares are to be supported by the driver, it's likely a newer firmware will only be pulled into linux-firmware for: New hardware support (new GPU family or GPU support) Security fix in the firmware New features that is required to be supported This should at least limit the number of firmwares in the linu

vulkan video vp9 decode - radv update

While going over the AV1 a few people commented on the lack of VP9 and a few people said it would be an easier place to start etc. Daniel Almeida at Collabora took a first pass at writing the spec up, and I decided to go ahead and take it to a working demo level. Lynne was busy, and they'd already said it should take an afternoon, so I decided to have a go at writing the ffmpeg side for it as well as finish off Daniel's radv code. About 2 mins before I finished for the weekend on Friday, I got a single frame to decode, and this morning I finished off the rest to get at least 2 test videos I downloaded to work. Branches are at [1] and [2]. There is only 8-bit support so far and I suspect some cleaning up is required. [1] [2]

vulkan video: status update (anv + radv)

 Okay just a short status update. radv H264/H265 decode: The radv h264/h265 support has been merged to mesa main branch. It is still behind RADV_PERFTEST=video_decode flag, and should work for basics from VI/GFX8+. It still has not passed all the CTS tests. anv H264 decode: The anv h264 decode support has been merged to mesa main branch. It has been tested from Skylake up to DG2. It has no enable flag, just make sure to build with h264dec video-codec support. It passes all current CTS tests. hasvk H264 decode: I ported the anv h264 decoder to hasvk the vulkan driver for Ivybridge/Haswell. This in a draft MR ( HASVK H264 ). I haven't given this much testing yet, it has worked in the past. I'll get to testing it before trying to get it merged. radv AV1 decode: I created an MR for spec discussion ( radv av1 ). I've also cleaned up the radv AV1 decode code. anv AV1 decode: I've started on anv AV1 decode support for DG2. I've gotten one very simple frame to decode. I wil

vulkan video decoding: anv status update

After hacking the Intel media-driver and ffmpeg I managed to work out how the anv hardware mostly works now for h264 decoding. I've pushed a branch [1] and a MR[2] to mesa. The basics of h264 decoding are working great on gen9 and compatible hardware. I've tested it on my one Lenovo WhiskeyLake laptop. I have ported the code to hasvk as well, and once we get moving on this I'll polish that up and check we can h264 decode on IVB/HSW devices. The one feature I know is missing is status reporting, radv can't support that from what I can work out due to firmware, but anv should be able to so I might dig into that a bit. [1] [2]