lavapipe Vulkan 1.2 conformant

The software Vulkan renderer in Mesa, lavapipe, achieved official Vulkan 1.2 conformance. The non obvious entry in the table is  here . Thanks to all the Mesa team who helped achieve this, Shout outs to Mike of Zink fame who drove a bunch of pieces over the line, Roland who helped review some of the funkier changes.  We will be submitting 1.3 conformance soon, just a few things to iron out.

optimizing llvmpipe vertex/fragment processing.

Around 2 years ago while I was working on tessellation support for llvmpipe, and running the heaven benchmark on my Ryzen, I noticed that heaven despite running slowly wasn't saturating all the cores. I dug in a bit, and found that llvmpipe despite threading rasterization, fragment shading and blending stages, never did anything else while those were happening. I dug into the code as I clearly remembered seeing a concept of a "scene" where all the primitives were binned into and then dispatched. It turned out the "scene" was always executed synchronously. At the time I wrote support to allow multiple scenes to exist, so while one scene was executing the vertex shading and binning for the next scene could execute, and it would be queued up. For heaven at the time I saw some places where it would build 36 scenes. However heaven was still 1fps with tess, and regressions in other areas were rampant, and I mostly left them in a branch. The reasons so many things were

video decode: crossing the streams

I was interested in how much work a vaapi on top of vulkan video proof of concept would be. My main reason for being interested is actually video encoding, there is no good vulkan video encoding demo yet, and I'm not experienced enough in the area to write one, but I can hack stuff. I think it is probably easier to hack a vaapi encode to vulkan video encode than write a demo app myself. With that in mind I decided to see what decode would look like first. I talked to Mike B (most famous zink author) before he left for holidays, then I ignored everything he told me and wrote a super hack. This morning I convinced zink vaapi on top anv with iris GL doing the presents in mpv to show me some useful frames of video. However zink vaapi on anv with zink GL is failing miserably (well green jellyfish). I'm not sure how much more I'll push on the decode side at this stage, I really wanted it to validate the driver side code, and I've found a few bugs in there already. The WIP hac

h264: more AMD hw worked on

Previously I mentioned having AMD VCN h264 support. Today I added initial support for the older UVD engine[1]. This is found on chips from Vega back to SI. I've only tested it on my Vega so far. I also worked out the "correct" answer to the how to I send the reset command correctly, however the nvidia player I'm using as a demo doesn't do things that way yet, so I've forked it for now[2]. The answer is to use vkCmdControlVideoCodingKHR to send a reset the first type a session is used. However I can't see how the app is meant to know this is necessary, but I've asked the appropriate people. The initial anv branch I mentioned last week is now here[3]. [1] [2] [3]

h264 video decoding: i-frames strike back

Last week I mentioned I had the basics of h264 decode using the proposed vulkan video on radv. This week I attempted to do the same thing with Intel's Mesa vulkan driver "anv". Now I'd previously unsuccessfully tried to get vaapi on crocus working but got sidetracked back into other projects. The Intel h264 decoder hasn't changed a lot between ivb/hsw/gen8/gen9 era. I ported what I had from crocus to anv and started trying to get something to decode on my WhiskeyLake. I wrote the code pretty early on, figured out all the things I had to send the hardware. The first anv side bridge to cross was Vulkan is doing H264 Picture level decode API, so it means you get handed the encoded slice data. However to program the Intel hw you need to decode the slice header. I wrote a slice header decoder in some common code. The other thing you need to give the intel hw is a number of bits of slice header, which in some encoding schemes is rounded to bytes and in some isn't. S

What do you know about video decoding/encoding?

A few weeks ago I watched Victor's excellent talk on Vulkan Video . This made me question my skills in this area. I'm pretty vague on video processing hardware, I really have no understanding of H264 or any of the standards. I've been loosely following the Vulkan video group inside of Khronos, but I can't say I've understood it or been useful. radeonsi has a gallium vaapi driver, that talks to firmware driver encoder on the hardware, surely copying what it is programming can't be that hard. I got an mpv/vaapi setup running and tested some videos on that setup just to get comfortable. I looked at what sort of data was being pushed about. The thing is the firmware is doing all the work here, the driver is mostly just responsible for taking semi-parsed h264 bitstream data structures and giving them in memory buffers to the fw API. Then the resulting decoded image should be magically in a buffer. I then got the demo nvidia video decoder application mentioned in Vict

crocus misrendering of the week

 I've been chasing a crocus misrendering bug show in a qt trace. The bottom image is crocus vs 965 on top. This only happened on Gen4->5, so Ironlake and GM45 were my test machines. I burned a lot of time trying to work this out. I trimmed the traces down, dumped a stupendous amount of batchbuffers, turned off UBO push constants, dump all the index and vertex buffers, tried some RGBx changes, but nothing was rushing to hit me, except that the vertex shaders produced were different. However they were different for many reasons, due to the optimization pipelines the mesa state tracker runs vs the 965 driver. Inputs and UBO loads were in different places so there was a lot of noise in the shaders. I ported the trace to a piglit GL application so I could easier hack on the shaders and GL, with that I trimmed it down even further (even if I did burn some time on a misplace */+ typo). Using the ported app, I removed all uniform buffer loads and then split the vertex shader in half (it