A little while back I took to wondering why one particular demo from the Sascha Willems vulkan demos was a lot slower on radv compared to amdgpu-pro. Like half the speed slow.
I internally titled this my "no fps left behind" project.
The deferred demo, does an offscreen rendering to 3 2048x2048 color attachments and one 2048x2048 D32S8 depth attachment. It then does a rendering using those down to as 1280x720 screen image.
Bas identifed the first cause was probably the fact we were doing clear color eliminations on the offscreen surfaces when we didn't need to. AMD GPU have a delta-color compression feature, and with certain clear values you don't need to do the clear color eliminations step. This brought me back from about 1/2 the FPS to about 3/4, however it took me quite a while to figure out where the rest of the FPS were hiding.
I took a few diversions in my testing, I pulled in some experimental patches to allow the depth buffer to be texture cache compatible, …
I recently acquired an r7 360 (BONAIRE) and spent some time getting radv stable and passing the same set of conformance tests that VI and Polaris pass.
The main missing thing was 10-bit integer format clamping for a bug in the SI/CIK fragment shader output hardware, where it truncates instead of clamps. The other missing piece was code for handling f16->f32 conversions according to the vulkan spec that I'd previously fixed for VI.
I also looked at a trace from amdgpu-pro and noticed it was using a ds_swizzle for the derivative calculations which avoided accessing LDS memory. I wrote support to use this path for radv/radeonsi since LLVM supported the intrinsic for a while now.
With these fixed CIK is pretty much in the same place as VI/Polaris.
I then plugged in my SI (Tahiti), and got lots of GPU hangs and crashes. I fixed a number of SI specific bugs (tiling and MSAA handling, stencil tiling). However even with those fixed I was getting random hangs, and a bunch of people on a…