virgl caps - oops I messed.up

When I designed virgl I added a capability system to pass some info about the host GL to the guest driver along the lines of gallium caps. The design was at the virtio GPU level you have a number of capsets each of which has a max version and max size.

The virgl capset is capset 1 with max version 1 and size 308 bytes.

Until now we've happily been using version 1 at 308 bytes. Recently we decided we wanted to have a v2 at 380 bytes, and the world fell apart.

It turned out there is a bug in the guest kernel driver, it asks the host for a list of capsets and allows guest userspace to retrieve from it. The guest userspace has it's own copy of the struct.

The flow is:
Guest mesa driver gives kernel a caps struct to fill out for capset 1.
Kernel driver asks the host over virtio for latest capset 1 info, max size, version.
Host gives it the max_size, version for capset 1.
Kernel driver asks host to fill out malloced memory of the max_size with the
caps struct.
Kernel driver copies the returned caps struct to userspace, using the size of the returned host struct.

The bug is the last line, it uses the size of the returned host struct which ends up corrupting the guest in the scenario where the host has a capset 1 v2, size 380, but the host is still running old userspace which understands capset v1, size 308.

The 380 bytes gets memcpy over the 308 byte struct and boom.

Now we can fix the kernel to not do this, but we can't upgrade every kernel in an existing VM. So if we allow the virglrenderer process to expose a v2 all older sw will explode unless it is also upgraded which isn't really something you want in a VM world.

I came up with some virglrenderer workarounds, but due to another bug where qemu doesn't reset virglrenderer when it should, there was no way to make it reliable, and things like kexec old kernel from new kernel would blow up.

I decided in the end to bite the bullet and just make capset 2 be a repaired one. Unfortunately this needs patches in all 4 components before it can be used.

1) virglrenderer needs to expose capset 2 with the new version/size to qemu.
2) qemu needs to allow the virtio-gpu to transfer capset 2 as a virgl capset to the host.
3) The kernel on the host needs fixing to make sure we copy the minimum of the host caps and the guest caps into the guest userspace driver, then it needs to
provide a way that guest userspace knows the fixed version is in place.
4) The guest userspace needs to check if the guest kernel has the fix, and then query capset 2 first, and fallback to querying capset 1.

After talking to a few other devs in virgl land, they pointed out we could probably just never add a new version of capset 2, and grow the struct endlessly.

The guest driver would fill out the struct it wants to use with it's copy of default minimum values.
It would then call the kernel ioctl to copy over the host caps. The kernel ioctl would copy the minimum size of the host caps and the guest caps.

In this case if the host has a 400 byte capset 2, and the guest still only has 380 byte capset 2, the new fields from the host won't get copied into the guest struct
and it will be fine.

If the guest has the 400 byte capset 2, but the host only has the 380 byte capset 2, the guest would preinit the extra 20 bytes with it's default values (0 or whatever) and the kernel would only copy 380 bytes into the start of the 400 bytes and leave the extra bytes alone.

Now I just have to got write the patches and confirm it all.

Thanks to Stephane at google for creating the patch that showed how broken it was, and to others in the virgl community who noticed how badly it broke old guests! Now to go write the patches...

Comments

Popular posts from this blog

On Rust, Linux, developers, maintainers

radv: vulkan av1 video decode status

anv: vulkan av1 decode status