lavapipe and sparse memory bindings

Mike nerdsniped me into wondering how hard sparse memory support would be in lavapipe.

The answer is unfortunately extremely.

Sparse binding essentially allows creating a vulkan buffer/image of a certain size, then plugging in chunks of memory to back it in page-size multiple chunks.

This works great with GPU APIs where we've designed this, but it's actually hard to pull off on the CPU.

Currently lavapipe allocates memory with an aligned malloc. It allocates objects with no backing and non-sparse bindings connect objects to the malloced memory.

However with sparse objects, the object creation should allocate a chunk of virtual memory space, then sparse binding should bind allocated device memory into the virtual memory space. Except Linux has no interfaces for doing this without using a file descriptor.

You can't mmap a chunk of anonymous memory that you allocated with malloc to another location. So if I malloc backing memory A at 0x1234000, but the virtual memory I've used for the object is at 0x4321000, there's no nice way to get the memory from the malloc to be available at the new location (unless I missed an API).

However you can do it with file descriptors. You can mmap a PROT_NONE area for the sparse object, then allocate the backing memory into file descriptors, then mmap areas from those file descriptors into the correct places.

But there are limits on file descriptors, you get 1024 soft, or 4096 hard limits by default, which is woefully low for this. Also *all* device memory allocations would need to be fd backed, not just ones going to be used in sparse allocations.

Vulkan has a limit maxMemoryAllocationCount that could be used for this, but setting it to the fd limit is a problem because some fd's are being used by the application and just in general by normal operations, so reporting 4096 for it, is probably going to explode if you only have 3900 of them left.

Also the sparse CTS tests don't respect the maxMemoryAllocationCount anyways :-)

I shall think on this a bit more, please let me know if anyone has any good ideas!


  1. Do you need to keep the fd open after you mmap it into the target region?

  2. You can't fix it in software. You should not fix it in software.
    It's infrastructure problem, only way to fix it is to force downstream distributions to raise their outdated limits on NOFILE.
    Databases need a lot more than 1024 file descriptors under any kind of load. Their installation instruction demand increase in nofile. And they crash when limit is reached.
    KDE Plasma needs more than 1024 file descriptors. It behaves quite funny when that limit is reached, but debugging it was quite a pain for me.
    Steam games need more than 1024 open file descriptors.

    Crash MESA/LAVAPIPE when number of allowed file descriptors is less than 2^14
    Write loud logs EVERYWHERE.
    Document the need in five different places.
    Force downstream distros to get on with times and bump nofile in limits.conf or DefaultLimitNOFILE in systemd.
    Deliver us from evil.

  3. Interesting. I don't know much about Vulkan (my graphics API knowledge is mostly OpenGL, and mostly from the user side rather than the driver side), but based on your description, I'd think you could keep a _single_ fd (or a small pool, if that helps with performance in some way) opened with shm_open and allocate out of that using internal bookkeeping. Use ftruncate to extend or contract from the end, and madvise with MADV_REMOVE to deallocate from the middle—I've never done the latter personally, but the man page says it punches an appropriate hole in the backing store and is supported by tmpfs. Does that sound workable, or am I missing something?


Post a Comment

Popular posts from this blog

Fedora 38 LLVM vs Team Fortress 2 (TF2)

tinygrad + rusticl + aco: why not?

nvk: the kernel changes needed