LPC 2022 GPU BOF (user console and cgroups)
At LPC 2022 we had a BoF session for GPUs with two topics.
Moving to userspace consoles:
Currently most mainline distros still use the kernel console, provided by the VT subsystem. We'd like to move to CONFIG_VT=n as the console and vt subsystem have historically been a source of bugs but are also nasty places for locking etc. It also can be the cause of oops going missing when it takes out the panic path with locking bugs stopping other paths from completely processing the oops (like pstore or serial).
The session started discussing what things would like. Lennart gave a great summary of the work David did a few years ago and the train of thought involved.
Once you think through all the paths and things you want supported, you realise the best user console is going to be one that supports emojis and non-Latin scripts. This probably means you want a lightweight wayland compositor running a fullscreen VTE based terminal. Working back from the consequences of this means you probably aren't going to want this in systemd, and it should be a separate development.
The other area discussed was around the requirements for a panic/emergency kernel console, likely called drmlog, this would just be something to output to the display whenever the kernel panics or during boot before the user console is loaded.
cgroups for GPU
This has been an ongoing topic, where vendors often suggest cgroup patches, and on review told they are too vendor specific and to simplify and try again, never to try again. We went over the problem space of both memory allocation accounting/enforcement and processing restrictions. We all agreed the local memory accounting was probably the easiest place to start doing something. We also agreed that accounting should be implemented and solid before we start digging into enforcement. We had a discussion about where CPU memory should be accounted, especially around integrated vs discrete graphics, since users might care about the application memory usage apart from the GPU objects usage which would be CPU on integrated and GPU on discrete. I believe a follow-up hallway discussion also covered a bunch of this with upstream cgroups maintainer.