QtWayland 6.6 Brings Robustness Through Compositor Handoffs

Every release has a killer feature. Qt 6.6 features the opposite - staying alive. This blog post describes work to make Qt clients more robust and seemlessly migrate between compositors, providing resistance against compositor crashes and more.

Prologue

Right now if you restart pulseaudio your sound might cut out, restart NetworkManager and you lose your wifi, restart an X11 window manager and your decorations disappear.

But within a second it's all back to normal exactly where you left off with everything recovering fine.

This isn't true for display servers. If X11 restarts you're back at the login prompt. All drafts lost, games unsaved, work wasted.

Past

For X11 this was unfixable; clients relied on memory stored by the Xserver, they made synchronous calls that were expected to return values, and multiple clients talked to multple clients.

This was a real problem in my early days of Linux, X11 would lock up frequently enough that most distributions had a shortcut key to restart the server and send you back to the login prompt.

It's less of an issue now as X11 has been in a lengthy period of feature freeze.

Present

Wayland makes it possible to fix all this. Memory allocations are client side, all operations are async, all protocols are designed that the compositor always has complete control.

Yet the current user-facing state is considerably worse:

  • Compositors and displays servers are now the same process, doubling the space for errors

  • Compositors are typically extensible with 3rd party plugins and scripts

  • The wayland security model means the compositor absorbs even more functions from global shortcuts to screencasting and input-method handling

  • The wayland ecosystem is not in a period of feature freeze with wayland protocols constantly evolving to cover missing features and new ideas.

    Even if there was a perfect compositor:

  • 40% of kwin's crash bug reports are either upstream or downstream causes

  • The current compositor developer experience is limited with developers having to relogin and restart their apps and development setup just to test their latest changes.

Plan

The solution for this? Instead of exiting when the compositor closes, simply...don't!

If we could connect to a new compositor we just need to send the right amount of information to bring it in sync and notify the application code of any changes to bring this in sync.

For Qt applications all this information is handled in the backend, in the Wayland Qt Platform Abstraction (QPA).

Qt already has to handle screens and input devices being removed, clipboards being revoked and drag and drops cancelled. Supporting a whole reset isn't introducing any new work, we just have to trigger all of these actions at once, then reconnect to the newly restored compositor and restore our contents.

Applications already have to support all of these events too as well as handle callbacks to redraw buffers. There's no changes needed at an application code level, it's all done as transparently as possible.

Handling OpenGL is a challenge, right now we don't have a way to keep that alive. Fortunately we put in lots of effort previously in supporting GPU resets throughout the Qt stack. The QtWayland backend fakes to the applications that a GPU reset has occured through the Qt abstractions.

For the majority of applications including all QtQuick users this happens automatically and flawlessly, building on the work that we put in previously.

Whilst the overall concepts might sound invasive, the final merge-request to support all of this for all Qt applications was fewer lines than supporting middle-click paste.

Almost no work needs doing on the compositor side. For a compositor there's no difference between a new client, and a client that was running previously reconnecting. The only big change we made within Kwin is having a helper process so the whole process can be seemless and race-free.

Proof

Path Forward

This post is about Qt, but the world is bigger than that. Not only does this technique work here, but we have pending patches for GTK, SDL and even XWayland, with key parts of SDL merged but disabled already.

The challenge for the other toolkits is we can't use the same OpenGL trick as Qt. They either lack GPU reset handling either at a toolkit level or within the applications.

Supporting OpenGL requires new infrastructure. We've tried from the start to get some infrastructure in place to allow this.It was clear that proposing library changes for client reconnection support was an uphill battle whilst being unproven in the wild.

After Qt6 is rolled out to a wider audience and shown to work reliably, we'll refocus efforts on pushing upstream changes to the other platforms.

Potential Perks

This isn't just about crashes. If support becomes mainstream there will be additional opportunities:

  • We can easily upgrade to new features without having to log out and back. This is one of the reasons why kwin development is happening at such a rapid pace in recent months. We're not having to test in fake nested sessions or waste time logging in and setting up between changes.

  • We can support multihead, running different compositors per group of outputs and move seemlessly between them.

  • It's feasible to switch between compositors at runtime. With the application handling the reconnect logic, they can easily handle the case where compositor feature sets vary.

  • Checkpoint restore in userspace, being able to suspend your application to disk and then pick up where you left off like nothing happened. It could never work for X applications, but with wayland reconnect support we can make it work.

Participating

More information about the long term goal this can be found at the kwin wiki, or each out to me directly if you want to add support.

Discuss this at discuss.kde.org.