The reality of Wayland input methods in 2022

Fri, Sep 9 2022 19:45:42 KST

Hello.
Lately I’ve been very interested in Wayfire and implemented Wayland input methods in Nimf.
I implemented wayland’s official input method unstable v1 in 2017.
In 2022, I implemented the non-standard input method unstable v2 of wlroots and also reimplemented v1.
In the process of implementing these, I realized how problematic Wayland is, and I wrote this article to inform the reality of Wayland.

Wayland’s overall problem

Wayland is a project that creates protocols. However, there are only a few stable protocols, and not much progress has been made in 14 years. I tried to search for a book with the keyword Wayland programming to study Wayland, but there seems to be no published book.
The Wayland protocol is poor and therefore unsuitable for use in a desktop environment. Although many developers have pointed out this, the Wayland developers do not seem to have any intention of improving it. As a result, various non-standard protocols such as wlroots emerged, resulting in the fragmentation of the Wayland protocol. What this means is that apps that work in Sway may not work in GNOME, and apps that work in GNOME may not work in KDE.

Problems with the Wayland input method

Fragmentation of input method protocols

In Wayland, text input is done through communication like XIM.

              text input protocol                   input method protocol
Application --- communication 1 --- Display server --- communication 2 --- Input method (IME)

In Wayland, the display server is called a compositor. Display servers include Sway, Wayfire, etc.

In communication 1, text input protocol is used, and there are three types of protocols: v1, v2, and v3.
In communication 2, the input method protocol is used, and there are v1 and v2. Each protocol is not backwards compatible. Therefore, the protocol version used between the application and the display server must match exactly, and the protocol version between the display server and the input method (IME) must match in order to input multilingual text.
I will explain in an easy to understand way. If both input method protocols v1 and v2 are implemented in the input method, text input protocol v1 is implemented in the application, and text input v2 is implemented in the display server, you cannot input multilingual text because the protocol does not match between the application and the display server.
As another example, if text input v1 is implemented in the application and display server, input method v2 is implemented in the display server, and input method v1 is implemented in the input method, you cannot input multilingual text because the protocol does not match between the display server and the input method.

Repeat key problem

When a key needs to be repeated, the X display server provides xEvents to the input method repeatedly. However, the Wayland display server provides raw key events. So the input method or Wayland display server must repeat the key.

1. Possibility of different repeat key logic for different Wayland display servers

Each display server may differ as to which keys to repeat or stop when keys are pressed simultaneously.

2. Possibility of different repeat key logic for different input methods

Each input method may differ as to which keys to repeat or stop when keys are pressed simultaneously.

3. Discontinuity between repeat timer on Wayland display server and repeat timer on input method

If you press and do not release the backspace key while inputting Hangul, the repeat timer of the input method operates and the backspace key repeats after a certain delay time and deletes the grapheme being composed. In that case the input method consumes the backspace key. After all the grapheme has been deleted, the input method does not consume the backspace key, so it stops repeating the backspace key and forwards the backspace key to the Wayland display server. At that moment, the repeat timer on the Wayland display server side is triggered, and after a certain delay, the backspace key is repeated to delete the Hangul. This behavior seems odd to users. This is because the grapheme being composed is deleted at a certain speed, and after all the grapheme is deleted, it stops for a certain delay time and then the Hangul is deleted. Also, it’s because it’s not customary to behave this way. To solve this problem, another problem will arise if the input method forwards the original key press event to the Wayland display server and then the input method virtually creates a key release event and forwards it to the Wayland display server. For example, when the app detects a key press or key release, it won’t work as intended.

To solve the above problems, the Wayland display server needs to provide repeat key events to the input method instead of raw key events. For example, the Wayland protocol needs to provide something like an XKeyEvent to the input method.

There is no way to know where the key event occurred.

When inputting Chinese, Japanese, or Hanja characters, a window to select a candidate from the candidate list is required. This window is called the candidate window. The input method needs to pop up a candidate window in an appropriate location, but there is no way to know where the key event occurred.

Problems with text input protocol

There is no backwards compatibility and the interface is named incorrectly. Originally, an interface is to perform various functionalities through a common name, but by attaching version names such as v1, v2 to the interface name, it is recognized as a separate interface in the place where the interface is used. Therefore, each version must be implemented separately. I will explain it in an easy to understand way. For example, in general thinking, if v2 is implemented in the display server(compositor), it is assumed that applications implementing v1 will be able to communicate with each other when connected. However, in the case of Wayland’s text input protocol, there is no backward compatibility, and since the version name is specified in the interface name, it is not compatible at all.

<interface name="zwp_text_input_v1" version="1">
<interface name="zwp_text_input_v3" version="1">

It’s like this. If the protocol was designed properly,

<interface name="zwp_text_input" version="1">
<interface name="zwp_text_input" version="3">

it should be like this.

These issues are outlined in the protocol description and are still in the unstable stage from 2015 to the present.

Problems with input method protocol v1

Weston is the standard implementation of the Wayland display server(compositor) and the input method v1 protocol is implemented.

Input method protocol v1 has the following protocols. A look at the protocol description reveals:

activate corresponds to create-ic and
deactivate corresponds to destroy-ic and
enter corresponds to focus-in and
leave corresponds to focus-out.

However, as a result of testing with weston-editor in Weston, if I click an entry that allows text input while writing in weston-editor, it gives “deactivate” and “activate” signals instead of “leave” and “enter” signals. The implementation in Weston is different from what the protocol describes. I tried to test other apps to see how they work, but the Wayland ecosystem is very poor, so I haven’t found any other apps that can test v1 input method protocols yet.

Problems with input method protocol v2

1. Incorrectly named protocol names confuse people.

For v1, it is the official Wayland protocol, and
for v2, it is a non-standard protocol of wlroot.

v1 and v2 are not compatible at all. They are two different input method protocols.

2. There is no forward functionality.

Forward means to send the received event back to the sender. The reason forward is needed is to send key events not handled by the input method back to the Wayland display server.

For example, some keys do not need to be converted to other characters by the input method, so if the input method passes an unhandled key event to the Wayland display server, the Wayland display server will handle it appropriately. This practice is customary. However, since v2 does not have the ability to forward key events, it must be forwarded in a separate protocol called zwp_virtual_keyboard_v1.

3. There is no protocol with create-ic and destroy-ic functionalities.

ic stands for input context. If you look at the protocol description, you will see this.

activate
     * Notification that a text input focused on this seat requested
     * the input method to be activated.
deactivate
     * Notification that no focused text input currently needs an
     * active input method on this seat.

From this description, we can see that

activate corresponds to focus-in and
deactivate corresponds to focus-out.

There are focus-in and focus-out functionalities, but there is no protocol for create-ic and destroy-ic functionalities. So it is impossible to maintain separate input contexts for each application. I will explain it in an easy to understand way. After inputting Chinese in App1, clicking App2, inputting Korean in App2, and then clicking App1 again, it should be in Chinese input state, but it is in Korean input state. As another example, when inputting Korean in App3 and clicking another window and then clicking App3 to enter, it should be in the Hangul input state, but it is in the English input state. Also, since Window1 and Window2 share the input context, if you click Window2 while inputting Hangul in Window1, the last character may be output in Window2. As such, the v2 protocol has design flaws.

Conclusion

A lot of people say that X is slow because it’s a communication method, and it’s also outdated, buggy and difficult to maintain. However, from what I know about Wayland, Wayland is also a communication method, and Wayland is weaker than X in terms of functionality, so it is not suitable for desktop use. Although it has been 14 years since it was released, there is little progress in protocol development, and there are a lot of bugs in implementations. Moreover, the performance of the implementation is no different from that of X, the quality is below expectations, it is unstable, and the development maturity is low. Also, Wayland input method v1 and wlroots input method v2 are technically and functionally regressive to XIM that appeared in the 1990s.
Simply put, Wayland in 2022 is more technologically obsolete than X. Many people have been interested in Wayland and have been cheering for it, but Wayland seems to have no hope. How can there be no books published on the subject of Wayland programming in 14 years? If each Linux distribution accelerates the migration to Wayland, users will ditch the Linux desktop. I look forward to seeing a new display server to replace Wayland.
Thank you