Writing from right-to-left in a left-to-right digital world

For most beginners’ purposes, specialized DH tools (like Transkribus, Voyant Tools, or any specialized database) are already a major boost to productivity and insight. Going anywhere beyond it, usually means getting some experience in programming. This is quite easy, because we barely need to consider our own needs and can piggyback off of the industry at large. There are lots and lots of resources to learn to program. And there are lots of tools to help you program. Some of these tools are indispensable, such as code editors. I am thinking of the omnipresent Visual Studio Code.

But quickly you will find out that VS Code was designed for structured data, encoded in Latin script. Preferably English. Using right-to-left scripts is going to be a pretty big pain.

In 2016, users already opened up tickets to point out that RTL-support is direly needed. But nothing has been done about it. Admittedly, this early ticket is a bit vague in its purpose. So what, exactly, needs to be done, then? But instead of opening up a conversation about this, other tickets have been relegated as ‘duplicates’ to this early one. The first ticket is too vague to be actionable, later tickets have been ignored as duplicates. The result is that nothing is done about this.

Among the simplest of requests is to just have fully RTL-lines, lined out to the right.

A very annoying and more complicated issue is punctuation. Periods are rendered at the wrong end of sentences, and brackets show up wrong and additionally mess up word order This is well explained in this issue.

A further complication is moving beyond punctuation and fully mixing RTL-scripts with LTR-scripts, for example including an English word inside an Arabic or Persian sentence. This is described here. Following the conversation one can see how easily it is dismissed as a ‘duplicate’ and therefore not worth looking into by the team of developers.

A related issue is once you have a line with mixed RTL and LTR text, the cursor placement becomes unpredictable which is especially frustrating if you wish to paste new text.

Meanwhile, the issuant of a new ticket, babakfp, has outlined in greater detail what the issue is and what a good solution to it would be. A key issue is the following:

This is how VS Code, wrongly, displays the input

wrong rendering – image courtesy of babakfp, https://babakfp.ir

This is how it should look like

proposed correct rendering – image courtesy of babakfp, https://babakfp.ir

Notice that this is an issue of rendering within the code editor. It is not an issue of typing this out in the wrong order, nor an issue of not supporting Arabic or encoding of the text. The user would have typed out:

sin lam alif mim space b a b a k f p space kha waw beh yeh question-mark 

With such irregular behaviour, the manipulation of large quantities of texts becomes nearly impossible, at the very least frustrating.

Nathan Gibson has worked on this issue a few year ago, getting a sense for how big the problem is (conclusion: it’s big as it exists in many frequently used tools). OpenITI has done its best to circumvent the problem by implementing their own markup system (mARkdown) which allows for all kinds of tagging without interfering with the RTL/LTR word order. For TEI I have seen people separate the English-based TEI tags from the RTL-text by placing them on separate lines, which foregoes issues in the editor while the output remains the same, similar to how the above example separates the tags in between square-bracket from the line of text. There is definitely an uptick in serious scholarly interest in the issue, both from historians and DH-specialists, but actual development seems slow (or I have missed it).

The issue of right-to-left scripts trying to exist in a left-to-right digital world is multifaceted. This article is merely a very basic introduction to the issue, in which I have only used Arabic and Persian as an example. I merely wanted to show that even your first steps outside of DH-tools and trying to develop your own solutions will confront you with fundamental issues. There are issues with the digital world being thoroughly anglophone and latin-script-based. It is regrettable that no progress has been made on the particular issue of support within Visual Studio Code. Relatively few people have stood up for this issue, while undoubtedly a very large group of users are experiencing the pains of it. And even in these fairly small discussions people are reprimanded for writing in their own, native, RTL language and script. Perhaps there is a task for us as a scholarly community to give weight and support to initiatives to get better support for right-to-left scripts in a tool such as VS Code. And for using such RTL-scripts in general.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s