Questions tagged [unicode]
For questions about supporting or implementing Unicode in a programming language, such as identifiers and string implementations. General Unicode questions are off-topic.
11 questions
10
votes
5
answers
1k
views
What are the considerations when modifying cases in a text?
By "cases" I mean uppercase, lowercase, and titlecase.
It seems many languages assumes that there is one-to-one correspondence of uppercase letters and lowercase letters, if the script that ...
20
votes
5
answers
4k
views
UTF-32 worth it for the string data type?
I've heard of Truffle strings from GraalVM which stores a string in one of multiple encodings, which might be perhaps like what Python (not CPython) uses for representing strings, though searching a ...
3
votes
1
answer
376
views
What are the tradeoffs around supporting Unicode keywords and tokens?
Is the mass adoption of Unicode tokens as operators in general-purpose programming languages a good idea? How acceptable is such a language to ordinary users and developers?
Background
I want to ...
4
votes
2
answers
264
views
Supporting reasonably efficient high-level indexing for strings
I'm designing a language that I intend to be implemented on the .NET platform. To my understanding, the native string representation uses UTF-16, by storing an array of ...
12
votes
4
answers
564
views
Prior art on modeling characters of variable lengths
In some encodings such as UTF-8, characters are of variable length in bytes. It's a bit like a tagged union, but the exact size could be computed, so the next element in a string could follow ...
30
votes
6
answers
2k
views
How have modern language designs dealt with Unicode strings?
Languages developed over the last fifteen years or so have been within the era where Unicode is ubiquitous, and so could design their core string types accordingly. There are a lot of new issues that ...
10
votes
3
answers
458
views
What vertical whitespace should be supported?
As a followup to PLDI's first question, about horizontal whitespace, what vertical whitespace (newlines) should be supported? I know of at least the following:
Line feed (...
15
votes
7
answers
733
views
What Unicode character categories should be allowed in identifiers?
When implementing Unicode identifiers, I'm never sure what characters should be allowed. For example, this is the list of categories:
Control: C
Letters: ...
6
votes
1
answer
178
views
How should programming languages handle updates to Unicode?
Occasionally, a new version of Unicode comes out, with new characters. This could introduce, for example, new whitespace characters, which may pose a problem for existing programs, for a number of ...
24
votes
5
answers
2k
views
What options in Unicode are there for balanced pairs of brackets like `(){}[]<>`?
My esolang involves the use of balanced brackets (){}[]<> to denote different scopes. However, unlike practical languages, all pairs of identical brackets ...
22
votes
3
answers
867
views
Which horizontal whitespace should be supported?
There seems to be three approaches to horizontal whitespace for separating tokens:
Only tab and space: Bash, C, D, Dart, Go, Java, Lua, OCaml, Python, PHP, Perl, Ruby, Rust, Scala, Swift (also ...