Skip to main content

Questions tagged [unicode]

For questions about supporting or implementing Unicode in a programming language, such as identifiers and string implementations. General Unicode questions are off-topic.

10 votes
5 answers
1k views

By "cases" I mean uppercase, lowercase, and titlecase. It seems many languages assumes that there is one-to-one correspondence of uppercase letters and lowercase letters, if the script that ...
Dannyu NDos's user avatar
  • 1,485
20 votes
5 answers
4k views

I've heard of Truffle strings from GraalVM which stores a string in one of multiple encodings, which might be perhaps like what Python (not CPython) uses for representing strings, though searching a ...
Hydroper's user avatar
  • 407
3 votes
1 answer
376 views

Is the mass adoption of Unicode tokens as operators in general-purpose programming languages ​​a good idea? How acceptable is such a language to ordinary users and developers? Background I want to ...
Aster's user avatar
  • 3,508
4 votes
2 answers
264 views

I'm designing a language that I intend to be implemented on the .NET platform. To my understanding, the native string representation uses UTF-16, by storing an array of ...
Karl Knechtel's user avatar
12 votes
4 answers
564 views

In some encodings such as UTF-8, characters are of variable length in bytes. It's a bit like a tagged union, but the exact size could be computed, so the next element in a string could follow ...
user23013's user avatar
  • 3,314
30 votes
6 answers
2k views

Languages developed over the last fifteen years or so have been within the era where Unicode is ubiquitous, and so could design their core string types accordingly. There are a lot of new issues that ...
Michael Homer's user avatar
  • 15.6k
10 votes
3 answers
458 views

As a followup to PLDI's first question, about horizontal whitespace, what vertical whitespace (newlines) should be supported? I know of at least the following: Line feed (...
rydwolf's user avatar
  • 4,870
15 votes
7 answers
733 views

When implementing Unicode identifiers, I'm never sure what characters should be allowed. For example, this is the list of categories: Control: C Letters: ...
rydwolf's user avatar
  • 4,870
6 votes
1 answer
178 views

Occasionally, a new version of Unicode comes out, with new characters. This could introduce, for example, new whitespace characters, which may pose a problem for existing programs, for a number of ...
rydwolf's user avatar
  • 4,870
24 votes
5 answers
2k views

My esolang involves the use of balanced brackets (){}[]<> to denote different scopes. However, unlike practical languages, all pairs of identical brackets ...
bigyihsuan's user avatar
  • 1,861
22 votes
3 answers
867 views

There seems to be three approaches to horizontal whitespace for separating tokens: Only tab and space: Bash, C, D, Dart, Go, Java, Lua, OCaml, Python, PHP, Perl, Ruby, Rust, Scala, Swift (also ...
Adám's user avatar
  • 3,317