Skip to main content

Questions tagged [unicode]

Unicode is intended to be a universal character set for describing all the characters required for written text incorporating all writing systems, technical symbols and punctuation.

9 votes
7 answers
3k views

I frequently encounter recommendations to specifically keep to ASCII characters in field and function names in documentation, even though non-ASCII (modern Unicode) generally works perfectly. An ...
Michael Macha's user avatar
6 votes
0 answers
790 views

I am wondering how to take these Hieroglyphs and make them into Unicode. I read through the Tesseract docs on how to create training data, but it seems largely tailored toward "traditional" ...
Lance Pollard's user avatar
2 votes
3 answers
154 views

For example, in Vietnamese, there are Unicode characters like "â", "ê", "ô", "ư", v.v. To type them from keyboard, I need to type aa, ee, oo, w, then a program ...
Ooker's user avatar
  • 335
10 votes
0 answers
268 views

A valid sequence of code-points can begin with one or more combining mark, which form a grapheme cluster that has no base glyph. I'm unsure how that should be handled, if at all. For example, consider ...
Wes's user avatar
  • 872
1 vote
1 answer
87 views

I've been reading Unicode's core specification (see https://www.unicode.org/versions/latest/). I mostly understood what the text was explaining in section 2.1 Architectural Context until it started ...
lonious's user avatar
  • 121
9 votes
3 answers
737 views

People often get excited about JuliaLang supporting Unicode function names. But it's not new at all,it's just that the Julia community decided that it was sometimes appropriate, and built tooling to ...
Frames Catherine White's user avatar
5 votes
1 answer
428 views

When you encode a code point to code units based on UTF-8, then if the code point fits on 7 bits, the most significant bit is set to zero so that it tells you it is a character which is stored on 1 ...
codepersonnel49's user avatar
2 votes
1 answer
443 views

I am developing against a file spec that lists the data type for certain fields as CHAR(<length>) The spec is for a fixed width flat file. In most cases, possible values to populate the fields ...
mathewb's user avatar
  • 137
3 votes
4 answers
5k views

From what it sounds like, a 64 bit processor means aligning to 64 bits, which means if you have unicode utf-8 stored in there, each 8-bit chunk would take up 64 bits of space. That doesn't really make ...
Lance Pollard's user avatar
0 votes
2 answers
555 views

My main goal is described here. How can Microsoft Word or Wordpad or other word editing software render fonts when these fonts seems to not follow the same rules? How do they render characters ...
HKhoshdel's user avatar
0 votes
2 answers
2k views

I am wondering why unicode encoding is necessary in JavaScript. I am looking at utf8.js as an example. I am also looking at the utf8 spec, but am not really following the different pieces of data. ...
Lance Pollard's user avatar
0 votes
1 answer
1k views

In general a character is represented in 1 byte i.e. 8 bits . This is I believe true for all text editors even for databases like oracle. 1 byte can represent 2^8 = 256 Characters. My question is when ...
user3198603's user avatar
  • 1,896
50 votes
4 answers
46k views

Our line-of-business software allows the user to save certain data as CSV. Since there are a lot of different formats (all called "CSV") in use in the wild, we are tying to decide what the &...
Heinzi's user avatar
  • 9,868
8 votes
1 answer
4k views

I used to think that the BOM is optional for UTF-8, but mandatory for UTF-16 and UTF-32. But then I have read the following (in this article): Let's look just at the ones that Notepad supports. ...
user9002947's user avatar
6 votes
3 answers
3k views

(Not entirely sure whether this should go in the information-security StackExchange instead; feel free to move it there if that's where it belongs.) Unicode has many, many instances of pairs or ...
Vikki's user avatar
  • 179

15 30 50 per page
1
2 3 4 5