Newest 'unicode' Questions - Software Engineering Stack Exchange

9 votes

7 answers

3k views

Are there historical problems with non-ASCII identifier characters in code?

I frequently encounter recommendations to specifically keep to ASCII characters in field and function names in documentation, even though non-ASCII (modern Unicode) generally works perfectly. An ...

Michael Macha

396

asked Jan 29, 2022 at 16:29

6 votes

0 answers

790 views

How to OCR and/or recreate lines of Egyptian Hieroglyphs in Unicode/HTML?

I am wondering how to take these Hieroglyphs and make them into Unicode. I read through the Tesseract docs on how to create training data, but it seems largely tailored toward "traditional" ...

Lance Pollard

2,787

asked Jul 22, 2020 at 16:01

2 votes

3 answers

154 views

What is the name of the type of program to produce Unicode characters from ASCII combinations?

For example, in Vietnamese, there are Unicode characters like "â", "ê", "ô", "ư", v.v. To type them from keyboard, I need to type aa, ee, oo, w, then a program ...

Ooker

335

asked Jul 18, 2020 at 12:39

10 votes

0 answers

268 views

Is there any guideline from Unicode on how to deal with graphemes that have no base character?

A valid sequence of code-points can begin with one or more combining mark, which form a grapheme cluster that has no base glyph. I'm unsure how that should be handled, if at all. For example, consider ...

Wes

872

asked Jun 17, 2020 at 19:08

1 vote

1 answer

87 views

Layout Behavior of Characters (question about unicode standard)

I've been reading Unicode's core specification (see https://www.unicode.org/versions/latest/). I mostly understood what the text was explaining in section 2.1 Architectural Context until it started ...

lonious

121

asked Feb 15, 2020 at 23:45

9 votes

3 answers

737 views

What was the first language to allow Unicode in function names?

People often get excited about JuliaLang supporting Unicode function names. But it's not new at all,it's just that the Julia community decided that it was sometimes appropriate, and built tooling to ...

Frames Catherine White

942

asked Dec 25, 2019 at 0:00

5 votes

1 answer

428 views

UTF-8 questions

When you encode a code point to code units based on UTF-8, then if the code point fits on 7 bits, the most significant bit is set to zero so that it tells you it is a character which is stored on 1 ...

codepersonnel49

69

asked Nov 15, 2019 at 22:03

2 votes

1 answer

443 views

Differentiating Between ASCII and Unicode in File Spec

I am developing against a file spec that lists the data type for certain fields as CHAR(<length>) The spec is for a fixed width flat file. In most cases, possible values to populate the fields ...

mathewb

137

asked Aug 22, 2018 at 17:11

3 votes

4 answers

5k views

How to align on both word size and cache lines in x86

From what it sounds like, a 64 bit processor means aligning to 64 bits, which means if you have unicode utf-8 stored in there, each 8-bit chunk would take up 64 bits of space. That doesn't really make ...

Lance Pollard

2,787

asked Aug 22, 2018 at 16:35

0 votes

2 answers

555 views

How does MS word renders different fonts?

My main goal is described here. How can Microsoft Word or Wordpad or other word editing software render fonts when these fonts seems to not follow the same rules? How do they render characters ...

HKhoshdel

11

asked Aug 8, 2018 at 6:26

0 votes

2 answers

2k views

Why Unicode Encoding/Decoding is Necessary in JavaScript

I am wondering why unicode encoding is necessary in JavaScript. I am looking at utf8.js as an example. I am also looking at the utf8 spec, but am not really following the different pieces of data. ...

Lance Pollard

2,787

asked Jul 23, 2018 at 21:45

0 votes

1 answer

1k views

Java takes 2 bytes to represent character?

In general a character is represented in 1 byte i.e. 8 bits . This is I believe true for all text editors even for databases like oracle. 1 byte can represent 2^8 = 256 Characters. My question is when ...

user3198603

1,896

asked Jul 6, 2018 at 14:31

50 votes

4 answers

46k views

Should UTF-8 CSV files contain a BOM (byte order mark)?

Our line-of-business software allows the user to save certain data as CSV. Since there are a lot of different formats (all called "CSV") in use in the wild, we are tying to decide what the &...

Heinzi

9,868

asked Jun 18, 2018 at 7:36

8 votes

1 answer

4k views

Is the BOM optional for UTF-16 and UTF-32?

I used to think that the BOM is optional for UTF-8, but mandatory for UTF-16 and UTF-32. But then I have read the following (in this article): Let's look just at the ones that Notepad supports. ...

user9002947

249

asked Apr 28, 2018 at 5:11

6 votes

3 answers

3k views

Why does Unicode have separate codepoints for characters with identical glyphs?

(Not entirely sure whether this should go in the information-security StackExchange instead; feel free to move it there if that's where it belongs.) Unicode has many, many instances of pairs or ...

Vikki

179

asked Apr 4, 2018 at 22:32

Stack Exchange Network

Questions tagged [unicode]

Are there historical problems with non-ASCII identifier characters in code?

How to OCR and/or recreate lines of Egyptian Hieroglyphs in Unicode/HTML?

What is the name of the type of program to produce Unicode characters from ASCII combinations?

Is there any guideline from Unicode on how to deal with graphemes that have no base character?

Layout Behavior of Characters (question about unicode standard)

What was the first language to allow Unicode in function names?

UTF-8 questions

Differentiating Between ASCII and Unicode in File Spec

How to align on both word size and cache lines in x86

How does MS word renders different fonts?

Why Unicode Encoding/Decoding is Necessary in JavaScript

Java takes 2 bytes to represent character?

Should UTF-8 CSV files contain a BOM (byte order mark)?

Is the BOM optional for UTF-16 and UTF-32?

Why does Unicode have separate codepoints for characters with identical glyphs?

Hot Network Questions