Questions tagged [utf-8]
For questions about the character encoding for Unicode.
21 questions
-1
votes
1
answer
550
views
Should a Java project use UTF-16? [closed]
Java, by default, uses UTF-16 to represent characters in the String data type.
I inherited a JavaFX project which currently has some Strings in UTF-8 and others in UTF-16. This is causing bugs (in pop-...
5
votes
1
answer
428
views
UTF-8 questions
When you encode a code point to code units based on UTF-8, then if the code point fits on 7 bits, the most significant bit is set to zero so that it tells you it is a character which is stored on 1 ...
4
votes
3
answers
2k
views
Are international UTF-8 e-mail addresses a thing or not?
RFC6530 defines the necessary steps for "international e-mail" (i.e., especially for UTF-8 e-mail addresses). Apparently Google adopted the RFC back in 2014 (source). Still, most validators ...
0
votes
2
answers
2k
views
Why Unicode Encoding/Decoding is Necessary in JavaScript
I am wondering why unicode encoding is necessary in JavaScript. I am looking at utf8.js as an example. I am also looking at the utf8 spec, but am not really following the different pieces of data. ...
8
votes
1
answer
615
views
Do C++'s iterator categories forbid writing a UTF-8 iterator adapter?
I've been working on a UTF-8 iterator adapter. By which, I mean an adapter that turns an iterator to a char or unsigned char sequence into an iterator to a char32_t sequence. My work here was inspired ...
2
votes
2
answers
900
views
Is there a good single byte delimeter for use with utf-8 strings that isn't a null terminator?
I'm looking for a quick way to split strings containing individual JSON payloads. Currently, I'm using newlines and searching for the newline ASCII character, but I figure if I start using utf-8 this ...
89
votes
5
answers
10k
views
Would UTF-8 be able to support the inclusion of a vast alien language with millions of new characters?
In the event an alien invasion occurred and we were forced to support their languages in all of our existing computer systems, is UTF-8 designed in a way to allow for their possibly vast amount of ...
-2
votes
2
answers
351
views
Does UTF-16 have some kind of separator in it?
From the UTF-16's wikipedia entry, the second sentence states it's a variable length encoding.
But where is the separator between a 16-bit character and 32-bit encoding? I know a lot of characters ...
70
votes
6
answers
49k
views
Should Latin-1 be used over UTF-8 when it comes to database configuration?
We are using MySQL at the company I work for, and we build both client-facing and internal applications using Ruby on Rails.
When I started working here, I ran into a problem what I had never ...
2
votes
2
answers
1k
views
How to detect client character encoding?
I programmed a telnet server using C as programming language but I have a problem to send characters with emphases (é, è, à ...). The character encoding is different between the telnet clients (...
21
votes
4
answers
4k
views
Why does UTF-8 waste several bits in its encoding
According to the Wikipedia article, UTF-8 has this format:
First code Last code Bytes Byte 1 Byte 2 Byte 3 Byte 4
point point Used
U+0000 U+007F 1 0xxxxxxx
U+0080 U+...
2
votes
2
answers
6k
views
When is it beneficial to not use utf-8? [duplicate]
When is it beneficial to use encodings other than UTF-8? Aside from dealing with pre-unicode documents, that is. And more importantly, why isn't UTF-8 the default in most languages? That is, why do I ...
1
vote
1
answer
3k
views
UTF-16 Pitfalls, Chinese
I'm going to be writing an application that is pure HTML5 and JS and MVC.net back-end.
We have .resx files that are getting compiled to .js files for resources in the html5 application. The ...
171
votes
6
answers
856k
views
How to detect the encoding of a file?
I have some SQL script files on Windows 7.
When opened with Notepad++, in the "Encoding" menu some of them are reported to have an encoding of "UCS-2 Little Endian" and some of &...
585
votes
1
answer
76k
views
Is the use of "utf8=✓" preferable to "utf8=true"?
I have recently seen a few URIs containing the query parameter "utf8=✓". My first impression (after thinking "mmm, looks cool") was that this could be used to detect a broken character encoding.
So, ...