Skip to main content

Questions tagged [utf-8]

For questions about the character encoding for Unicode.

-1 votes
1 answer
550 views

Java, by default, uses UTF-16 to represent characters in the String data type. I inherited a JavaFX project which currently has some Strings in UTF-8 and others in UTF-16. This is causing bugs (in pop-...
chilliefiber's user avatar
5 votes
1 answer
428 views

When you encode a code point to code units based on UTF-8, then if the code point fits on 7 bits, the most significant bit is set to zero so that it tells you it is a character which is stored on 1 ...
codepersonnel49's user avatar
4 votes
3 answers
2k views

RFC6530 defines the necessary steps for "international e-mail" (i.e., especially for UTF-8 e-mail addresses). Apparently Google adopted the RFC back in 2014 (source). Still, most validators ...
D.R.'s user avatar
  • 241
0 votes
2 answers
2k views

I am wondering why unicode encoding is necessary in JavaScript. I am looking at utf8.js as an example. I am also looking at the utf8 spec, but am not really following the different pieces of data. ...
Lance Pollard's user avatar
8 votes
1 answer
615 views

I've been working on a UTF-8 iterator adapter. By which, I mean an adapter that turns an iterator to a char or unsigned char sequence into an iterator to a char32_t sequence. My work here was inspired ...
Nicol Bolas's user avatar
  • 12.1k
2 votes
2 answers
900 views

I'm looking for a quick way to split strings containing individual JSON payloads. Currently, I'm using newlines and searching for the newline ASCII character, but I figure if I start using utf-8 this ...
Mikey A. Leonetti's user avatar
89 votes
5 answers
10k views

In the event an alien invasion occurred and we were forced to support their languages in all of our existing computer systems, is UTF-8 designed in a way to allow for their possibly vast amount of ...
Qix - MONICA WAS MISTREATED's user avatar
-2 votes
2 answers
351 views

From the UTF-16's wikipedia entry, the second sentence states it's a variable length encoding. But where is the separator between a 16-bit character and 32-bit encoding? I know a lot of characters ...
Arrow's user avatar
  • 127
70 votes
6 answers
49k views

We are using MySQL at the company I work for, and we build both client-facing and internal applications using Ruby on Rails. When I started working here, I ran into a problem what I had never ...
Ten Bitcomb's user avatar
  • 1,174
2 votes
2 answers
1k views

I programmed a telnet server using C as programming language but I have a problem to send characters with emphases (é, è, à ...). The character encoding is different between the telnet clients (...
ipStack's user avatar
  • 121
21 votes
4 answers
4k views

According to the Wikipedia article, UTF-8 has this format: First code Last code Bytes Byte 1 Byte 2 Byte 3 Byte 4 point point Used U+0000 U+007F 1 0xxxxxxx U+0080 U+...
qbt937's user avatar
  • 321
2 votes
2 answers
6k views

When is it beneficial to use encodings other than UTF-8? Aside from dealing with pre-unicode documents, that is. And more importantly, why isn't UTF-8 the default in most languages? That is, why do I ...
Electric Coffee's user avatar
1 vote
1 answer
3k views

I'm going to be writing an application that is pure HTML5 and JS and MVC.net back-end. We have .resx files that are getting compiled to .js files for resources in the html5 application. The ...
maxfridbe's user avatar
  • 351
171 votes
6 answers
856k views

I have some SQL script files on Windows 7. When opened with Notepad++, in the "Encoding" menu some of them are reported to have an encoding of "UCS-2 Little Endian" and some of &...
Marcel's user avatar
  • 3,172
585 votes
1 answer
76k views

I have recently seen a few URIs containing the query parameter "utf8=✓". My first impression (after thinking "mmm, looks cool") was that this could be used to detect a broken character encoding. So, ...
Gary's user avatar
  • 24.4k

15 30 50 per page