How to find charset name by byte codes?

Question

I have a printer and SDK to work with it in Java. Printer working well with english letters and digits but doesn't print correctly special symbols like 'ä' or 'ê'.

I suppose that I need to convert string in charset used by printer. But I don't know what is the charset and has no chanse to get it now.

I run test printing with this simple code:

for (int i = 0; i < 256; i++) {
     byte[] a = new byte[1];
     a[0] = (byte) i;
     printer.print((i+" ").getBytes());
     printer.print(a);
     printer.newLine();
}

And now I know that 'ä' has code 132 and 'ê' has code 136.

How to find charset name, when I know which code corresponding to symbol?

"I don't know what is the charset [of the printer]". Create a list of candidate charsets and then print a test paragraph containing accented characters using each of the candidate charsets. Then observe the printed output to determine which of the candidates is most likely to be correct. — Brandin
– Brandin, Commented Dec 18, 2016 at 22:31

score 1 · Accepted Answer · 2016-12-18 13:02:57Z

Characters are one of the most confusing things in computer science. This and the reason that computer science was mostly starting in English speaking countries which adopted the ASCII (American Standard Code for Information Interchange) are responsible for the miserable existence characters have to live in a computer. Chinese and emojis were not in question that time. So while Byte == Letter still is taken synonymous and treated as such by most programming languages and programmers, actually they are two different concepts. Apple had introduced the concept of glyphs in its APIs since a long time, but people are still lazy. And that will certainly not change in the near future. Unicode tried to remedy the lack of varieties in glyphs you can represent in a "character string". So they invented the multi-byte character and opened a can of worms. Instead of making things easier, the situation got worse. People and programming languages mix it all up and everyone who ever dealt with code conversion starts whining along with all those poor creatures.

How could you find out the encoding if you don't know? Well, make a statistical analysis of the text. Decide if you got Unicode or not. If not, make a good guess as to which language the original text is made of (how could you do that). Then fix it with that code page.

TL;DR You can't. You need to be supplied with the right information.

Multibyte character sets weren't "invented" by Unicode. They existed long before, e.g. in China and Japan. — Jörg W Mittag
– Jörg W Mittag, Commented Dec 18, 2016 at 22:11
@JörgWMittag Maybe (I'm no historian), but they only got popular with Unicode. We can't claim China/Japan to be responsible for the "String-disaster". — user188153
– user188153, Commented Dec 19, 2016 at 5:44

BArtWell · Accepted Answer · 2016-12-18 14:36:01Z

0

First I get all supported charsets and filter charsets by one of symbols which printer prints correctly (english 'A' with code 65):

SortedMap<String, Charset> charsets = Charset.availableCharsets();
for (SortedMap.Entry<String, Charset> entry : charsets.entrySet()) {
    try {
        String symbol = "A";
        byte[] bytes = symbol.getBytes(entry.getKey());
        if (((int) bytes[0]) == 65) {
            print(entry.getKey());
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}

I get short list with suitable charsets. Then I filter it by problem symbols and find necessary charset.

answered Dec 18, 2016 at 14:36

BArtWell

1072 bronze badges

Uhm. What's that? A solution to your problem?

user188153
– user188153

2016-12-18 16:18:45 +00:00
Commented Dec 18, 2016 at 16:18
@ThomasKilian yes, I found needed charset by this way.

BArtWell
– BArtWell

2016-12-18 16:26:45 +00:00
Commented Dec 18, 2016 at 16:26
Good luck with that.

user188153
– user188153

2016-12-18 16:37:48 +00:00
Commented Dec 18, 2016 at 16:37

Add a comment |

Stack Exchange Network

How to find charset name by byte codes?

2 Answers 2

Hot Network Questions

How to find charset name by byte codes?

2 Answers 2

Related

Hot Network Questions