0

How to properly encode strings in java? I'm trying to encode the letter Ü in utf-8 and I'm getting garbage results - d093d19a instead of C39C. What could be the problem?

package org.example;

import java.io.*;
import java.nio.charset.StandardCharsets;
import org.apache.commons.codec.DecoderException;
import org.apache.commons.codec.binary.Hex;

public class Main {
    public static String convertStringToHex(String str) {
        char[] chars = Hex.encodeHex(str.getBytes(StandardCharsets.UTF_8));

        return String.valueOf(chars);
    }

    public static String convertHexToString(String hex) {

        String result = "";
        try {
            byte[] bytes = Hex.decodeHex(hex);
            result = new String(bytes, StandardCharsets.UTF_8);
        } catch (DecoderException e) {
            throw new IllegalArgumentException("Invalid Hex format!");
        }
        return result;
    }

    public static void main(String[] args) throws IOException {
        System.out.println("encoder Ü: " + convertStringToHex("Ü"));
        System.out.println("decoder Ü error: " + convertHexToString("d093d19a"));
        System.out.println("decoder Ü: " + convertHexToString("C39C"));
    }
}
Output:
encoder Ü: d093d19a
decoder Ü error: Ü
decoder Ü: ?

I work in windows. I know that windows uses its own encoding, can this affect the final result?

7
  • edit the question to add the output formatted as code block. Commented Sep 16, 2024 at 12:02
  • 2
    For capital U with diaresis (umlaut) try \u00DC instead of Ü in the java code. It might be that the editor uses an other encoding than the compiler (error). Commented Sep 16, 2024 at 12:34
  • 1
    Also since java 17 HexFormat exists, see for several tedchniques Baeldung's baeldung.com/java-byte-arrays-hex-strings Commented Sep 16, 2024 at 12:41
  • Try it in Powershell, first doing [Console]::InputEncoding = [Console]::OutputEncoding = New-Object System.Text.UTF8Encoding Commented Sep 16, 2024 at 13:09
  • You have a non-UTF8 encoding in your editor and in your output display. Your convertHexToString("C39C") definitely creates the string "\u00DC"` irrespective of your Java default charset. So, since you can't display "Ü", you're not using the same encoding in your output window as Java is using to output it. Commented Sep 16, 2024 at 14:29

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.