5

how to write CSV File in UTF-8 via Apache CSV?

I am trying generate csv by following code where Files.newBufferedWriter() encode text into UTF-8 by default, but when I open generated text in excel there are senseless characters.

I create CSVPrinter like this:

CSVPrinter csvPrinter = new CSVPrinter(Files.newBufferedWriter(Paths.get(filePath)), CSVFormat.EXCEL);

next I set headers

csvPrinter.printRecord(headers);

and next in loop I print values into writer like this

csvPrinter.printRecord("value1", "valu2", ...);

I also tried upload file into online CSV lint validator and it tells that I am using ASCII-8BIT instead of UTF-8. What I did wrong?

15
  • 1
    ASCII characters are encoded the same way in UTF8 as they are encoded in ASCII. Your code only uses ASCII characters, so there's no way to distinguish between ASCII and UTF8 when looking at the file. Commented Jul 19, 2019 at 12:27
  • instead of CSVFormat.EXCEL try using CSVFormat.RFC4180 Commented Jul 19, 2019 at 12:28
  • @Deadpool doesn't help :/ Commented Jul 19, 2019 at 12:32
  • something like this CSVPrinter printer = new CSVPrinter(new PrintWriter("nlp.csv", "UTF-8"), CSVFormat.EXCEL.withDelimiter("|".charAt(0))); @DenisStephanov Commented Jul 19, 2019 at 12:33
  • @Deadpool still not works Commented Jul 19, 2019 at 12:39

1 Answer 1

20

Microsoft software tends to assume windows-12* or UTF-16LE charsets, unless the content starts with a byte order mark which the software will use to identify the charset. Try adding a byte order mark at the start of your file:

try (BufferedWriter writer = Files.newBufferedWriter(Paths.get(filePath))) {

    writer.write('\ufeff');

    CSVPrinter csvPrinter = new CSVPrinter(writer);

    //...
}
Sign up to request clarification or add additional context in comments.

3 Comments

This may also be done as a header CSVFormat.EXCEL.withHeader('\ufeff' + "Name", "Age") so we can have CSVPrinter as part of the try.
Is this solution still works in ubuntu with the byte order mark, any idea ?
@RezguiBahaEddinne This will work on any system. UTF-8 is universal. However, reading the file in Ubuntu will depend on the tools you use. In my experience, many editors are smart enough to recognize a BOM, but text processing tools often are not.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.