1

I'm working on a Kotlin project where I need to take some incoming data and output a csv formatted string. I decided to try using the apache-commons csv library to handle building the csv output. While it mostly works, there is some strange behavior in the way it handles double-quotes in the incoming data. If I understand the rules for csv formatting correctly, a string containing a double quote should get escaped with an additional double quote. So if my input string is "foo, the output string should be ""foo. However, when I build my output string using CSVFormat, I instead get the output """foo.

Why am I getting an extra double quote in the output? Is this a bug or expected behavior? Is there any way to work around this without completely disabling quoting?

For reference, my function responsible for creating the scv string looks like:

fun Rosters.toCsvString(includeHeader: Boolean): String {

    val writer = StringWriter()
    val builder = CSVFormat.DEFAULT.builder().setQuoteMode(QuoteMode.MINIMAL)
    if(includeHeader){
        builder.setHeader(Rosters.Header::class.java)
    }
    val printer = CSVPrinter(writer, builder.get())
    return writer.use {
        printer.printRecords(this.rosters.map { it.values() })
        writer.toString()
    }

}

If I run a test against that function:

    @Test
    fun `Test convertRosterToCsvWithSpecialChars`()
    {
        val expected = """
EPPN,FirstName,LastName,Email,ClassStanding,StudentID,Degree,College,Department,SuitableRole
""foo014"",Foo,Bar,[email protected],,9200#8210,,,,ROLE_TENANT_ADVISOR,

""".trimMargin()

        val worker1 = Roster("\"foo014\"","Foo","Bar","[email protected]",
                    studentId = "9200#8210", suitableRole = "ROLE_TENANT_ADVISOR")

        val incomingRosters = Rosters(listOf( worker1))
        val actual = incomingRosters.toCsvString(true)
        assertThat(actual).isEqualToNormalizingNewlines(expected)
    }

I get the error:

Expecting actual:
  

"EPPN,FirstName,LastName,Email,ClassStanding,StudentID,Degree,College,Department,SuitableRole
"""foo014""",Foo,Bar,[email protected],,9200#8210,,,,ROLE_TENANT_ADVISOR
"
to be equal to:
  "EPPN,FirstName,LastName,Email,ClassStanding,StudentID,Degree,College,Department,SuitableRole
""foo014"",Foo,Bar,[email protected],,9200#8210,,,,ROLE_TENANT_ADVISOR,
"
when ignoring newline differences ('\r\n' == '\n')
2
  • btw you don't need .trimMargin(). Commented Jun 9 at 17:36
  • Should """foo really be """foo"? Commented Jun 10 at 0:07

1 Answer 1

2

In most CSV dialects, inside a quoted string a quote needs to be escaped using two quotes. You're using CSVFormat.DEFAULT which is based on RFC 4180. RFC 4180 section 2.7 states:

If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.

So the problem is not in your production code, nor in the library, but in your expected value.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.