1

I have a dataset of word in a array like:

arr: Array[org.apache.spark.sql.Row] = Array([conversionevents], [elements], [pageviews], [productviews], [registrations], [sitepromotionclicks])

when I map this word I get:

val v = arr.map(x => println(x.toString))

[conversionevents]
[elements]
[pageviews]
[productviews]
[registrations]
[sitepromotionclicks]

and I would to convert this data to string to avoid the Square brackets []

3
  • The brackets should come from Row.toString. You could extract the first element of each row to get the string out of it. I'm not sure what you plan for rows with multiple entries. Commented Jul 27, 2018 at 9:24
  • @lambda.xy.x could you be more explicite? Commented Jul 27, 2018 at 9:26
  • I cannot test the code because I don't have spark installed, but you have an array of rows. If you print arr(0) the brackets will be included even for a single element. That means you need to look at the documentation of Row. On the top of that page is also an example how to use the Row interface with Scala. Commented Jul 27, 2018 at 9:49

1 Answer 1

3

As mentioned in the question the data is of Array[org.apache.spark.sql.Row] with only one element in each Row. so the simplest solution would be

scala> arr.map(x => x(0))
//res1: Array[Any] = Array(conversionevents, elements, pageviews, productviews)

I would to convert this data to string to avoid the Square brackets []

scala> arr.map(x => x(0).toString)
//res2: Array[String] = Array(conversionevents, elements, pageviews, productviews)

But if you have data as

//arr: Array[org.apache.spark.sql.Row] = Array([conversionevents,test1], [elements], [pageviews,test21,test22], [productviews])

above solution would reject rest of the values as

val result = arr.map(x => x(0))
//result: Array[Any] = Array(conversionevents, elements, pageviews, productviews)

the final solution is to use flatMap and toSeq as

val result = arr.flatMap(x => x.toSeq)
//result: Array[Any] = Array(conversionevents, test1, elements, pageviews, test21, test22, productviews)

and of course if you want them in String you can always use toString as

val result = arr.flatMap(x => x.toSeq.map(_.toString))
//result: Array[String] = Array(conversionevents, test1, elements, pageviews, test21, test22, productviews)

I hope the answer is helpful

Sign up to request clarification or add additional context in comments.

5 Comments

I'd skip the .toString during the map - it is better to have the exact class around, not only a string representation (e.g. 1 != "1" but 1.toString == "1".toString -- after the conversion you don't see the difference anymore ). When the element is printed, toString is called automatically anyway.
@lambda.xy.x thanks for the comment. toString is because OP wants output in string.
what is the different if I use getString(0) instead of toString ?
@hisi yes you can use that definitely
getString(0) is the better choice - you should get an exception if the column is not a String. You should also check if the row is non-empty before you call getString(0) / row(0).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.