While there are many threads on the topic, the examples are not working for me.
Consider the following:
df = spark.createDataFrame(sc.parallelize([['1', 'SN4.F01C04-AM428.1_31']]), ["col1", "col2"])
+----+--------------------+
|col1| col2|
+----+--------------------+
| 1|SN4.F01C04-AM428....|
+----+--------------------+
What I tried:
display(df.select(F.split(df.col2, '.', 1).alias('s')))
+--------------------+
| s|
+--------------------+
|[SN4.F01C04-AM428...|
+--------------------+
Expected:
expected = spark.createDataFrame(sc.parallelize([['1', 'SN4', 'F01C04-AM428', '1_31']]), ["col1", "col2", "col3", "col4"])
+----+----+------------+----+
|col1|col2| col3|col4|
+----+----+------------+----+
| 1| SN4|F01C04-AM428|1_31|
+----+----+------------+----+