When I'm reading some json payloads PySpark is changing the data even if I read it as a StringType and I want this as a String because I don't want to have each field as a column at this step. I just want to get this payload as String as it is in payload/source file
Locally I'm using Spark 3.3 in Jupiter Notebook with Glue 4 image PySpark version: 3.3.0+amzn.1.dev0
Here my payload/source (test.txt):
{"payload":{"points":1220000000}}
{"payload":{"count":1550554545.0}}
{"payload":{"points":125888002540.0, "count":1550554545.0}}
{"payload":{"name": "Roger", "count":55154111.0}}
Here my code:
path = "/home/glue_user/workspace/jupyter_workspace/test/test.txt"
schema = StructType([StructField('payload', StringType(), True)])
my_df = spark.read.schema(schema).option("inferSchema", "false").json(path)
my_df.show(truncate=False)
Here the result where PySpark is setting the float number in scientific notation, even when I read it as String.
+------------------------------------------------+
|payload |
+------------------------------------------------+
|{"points":1220000000} |
|{"count":1.550554545E9} |
|{"points":1.2588800254E11,"count":1.550554545E9}|
|{"name":"Roger","count":5.5154111E7} |
+------------------------------------------------+
Why I can't simply have my data as it is? Why the final result is changed into my string field and receive this scientific notation? i.e:
"count":1550554545.0
"count":1.550554545E9