0

I am very new to spark, trying to find max value from array of string but getting errors. Tried couple of things like creating dataframe/split/using lit but facing further errors. Can anyone please help me.

from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.sql.functions import max
from pyspark.sql.types import StructType, StructField, StringType,IntegerType,TimestampType,ArrayType
from datetime import datetime

new_array: list = ['00', '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '17', '18', '19', '20', '22']

df = max(new_array) #error in this line
df.show()
df.printSchema()

Error :

Invalid argument, not a string or column: ['00', '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '17', '18', '19', '20', '22'] of type . For column literals, use 'lit', 'array', 'struct' or 'create_map' function.

Thanks a lot in Advance

1 Answer 1

0

Just wanted to share update, I am able to get desired results from below code.

from pyspark.sql import functions as F

new_array: list = ['00', '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '17', '18', '19', '20', '22']

df = spark.createDataFrame([(i,) for i in new_array], ["new_array"])
df.select(F.max(df.new_array)).show()
max_val = df.select(F.max(df.new_array).alias("maxval")).first().__getitem__('maxval')
print(max_val)

Output :

+--------------+
|max(new_array)|
+--------------+
|            22|
+--------------+

22

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.