Different possibilities to extract weekday from timestamp:
Number
0 = Monday, 1 = Tuesday, ..., 6 = Sunday:
Spark 3.5+ F.weekday('col_name')
Spark 2.4+ F.expr("weekday(col_name)")
1 = Sunday, 2 = Monday, ..., 7 = Saturday:
Spark 2.3+ F.dayofweek('col_name')
1 = Monday, 2 = Tuesday, ..., 7 = Sunday:
Spark 3.5+ F.extract(F.lit('dow_iso'), 'col_name')
Spark 3.0+ F.expr("extract('dow_iso', col_name)")
Aligned day-of-week within a month:
Spark 3.0+ F.date_format('col_name', 'F')
Abbreviation (Mon, Tue, ...)
Spark 4.0+ F.dayname('col_name')
Spark 1.5+ F.date_format('col_name', 'E')
Full name (Monday, Tuesday, ...)
Spark 1.5+ F.date_format('col_name', 'EEEE')
Other locales (e.g., zh = Chinese)
F.to_csv(F.struct(F.to_date('col_name')), {'dateFormat': 'E', 'locale': 'zh'})
General example
from pyspark.sql import functions as F
df = spark.range(1).selectExpr("timestamp'2018-12-31' col_name")
df = df.withColumns({
'v1': F.weekday('col_name'),
'v2': F.dayofweek('col_name'),
'v3': F.extract(F.lit('dow_iso'), 'col_name'),
'v4': F.date_format('col_name', 'F'),
'v5': F.dayname('col_name'),
'v6': F.date_format('col_name', 'E'),
'v7': F.date_format('col_name', 'EEEE'),
'v8': F.to_csv(F.struct(F.to_date('col_name')), {'dateFormat': 'E', 'locale': 'zh'}),
})
df.show()
# +-------------------+---+---+---+---+---+---+------+----+
# | col_name| v1| v2| v3| v4| v5| v6| v7| v8|
# +-------------------+---+---+---+---+---+---+------+----+
# |2018-12-31 00:00:00| 0| 2| 1| 3|Mon|Mon|Monday|周一|
# +-------------------+---+---+---+---+---+---+------+----+
The answer
Since the OP has a string (not a timestamp), as an intermediary step, just for the OP, we add a couple of lines in order to return a timestamp:
ts_string = F.regexp_extract('value', r'^.*\[(\d\d/\w{3}/\d{4}:\d{2}:\d{2}:\d{2}) -\d{4}]', 1)
ts_parsed = F.to_timestamp(ts_string, 'dd/MMM/yyyy:HH:mm:ss')
Using the timestamp, we can choose the most suited function.
from pyspark.sql import functions as F
df = spark.createDataFrame([('[01/Jul/1995:00:00:01 -0400]',)], ['value'])
ts_string = F.regexp_extract('value', r'^.*\[(\d\d/\w{3}/\d{4}:\d{2}:\d{2}:\d{2}) -\d{4}]', 1)
ts_parsed = F.to_timestamp(ts_string, 'dd/MMM/yyyy:HH:mm:ss')
df = df.withColumns({
'v1': F.weekday(ts_parsed),
'v2': F.dayofweek(ts_parsed),
'v3': F.extract(F.lit('dow_iso'), ts_parsed),
'v4': F.date_format(ts_parsed, 'F'),
'v5': F.dayname(ts_parsed),
'v6': F.date_format(ts_parsed, 'E'),
'v7': F.date_format(ts_parsed, 'EEEE'),
'v8': F.to_csv(F.struct(F.to_date(ts_parsed)), {'dateFormat': 'E', 'locale': 'zh'}),
})
df.show()
# +--------------------+---+---+---+---+---+---+--------+----+
# | value| v1| v2| v3| v4| v5| v6| v7| v8|
# +--------------------+---+---+---+---+---+---+--------+----+
# |[01/Jul/1995:00:0...| 5| 7| 6| 1|Sat|Sat|Saturday|周六|
# +--------------------+---+---+---+---+---+---+--------+----+
datetimemodule and parse out thetimestampcolumn.datetime.strptimefor