Newest 'pyspark+apache-spark-sql+dataframe' Questions

0 votes

1 answer

79 views

Monotonically increasing id order

The spec of monotonically order id monotonically_increasing_id says The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. So I assume there is some ordering ...

BelowZero

1,393

asked Feb 20 at 14:49

-1 votes

1 answer

46 views

How to use LIMIT ALL with DataFrame

When using Spark SQL I can use LIMIT ALL to return all rows. Is there an equivalent when using the DataFrame API so that I can do something like df.limit("ALL")?

David

2,697

asked Jan 13 at 21:32

0 votes

0 answers

151 views

How to validate nested Spark DataFrame with Pandera?

Is there any possibility to validate nested Spark DataFrame with pandera.pyspark? This is an example with StructType, but similar could be with ArrayType. from pandera.pyspark import DataFrameModel, ...

matt91t

311

asked Oct 11, 2024 at 11:51

0 votes

0 answers

30 views

Index Error when generating a Data Quality and Insights Report due to array column

I'm using AWS's Data Wrangler service to prepare some data to train a ML model. I have a very simple CSV file which has 3 columns and 4 rows: State,Current,History 1,2.045301,[2.045236##2.045129##2....

user27717733

1

asked Oct 9, 2024 at 16:18

0 votes

1 answer

95 views

PySpark: Throwing error 'Column' object is not callable while using .count()

I'm working with a PySpark DataFrame and trying to count the number of null values in each column. I tried the following expression: [col(c).isNull().count() for c in df.columns] throws error: ----&...

aroyc

922

asked Sep 22, 2024 at 8:23

0 votes

1 answer

90 views

Insert column at specified position

How to insert a column at specified position without listing all the existing column names? I have this dataframe: from pyspark.sql import functions as F df = spark.range(1).select( F.lit(11)....

ZygD

24.6k

asked Sep 6, 2024 at 14:17

-1 votes

1 answer

47 views

Pyspark Data frame not returning rows having value more than 8 digits

I have created a sample data frame in Pyspark and the ID column contains a few values having more than 8 digits number. But it returns only those rows having less than 8 digits values in ID field. Can ...

Deveshwari Devi

1

asked Jul 24, 2024 at 11:39

0 votes

1 answer

51 views

Pyspark select after join raises ambiguity but column should only be present in one of the dataframes

I'm doing a join on two dataframes that come from the same original dataframe. These then suffer some aggregations and the columns selected are not equal except for the ones that are used to join. So ...

Miguel Rodrigues

1

asked Jul 22, 2024 at 22:14

1 vote

1 answer

77 views

How to apply an expression from a column to another column in pyspark dataframe?

I would like to know if it is possible to apply. for example, I have this table: new_feed_dt regex_to_apply expr_to_apply 053021 | _(\d+) | date_format(to_date(new_feed_dt, '...

Tomás Jullier

115

asked Jul 19, 2024 at 15:26

2 votes

1 answer

58 views

Pyspark - Retrieve the value from the field dynamically specified in other field of the same data frame

I'm working with PySpark and have a challenging scenario where I need to dynamically retrieve the value of a field specified in another field of the same DataFrame. I then need to compare this ...

Piotr Wojcik

23

asked Jul 5, 2024 at 21:16

0 votes

0 answers

207 views

Pyspark DAGScheduler: Failed to update accumulator because of Pyspark UDF?

When I run a UDF on pyspark I get this on the console all the time. It hasnt failed any unittest yet which prompts me to question if this is something I need to attend to? but this is my first time ...

Daniel Koh

1

asked Jun 12, 2024 at 16:40

1 vote

0 answers

36 views

Create a Sparse vector from Pyspark dataframe maintaing the index

Chris_007

923

asked Jun 3, 2024 at 20:33

0 votes

2 answers

69 views

PySpark equivalent of Spark sliding() function

I have a multiline flat file which I wish to convert to an rdd/dataframe as a 4 column dataframe, or rdd array via PySpark. The Spark Scala code is, #from pyspark.sql import SparkSession # Scala ...

M__

636

asked May 31, 2024 at 1:15

2 votes

2 answers

107 views

Aggregate (sum) consecutive rows where the number of consecutive rows is defined in a dataframe column

Initial Dataframe: Every "id" has the same "range" value, I have to execute the following aggregation: grouping on column "id" a dynamic range of consecutive rows (col &...

csbr

151

asked May 30, 2024 at 17:32

0 votes

2 answers

42 views

How to change a value of a row in condition of a value in a previous row in an ordred dataframe by date of a unique id?

I need insights for how to do this in spark: My dataframe is this |ID | DATE | State |X | 20-01-2023 | N |X | 21-01-2023 | S |X | 22-01-2023 | S |X | 23-01-2023 | ...

Ilyas

3

asked May 23, 2024 at 21:39

Collectives™ on Stack Overflow

All Questions

Monotonically increasing id order

How to use LIMIT ALL with DataFrame

How to validate nested Spark DataFrame with Pandera?

Index Error when generating a Data Quality and Insights Report due to array column

PySpark: Throwing error 'Column' object is not callable while using .count()

Insert column at specified position

Pyspark Data frame not returning rows having value more than 8 digits

Pyspark select after join raises ambiguity but column should only be present in one of the dataframes

How to apply an expression from a column to another column in pyspark dataframe?

Pyspark - Retrieve the value from the field dynamically specified in other field of the same data frame

Pyspark DAGScheduler: Failed to update accumulator because of Pyspark UDF?

Create a Sparse vector from Pyspark dataframe maintaing the index

PySpark equivalent of Spark sliding() function

Aggregate (sum) consecutive rows where the number of consecutive rows is defined in a dataframe column

How to change a value of a row in condition of a value in a previous row in an ordred dataframe by date of a unique id?

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags