Newest 'pyspark+python-3.x' Questions

1 vote

1 answer

56 views

How add double quotes to all columns in my dataframe and save into csv

i need a help to do something related dataframes I need save a csv file, where all the columns containe double quotes at the beginnign and at the end of the value. this dataframe is created after read ...

Julio

551

asked Nov 28, 2024 at 14:50

0 votes

2 answers

107 views

For reach row in dataframe, how to extract elements from an array?

I'm working with a third party dataset that includes location data. I'm trying to extract the Longitude and Latitude coordinates from the location column. As stated in their doc: The location column ...

MyNameHere

305

asked Nov 25, 2024 at 3:26

0 votes

0 answers

61 views

ThreadPoolExecutor for Parallelism

I have PySpark code which does few POST API call to an external system. For each row in the input dataframe, I need to trigger a POST API request (using Python code) to create an entry in an external ...

steve

315

asked Nov 18, 2024 at 7:04

0 votes

0 answers

94 views

python worker exited unexpectedly crashed

print(sc.parallelize([1, 2, 3, 4]).map(lambda x:x*x).collect()) Error: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.: org.apache.spark.SparkException: Job ...

Girinath

9

asked Nov 15, 2024 at 12:00

0 votes

1 answer

35 views

unable to set SPARK_HOME in tox.ini

Pyspark is installed using pip install in a virtual environment. On the basis of the tox documentation, setenv is placed under the `[testenv]' section. Here is the code that I got in the tox.ini file ...

mskcc

1

asked Nov 3, 2024 at 18:14

0 votes

1 answer

41 views

How create multiples files using a text file as template and dataframe

I need to create multiple files that will be Python functions using a text file as a template, my template will contain something like: mytemplate.txt #text as template to python file def D_{var_1}_O(...

Julio

551

asked Oct 9, 2024 at 13:17

0 votes

0 answers

47 views

How to force a transitive dependency version during dependency build using `pdm`

We want to install a specific version of pyspark (==2.4.7). The issue is that this specific version needs a pypandoc < 1.8. Moreover, pyspark must be built on installation. We pin it explicitly in ...

Oussama Ennafii

77

asked Oct 8, 2024 at 18:15

1 vote

0 answers

40 views

How to push data to the dynamo db table using aws glue, api gateway and lambda from aws s3

#This is my glue spark code import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext ...

sushmasree_rudroju

11

asked Oct 2, 2024 at 1:05

0 votes

0 answers

75 views

py4j.protocol.Py4JJavaError: An error occurred while calling o41.saveAsTable. this error occurred while running below code

from pyspark.sql import * if __name__ =="__main__": spark =SparkSession.builder.appName("helloSpark2").master("local[3]") \ .enableHiveSupport() \ ...

shreya kadam

1

asked Sep 25, 2024 at 17:00

0 votes

2 answers

67 views

How to Read Multiple CSV Files with Skipping Rows and Footer in PySpark Efficiently?

I have several CSV files with an inconsistent number of data rows without a header row and I want to read these files into a single PySpark DataFrame. The structure of the CSV files is as follows: ...

Purushottam Nawale

477

asked Sep 19, 2024 at 14:23

1 vote

0 answers

20 views

From pyspark.sql.dataframe.DataFrame call to foreach to lambda is able to print but is not able to append to list

Below PySpark code is able to print the values But, the obj.append does not seem to have any affect. Environment 24/09/17 14:46:12 INFO SparkContext: Running Spark version 3.5.2 ...

Manoj

83

asked Sep 17, 2024 at 20:22

0 votes

2 answers

74 views

How convert CSV table structure to JSON using Python?

Today I have a challenge at my school, it is convert a CSV file to a JSON file. This CSV has a table structure (which means contains the information from an oracle table in this example). So i have to ...

Julio

551

asked Sep 11, 2024 at 7:35

0 votes

0 answers

102 views

While in Jupyter notebook, while using pyspark, get Py4JJavaError when using simple .count

While using the following code: import pyspark from pyspark import SparkContext from pyspark.sql import SQLContext from pyspark.sql import SparkSession from pyspark.sql.types import Row from datetime ...

aemilius89

17

asked Aug 30, 2024 at 20:18

0 votes

0 answers

365 views

PySpark python issue: Py4JJavaError: An error occurred while calling o91.showString. Python worker exited unexpectedly (crashed)

I am new to apache spark, and am currently learning to use pyspark. I am having problems when I am just learning this, in installation, I have also equated the environment path according to the guide ...

Alfarezza

1

asked Aug 19, 2024 at 17:42

0 votes

0 answers

128 views

Compare two Pyspark Dataframes using datacompy 0.13.2

I have two pyspark dataframe with 6 columns and 50000 rows each. comparison = SparkSQLCompare( spark, df1, df2, join_columns=['col1', 'col2', 'col3', 'col4', '...

Sara

191

asked Aug 7, 2024 at 16:37

Collectives™ on Stack Overflow

All Questions

How add double quotes to all columns in my dataframe and save into csv

For reach row in dataframe, how to extract elements from an array?

ThreadPoolExecutor for Parallelism

python worker exited unexpectedly crashed

unable to set SPARK_HOME in tox.ini

How create multiples files using a text file as template and dataframe

How to force a transitive dependency version during dependency build using `pdm`

How to push data to the dynamo db table using aws glue, api gateway and lambda from aws s3

py4j.protocol.Py4JJavaError: An error occurred while calling o41.saveAsTable. this error occurred while running below code

How to Read Multiple CSV Files with Skipping Rows and Footer in PySpark Efficiently?

From pyspark.sql.dataframe.DataFrame call to foreach to lambda is able to print but is not able to append to list

How convert CSV table structure to JSON using Python?

While in Jupyter notebook, while using pyspark, get Py4JJavaError when using simple .count

PySpark python issue: Py4JJavaError: An error occurred while calling o91.showString. Python worker exited unexpectedly (crashed)

Compare two Pyspark Dataframes using datacompy 0.13.2

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags