Skip to main content

All Questions

Tagged with
1 vote
1 answer
56 views

How add double quotes to all columns in my dataframe and save into csv

i need a help to do something related dataframes I need save a csv file, where all the columns containe double quotes at the beginnign and at the end of the value. this dataframe is created after read ...
Julio's user avatar
  • 551
0 votes
2 answers
107 views

For reach row in dataframe, how to extract elements from an array?

I'm working with a third party dataset that includes location data. I'm trying to extract the Longitude and Latitude coordinates from the location column. As stated in their doc: The location column ...
MyNameHere's user avatar
0 votes
0 answers
61 views

ThreadPoolExecutor for Parallelism

I have PySpark code which does few POST API call to an external system. For each row in the input dataframe, I need to trigger a POST API request (using Python code) to create an entry in an external ...
steve's user avatar
  • 315
0 votes
0 answers
94 views

python worker exited unexpectedly crashed

print(sc.parallelize([1, 2, 3, 4]).map(lambda x:x*x).collect()) Error: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.: org.apache.spark.SparkException: Job ...
Girinath's user avatar
0 votes
1 answer
35 views

unable to set SPARK_HOME in tox.ini

Pyspark is installed using pip install in a virtual environment. On the basis of the tox documentation, setenv is placed under the `[testenv]' section. Here is the code that I got in the tox.ini file ...
mskcc's user avatar
  • 1
0 votes
1 answer
41 views

How create multiples files using a text file as template and dataframe

I need to create multiple files that will be Python functions using a text file as a template, my template will contain something like: mytemplate.txt #text as template to python file def D_{var_1}_O(...
Julio's user avatar
  • 551
0 votes
0 answers
47 views

How to force a transitive dependency version during dependency build using `pdm`

We want to install a specific version of pyspark (==2.4.7). The issue is that this specific version needs a pypandoc < 1.8. Moreover, pyspark must be built on installation. We pin it explicitly in ...
Oussama Ennafii's user avatar
1 vote
0 answers
40 views

How to push data to the dynamo db table using aws glue, api gateway and lambda from aws s3

#This is my glue spark code import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext ...
sushmasree_rudroju's user avatar
0 votes
0 answers
75 views

py4j.protocol.Py4JJavaError: An error occurred while calling o41.saveAsTable. this error occurred while running below code

from pyspark.sql import * if __name__ =="__main__": spark =SparkSession.builder.appName("helloSpark2").master("local[3]") \ .enableHiveSupport() \ ...
shreya kadam's user avatar
0 votes
2 answers
67 views

How to Read Multiple CSV Files with Skipping Rows and Footer in PySpark Efficiently?

I have several CSV files with an inconsistent number of data rows without a header row and I want to read these files into a single PySpark DataFrame. The structure of the CSV files is as follows: ...
Purushottam Nawale's user avatar
1 vote
0 answers
20 views

From pyspark.sql.dataframe.DataFrame call to foreach to lambda is able to print but is not able to append to list

Below PySpark code is able to print the values But, the obj.append does not seem to have any affect. Environment 24/09/17 14:46:12 INFO SparkContext: Running Spark version 3.5.2 ...
Manoj's user avatar
  • 83
0 votes
2 answers
74 views

How convert CSV table structure to JSON using Python?

Today I have a challenge at my school, it is convert a CSV file to a JSON file. This CSV has a table structure (which means contains the information from an oracle table in this example). So i have to ...
Julio's user avatar
  • 551
0 votes
0 answers
102 views

While in Jupyter notebook, while using pyspark, get Py4JJavaError when using simple .count

While using the following code: import pyspark from pyspark import SparkContext from pyspark.sql import SQLContext from pyspark.sql import SparkSession from pyspark.sql.types import Row from datetime ...
aemilius89's user avatar
0 votes
0 answers
365 views

PySpark python issue: Py4JJavaError: An error occurred while calling o91.showString. Python worker exited unexpectedly (crashed)

I am new to apache spark, and am currently learning to use pyspark. I am having problems when I am just learning this, in installation, I have also equated the environment path according to the guide ...
Alfarezza 's user avatar
0 votes
0 answers
128 views

Compare two Pyspark Dataframes using datacompy 0.13.2

I have two pyspark dataframe with 6 columns and 50000 rows each. comparison = SparkSQLCompare( spark, df1, df2, join_columns=['col1', 'col2', 'col3', 'col4', '...
Sara's user avatar
  • 191

15 30 50 per page
1
2 3 4 5
99