All Questions
Tagged with pyspark python-3.x
1,483 questions
1
vote
1
answer
56
views
How add double quotes to all columns in my dataframe and save into csv
i need a help to do something related dataframes
I need save a csv file, where all the columns containe double quotes at the beginnign and at the end of the value.
this dataframe is created after read ...
0
votes
2
answers
107
views
For reach row in dataframe, how to extract elements from an array?
I'm working with a third party dataset that includes location data. I'm trying to extract the Longitude and Latitude coordinates from the location column. As stated in their doc:
The location column ...
0
votes
0
answers
61
views
ThreadPoolExecutor for Parallelism
I have PySpark code which does few POST API call to an external system. For each row in the input dataframe, I need to trigger a POST API request (using Python code) to create an entry in an external ...
0
votes
0
answers
94
views
python worker exited unexpectedly crashed
print(sc.parallelize([1, 2, 3, 4]).map(lambda x:x*x).collect())
Error: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.: org.apache.spark.SparkException: Job ...
0
votes
1
answer
35
views
unable to set SPARK_HOME in tox.ini
Pyspark is installed using pip install in a virtual environment.
On the basis of the tox documentation, setenv is placed under the `[testenv]' section.
Here is the code that I got in the tox.ini file
...
0
votes
1
answer
41
views
How create multiples files using a text file as template and dataframe
I need to create multiple files that will be Python functions using a text file as a template, my template will contain something like:
mytemplate.txt
#text as template to python file
def D_{var_1}_O(...
0
votes
0
answers
47
views
How to force a transitive dependency version during dependency build using `pdm`
We want to install a specific version of pyspark (==2.4.7). The issue is that this specific version needs a pypandoc < 1.8. Moreover, pyspark must be built on installation. We pin it explicitly in ...
1
vote
0
answers
40
views
How to push data to the dynamo db table using aws glue, api gateway and lambda from aws s3
#This is my glue spark code
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
...
0
votes
0
answers
75
views
py4j.protocol.Py4JJavaError: An error occurred while calling o41.saveAsTable. this error occurred while running below code
from pyspark.sql import *
if __name__ =="__main__":
spark =SparkSession.builder.appName("helloSpark2").master("local[3]") \
.enableHiveSupport() \
...
0
votes
2
answers
67
views
How to Read Multiple CSV Files with Skipping Rows and Footer in PySpark Efficiently?
I have several CSV files with an inconsistent number of data rows without a header row and I want to read these files into a single PySpark DataFrame. The structure of the CSV files is as follows:
...
1
vote
0
answers
20
views
From pyspark.sql.dataframe.DataFrame call to foreach to lambda is able to print but is not able to append to list
Below PySpark code is able to print the values
But, the obj.append does not seem to have any affect.
Environment
24/09/17 14:46:12 INFO SparkContext: Running Spark version 3.5.2 ...
0
votes
2
answers
74
views
How convert CSV table structure to JSON using Python?
Today I have a challenge at my school, it is convert a CSV file to a JSON file.
This CSV has a table structure (which means contains the information from an oracle table in this example). So i have to ...
0
votes
0
answers
102
views
While in Jupyter notebook, while using pyspark, get Py4JJavaError when using simple .count
While using the following code:
import pyspark
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql import SparkSession
from pyspark.sql.types import Row
from datetime ...
0
votes
0
answers
365
views
PySpark python issue: Py4JJavaError: An error occurred while calling o91.showString. Python worker exited unexpectedly (crashed)
I am new to apache spark, and am currently learning to use pyspark. I am having problems when I am just learning this, in installation, I have also equated the environment path according to the guide ...
0
votes
0
answers
128
views
Compare two Pyspark Dataframes using datacompy 0.13.2
I have two pyspark dataframe with 6 columns and 50000 rows each.
comparison = SparkSQLCompare(
spark,
df1,
df2,
join_columns=['col1', 'col2', 'col3', 'col4', '...