All Questions
4 questions
1
vote
1
answer
2k
views
Pyspark - df.cache().count() taking forever to run
I'm trying to force eager evaluation for PySpark, using the count methodology I read online:
spark_df = spark.read.jdbc(url=jdbcUrl, table=pushdown_query, properties=connectionProperties)
spark_df....
1
vote
0
answers
39
views
bugs due to pyspark lazy evalution [duplicate]
from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName("Ark API Stats")
sc = SparkContext(conf=conf)
a = sc.parallelize([1,2,3,4,5,6,7,8,9,10])
count = [2,4]
array = [a.filter(...
1
vote
1
answer
2k
views
Pyspark lazy evaluation in loops too slow
First of all I want to let you know that I am still very new in spark and getting used to the lazy-evaluation concept.
Here my issue:
I have two spark DataFrames that I load from reading CSV.GZ ...
0
votes
2
answers
653
views
RDD creation and variable binding
I have a very simple code:
def fun(x, n):
return (x, n)
rdds = []
for i in range(2):
rdd = sc.parallelize(range(5*i, 5*(i+1)))
rdd = rdd.map(lambda x: fun(x, i))
rdds.append(rdd)
a =...