1

I am using spark 1.3.0.

I have a problem running the python program in spark python shell.

This is how I submit the job :

/bin/spark-submit progname.py

the error I found is,

NameError: name 'sc' is not defined

on that line.

Any idea? Thanks in advance

3 Answers 3

2
 ## Imports

from pyspark import SparkConf, SparkContext

## CONSTANTS

APP_NAME = "My Spark Application"

##OTHER FUNCTIONS/CLASSES

## Main functionality

def main(sc):

    rdd = sc.parallelize(range(1000), 10)

    print rdd.mean()

if __name__ == "__main__":
     # Configure OPTIONS
     conf = SparkConf().setAppName(APP_NAME)
     conf = conf.setMaster("local[*]")
     #in cluster this will be like
     #"spark://ec2-0-17-03-078.compute-#1.amazonaws.com:7077"
     sc   = SparkContext(conf=conf)
     # Execute Main functionality
main(sc)
6
  • I tried to copy paste the above prog and run it. IndentationError: expected an indented block I am getting this error. soory I am troubling u more. but thank yo so much for the help
    – user5570383
    Commented Nov 17, 2015 at 7:58
  • have you /t (did 4 spaces) after the if statement?
    – Xer
    Commented Nov 17, 2015 at 8:02
  • Yes sir. now I am getting this error zipimport.ZipImportError: can't decompress data; zlib not available
    – user5570383
    Commented Nov 17, 2015 at 8:26
  • can u give your email id if u dont mind. I shall send u the screen shots. sorry for troubling u again.
    – user5570383
    Commented Nov 17, 2015 at 8:29
  • [link] askubuntu.com/questions/661039/… you will find the answer in here
    – Xer
    Commented Nov 17, 2015 at 8:31
0
conf = pyspark.SparkConf()

This is how you should create SparkConf object.

Further you can use chaining to do thins like set application name etc

conf = pyspark.SparkConf().setAppName("My_App_Name")

Then pass this config var to create spark context.

-1

The first thing a Spark program must do is to create a SparkContext object, which tells Spark how to access a cluster. To create a SparkContext you first need to build a SparkConf object that contains information about your application.

conf = SparkConf().setAppName(appName).setMaster(master)
sc = SparkContext(conf=conf)
5
  • sorry to ask again. Can u tell me how to build a sparkConf. In terminal? or where? Thanks again .
    – user5570383
    Commented Nov 17, 2015 at 7:22
  • create a SparkConf object with new SparkConf(), which will load values from any spark.* Python system properties
    – Xer
    Commented Nov 17, 2015 at 7:28
  • The appName parameter is a name for your application to show on the cluster UI. master is a Spark, Mesos or YARN cluster URL, or a special “local” string to run in local mode
    – Xer
    Commented Nov 17, 2015 at 7:30
  • I am sorry. Thank you for your help. but i again that is throwing error. and I dont know how to resolve. may be I dont know the correct syntax. :( conf = SparkConf().setAppname("README.md").setMaster("/home/nikitha/Downloads/spark-1.5.0-bin-hadoop2.4") sc = SparkContext(conf=conf) textFile=sc.textFile("README.md") I gave like this and tried to run by using /bin/spark-submit progname.py but the error is NameError: name 'SparkConf' is not defined
    – user5570383
    Commented Nov 17, 2015 at 7:48
  • set the .master to local
    – Xer
    Commented Nov 17, 2015 at 7:55