0
import os

import pyspark

from pyspark.sql import SparkSession

# file directory
DATA_DIR = "dbfs:/FileStore/shared_uploads/[email protected]"

path = os.path.join(DATA_DIR, "Alabama_pop_by_sex_and_age_2000_2010.xls")

spark = SparkSession.builder.appName('test')\
    .config("spark.jars.packages", "com.crealytics:spark-excel_2.12:3.5.1_0.20.4")\
    .getOrCreate()

upbsa_df = spark.read.format("com.crealytics.spark.excel")\
        .option("header", "false")\
        .option("inferSchema", "true")\
        .load(path)

All i'm trying to do is read the excel file with spark and I already installed the maven package com.crealytics.spark.excel but to no avail it still raises this error. I've tried reading thsi post and searching for the commons-io package in the install libraries option inside my databricks cluster as told by this post java.lang.NoSuchMethodError: 'org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream$Builder org.apache.poi-poi-ooxml-5.2.4. But it shows only these packages

packages shown in maven central in install libraries under my databricks cluster

Tried installing com.twelvemonkeys.common:common-io:3.12.0 to see if it would work since I thought it had relatively the same name.

My runtime is 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12)

I really want to get this going, so I can get to building data pipelines. So feedback would be appreciated.

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.