import os
import pyspark
from pyspark.sql import SparkSession
# file directory
DATA_DIR = "dbfs:/FileStore/shared_uploads/[email protected]"
path = os.path.join(DATA_DIR, "Alabama_pop_by_sex_and_age_2000_2010.xls")
spark = SparkSession.builder.appName('test')\
.config("spark.jars.packages", "com.crealytics:spark-excel_2.12:3.5.1_0.20.4")\
.getOrCreate()
upbsa_df = spark.read.format("com.crealytics.spark.excel")\
.option("header", "false")\
.option("inferSchema", "true")\
.load(path)
All i'm trying to do is read the excel file with spark and I already installed the maven package com.crealytics.spark.excel
but to no avail it still raises this error. I've tried reading thsi post and searching for the commons-io
package in the install libraries option inside my databricks cluster as told by this post java.lang.NoSuchMethodError: 'org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream$Builder org.apache.poi-poi-ooxml-5.2.4. But it shows only these packages
Tried installing com.twelvemonkeys.common:common-io:3.12.0
to see if it would work since I thought it had relatively the same name.
My runtime is 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12)
I really want to get this going, so I can get to building data pipelines. So feedback would be appreciated.