Newest 'apache-spark' Questions - Data Science Stack Exchange

0 votes

0 answers

13 views

Java heap space - even only 1.5Gb of 5.8 used and df size is 3Gb

Why am i getting java.lang.OutOfMemoryError: Java heap space even when i have a plenty of memory. So my simple code that create dataframe from input data, so no ...

user453575457

101

asked Oct 31, 2025 at 12:57

1 vote

2 answers

69 views

Stuck on loading parquet files recursively of varying size with Spark

I am using Spark on Scala via and Almond kernel for Jupyter to load several parquet files with varying size. I have a single worker with 10 cores and memory allowance of 10GB. When I execute the ...

Ícaro Lorran

331

asked Dec 11, 2024 at 9:12

0 votes

1 answer

104 views

Any Interface/Library that can take the Python ML code and run on spark cluster without learning PySpark?

I have been working with Python for machine learning and have a fair amount of code written in Python using libraries such as scikit-learn, pandas, and numpy. Recently, I’ve been faced with larger ...

Mohith7548

622

asked Nov 30, 2023 at 7:48

0 votes

1 answer

214 views

Hadoop, Spark and Cloud

It seems Hadoop, Spark, and different versions of Clouds offer facilities to store and analyze big data. There are some articles comparing Hadoop and Spark (for example, this article). There are also ...

Tara

1

asked Apr 16, 2023 at 7:28

0 votes

1 answer

371 views

IllegalArgumentException at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source) when training an ALS implementation of spark in scala

I was following this tutorial trying to write a collaborative recommender system using the alternating least squares algorithm in spark. I am using the movie lens dataset which can be found here. My ...

ptushev

21

asked Dec 19, 2022 at 19:28

1 vote

0 answers

74 views

Not able to read data from Mongodb for below schema [closed]

Am trying to read very complex json from mongoDB. Tried in multiple ways nut no luck. Sample schema below : ...

sai

11

asked Oct 17, 2022 at 11:31

1 vote

1 answer

89 views

What is the difference between Data Modeling and Data Processing?

When discussing big data, it is sometimes mentioned that data modeling can be done by using a tool like map reduce, while data processing may be performed by apache spark. What is the difference ...

Karl 17302

49

asked Sep 17, 2022 at 4:59

1 vote

0 answers

52 views

Working with massive data what is the right approach

let's say I have database with massive data (millions of rows) additionally Let's say 26 Million rows are entered every day I want to build a fraud model to check these 26 Million rows every day.. as ...

MAS

45

asked Aug 28, 2022 at 13:24

0 votes

1 answer

406 views

Group a spark dataframe by a starting event to an ending event

Given a series of events (with datetime) such as: failed, failed, passed, failed, passed, passed I want to retrieve the time from when it first "failed" to when it first "passed," ...

Ronen

101

asked Mar 6, 2022 at 9:13

0 votes

1 answer

545 views

Storage of N-dimensional matrices (tensors) as part of machine learning pipelines

I'm an infra person working on a storage product. I've been googling quite a bit to find an answer to the following question but unable to do so. Hence, I am attemping to ask the question here. I am ...

user855

101

asked Feb 22, 2022 at 20:04

0 votes

0 answers

306 views

Generalized Additive Modeling Apache Spark implementation

Does Spark MLlib support Generalized Additive Modeling? How does one go about implementing GAM models in Spark? I want to implement GAM (Generalized additive model) model in Spark. Based on my ...

Pavan

1

asked Feb 15, 2022 at 7:28

0 votes

1 answer

133 views

CREATE TABLE USING Oracle DATA_SOURCE

I am trying to create a table using ORACLE as a data source using spark query but getting an error. %sql CREATE TABLE TEST USING org.apache.spark.sql.jdbc OPTIONS ( url "jdbc:oracle:thin:@...

nikhil parmar

1

asked Jan 26, 2022 at 20:50

0 votes

0 answers

339 views

Creating table in Databricks using the table from Oracle

I am trying to create a table in Databricks using a object in Oracle Db, but getting an error ...

nikhil parmar

1

asked Jan 26, 2022 at 14:28

1 vote

1 answer

308 views

Outlier Elimination in Spark With InterQuartileRange Results in Error

I have the following function that is supposed to calculate the outlier for a given dataset. ...

joesan

219

asked Jan 4, 2022 at 20:35

1 vote

1 answer

43 views

Would it be possible/practical to build a distributed deep learning engine by tapping into ordinary PCs' unused resources?

I started thinking about this in the context of Apple's new line of desktop CPUs with dedicated neural engines. From what I hear, these chips are quite adept at solving deep learning problems (as the ...

AffableAmbler

383

asked Nov 22, 2021 at 19:24

Stack Exchange Network

Questions tagged [apache-spark]

Java heap space - even only 1.5Gb of 5.8 used and df size is 3Gb

Stuck on loading parquet files recursively of varying size with Spark

Any Interface/Library that can take the Python ML code and run on spark cluster without learning PySpark?

Hadoop, Spark and Cloud

IllegalArgumentException at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source) when training an ALS implementation of spark in scala

Not able to read data from Mongodb for below schema [closed]

What is the difference between Data Modeling and Data Processing?

Working with massive data what is the right approach

Group a spark dataframe by a starting event to an ending event

Storage of N-dimensional matrices (tensors) as part of machine learning pipelines

Generalized Additive Modeling Apache Spark implementation

CREATE TABLE USING Oracle DATA_SOURCE

Creating table in Databricks using the table from Oracle

Outlier Elimination in Spark With InterQuartileRange Results in Error

Would it be possible/practical to build a distributed deep learning engine by tapping into ordinary PCs' unused resources?

Hot Network Questions