Skip to main content

Questions tagged [apache-spark]

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write, originally developed in the AMPLab at UC Berkeley.

0 votes
0 answers
13 views

Why am i getting java.lang.OutOfMemoryError: Java heap space even when i have a plenty of memory. So my simple code that create dataframe from input data, so no ...
user453575457's user avatar
1 vote
2 answers
69 views

I am using Spark on Scala via and Almond kernel for Jupyter to load several parquet files with varying size. I have a single worker with 10 cores and memory allowance of 10GB. When I execute the ...
Ícaro Lorran's user avatar
0 votes
1 answer
104 views

I have been working with Python for machine learning and have a fair amount of code written in Python using libraries such as scikit-learn, pandas, and numpy. Recently, I’ve been faced with larger ...
Mohith7548's user avatar
0 votes
1 answer
214 views

It seems Hadoop, Spark, and different versions of Clouds offer facilities to store and analyze big data. There are some articles comparing Hadoop and Spark (for example, this article). There are also ...
Tara's user avatar
  • 1
0 votes
1 answer
371 views

I was following this tutorial trying to write a collaborative recommender system using the alternating least squares algorithm in spark. I am using the movie lens dataset which can be found here. My ...
ptushev's user avatar
  • 21
1 vote
0 answers
74 views

Am trying to read very complex json from mongoDB. Tried in multiple ways nut no luck. Sample schema below : ...
sai's user avatar
  • 11
1 vote
1 answer
89 views

When discussing big data, it is sometimes mentioned that data modeling can be done by using a tool like map reduce, while data processing may be performed by apache spark. What is the difference ...
Karl 17302's user avatar
1 vote
0 answers
52 views

let's say I have database with massive data (millions of rows) additionally Let's say 26 Million rows are entered every day I want to build a fraud model to check these 26 Million rows every day.. as ...
MAS's user avatar
  • 45
0 votes
1 answer
406 views

Given a series of events (with datetime) such as: failed, failed, passed, failed, passed, passed I want to retrieve the time from when it first "failed" to when it first "passed," ...
Ronen's user avatar
  • 101
0 votes
1 answer
545 views

I'm an infra person working on a storage product. I've been googling quite a bit to find an answer to the following question but unable to do so. Hence, I am attemping to ask the question here. I am ...
user855's user avatar
  • 101
0 votes
0 answers
306 views

Does Spark MLlib support Generalized Additive Modeling? How does one go about implementing GAM models in Spark? I want to implement GAM (Generalized additive model) model in Spark. Based on my ...
Pavan's user avatar
  • 1
0 votes
1 answer
133 views

I am trying to create a table using ORACLE as a data source using spark query but getting an error. %sql CREATE TABLE TEST USING org.apache.spark.sql.jdbc OPTIONS ( url "jdbc:oracle:thin:@...
nikhil parmar's user avatar
0 votes
0 answers
339 views

I am trying to create a table in Databricks using a object in Oracle Db, but getting an error ...
nikhil parmar's user avatar
1 vote
1 answer
308 views

I have the following function that is supposed to calculate the outlier for a given dataset. ...
joesan's user avatar
  • 219
1 vote
1 answer
43 views

I started thinking about this in the context of Apple's new line of desktop CPUs with dedicated neural engines. From what I hear, these chips are quite adept at solving deep learning problems (as the ...
AffableAmbler's user avatar

15 30 50 per page
1
2 3 4 5
16