Skip to main content
Tooling
0 votes
0 replies
27 views

I am not given access to the backup of some data required for work by my boss who holds the wrong belief that the backup and the production being backed share credentials. I have got people setting up ...
user4343712's user avatar
Advice
0 votes
5 replies
121 views

I currently have Java 24 installed on my system and I use it for my personal projects. However, for my college work with Hadoop, I need to run it on Java 17. How can I set up Hadoop to use Java 17 ...
Yash Sharma's user avatar
0 votes
0 answers
79 views

We have been using tdch approach for data loading from hadoop to teradata but now looking to load into a teradata view from Hadoop csv tables, I've tried batch insert using tdch but that is failing as ...
Vaishnavi Priya's user avatar
1 vote
2 answers
118 views

I want to use a compression in bigdata processing, but there are two compression codecs. Anyone know the difference?
Angle Tom's user avatar
  • 1,150
2 votes
1 answer
53 views

I have an application using EKS in AWS that runs a spark session that can run multiple workloads. In each workload, I need to access data from S3 in another AWS account, for which I have STS ...
md12345's user avatar
  • 21
0 votes
0 answers
198 views

I keep running into this issue when running PySpark. I was able to connect to my database and retrieve data, but whenever I try do operations like .show() or .count(), or when I try to save a Spark ...
Siva Indukuri's user avatar
0 votes
1 answer
171 views

I am running Apache Hive 4.0.0 inside Docker on Ubuntu 22.04. The container starts, but HiveServer2 never binds to the port. When I try to connect with Beeline: sudo docker exec -it hive4 beeline -u ...
user31562336's user avatar
0 votes
3 answers
350 views

I'm trying to read some file from S3 with PySpark 4.0.1 and the S3AFileSystem. The standard configuration using hadoop-aws 3.4.1 works, but it requires the AWS SDK Bundle. This single dependency is ...
RobinFrcd's user avatar
  • 5,734
0 votes
0 answers
70 views

I'm having a Hive table emp1 with 100 partitions in Text format. I want Spark to read emp table based on partitions bases and write to EMP2 in parquet format. How to achieve 1) 10 Partition Read from ...
Rishabh Joshi's user avatar
0 votes
1 answer
82 views

Context: using distcp, I am trying to copy HDFS directory including files to GCP bucket. I am using hadoop distcp -Dhadoop.security.credential.provider.path=jceks://$JCEKS_FILE hdfs://nameservice1/...
Jhon's user avatar
  • 49
0 votes
0 answers
80 views

I’m trying to convert my PySpark script into an executable(.exe) file using PyInstaller. The script runs fine in Python, but after converting to an EXE and executing it, I get the following error: '...
userr's user avatar
  • 11
-1 votes
1 answer
186 views

I have 67 snapshot in a single table but when i use CALL iceberg_catalog.system.expire_snapshots( table => 'iceberg_catalog.default.test_7', retain_last => 5 ); It doesn't delete any snapshot. ...
Sơn Bùi's user avatar
1 vote
1 answer
48 views

When I build a hadoop cluster(version 3.3.6) by docker swarm. I have 3 machines, and 1 for namenode, all for datanode. After all starts, I checked everything, namenode is healthy, datanode is healthy, ...
jcyan's user avatar
  • 71
0 votes
2 answers
113 views

I'm working on a Scala project using Spark (with Hive support in some tests) and running unit and integration tests via both IntelliJ and Maven Surefire. I have a shared test session setup like this: ...
M06H's user avatar
  • 1,813
0 votes
1 answer
162 views

Hive 4.0.1 doesn't work because of Jar files not found. I want to use hive integrated with hadoop 3.4.1 to query data on apache spark. I tried to type in ./hive/bin/hive and expected it to return >...
vinhdiesal's user avatar

15 30 50 per page
1
2 3 4 5
��
2944