Questions tagged [hdfs]
For questions regarding the Hadoop distributed file system (HDFS) which is part of the Apache Hadoop project.
73 questions
2
votes
1
answer
206
views
HDFS + Using very large disks with HDFS
from my understanding Using 20-30TB disks with HDFS can present some challenges, but it can also be managed effectively with proper configuration
using 20-30TB disks with HDFS is possible, it requires ...
1
vote
0
answers
174
views
Hadoop + warnings as slow block-receive from data-node machines
We have Hadoop cluster with 487 data-nodes machines ( each data-node machine include also the Service node-manager ) , all machines are physical machines ( DELL ) , and OS is RHEL 7.9 version.
Each ...
0
votes
1
answer
150
views
Does VM machine can replace physical machine,
We have 254 Physical servers when all machines are DELL servers R740.
servers are part of Hadoop cluster. most of them are holding HDFS filesystem and data node & node manager services, part of ...
1
vote
0
answers
282
views
HDP cluster + journal nodes get out of Sync
we have HDP cluster version 2.6.5
when we look on name-node logs we can see the following warning
2023-02-20 15:56:37,731 INFO namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(...
-2
votes
1
answer
226
views
How does placing data in various racks help to exploit the fact that intra-rack aggregated bandwidth>=inter-rack bandwidth?
GFS research paper snapshot
it says that(my interpretation after reading research paper and its reviews) "inter rack bandwidth is lower than aggregated intra rack bandwidth(not sure what it means ...
1
vote
0
answers
167
views
HDFS + how to disable the "du -sk" verifcation on data node disks
We are using HDP cluster with 182 data node machines:
HDP version - 2.6.4
Ambari version 2.6.1
We note the following behavior on the data nodes machines (its happens on all data-node machines and on ...
0
votes
1
answer
201
views
Hadoop recommissioning datanode
Do I need to delete all data from a datanode before recommissioning it, or it doesn't matter and the namenode will not pick stale data from the datanode?
0
votes
1
answer
534
views
Change HDFS replication factor
I've changed replication factor from 3 to 2 for some directories with command:
hdfs dfs -setrep -R 2 /path/to/dir
but my HDFS free space still the same. Should I do something else to free my disks?
0
votes
1
answer
129
views
HDFS. How to free 1 particular disk
I have cluster with 3 servers. 2 of them have 2 TB disks and another one have 500 Gb SSD. I am trying to use balancer, but I still get 70% of usage on 2TB disks and 99% on 500Gb due to non-dfs files. ...
0
votes
1
answer
258
views
Hadoop Cluster Capacity Planning of Data Nodes for disks per data node
we are planing to build hadoop cluster with 12 data nodes machines
when the replication factor is 3
and DataNode failed disk tolerance - 1
data nodes machines are include the disks for HDFS
since we ...
0
votes
1
answer
301
views
Optimal RAID configuration for EC2 instance store used for HDFS
I'm trying to determine if there is any practical advantage to configuring a RAID array on the instance store of a 3x d2.2xlarge instances being used for HDFS. Initially I planned to just mount each ...
1
vote
1
answer
1k
views
List all files in hdfs directory
Due to some error at one component, files in HDFS got accumulated and the number is huge i.e 2123516. I want to list all files and want to copy their name in one file but when I run the following ...
0
votes
1
answer
1k
views
AWS FSx for lustre with S3 vs EMR (with EMRFS) for spark jobs
We are currently using EMR for easy job submission for our spark jobs.
Recently I came across the "FSx lustre + S3" solution that is being advertised as ideal for HPC situations.
EMRFS however is also ...
0
votes
1
answer
310
views
is it possible mix different RHEL OS version in hadoop cluster?
we are using the following HDP cluster with ambari ,
list of nodes and their RHEL version
3 masters machines ( with namenode & resource manager ) , installed on RHEL 7.2
312 DATA-NODES machines ...
0
votes
1
answer
2k
views
HDFS block deletion speed - cause, expectance, tuning?
I have a small (testing) HDFS cluster which I use as snapshot backup space for Flink. Flink creates and deletes roughly 1000 (small) files per second. The namenode seems to handle this without ...