Skip to main content

Questions tagged [hdfs]

For questions regarding the Hadoop distributed file system (HDFS) which is part of the Apache Hadoop project.

2 votes
1 answer
206 views

from my understanding Using 20-30TB disks with HDFS can present some challenges, but it can also be managed effectively with proper configuration using 20-30TB disks with HDFS is possible, it requires ...
King David's user avatar
1 vote
0 answers
174 views

We have Hadoop cluster with 487 data-nodes machines ( each data-node machine include also the Service node-manager ) , all machines are physical machines ( DELL ) , and OS is RHEL 7.9 version. Each ...
King David's user avatar
0 votes
1 answer
150 views

We have 254 Physical servers when all machines are DELL servers R740. servers are part of Hadoop cluster. most of them are holding HDFS filesystem and data node & node manager services, part of ...
King David's user avatar
1 vote
0 answers
282 views

we have HDP cluster version 2.6.5 when we look on name-node logs we can see the following warning 2023-02-20 15:56:37,731 INFO namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(...
King David's user avatar
-2 votes
1 answer
226 views

GFS research paper snapshot it says that(my interpretation after reading research paper and its reviews) "inter rack bandwidth is lower than aggregated intra rack bandwidth(not sure what it means ...
gibmegucci's user avatar
1 vote
0 answers
167 views

We are using HDP cluster with 182 data node machines: HDP version - 2.6.4 Ambari version 2.6.1 We note the following behavior on the data nodes machines (its happens on all data-node machines and on ...
King David's user avatar
0 votes
1 answer
201 views

Do I need to delete all data from a datanode before recommissioning it, or it doesn't matter and the namenode will not pick stale data from the datanode?
Guido Aulisi's user avatar
0 votes
1 answer
534 views

I've changed replication factor from 3 to 2 for some directories with command: hdfs dfs -setrep -R 2 /path/to/dir but my HDFS free space still the same. Should I do something else to free my disks?
John Brown's user avatar
0 votes
1 answer
129 views

I have cluster with 3 servers. 2 of them have 2 TB disks and another one have 500 Gb SSD. I am trying to use balancer, but I still get 70% of usage on 2TB disks and 99% on 500Gb due to non-dfs files. ...
John Brown's user avatar
0 votes
1 answer
258 views

we are planing to build hadoop cluster with 12 data nodes machines when the replication factor is 3 and DataNode failed disk tolerance - 1 data nodes machines are include the disks for HDFS since we ...
King David's user avatar
0 votes
1 answer
301 views

I'm trying to determine if there is any practical advantage to configuring a RAID array on the instance store of a 3x d2.2xlarge instances being used for HDFS. Initially I planned to just mount each ...
John R's user avatar
  • 423
1 vote
1 answer
1k views

Due to some error at one component, files in HDFS got accumulated and the number is huge i.e 2123516. I want to list all files and want to copy their name in one file but when I run the following ...
innervoice's user avatar
0 votes
1 answer
1k views

We are currently using EMR for easy job submission for our spark jobs. Recently I came across the "FSx lustre + S3" solution that is being advertised as ideal for HPC situations. EMRFS however is also ...
dimisjim's user avatar
  • 265
0 votes
1 answer
310 views

we are using the following HDP cluster with ambari , list of nodes and their RHEL version 3 masters machines ( with namenode & resource manager ) , installed on RHEL 7.2 312 DATA-NODES machines ...
shalom's user avatar
  • 531
0 votes
1 answer
2k views

I have a small (testing) HDFS cluster which I use as snapshot backup space for Flink. Flink creates and deletes roughly 1000 (small) files per second. The namenode seems to handle this without ...
Caesar's user avatar
  • 111

15 30 50 per page
1
2 3 4 5