Newest 'hdfs' Questions

2 votes

1 answer

206 views

HDFS + Using very large disks with HDFS

from my understanding Using 20-30TB disks with HDFS can present some challenges, but it can also be managed effectively with proper configuration using 20-30TB disks with HDFS is possible, it requires ...

King David

739

asked Aug 14, 2024 at 10:18

1 vote

0 answers

174 views

Hadoop + warnings as slow block-receive from data-node machines

We have Hadoop cluster with 487 data-nodes machines ( each data-node machine include also the Service node-manager ) , all machines are physical machines ( DELL ) , and OS is RHEL 7.9 version. Each ...

King David

739

asked Mar 19, 2024 at 14:06

0 votes

1 answer

150 views

Does VM machine can replace physical machine,

We have 254 Physical servers when all machines are DELL servers R740. servers are part of Hadoop cluster. most of them are holding HDFS filesystem and data node & node manager services, part of ...

King David

739

asked Jul 25, 2023 at 15:33

1 vote

0 answers

282 views

HDP cluster + journal nodes get out of Sync

we have HDP cluster version 2.6.5 when we look on name-node logs we can see the following warning 2023-02-20 15:56:37,731 INFO namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(...

King David

739

asked Feb 23, 2023 at 12:00

-2 votes

1 answer

226 views

How does placing data in various racks help to exploit the fact that intra-rack aggregated bandwidth>=inter-rack bandwidth?

GFS research paper snapshot it says that(my interpretation after reading research paper and its reviews) "inter rack bandwidth is lower than aggregated intra rack bandwidth(not sure what it means ...

gibmegucci

1

asked Jul 28, 2022 at 4:50

1 vote

0 answers

167 views

HDFS + how to disable the "du -sk" verifcation on data node disks

We are using HDP cluster with 182 data node machines: HDP version - 2.6.4 Ambari version 2.6.1 We note the following behavior on the data nodes machines (its happens on all data-node machines and on ...

King David

739

asked Nov 28, 2021 at 15:49

0 votes

1 answer

201 views

Hadoop recommissioning datanode

Do I need to delete all data from a datanode before recommissioning it, or it doesn't matter and the namenode will not pick stale data from the datanode?

Guido Aulisi

1

asked Feb 19, 2021 at 9:36

0 votes

1 answer

534 views

Change HDFS replication factor

I've changed replication factor from 3 to 2 for some directories with command: hdfs dfs -setrep -R 2 /path/to/dir but my HDFS free space still the same. Should I do something else to free my disks?

John Brown

1

asked Sep 29, 2020 at 11:16

0 votes

1 answer

129 views

HDFS. How to free 1 particular disk

I have cluster with 3 servers. 2 of them have 2 TB disks and another one have 500 Gb SSD. I am trying to use balancer, but I still get 70% of usage on 2TB disks and 99% on 500Gb due to non-dfs files. ...

John Brown

1

asked Sep 16, 2020 at 8:44

0 votes

1 answer

258 views

Hadoop Cluster Capacity Planning of Data Nodes for disks per data node

we are planing to build hadoop cluster with 12 data nodes machines when the replication factor is 3 and DataNode failed disk tolerance - 1 data nodes machines are include the disks for HDFS since we ...

King David

739

asked Aug 2, 2020 at 20:51

0 votes

1 answer

301 views

Optimal RAID configuration for EC2 instance store used for HDFS

I'm trying to determine if there is any practical advantage to configuring a RAID array on the instance store of a 3x d2.2xlarge instances being used for HDFS. Initially I planned to just mount each ...

John R

423

asked Jun 25, 2020 at 22:49

1 vote

1 answer

1k views

List all files in hdfs directory

Due to some error at one component, files in HDFS got accumulated and the number is huge i.e 2123516. I want to list all files and want to copy their name in one file but when I run the following ...

innervoice

21

asked Jan 21, 2020 at 5:59

0 votes

1 answer

1k views

AWS FSx for lustre with S3 vs EMR (with EMRFS) for spark jobs

We are currently using EMR for easy job submission for our spark jobs. Recently I came across the "FSx lustre + S3" solution that is being advertised as ideal for HPC situations. EMRFS however is also ...

dimisjim

265

asked Jan 12, 2020 at 1:21

0 votes

1 answer

310 views

is it possible mix different RHEL OS version in hadoop cluster?

we are using the following HDP cluster with ambari , list of nodes and their RHEL version 3 masters machines ( with namenode & resource manager ) , installed on RHEL 7.2 312 DATA-NODES machines ...

shalom

531

asked Nov 20, 2019 at 19:48

0 votes

1 answer

2k views

HDFS block deletion speed - cause, expectance, tuning?

I have a small (testing) HDFS cluster which I use as snapshot backup space for Flink. Flink creates and deletes roughly 1000 (small) files per second. The namenode seems to handle this without ...

Caesar

111

asked Nov 7, 2019 at 7:01

Stack Exchange Network

Questions tagged [hdfs]

HDFS + Using very large disks with HDFS

Hadoop + warnings as slow block-receive from data-node machines

Does VM machine can replace physical machine,

HDP cluster + journal nodes get out of Sync

How does placing data in various racks help to exploit the fact that intra-rack aggregated bandwidth>=inter-rack bandwidth?

HDFS + how to disable the "du -sk" verifcation on data node disks

Hadoop recommissioning datanode

Change HDFS replication factor

HDFS. How to free 1 particular disk

Hadoop Cluster Capacity Planning of Data Nodes for disks per data node

Optimal RAID configuration for EC2 instance store used for HDFS

List all files in hdfs directory

AWS FSx for lustre with S3 vs EMR (with EMRFS) for spark jobs

is it possible mix different RHEL OS version in hadoop cluster?

HDFS block deletion speed - cause, expectance, tuning?

Hot Network Questions