Questions tagged [hpc]
High Performance Computing encompasses using "supercomputers" with high numbers of CPUs, large parallel storage systems and advanced networks to perform time-consuming calculations. Parallel algorithms and parallelization of storage are essential to this field, as well as issues with complex, fast networking fabrics such as Infiniband.
118 questions
0
votes
0
answers
91
views
How to handle Struck Jobs in LSF?
I am using LSFjob manager on an HPC cluster. Occasionally the Ansys jobs become "stuck" during execution. Once the jobs are stuck the Ansys log and result files stop getting updated.The ...
0
votes
0
answers
67
views
MPI fails on send over InfiniBand unless run as root
In a cluster where the nodes are interconnected over Intel True Scale InfiniBand, an Open MPI job executed via Slurm fails on send unless, for testing purposes, I run it as root:
Traceback (most ...
0
votes
0
answers
75
views
I am setting up some new compute nodes but they are performing poorer than older nodes
So as the question says, I am in the process of migrating to new compute nodes. The new servers are HPE Proliant DL360 Gen 10 and the operating system installed is Ubuntu. These are the specifications ...
0
votes
1
answer
120
views
Why the core id is interleaved in range?
I have a NUMA system with two socket, I'm curious why the core id in NUMA0 is 0-15&32-47, instead of 0-31.
Additional information: Hyper thread disabled in BIOS;
Some related boot args: ...
1
vote
1
answer
963
views
openLDAP works for v2.4 but v2.6 returns: Invalid syntax (21) additional info: objectClass: value #0 invalid per syntax
I have been facing this problem for a few days by now, and I would like to preface this by stating that this is the first time I have ever used ldap. So after debugging a bit around the error I have ...
3
votes
2
answers
2k
views
Install software on Ubuntu using apt without root?
I’ve got accounts on many HPC clusters. The machines have a minimal install, and the admins won’t add much else. I need to install lots of typical software. Normally I’d do this with apt, but of ...
0
votes
1
answer
192
views
Why does the login node connect to external networks but allocated compute node fail in Slurm-GCP?
I've noticed that connecting to the internet from the allocated compute node via Slurm-GCP keeps failing. For example, using wget from the login node works successfully:
[me@gcp-login0 ~]$ wget https:/...
0
votes
1
answer
342
views
Linux missing lvm
hello so i have a ubuntu hpc cluster and i got a problem with storage
whenever i try to access the storage from my compute nodes i cant i keep getting this error
mount:mounting 192.168.100.211:/cm/...
-1
votes
1
answer
900
views
Speeding up SAS data rate from 12Gbps
I'm curious about SAS data transfer speed.
Maximum is 12Gbps in the whole bus (not per drive) as far as I understand, but I have a scenario where I would like to have a faster data rate (hopefully ...
0
votes
0
answers
477
views
Not able to ssh into 2 Compute Nodes on HPE Cluster
I recently added two new Compute Nodes on HPE CLuster , But surprisingly, I am Unable to ssh into the new Compute Nodes from the Head Node .
[Unable to SSH to new Compute Nodes][1]
(base) [root@hn001 ~...
1
vote
1
answer
358
views
HPC master node no infiniband network influence on compute nodes - Slurm management
I'm writing because I'm facing an issue that I cannot solve trying to configure a cluster with a master node ( or Frontend node ) as a Virtual machine managing nodes with infiniband network.
I use ...
0
votes
1
answer
57
views
Single SSH login for multilpe machines?
I have a number of physical (desktop) machines running at the office as part of a new network to handle processing & serving Open Source data; some of these machines also house VMs.
At the moment, ...
2
votes
0
answers
492
views
Infiniband fabric with 3 nodes - newbie
I am trying to connect 3 HP z840 workstations using:
Mellanox ConnectX-3 VPI 40 / 56GbE Dual-Port QSFP Adapter MCX354A-FCBT Mellanox SX6005 12-port Non-blocking Unmanaged 56Gb/s
Description of ...
2
votes
1
answer
1k
views
How can I set up interactive-job-only or batch-job-only partition on a SLURM cluster?
I'm managing a PBS/torque HPC cluster, and now I'm setting up another cluster with SLURM. On the PBS cluster, I can set a queue to accept only interactive jobs by qmgr -c "set queue interactive_q ...
0
votes
0
answers
1k
views
Setting up slurm on a cluster
My IT admin has setup a cluster with 3 nodes, which is administered via Windows server. VMs are hosted via Hyper-V, including an Ubuntu VM to which a substantial portion of the cluster's resources ...