I’m troubleshooting an NFS issue on SUSE Linux Enterprise Server 12 SP3.
Setup
Server: SUSE Linux 12 SP3, exporting via NFSv3 and NFSv4.
Clients: AWS EKS 5 old worker node (This work) , 2 new worker node (Not working)
Firewall: verified open (2049, 111, etc.).
Protocol: NFSv4 — old clients mount and work normally, new client fails.
Symptoms
On the new client, mount -t nfs -o vers=4.1 ... hangs. tcpdump on the server/client shows the NFS request goes out but never completes: netstat on the client shows the connection established but stuck in use. Disk I/O is fine (vmstat shows no bottleneck). I try add thread to 64 as well. it still not working.
Diagnostics
nfsstat -s
Server nfsstat -s (excerpt):
Server rpc stats:
calls badcalls badclnt badauth xdrcall
10365066 2 0 0 0
Server nfs v3:
null 19 0%
getattr 26948 25%
setattr 239441 1%
lookup 413204 9%
access 413204 16%
readlink 0 0%
read 128517 5%
write 233267 9%
create 89095 3%
mkdir 566 0%
symlink 0 0%
mknod 0 0%
remove 17886 0%
rmdir 3634 0%
rename 3634 0%
link 0 0%
readdir 4622 0%
readdirplus 1389 0%
fsstat 694822 27%
fsinfo 0 0%
pathconf 6 0%
commit 27055 1%
Server nfs v4:
null 17 0%
compound 7820412 99%
Server nfs v4 operations:
op0-unused 0 0%
op1-unused 0 0%
op2-future 0 0%
access 70166 0%
close 32186 0%
commit 21028 0%
create 46 0%
delego_purge 0 0%
delegreturn 22263 0%
getattr 3386908 14%
getfh 25887 0%
link 0 0%
lock 0 0%
lockt 0 0%
locku 6437 0%
lookup 0 0%
lookup_root 0 0%
nverify 0 0%
open 32396 0%
openattr 0 0%
open_conf 0 0%
open_dgrd 116 0%
putfh 7374611 32%
putpubfh 0 0%
putrootfh 0 0%
read 43231 0%
readdir 524 0%
readlink 0 0%
remove 3917678 17%
rename 164 0%
renew 0 0%
restorefh 0 0%
savefh 0 0%
secinfo 0 0%
secinfo_no 0 0%
setattr 3917678 17%
setcltid 0 0%
setcltidconf 0 0%
verify 0 0%
write 116222 0%
rellockowner 2092 0%
bc_ctl 0 0%
bind_conn 0 0%
exchange_id 0 0%
create_ses 419831 0%
destroy_ses 0 0%
free_stateid 0 0%
getdirdeleg 0 0%
getdevinfo 0 0%
getdevlist 0 0%
layoutcommit 0 0%
layoutget 0 0%
layoutreturn 0 0%
secinfo_non 0 0%
sequence 7400510 32%
set_ssv 0 0%
test_stateid 0 0%
want_deleg 0 0%
destroy_clid 18 0%
reclaim_comp 15 0%
NFSv4 stats look normal (compound, getattr, setattr, sequence, etc.).
Only 2 badcalls out of ~10M requests.
vmstat -1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 536 84668 104136 684344 0 0 0 8 36 70 0 0 99 0 0
0 0 536 84668 104136 684344 0 0 0 0 86 103 0 0 100 0 0
0 0 536 84668 104136 684344 0 0 0 0 36 31 0 0 100 0 0
0 0 536 84668 104136 684344 0 0 0 0 39 30 0 0 100 0 0
0 0 536 84668 104136 684344 0 0 0 0 72 70 0 0 100 0 0
top
top - 19:32:26 up 5 days, 10 min, 1 user, load average: 0.00, 0.00, 0.00
Tasks: 179 total, 1 running, 178 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.3 us, 0.0 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.3 st
KiB Mem : 1014468 total, 943204 used, 71264 free, 104544 buffers
KiB Swap: 8387580 total, 604 used, 8386976 free, 695416 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 37724 5812 3908 S 0.000 0.573 4:29.80 systemd
2 root 20 0 0 0 0 S 0.000 0.000 0:00.01 kthreadd
3 root 20 0 0 0 0 S 0.000 0.000 0:13.14 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.000 0.000 0:00.00 kworker/0:0H
7 root 20 0 0 0 0 S 0.000 0.000 0:50.74 rcu_sched
8 root 20 0 0 0 0 S 0.000 0.000 0:00.00 rcu_bh
9 root rt 0 0 0 0 S 0.000 0.000 0:00.00 migration/0
10 root rt 0 0 0 0 S 0.000 0.000 0:02.27 watchdog/0
11 root 20 0 0 0 0 S 0.000 0.000 0:00.00 kdevtmpfs
12 root 0 -20 0 0 0 S 0.000 0.000 0:00.00 netns
13 root 20 0 0 0 0 S 0.000 0.000 0:00.00 perf
14 root 20 0 0 0 0 S 0.000 0.000 0:00.00 xenwatch
15 root 20 0 0 0 0 S 0.000 0.000 0:00.00 xenbus
16 root 20 0 0 0 0 S 0.000 0.000 0:00.00 khungtaskd
17 root 20 0 0 0 0 S 0.000 0.000 0:00.00 writeback
18 root 25 5 0 0 0 S 0.000 0.000 0:00.00 kcompactd0
19 root 20 0 0 0 0 S 0.000 0.000 0:00.00 ksmd
20 root 20 0 0 0 0 S 0.000 0.000 0:01.53 khugepaged
21 root 20 0 0 0 0 S 0.000 0.000 0:00.00 crypto
22 root 20 0 0 0 0 S 0.000 0.000 0:00.00 kintegrityd
23 root 20 0 0 0 0 S 0.000 0.000 0:00.00 bioset
24 root 20 0 0 0 0 S 0.000 0.000 0:00.00 vblockd
25 root 20 0 0 0 0 S 0.000 0.000 0:00.00 devfreq_wq
26 root 20 0 0 0 0 S 0.000 0.000 0:00.00 kswapd0
27 root 20 0 0 0 0 S 0.000 0.000 0:00.00 vmstat
28 root 20 0 0 0 0 S 0.000 0.000 0:00.00 fsnotify_mark
29 root 20 0 0 0 0 S 0.000 0.000 0:00.00 ecryptfs-kthrea
30 root 20 0 0 0 0 S 0.000 0.000 0:00.00 kthrotld
tcpdump
19:11:02.021752 IP 10.24.129.188.726 > 10.24.17.151.2049: Flags [P.], seq 2240:2520, ack 385, win 491, options [nop,nop,TS val 2653014297 ecr 2402970920], length 280: NFS request xid 136786115 276 getattr fh 0,1/43
19:11:02.021920 IP 10.24.17.151.2049 > 10.24.129.188.726: Flags [P.], seq 385:433, ack 2520, win 1140, options [nop,nop,TS val 2402971176 ecr 2653014297], length 48: NFS reply xid 136786115 reply ok 44 getattr ERROR: Request couldn't be completed in time
19:11:02.022207 IP 10.24.129.188.726 > 10.24.17.151.2049: Flags [.], ack 433, win 491, options [nop,nop,TS val 2653014297 ecr 2402971176], length 0
19:11:03.045780 IP 10.24.129.188.726 > 10.24.17.151.2049: Flags [P.], seq 2520:2800, ack 433, win 491, options [nop,nop,TS val 2653015321 ecr 2402971176], length 280: NFS request xid 153563331 276 getattr fh 0,1/43
19:11:03.046012 IP 10.24.17.151.2049 > 10.24.129.188.726: Flags [P.], seq 433:481, ack 2800, win 1148, options [nop,nop,TS val 2402971432 ecr 2653015321], length 48: NFS reply xid 153563331 reply ok 44 getattr ERROR: Request couldn't be completed in time
Question
Since:
On SUSE Linux Enterprise Server 12 SP3:
- Server-side counters are clean (nfsstat -s shows ~10M calls, only 2 badcalls).
- Older clients can mount fine using NFSv4.
- A new client fails — tcpdump shows NFS requests going out but no complete response.
Where should I focus next to troubleshoot and fix this?
- Could this be a client-side protocol/config mismatch (mount options, idmapd, domain)?
- Or is there a way to confirm if the server is silently dropping these requests?