7

I'm working on an NFS solution for RHEL6.5 clients (all VMs) with RHEL6.5 and RHEL7 hosts. Currently, the RHEL7 host with RHEL6.5 clients works fine. The trouble is with the RHEL6.5 host.

These problems might be down to aspects of the server I can't control, as the server has been having issues lately that it didn't last year. If you think that's the issue, please suggest ways I can prove this to my superiors, and begin the process of getting a new machine.

The solution was initially being crafted to use NFSv4, which was going swell. The RHEL6.5 host, however, is not as keen as the RHEL7 host. Mounts succeed, but file access does not work, e.g. cp, less. In terminal, they hang. tail-ing the client's /var/log/messages shows state manager: lease expired failed on NFSv4 server nfs_master with error 10018. Per the standard, that error code is for NFS4ERR_RESOURCE, documented here. My attempt to resolve the resource issue was by increasing the number of nfsd processes via the command-line, and by setting the appropriate config in /etc/sysconfig/nfs. It didn't help. This issue also occurs if the exported directory is mounted on the NFS server itself.

What is not shown in the logs for the host nor client is another error 10022, or at least I assume this is an NFSv4 error code. This is only viewable when tcpdump-ing the interface that the NFS communication is going over: IP test-host.nfs > test_client-1.3297002672: reply ok 52 getattr ERROR: unk 10022 If this error code is indeed an NFSv4 one, then it is for NFS4ERR_STALE_CLIENTID documented here.

When the mount command is changed to set nfsvers=3, actions like cp are successful and generate no errors on the client nor the host. The first attempt will take a little long, 5 seconds maybe, then futures actions are much faster.

At a time there will be at most four clients mounting the export and reading from it, and potentially the same file.

So, my questions are:

  1. What are the server-side resources being referred to by the NFS4ERR_RESOURCE description?
  2. How do I resolve NFS4ERR_RESOURCE and NFS4ERR_STALE_CLIENTID errors?
  3. Why is NFSv3 functioning as expected, but not NFSv4?

nfs-utils version and release (for both clients and RHEL6.5 host): 1.2.3.39.el6

mount commands:

  • mount -n -t nfs -o ro,noexec,timeo=10,retrans=3,retry=0,soft,rsize=32768,intr,noatime
  • mount -n -t nfs -o nfsvers=3,ro,noexec,timeo=10,retrans=3,retry=0,soft,rsize=32768,intr,noatime

EDIT: Our resolution for this issue was to fall back to NFSv3 protocol. Everything works just fine. I won't answer this question with a "just fall back to NFSv3", but this issue is probably too niche to ever see an answer.

Ungeheuer
  • 333
  • 3
  • 14
  • You might want to try to force it to use TCP only. Or UDP only. I forget off-hand which, but that helped me in the past when I had problems with C6 <=> C7. – Aaron D. Marasco Jul 08 '19 at 23:20
  • 1
    @AaronD.Marasco NFSv4 doesn't work over UDP. The standard doesn't mention UDP support here: https://tools.ietf.org/html/rfc7530#section-3.1. Looks like I'll have to move to NFSv3, which shouldn't be a problem. Just a few more `iptables`/`firewalld` changes. – Ungeheuer Jul 09 '19 at 18:44
  • What kernel versions? – bishop Jul 27 '19 at 03:09
  • @bishop 2.6.32-431.el6.x86_64 – Ungeheuer Jul 27 '19 at 04:35

2 Answers2

1

Since you said you have no control over the offending RHEL 6.5: have you checked the NFS-domains do match? There is no such thing with NFSv3 but in NFSv4 you need it (otherwise mounts will still work but file access will act weird, just as you described it).

You configure the NFS-domain in /etc/idmapd.conf and it might be worth it to have a look.

bakunin
  • 479
  • 2
  • 7
0

Try -fstype=nfs4,rw,intr,hard,proto=tcp,port=2049,acl as a test and make sure 2049/tcp is open to the client on the server. If there's a firewall in the way it needs to pass 2049/tcp as well.