[LTP] LTP nfslock01 test failing on NFS v3 (lockd: cannot monitor 10.0.0.2)

Nikita Yushchenko nikita.yushchenko@virtuozzo.com
Tue Jan 18 16:51:27 CET 2022


18.01.2022 18:26, Petr Vorel wrote:
> Hi all,
> 
> this is a test failure posted by Nikita Yushchenko [1]. LTP NFS test nfslock01
> looks to be failing on NFS v3:
> 
> "not unsharing /var makes AF_UNIX socket for host's rpcbind to become available
> inside ltpns. Then, at nfs3 mount time, kernel creates an instance of lockd for
> ltpns, and ports for that instance leak to host's rpcbind and overwrite ports
> for lockd already active for root namespace. This breaks nfs3 file locking."

What exactly happens is:

Test runs 'mount' in non-root netns, trying to mount a directory from root netns of the same host via nfsv3

(Part of) call chain inside the kernel

nfs_try_get_tree()
  nfs3_create_server()
   nfs_create_server()
    nfs_init_server()
     nfs_start_lockd()
      nlmclnt_init()
       lockd_up()
        svc_bind()
         svc_rpcb_setup()
          rpcb_create_local()

... and at this point it tries AF_UNIX connection to /var/run/rpcbind.sock

AF_UNIX is not netns-aware.
So it connects to host's rpcbind.
And overwrites ports registered in host's rpcbind by lockd instance for root namespace. Since this 
point, lockd instance for root namespace becomes no longer accessible (it still listens but nobody can 
learn the ports). Thus nfs locks don't work.

I'm not sure what is the correct behavior here.

Maybe rpcb_create_local() shall detect that it is not in root netns, and only try AF_INET connection to 
localhost in that case.

Maybe it shall not try AF_UNIX at all. Are there any realistic cases when rpcbind is accessible via 
AF_UNIX only?

Nikita


More information about the ltp mailing list