[LTP] [PATCH 1/1] nfsstat01: Update client RPC calls for kernel 6.9

Jeff Layton jlayton@kernel.org
Fri Jul 12 16:07:35 CEST 2024


On Fri, 2024-07-12 at 15:45 +0200, Thorsten Leemhuis wrote:
> On 08.07.24 19:49, Chuck Lever III wrote:
> > > On Jul 8, 2024, at 6:36 AM, Greg KH <greg@kroah.com> wrote:
> > > On Sat, Jul 06, 2024 at 07:46:19AM +0000, Sherry Yang wrote:
> > > > > On Jul 6, 2024, at 12:11 AM, Greg KH <greg@kroah.com> wrote:
> > > > > On Fri, Jul 05, 2024 at 02:19:18PM +0000, Chuck Lever III
> > > > > wrote:
> > > > > > > On Jul 2, 2024, at 6:55 PM, Calum Mackay
> > > > > > > <calum.mackay@oracle.com> wrote:
> > > > > > > On 02/07/2024 5:54 pm, Calum Mackay wrote:
> > > > > > > > I noticed your LTP patch [1][2] which adjusts the
> > > > > > > > nfsstat01 test on v6.9 kernels, to account for Josef's
> > > > > > > > changes [3], which restrict the NFS/RPC stats per-
> > > > > > > > namespace.
> > > > > > > > I see that Josef's changes were backported, as far back
> > > > > > > > as longterm v5.4,
> > [...]
> > > > > > > I'm wondering if this difference between NFS client, and
> > > > > > > NFS server, stat behaviour, across kernel versions, may
> > > > > > > perhaps cause some user confusion?
> > > > > > 
> > > > > > As a refresher for the stable folken, Josef's changes make
> > > > > > nfsstats silo'd, so they no longer show counts from the
> > > > > > whole
> > > > > > system, but only for NFS operations relating to the local
> > > > > > net
> > > > > > namespace. That is a surprising change for some users,
> > > > > > tools,
> > > > > > and testing.
> > > > > > 
> > > > > > I'm not clear on whether there are any rules/guidelines
> > > > > > around
> > > > > > LTS backports causing behavior changes that user tools,
> > > > > > like
> > > > > > nfsstat, might be impacted by.
> > > > > 
> > > > > The same rules that apply for Linus's tree (i.e. no userspace
> > > > > regressions.)
> > > > [...]
> > > > If no userspace regression, should we revert the Josef’s NFS
> > > > client-side changes on LTS?
> > > 
> > > This sounds like a regression in Linus's tree too, so why isn't
> > > it
> > > reverted there first?
> > 
> > There is a change in behavior in the upstream code, but Josef's
> > patches fix an information leak and make the statistics more
> > sensible in container environments. I'm not certain that
> > should be considered a regression, but confess I don't know
> > the regression rules to this fine a degree of detail.
> 
> Chuck pointed me to this thread (I had an eye on it already anyway)
> and
> asked for advice. Take everything I write here with a grain of salt,
> as
> this is somewhat tricky situation which makes it hard to predict how
> Linus would actually want to see this handled. Maybe I should have
> CCed
> him, but I doubt he cares right now; but we maybe should bring him
> in,
> if an actual user complains.
> 
> With that out of the way, let me write a few thoughts:
> 
> * That some test breaks is not a regression, as regressions are about
> "practical issues", not some ABI/API changes that only some tests
> care
> about. So if it's just a test that broke update it.
> 
> * If a user would reported something like "this change broke my app"
> it
> obviously would be something totally different. But that did not
> happen
> yet afaics -- or did it? But from the discussion it sounds like that
> is
> something that will likely happen down the road. If that's the case
> I'd
> say it's best to prevent that from happening.
> 

I doubt anyone outside of automated testcases will ever notice this,
and if they do, then they probably want the new behavior. This was an
oversight when the nfs client and server were first containerized.
These stats should have been made per-net then, but never were.

Basically, the old "/proc/net/rpc/nfs{d}" files presented aggregate
stats for all of the nfsd's on the machine _and_ they presented the
same information no matter the net namespace you're in when reading
them.

Josef's patches changed it so that we collect this information on a
per-net namespace basis, and we only present the totals of the net
namespace reading the procfile.

There are 2 possibilities of breakage:

1) someone in a container expects to see stats for the entire host in
/proc/net/rpc/nfsd. This is a bug -- users in the container should
never have seen this in the first place. In practice, it's probably
pretty benign, but fixing it is the right thing to do.

2) someone in the init_net_ns reads the procfile and expects to see
global totals. Ok, this is a change, but I argue that it's a good one,
since it gives more immediate info about the server running in the
init_net_ns.

In practice most usage of these procfiles is pretty informal (mostly
acting as the source for nfsstat). Segregating this info by container
is the right outcome. It should have been done that way from the get-
go.

> * Not sure how Linus would react if a user would complain that some
> workflow broke because rpc_stat are now per net namespace and shows
> different numbers (e.g. using a format that does not break any apps).
> It
> would likely depend on the actual case and how bad he would consider
> the
> information leak.
> 
> > If it is indeed a regression, how can we go about retaining
> > both behaviors (selectable by Kconfig or perhaps administrative
> > UI)?
> 
> That likely might be the best idea if user report an actual
> regression
> due to this. But switching the format of any existing file creates
> quite
> some trouble, as others already mentioned in this thread. So maybe
> providing the newer format in a different file and allowing to
> disable
> the older one though a Kconfig setting might be the best way forward.
> Sure, it would take years until people would have switched over, but
> that's how it is with our "no regressions" rule.
> 
> Does that help?
> 
> Ciao, Thorsten

-- 
Jeff Layton <jlayton@kernel.org>


More information about the ltp mailing list