[LTP] min_free_kbytes: Handle transient memory drops in check_monitor

Wei Gao wegao@suse.com
Tue Jun 2 09:46:01 CEST 2026


On Tue, Jun 02, 2026 at 04:02:55AM +0000, linuxtestproject.agent@gmail.com wrote:
> Hi Wei,
> 
> Thanks for the patch. A few comments below.
> 
> > min_free_kbytes: Handle transient memory drops in check_monitor
> >
> > High memory pressure can cause MemFree to temporarily drop below the
> > min_free_kbytes threshold before the kernel reclaimer can catch up.
> > This results in intermittent test failures, observed on openQA aarch64
> > virtual machines.
> >
> > Implement a 2-second grace period with high-accuracy 10ms fixed polling
> > in check_monitor() to allow the kernel time to reclaim memory.
> >
> > Introduce a 10% tolerance (90% threshold) for the MemFree check. My
> > measurements showed that under extreme pressure, MemFree can dip as low
> > as ~50% to ~70% of the target. While it typically recovers above 90%
> > within one second, hitting the exact 100% watermark sometimes can take
> > significantly longer. This tolerance prevents false positives during the
> > slow recovery tail while still ensuring memory is maintained near the
> > required level.
> 
> Please rewrite in impersonal form, e.g.:
>   "Measurements under extreme pressure show MemFree can dip as low as
>    ~50% to ~70% of the target."
> 
> Commit messages become a permanent part of the project history and
> should avoid first-person language.
The measurement result is quite related with my local env, also is there any ltp
rule forbids the use of "My"?
> 
> > -unsigned long tune;
> > -unsigned long memfree;
> > +unsigned long tune, threshold;
> > +unsigned long memfree, min_memfree;
> > +int i;
> >
> >  while (!end) {
> >  memfree = SAFE_READ_MEMINFO("MemFree:");
> >  tune = TST_SYS_CONF_LONG_GET(MIN_FREE_KBYTES);
> > +/*
> > + * Allow 10% tolerance to account for transient states.
> > + */
> > +threshold = tune * 9 / 10;
> >
> >  if (memfree < tune) {
> > -tst_res(TINFO, "MemFree is %lu kB, "
> > - "min_free_kbytes is %lu kB", memfree, tune);
> > -tst_res(TFAIL, "MemFree < min_free_kbytes");
> > +min_memfree = memfree;
> > +/*
> > + * Give it some time to reclaim. The kernel should keep
> > + * MemFree above min_free_kbytes, but transient drops
> > + * are possible under high pressure.
> > + * Check every 10ms for up to 2 seconds for high accuracy.
> > + */
> > +for (i = 10; i <= 2000; i += 10) {
> > +usleep(10000);
> > +memfree = SAFE_READ_MEMINFO("MemFree:");
> > +if (memfree < min_memfree)
> > +min_memfree = memfree;
> > +
> > +if (memfree >= tune)
> > +break;
> > +}
> 
> The inner polling loop does not check the 'end' flag, which is set to 1
> by the SIGUSR1 handler when the parent finishes. In the original code,
> the single sleep(2) at the bottom of the outer loop was interrupted by
> the signal (usleep/sleep return EINTR), so the outer while (!end) check
> fired promptly. With the new grace period loop, the signal can fire
> while we are in any of the 200 inner usleep(10000) calls; the sleep
> returns early but the loop body continues, and the outer while (!end)
> cannot be reached until the full ~2 seconds elapse.
> 
> Please add an early exit:
> 
>   for (i = 10; i <= 2000; i += 10) {
>       if (end)
>           return;
>       usleep(10000);
>       memfree = SAFE_READ_MEMINFO("MemFree:");
>       if (memfree < min_memfree)
>           min_memfree = memfree;
>       if (memfree >= tune)
>           break;
>   }
> 
> Alternatively a `break` is sufficient if the outer loop check is
> reached quickly, but `return` is cleaner given the function only has
> one exit point anyway.
Completeness is more important than saving 2 seconds of test time, especially 
for the final sub-test. If we jump out early, we might miss the final result of the test.
> 
> > +if (memfree < threshold) {
> > +tst_res(TFAIL, "MemFree %lu < 90%% of min_free_kbytes %lu (MinSeen: %lu%%) after 2s",
> > +memfree, tune, (min_memfree * 100 / tune));
> > +} else if (memfree < tune) {
> > +tst_res(TINFO, "MemFree (%lu) stayed within 10%% tolerance (min %lu%%) after ~2s",
> > +memfree, (min_memfree * 100 / tune));
> > +} else {
> > +tst_res(TINFO, "MemFree recovered to %lu (min %lu%%) after %d ms",
> > +memfree, (min_memfree * 100 / tune), i);
> > +}
> 
> Minor: the TINFO messages mix kB values and percentage values using the
> same format specifier %lu, which can be confusing in the output. The
> TFAIL line prints MemFree and tune as raw kB values but labels the third
> argument "(MinSeen: N%)" — the intent is clear but adding "kB" units to
> the first two values would be more consistent:
> 
>   "MemFree %lu kB < 90%% of min_free_kbytes %lu kB (MinSeen: %lu%%) after 2s"
> 
> Similarly for the TINFO lines.
Name is min_free_kbytes and the test context is very clear.

Unless there's a logical error or an LTP rule violation, I won't be sending another patch.
> 
> ---
> 
> Verdict: Needs revision
> 
> The missing 'end' check in the inner polling loop is a regression: it
> removes the prompt termination behaviour of the original sleep(2) and
> can delay the test exit by up to 2 seconds after SIGUSR1 is received.
> Please fix before merging. The commit message first-person wording and
> the missing kB unit labels are minor and can be addressed in the same
> respin.
> 
> LTP AI Reviewer


More information about the ltp mailing list