[LTP] [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c

Luis Chamberlain mcgrof@kernel.org
Fri Mar 28 02:44:54 CET 2025


On Tue, Mar 25, 2025 at 02:52:49PM +0800, Oliver Sang wrote:
> hi, Luis,
> 
> On Sun, Mar 23, 2025 at 12:07:27AM -0700, Luis Chamberlain wrote:
> > On Sat, Mar 22, 2025 at 06:02:13PM -0700, Luis Chamberlain wrote:
> > > On Sat, Mar 22, 2025 at 07:14:40PM -0400, Johannes Weiner wrote:
> > > > Hey Luis,
> > > > 
> > > > This looks like the same issue the bot reported here:
> > > > 
> > > > https://lore.kernel.org/all/20250321135524.GA1888695@cmpxchg.org/
> > > > 
> > > > There is a fix for it queued in next-20250318 and later. Could you
> > > > please double check with your reproducer against a more recent next?
> > > 
> > > Confirmed, at least it's been 30 minutes and no crashes now where as
> > > before it would crash in 1 minute. I'll let it soak for 2.5 hours in
> > > the hopes I can trigger the warning originally reported by this thread.
> > > 
> > > Even though from code inspection I see how the kernel warning would
> > > trigger I just want to force trigger it on a test, and I can't yet.
> > 
> > Survied 5 hours now. This certainly fixed that crash.
> > 
> > As for the kernel warning, I can't yet reproduce that, so trying to
> > run generic/750 forever and looping
> > ./testcases/kernel/syscalls/close_range/close_range01
> > and yet nothing.
> > 
> > Oliver can you reproduce the kernel warning on next-20250321 ?
> 
> the issue still exists on
> 9388ec571cb1ad (tag: next-20250321, linux-next/master) Add linux-next specific files for 20250321
> 
> but randomly (reproduced 7 times in 12 runs, then ltp.close_range01 also failed.
> on another 5 times, the issue cannot be reproduced then ltp.close_range01 pass)

OK I narrowed down a reproducer to requiring the patch below 


diff --git a/mm/util.c b/mm/util.c
index 448117da071f..3585bdb8700a 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -735,6 +735,8 @@ int folio_mc_copy(struct folio *dst, struct folio *src)
 	long nr = folio_nr_pages(src);
 	long i = 0;
 
+	might_sleep();
+
 	for (;;) {
 		if (copy_mc_highpage(folio_page(dst, i), folio_page(src, i)))
 			return -EHWPOISON;


And  then just running:

dd if=/dev/zero of=/dev/vde bs=1024M count=1024

For some reason a kernel with the following didn't trigger it so the
above patch is needed


CONFIG_PROVE_LOCKING=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_ACPI_SLEEP=y

It may have to do with my preemtpion settings:

CONFIG_PREEMPT_BUILD=y
CONFIG_ARCH_HAS_PREEMPT_LAZY=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
# CONFIG_PREEMPT_LAZY is not set
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y
CONFIG_PREEMPT_DYNAMIC=y
CONFIG_PREEMPT_RCU=y

And so now to see how we should fix it.

  LUis




More information about the ltp mailing list