[LTP] [PATCH] Add hugemmap41(Migrating the libhugetlbfs/testcases/slbpacaflush.c test)
Petr Vorel
pvorel@suse.cz
Mon Feb 19 23:10:47 CET 2024
Hi Shirisha,
> We are verifying ppc64 kernels prior to 2.6.15-rc5 exhibit a bug in the
Test is from 2005, for 2.6.15-rc5. Is it really relevant now?
> hugepage SLB flushing path. When opening new hugetlb areas, updating masks
> in the thread_struct and copying to the PACA only occurs on the CPU where
> segments are opened, leading to potential stale copies in other CPUs.
> This bug can be triggered by multiple threads sharing the mm or a single thread
> migrating between CPUs, particularly evident in a close-to-idle system,
> as other processes may flush the SLB and prevent the bug from manifesting.
Please run make check-hugemmap41 in the test directory and fix formatting.
> Original test originates from https://github.com/libhugetlbfs/libhugetlbfs/blob/master/tests/slbpacaflush.c
> Signed-off-by: Shirisha G <shirisha@linux.ibm.com>
> ---
> runtest/hugetlb | 1 +
> testcases/kernel/mem/.gitignore | 1 +
> .../kernel/mem/hugetlb/hugemmap/hugemmap41.c | 144 ++++++++++++++++++
> 3 files changed, 146 insertions(+)
> create mode 100644 testcases/kernel/mem/hugetlb/hugemmap/hugemmap41.c
> diff --git a/runtest/hugetlb b/runtest/hugetlb
> index 299c07ac9..d956866ac 100644
> --- a/runtest/hugetlb
> +++ b/runtest/hugetlb
> @@ -35,6 +35,7 @@ hugemmap29 hugemmap29
> hugemmap30 hugemmap30
> hugemmap31 hugemmap31
> hugemmap32 hugemmap32
> +hugemmap41 hugemmap41
nit: Any reason why not add it as hugemmap33? You don't want to clash with other
sent test, right?
> hugemmap05_1 hugemmap05 -m
> hugemmap05_2 hugemmap05 -s
> hugemmap05_3 hugemmap05 -s -m
> diff --git a/testcases/kernel/mem/.gitignore b/testcases/kernel/mem/.gitignore
> index c96fe8bfc..b7e108956 100644
> --- a/testcases/kernel/mem/.gitignore
> +++ b/testcases/kernel/mem/.gitignore
> @@ -34,6 +34,7 @@
> /hugetlb/hugemmap/hugemmap30
> /hugetlb/hugemmap/hugemmap31
> /hugetlb/hugemmap/hugemmap32
> +/hugetlb/hugemmap/hugemmap41
> /hugetlb/hugeshmat/hugeshmat01
> /hugetlb/hugeshmat/hugeshmat02
> /hugetlb/hugeshmat/hugeshmat03
> diff --git a/testcases/kernel/mem/hugetlb/hugemmap/hugemmap41.c b/testcases/kernel/mem/hugetlb/hugemmap/hugemmap41.c
> new file mode 100644
> index 000000000..3b3388c68
> --- /dev/null
> +++ b/testcases/kernel/mem/hugetlb/hugemmap/hugemmap41.c
> @@ -0,0 +1,144 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (C) 2005-2006 IBM Corporation.
> + * Author: David Gibson & Adam Litke
> + */
> +/*\
> + * [Description]
> + *
> + * ppc64 kernels (prior to 2.6.15-rc5) have a bug in the hugepage SLB
> + * flushing path. After opening new hugetlb areas, we update the
> + * masks in the thread_struct, copy to the PACA, then do slbies on
> + * each CPU. The trouble is we only copy to the PACA on the CPU where
> + * we're opening the segments, which can leave a stale copy in the
> + * PACAs on other CPUs.
> + *
> + * This can be triggered either with multiple threads sharing the mm,
> + * or with a single thread which is migrated from one CPU, to another
> + * (where the mapping occurs), then back again (where we touch the
> + * stale SLB). We use the second method in this test, since it's
> + * easier to force (using sched_setaffinity). However it relies on a
> + * close-to-idle system, if any process other than a kernel thread
> + * runs on the first CPU between runs of the test process, the SLB
> + * will be flushed and we won't trigger the bug, hence the
> + * PASS_INCONCLUSIVE(). Obviously, this test won't work on a 1-cpu
> + * system (should get CONFIG() on the sched_setaffinity)
> + *
> + */
> +#define _GNU_SOURCE
> +#include "hugetlb.h"
> +#define SYSFS_CPU_ONLINE_FMT "/sys/devices/system/cpu/cpu%d/online"
> +#define MNTPOINT "hugetlbfs/"
Could you please have these 2 #define above _GNU_SOURCE? (readablility)
> +
> +
> +#include <stdio.h>
> +#include <sched.h>
> +
> +
> +long hpage_size;
> +int fd;
> +void *p;
> +volatile unsigned long *q;
> +int online_cpus[2], err;
> +cpu_set_t cpu0, cpu1;
> +
> +
> +
Please remove these blank lines above.
> +void check_online_cpus(int online_cpus[], int nr_cpus_needed)
> +{
> + char cpu_state, path_buf[64];
> + int total_cpus, cpu_idx, fd, ret, i;
> +
> + total_cpus = get_nprocs_conf();
> + cpu_idx = 0;
> +
> + if (get_nprocs() < nr_cpus_needed)
> + tst_res(TFAIL, "Atleast online %d cpus are required", nr_cpus_needed);
> +
> + for (i = 0; i < total_cpus && cpu_idx < nr_cpus_needed; i++) {
nit: Maybe just use get_nprocs_conf() directly?
> + errno = 0;
Is it errno reset really needed?
> + sprintf(path_buf, SYSFS_CPU_ONLINE_FMT, i);
> + fd = open(path_buf, O_RDONLY);
> + if (fd < 0) {
> + /* If 'online' is absent, the cpu cannot be offlined */
> + if (errno == ENOENT) {
> + online_cpus[cpu_idx] = i;
> + cpu_idx++;
> + continue;
> + } else {
> + tst_res(TFAIL, "Unable to open %s: %s", path_buf,
> + strerror(errno));
We have TERRNO:
tst_res(TFAIL | TERRNO, "Unable to open %s: %s", path_buf);
> + }
> + }
> +
> + ret = read(fd, &cpu_state, 1);
> + if (ret < 1)
> + tst_res(TFAIL, "Unable to read %s: %s", path_buf,
> + strerror(errno));
Maybe use SAFE_READ() ?
> +
> + if (cpu_state == '1') {
> + online_cpus[cpu_idx] = i;
> + cpu_idx++;
> + }
> +
> + if (fd >= 0)
> + SAFE_CLOSE(fd);
> + }
> +
> + if (cpu_idx < nr_cpus_needed)
> + tst_res(TFAIL, "Atleast %d online cpus were not found", nr_cpus_needed);
> +}
> +
> +
> +static void run_test(void)
> +{
> + check_online_cpus(online_cpus, 2);
> + CPU_ZERO(&cpu0);
> + CPU_SET(online_cpus[0], &cpu0);
> + CPU_ZERO(&cpu1);
> + CPU_SET(online_cpus[1], &cpu1);
> +
> + err = sched_setaffinity(getpid(), CPU_SETSIZE/8, &cpu0);
> + if (err != 0)
> + tst_res(TFAIL, "sched_setaffinity(cpu%d): %s", online_cpus[0],
> + strerror(errno));
Again, please use TTERRNO.
> +
> + err = sched_setaffinity(getpid(), CPU_SETSIZE/8, &cpu1);
Maybe define CPU_SETSIZE/8 at the top?
> +
> + if (err != 0)
> + tst_res(TFAIL, "sched_setaffinity(cpu%d): %s", online_cpus[1],
> + strerror(errno));
> + p = SAFE_MMAP(NULL, hpage_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
> +
> + err = sched_setaffinity(getpid(), CPU_SETSIZE/8, &cpu0);
> + if (err != 0)
> + tst_res(TFAIL, "sched_setaffinity(cpu%d): %s", online_cpus[0],
> + strerror(errno));
> + q = (volatile unsigned long *)(p + getpagesize());
> + *q = 0xdeadbeef;
Why to set the address before end of testing? (yes, the original does it, but
why?). Wouldn't be better to use guarded buffers instead?
https://github.com/linux-test-project/ltp/wiki/C-Test-API#131-guarded-buffers
Kind regards,
Petr
> +
> + tst_res(TPASS, "Test Passed inconclusive");
> +}
> +
> +static void setup(void)
> +{
> + hpage_size = tst_get_hugepage_size();
> + fd = tst_creat_unlinked(MNTPOINT, 0);
> +}
> +
> +void cleanup(void)
> +{
> + if (fd > 0)
> + SAFE_CLOSE(fd);
> +}
> +
> +static struct tst_test test = {
> + .needs_root = 1,
> + .mntpoint = MNTPOINT,
> + .needs_hugetlbfs = 1,
> + .needs_tmpdir = 1,
> + .setup = setup,
> + .cleanup = cleanup,
> + .test_all = run_test,
> + .hugepages = {1, TST_NEEDS},
> +};
More information about the ltp
mailing list