[LTP] [PATCH v3] thermal: add new test group
Kubaj, Piotr
piotr.kubaj@intel.com
Fri Jan 23 14:12:44 CET 2026
2026-01-22 (木) の 15:07 +0100 に Petr Vorel さんは書きました:
> Hi Piotr,
>
> not a complete review, just few thoughts.
>
> > diff --git a/testcases/kernel/thermal/thermal_interrupt_events.c
> > b/testcases/kernel/thermal/thermal_interrupt_events.c
> ...
> > +/*\
> > + * Tests the CPU package thermal sensor interface for Intel
> > platforms.
> > +
> > + * Works by checking the initial count of thermal interrupts.
> IMHO this part should be in setup function.
Done.
>
> > Then it
> > + * decreases the threshold for sending a thermal interrupt to just
> > above
> > + * the current temperature and runs a workload on the CPU.
>
> First, why test needs to run for 30 sec and then sleep for 10 sec?
> Is it possible to check the expected value from sysfs and quit
> testing before?
> Also we have TST_RETRY_FUNC() and TST_RETRY_FN_EXP_BACKOFF() which
> could be used
> for the loop (you create a function which does the check to quit
> before the
> timeout.
Both those macros are unsuitable for this test. TST_RETRY_FUNC() uses a
constant sleep time and TST_RETRY_FN_EXP_BACKOFF() increases its
timeout.
Here the point is to use a decreasing timeout. The test starts with 10s
cooldown to make sure that even pre-production CPU's, which might have
their thermal protections disabled, cool down properly. Once sleep time
reaches 0, the conclusion is that either there was not enough workload
or somehow interrupts are not triggered after all.
>
> > Finally, it restores
> > + * the original thermal threshold and checks whether the number of
> > thermal
> > + * interrupts increased.
> NOTE: the restore will not work if test exits before for some reason
> (i.e. any of SAFE_*() macros fail). Therefore restore should be in a
> cleanup
> function.
Done.
>
> > + */
> > +
> > +#include "tst_safe_stdio.h"
> > +#include "tst_test.h"
> > +#include <ctype.h>
> > +#include <pthread.h>
> > +#define PATH_LEN 69
> > +#define STRING_LEN 23
> > +
> > +static void *cpu_workload(void *arg)
> > +{
> > + time_t start_time = time(NULL);
> > + int num = 2;
> > +
> > + while (difftime(time(NULL), start_time) < *(double *)arg)
> > {
>
> difftime() is sufficient, but FYI we have include/tst_timer.h for
> measuring time difference (which uses more precise struct timeval).
>
> > + for (int i = 2; i * i <= num; i++) {
> > + if (num % i == 0)
> > + break;
> > + }
> > + num++;
> This is supposed to do some real workload on CPU? Is that really
> enough?
Yes, my work desktop currently runs at around 38C, when I start this
test it very quickly goes to around 80C, then when the fan starts
running faster, it decreases to about 65C.
> > + }
> > + return NULL;
> > +}
> ...
> > +static void run(void)
> > +{
> > + bool status = 1;
> > + char line[8192];
> > + int nproc = tst_ncpus();
> > + uint64_t interrupt_init[nproc], interrupt_later[nproc];
> > +
> > + tst_res(TDEBUG, "Number of logical cores: %d", nproc);
> > + read_interrupts(interrupt_init, nproc);
> > +
> > + DIR *dir = SAFE_OPENDIR("/sys/class/thermal/");
> > + struct dirent *entry;
> > + uint8_t tz_counter = 0;
> I'm not sure if AI advices you to use it, but really we don't
> optimize that much
> that we could not afford just to use plain int. Using exact-width
> types is
> usually used only when needed, e.g. when mapping some struct from
> kernel.
Wrong address here :) I'm anti-FAMA, pro-selfhosting etc. and the
current AI fad goes very much against my principles.
>
> > + while ((entry = SAFE_READDIR(dir))) {
> > + if ((strncmp(entry->d_name, "thermal_zone",
> > sizeof("thermal_zone"))) > 0)
> > + tz_counter++;
> > + }
> > + SAFE_CLOSEDIR(dir);
> > + tst_res(TDEBUG, "Found %d thermal zone(s)", tz_counter);
>
> I still think this part can be done in setup and needed values stored
> in static
> variables (outside function), i.e. static int tz_counter, nproc; Why?
> Because some
> people can run test 1000x via "-i 1000", therefore we cache various
> preparation
> results.
Done.
> > +
> > + bool x86_pkg_temp_tz[tz_counter], x86_pkg_temp_tz_found =
> > 0;
> > +
> > + memset(x86_pkg_temp_tz, 0, sizeof(x86_pkg_temp_tz));
> > +
> > + for (uint8_t i = 0; i < tz_counter; i++) {
> > + char path[PATH_LEN];
> > +
> > + snprintf(path, PATH_LEN,
> > "/sys/class/thermal/thermal_zone%d/type", i);
> > + tst_res(TDEBUG, "Checking whether %s is
> > x86_pkg_temp", path);
> > +
> > + SAFE_FILE_SCANF(path, "%s", line);
> > + if (strstr(line, "x86_pkg_temp")) {
> > + tst_res(TDEBUG, "Thermal zone %d uses
> > x86_pkg_temp", i);
> > + x86_pkg_temp_tz[i] = 1;
> > + x86_pkg_temp_tz_found = 1;
> > + }
> > + }
> > + if (!x86_pkg_temp_tz_found) {
> > + tst_res(TINFO, "No thermal zone uses
> > x86_pkg_temp");
> > + status = 0;
> > + }
> > +
> > + for (uint8_t i = 0; i < tz_counter; i++) {
> Again, please just use int.
Done.
>
> > + if (x86_pkg_temp_tz[i]) {
> > + char path[PATH_LEN], temp_path[PATH_LEN],
> > trip_path[PATH_LEN], temp_high[12], trip[12];
>
> You read and write integer values. Using char array is
> overcomplicated:
> int trip, temp_high = temp + 10;
>
> (But maybe have 10 defined as a constant with meaningful name. That
> is kind of
> documentation (instead reader having to read the code to understand
> the meaning
> of the number).
>
> #define TEMP_INCREMENT 10
>
> > + double run_time = 30;
> #define RUNTIME 30
>
> > + uint8_t sleep_time = 10;
> #define SLEEP 10
Done.
>
> > + int temp;
> > +
> > + snprintf(path, PATH_LEN,
> > "/sys/class/thermal/thermal_zone%d/", i);
> > + strncpy(temp_path, path, PATH_LEN);
> > + strncat(temp_path, "temp", 4);
> > + tst_res(TDEBUG, "Testing %s", temp_path);
> nit: I'd put this as TINFO to get at least some debug info without -
> D.
>
> > + SAFE_FILE_SCANF(temp_path, "%d", &temp);
> All SAFE_*() macros quit testing, therefore the following check is
> not needed.
It's necessary because if the temperature is below 0, there's most
likely some kernel or sensor issue.
>
> > + if (temp < 0) {
> > + tst_brk(TBROK, "Unexpected zone
> > temperature value %d", temp);
> > + status = 0;
> > + }
> > + tst_res(TDEBUG, "Current temperature for
> > %s: %d", path, temp);
> > +
> This would not be needed:
Done.
> > + snprintf(temp_high, sizeof(temp_high),
> > "%d", temp + 10);
> > +
> > + strncpy(trip_path, path, PATH_LEN);
> > + strncat(trip_path, "trip_point_1_temp",
> > 17);
> > +
> > + tst_res(TDEBUG, "Setting new
> > trip_point_1_temp value: %s", temp_high);
> > + SAFE_FILE_SCANF(trip_path, "%s", trip);
> SAFE_FILE_SCANF(trip_path, "%d", &trip);
>
> > + SAFE_FILE_PRINTF(trip_path, "%s",
> > temp_high);
> SAFE_FILE_PRINTF(trip_path, "%d", temp_high);
> > +
> > + while (sleep_time > 0) {
>
> > + tst_res(TDEBUG, "Running for %f
> > seconds, then sleeping for %d seconds", run_time, sleep_time);
> > + pthread_t threads[nproc];
> > +
> > + for (uint16_t j = 0; j < nproc;
> > j++)
> > + pthread_create(&threads[j]
> > , NULL, cpu_workload, &run_time);
> IMHO using pthread (and therefore -lpthread) is overkill.
> Could you please use simple fork()?
>
> testcases/cve/cve-2017-2618.c
>
> for (i = 0; i < nproc; i++) {
> if (!SAFE_FORK()) {
> cpu_workload();
> exit(0);
> }
>
> tst_reap_children();
> }
>
Done.
> > + for (uint16_t j = 0; j < nproc;
> > j++)
> > + pthread_join(threads[j],
> > NULL);
> > +
> > + SAFE_FILE_SCANF(temp_path, "%d",
> > &temp);
> > + tst_res(TDEBUG, "Temperature for
> > %s after a test: %d", path, temp);
> > +
> > + if (temp > atol(temp_high))
> > + break;
> > + sleep(sleep_time--);
> > + run_time -= 3;
> > + }
> > + if (temp <= atol(temp_high)) {
> > + tst_res(TINFO, "Zone temperature
> > is not rising as expected");
> > + status = 0;
> > + }
>
> > +
> > + tst_res(TDEBUG, "Restoring original
> > trip_point_1_temp value: %s", trip);
> > + SAFE_FILE_PRINTF(trip_path, "%s", trip);
> > + }
> > + }
> > + read_interrupts(interrupt_later, nproc);
> > +
> > + for (uint16_t i = 0; i < nproc; i++) {
> > + if (interrupt_later[i] < interrupt_init[i]) {
> > + tst_res(TINFO, "For CPU %d interrupt
> > counter is currently %ld, while it was %ld before the test", i,
> > interrupt_later[i], interrupt_init[i]);
> > + status = 0;
> > + }
> > + }
> > +
> > + if (status)
> > + tst_res(TPASS, "x86 package thermal interrupt
> > triggered");
> > + else
> > + tst_res(TFAIL, "x86 package thermal interrupt did
> > not trigger");
> > +}
> > +
> > +static struct tst_test test = {
> > + .min_runtime = 180,
>
> > + .needs_root = true,
> nit: We always set it as 1.
Done.
>
> Kind regards,
> Petr
>
> > + .supported_archs = (const char *const []) {
> > + "x86",
> > + "x86_64",
> > + NULL
> > + },
> > + .test_all = run
> > +};
---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach handlowych.
Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.
More information about the ltp
mailing list