<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-size:small"><br></div></div><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Mar 1, 2019 at 6:09 AM Jan Stancek <<a href="mailto:jstancek@redhat.com" target="_blank">jstancek@redhat.com</a>> wrote:</div><div dir="ltr" class="gmail_attr"><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

Current behaviour varies a lot depending on system. I'm thinking if we should<br>

just set it to 80% of free RAM. We already have number of OOM tests,<br>

so maybe we don't need to worry about memory pressure here too.<br></blockquote><div><br></div><div><div class="gmail_default" style="font-size:small">Yes, I'm ok with that change. Since if we decrease the allocated consumption to 50% mem+swap, that probably only do allocating in the part of free mem too.</div></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

> +<br>

> +     act.sa_handler = handler;<br>

> +     act.sa_flags = 0;<br>

> +     sigemptyset(&act.sa_mask);<br>

> +     sigaction(SIGRTMIN, &act, 0);<br>

<br>

I was thinking if we can't "abuse" tst_futexes a bit. It's a piece of<br>

shared memory we already have and could use for an atomic counter.<br>

<br>

<snip><br><br>

> +     /* waits in the loop for all children finish allocating*/<br>

> +     while(pid_count < pid_cntr)<br>

> +             sleep(1);<br>

<br>

What happens if one child hits OOM?<br></blockquote><div><br></div><div><div class="gmail_default" style="font-size:small">LTP new API does wait and check child status for the test, if one child_A(allocating finished and status paused) hits OOM, it will just break and report status, but that's ok for this event, because other children which still allocating will keep running after system reclaiming memory from child_A. So parent process will recieve all of children's SIGRTMIN signal and break from the while loop correctly. </div></div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">Anthoer situation(I haven't hit), is one child_B(still allocating and not finishes) was killed by OOM. that will make parent fall into an infinite loop here. From OOM mechanism, oom-killer likes to choose high score process, so this situation maybe not easy to reproduce. But that not mean it will not, since oom-killer is not perfect.</div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">Anyway, to avoid the second situation occuring, I'd like to take you advice to make parent exiting loop safly with many check actions.</div><div class="gmail_default"><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

>  <br>

> -             if (sigchld_count) {<br>

> -                     tst_resm(TFAIL, "child process exited unexpectedly");<br>

> -             } else if (dowrite) {<br>

> -                     tst_resm(TPASS, "%llu kbytes allocated and used.",<br>

> -                              original_maxbytes / 1024);<br>

> -             } else {<br>

> -                     tst_resm(TPASS, "%llu kbytes allocated only.",<br>

> -                              original_maxbytes / 1024);<br>

> -             }<br>

> +     if (dowrite) {<br>

> +             sysinfo(&sstats);<br>

> +             /* Total Free Post-Test RAM */<br>

> +             post_mem = (unsigned long long)sstats.mem_unit * sstats.freeram;<br>

> +             post_mem = post_mem + (unsigned long long)sstats.mem_unit *<br>

> sstats.freeswap;<br>

>  <br>

> +             if (((pre_mem - post_mem) < original_maxbytes))<br>

> +                     tst_res(TFAIL, "kbytes allocated and used less than expected %llu",<br>

> +                                     original_maxbytes / 1024);<br>

> +             else<br>

> +                     tst_res(TPASS, "%llu kbytes allocated and used",<br>

> +                                     original_maxbytes / 1024);<br>

> +     } else {<br>

> +             tst_res(TPASS, "%llu kbytes allocated only",<br>

> +                             original_maxbytes / 1024);<br>

> +     }<br>

> +<br>

> +     i = 0;<br>

> +     while (pid_list[i] > 0) {<br>

> +             kill(pid_list[i], SIGCONT);<br>

> +             i++;<br>

>       }<br>

> -     cleanup();<br>

> -     tst_exit();<br>

>  }<br>

> +<br>

> +static struct tst_test test = {<br>

> +     .forks_child = 1,<br>

> +     .options = mtest_options,<br>

> +     .setup = setup,<br>

> +     .cleanup = cleanup,<br>

> +     .test_all = mem_test,<br>

<br>

Is default timeout going to work on large boxes (256GB+ RAM)?<br></blockquote><div><br></div><div><div class="gmail_default" style="font-size:small">No. </div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">I had the same worries before, but in this test, the number of children(max_pids) will be increased dynamically with the system total memory size. And each child allocating won't beyond the 'alloc_bytes' (alloc_bytes = MIN(THREE_GB, alloc_maxbytes)) limitaion, so an extra time consumption part is just by forking, but from my evaluation on a 4T ram system, mtest01 finishes very faster(99% mem+swap, 2m22sec) than I expected. So the default timeout is not trigger at all.</div></div><div> </div><div><div class="gmail_default" style="font-size:small"><div class="gmail_default"># cat /proc/meminfo  | grep Mem</div><div class="gmail_default">MemTotal:       4227087524 kB</div><div class="gmail_default">MemFree:        4223159948 kB</div><div class="gmail_default">MemAvailable:   4213257308 kB</div></div><br></div><div><div class="gmail_default" style="font-size:small"></div><div class="gmail_default" style="font-size:small"># time ./mtest01 -p99 -w</div><div class="gmail_default">tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s</div><div class="gmail_default">mtest01.c:113: INFO: Total memory already used on system = 3880348 kbytes</div><div class="gmail_default">mtest01.c:120: INFO: Total memory used needed to reach maximum = 4188969005 kbytes</div><div class="gmail_default">mtest01.c:134: INFO: Filling up 99% of ram which is 4185088657 kbytes</div><div class="gmail_default" style="font-size:small">...</div></div><div class="gmail_default">mtest01.c:185: INFO: ... 3221225472 bytes allocated and used in child 41779</div><div class="gmail_default">mtest01.c:281: PASS: 4185132681 kbytes allocated and used</div><div class="gmail_default">...</div><div class="gmail_default"><br></div><div class="gmail_default">real<span style="white-space:pre-wrap">    </span>2m22.213s</div><div class="gmail_default">user<span style="white-space:pre-wrap">    </span>79m52.390s</div><div class="gmail_default">sys<span style="white-space:pre-wrap">    </span>351m56.059s</div><div class="gmail_default"><br></div><div class="gmail_default" style="font-size:small"></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

<br>

Thinking loud, what if...<br>

- we define at the start of test how much memory we want to allocate (target == 80% of free RAM)<br>

- we allocate a shared memory for counter, that each child increases<br>

  as it allocates memory (progress)<br>

  (or we abuse tst_futexes)<br>

  we could use tst_atomic_add_return() to count allocated chunks globally<br>

- once child finishes allocation it will pause()<br>

- we set timeout to ~3 minutes<br>

- main process runs in loop, sleeps, and periodically checks<br>

  - if progress reached target, PASS, break<br>

  - if progress hasn't increased in last 15 seconds, FAIL, break<br>

  - if we are 15 seconds away from timeout, end test early, PASS, break<br>

    (reason is to avoid running too long on big boxes)<br>

- kill all children, exit<br>

<br></blockquote><div><br></div><div><div class="gmail_default" style="font-size:small">Real good suggestions, I will try to take some of them in V2.</div></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

Regards,<br>

Jan<br>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail-m_7786714366719917067gmail_signature"><div dir="ltr"><div>Regards,<br></div><div>Li Wang<br></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div>