[LTP] [PATCH 0/7] docparse improvements

Richard Palethorpe rpalethorpe@suse.de
Thu Oct 28 10:11:09 CEST 2021


Hello,

Cyril Hrubis <chrubis@suse.cz> writes:

> Hi!
>> It's unfortunate that before starting this effort and the checker that
>> we didn't know about tree-sitter (although Sparse may still be the best
>> choice for the checker).
>> 
>> Tree-sitter can parse C into an AST and can easily be vendored into LTP:
>> https://tree-sitter.github.io/tree-sitter/using-parsers#building-the-library
>> 
>> Then we just need to work on the level of the AST. It also has a query
>> language. This should allow the initial matching to be done on a high
>> level.
>
> The only worry that I have about this would be speed, currently the code
> I wrote takes a few second to process thousands of C files in LTP, that
> is because we take a lot of shortcuts and ignore all the stuff we do not
> need. Full parser that builds AST would be orders of magnitude slower,
> so before we attempt to use it it should be benchmarked properly to see
> if it's fast enough.

It's incredibly fast, it has no trouble parsing the entire kernel.

Weggli uses tree-sitter

https://github.com/googleprojectzero/weggli

rich@g78 ~/q/ltp (master)> time weggli '_ verify_alarm(_) { exit(0); }' .
/home/rich/qa/ltp/./testcases/kernel/syscalls/alarm/alarm03.c:21
static void verify_alarm(void)
{
	pid_t pid;

	TEST(alarm(100));

..
		} else {
			tst_res(TPASS,
				"alarm(100), fork, alarm(0) child's "
				"alarm returned %ld", TST_RET);
		}
		exit(0);
	}

	TEST(alarm(0));
	if (TST_RET != 100) {
		tst_res(TFAIL,
..
}
/home/rich/qa/ltp/./testcases/kernel/syscalls/alarm/alarm07.c:20
static void verify_alarm(void)
{
	pid_t pid;
	alarm_cnt = 0;

	TEST(alarm(1));
..
			tst_res(TPASS, "alarm() request cleared in child");
		} else {
			tst_res(TFAIL, "alarm() request not cleared in "
				"child; alarms received:%d", alarm_cnt);
		}
		exit(0);
	}

	if (alarm_cnt != 1)
		tst_res(TFAIL, "Sigalarms in parent %i, expected 1", alarm_cnt);
	else
..
}

________________________________________________________
Executed in   49.35 millis    fish           external
   usr time  110.88 millis    0.00 millis  110.88 millis
   sys time   87.44 millis    1.20 millis   86.24 millis

>
>> If we continue down the path of hand parsing C, then it will most likely
>> result in constant tweaks and additions.
>
> Well I would say that this patchset is the last addition for the parser,
> if we ever need anything more complex we should really switch to
> something else. On the other hand I do not think that we will ever need
> more complexity in the parser than this, as long as we keep things
> sane.

This closes the door on a lot of options for no upside AFAICT. We have
two tools (Sparse and tree-sitter) that can be (or have been) vendored
and will parse a large subset of C. Sparse goes a step further allowing
control flow analysis. The usual reasons for reinventing the wheel are
not present.

-- 
Thank you,
Richard.


More information about the ltp mailing list