From a quick look it seems like it's "as fast as a linter" because it is a linter. The homepage says "Not just generic AST patterns", but I couldn't find any rule that did anything besides AST matching. I don't see anything in the code that would enable any kind of control or data flow analysis.
true, right now it's AST pattern matching with `pattern-inside`/`pattern-not-inside` for syntactic scoping. I changed the description. Intraprocedural dataflow is the next step (tracking in #10) and while trying to keep it close linter latency.
The speed is really cool but the fact that your rules are written as rust code meaning that new rules need a new binary. That might be fine but just wanted to point it out to anyone who's interested.
quick correction: built-in rules are compiled in, but foxguard also loads Semgrep-compatible YAML rules at runtime via --rules <path> (or .foxguard.yml). You can add or modify rules without touching the binary. The rust-coded rules are just the default pack for zero-config speed :D
This follows the stereotype of every single Rust project immediately advertising the fact that it's written in Rust, as Rust devs seem more enamored with the language than what they're doing with the language
Appreciate it! cloudformation isn't in scope today but the perf approach (tree-sitter + parallel file walk + rule pre-filtering) transfers, so happy to check it out.
Some of the checks here seem very brittle. For example this one[1].
In the context of security scanning (versus, say, listing), I think it's reasonable to expect the tool to be resilient to attempts at obfuscation (or just badly written code that doesn't adhere to normal Python idioms around import paths).
update: `NoPickle`/`NoYamlLoad` string-match the callee text, so `import pickle as p; p.loads(...)` and `from pickle import loads as d` slip past. Filed as #7 with a fix plan (intraprocedural alias table). Thanks!
Looks interesting, will give it a run on the codebase at $work. One thing that would be nice to see in the README are benchmarks on larger codebases. Everything in the benchmark table is quite small. I’d also list line count over files, since the latter is a much better measure of amount of code.
For context, the codebase I work on most often has 1200 JS/TS files, 685 rust files, and a bunch more. LoC is 13k JS, 80k TS, and 155k Rust
Running security checks at linter speed is a big deal for
CI pipelines. What's the false positive rate in practice?
That's usually the tradeoff with fast static analysis —
speed vs accuracy. Would love to know how you benchmarked it.
From a quick look it seems like it's "as fast as a linter" because it is a linter. The homepage says "Not just generic AST patterns", but I couldn't find any rule that did anything besides AST matching. I don't see anything in the code that would enable any kind of control or data flow analysis.
true, right now it's AST pattern matching with `pattern-inside`/`pattern-not-inside` for syntactic scoping. I changed the description. Intraprocedural dataflow is the next step (tracking in #10) and while trying to keep it close linter latency.
The speed is really cool but the fact that your rules are written as rust code meaning that new rules need a new binary. That might be fine but just wanted to point it out to anyone who's interested.
quick correction: built-in rules are compiled in, but foxguard also loads Semgrep-compatible YAML rules at runtime via --rules <path> (or .foxguard.yml). You can add or modify rules without touching the binary. The rust-coded rules are just the default pack for zero-config speed :D
This follows the stereotype of every single Rust project immediately advertising the fact that it's written in Rust, as Rust devs seem more enamored with the language than what they're doing with the language
Came here to note this - it's like the old vegan joke:
How do you know if an app is written in Rust? Don't worry, the developer will tell you.
Legitimately, I have had to stay away from certain linting tools because of how slow they are. I'll check this out.
cfn-lint is due for one of these rewrites, it's excruciating. I made some patches to experiment with it and it could be a lot faster.
Appreciate it! cloudformation isn't in scope today but the perf approach (tree-sitter + parallel file walk + rule pre-filtering) transfers, so happy to check it out.
Some of the checks here seem very brittle. For example this one[1].
In the context of security scanning (versus, say, listing), I think it's reasonable to expect the tool to be resilient to attempts at obfuscation (or just badly written code that doesn't adhere to normal Python idioms around import paths).
[1]: https://github.com/PwnKit-Labs/foxguard/blob/a215faf52dcff56...
update: `NoPickle`/`NoYamlLoad` string-match the callee text, so `import pickle as p; p.loads(...)` and `from pickle import loads as d` slip past. Filed as #7 with a fix plan (intraprocedural alias table). Thanks!
Looks interesting, will give it a run on the codebase at $work. One thing that would be nice to see in the README are benchmarks on larger codebases. Everything in the benchmark table is quite small. I’d also list line count over files, since the latter is a much better measure of amount of code.
For context, the codebase I work on most often has 1200 JS/TS files, 685 rust files, and a bunch more. LoC is 13k JS, 80k TS, and 155k Rust
It is still quite fast on that codebase, fwiw. 10.7 ms.
thx for the tip, I'll measure and see if LoC time is stable accross different codebases. Mind if I cite it in the readme (anonymized)?
Sure thing, feel free. If you’d like more details, you send me an email at msplanchard (at gmail)
update: filed #9 to build a labeled corpus and publish per-rule numbers.
Running security checks at linter speed is a big deal for CI pipelines. What's the false positive rate in practice? That's usually the tradeoff with fast static analysis — speed vs accuracy. Would love to know how you benchmarked it.
didn't measure that yet, but definitely thinking of adding it into scope soon
There's also https://github.com/mongodb/kingfisher
cool, will check it out thanks!
"No X, no Y no Z. Just a ..."
15 commits on Day #1 starting from an stub/empty repo. 47K lines of code developed in under two weeks by one person.
Sigh... AI slop.