This is a follow-up to my previous post, I built a 2x faster lexer, then discovered I/O was the real bottleneck. In that post, I benchmarked my ARM64 assembly lexer against the official Dart scanner. This time, I wanted to answer a different question: how fast should a parser generator be able to go?

I found my answer by benchmarking against LLVM.

The setup

I have been working on a parser generator that produces LALR(k) parsers. To test it, I wrote a grammar for LLVM's textual IR format, a language complex enough to require LALR(2). My test corpus consists of 12,161 LLVM IR files totaling 112 MB, extracted from the LLVM test suite.

I benchmarked three lexers and three parsers:

Lexers:

  • A Dart-based lexer (generated from a specification by my lexer generator)
  • An ARM64 assembly lexer (generated from a specification by my lexer generator)
  • LLVM's official lexer (called via FFI)

Parsers:

  • A table-driven recursive ascent parser in Dart (interpreted, not optimized)
  • A table-driven recursive ascent parser (code-generated from a specification by my parser generator)
  • LLVM's official parser (called via FFI)

A note on terminology: most programmers are familiar with recursive descent, a top-down parsing technique where each grammar rule becomes a function that calls other rule functions. Recursive ascent is the bottom-up counterpart: the call stack mirrors the LR parse stack, and functions return by "ascending" to parent rules after reducing a production. Both of my parsers use this technique to implement LALR parsing.

Both of my parsers are fully deterministic. I was able to implement a deterministic LLVM IR parser, which means no backtracking and no non-deterministic steps. This also means that generalized parsing algorithms like GLR or GLL would not help here, as they are designed to handle ambiguous or (more broadly) non-deterministic grammars. Similarly, PEG parsers use ordered choice to implicitly disambiguate grammars, but since my grammar is LALR(2) and is therefore deterministic, this offers no advantage.

The results

First, let me note that I/O remains a significant cost. Loading 12,161 files into memory took 2.5 seconds, nearly four times longer than the fastest parser. The lessons from my previous post still apply.

Lexer comparison

Lexer Time Throughput
LLVM (FFI) 248ms 451 MB/s
ARM64 ASM 293ms 382 MB/s
Dart 653ms 172 MB/s

My assembly lexer is only 1.18x slower than LLVM's. This is encouraging.

The Dart lexer is 2.6x slower, but this is not a reflection of Dart as a language. The Dart lexer is table-driven rather than direct-coded, and it exists as a quick proof of concept so I do not have to run a code generation step during development. There are many optimization techniques I could apply, like compiling state transitions into functions, but that defeats the purpose of having a fast iteration loop.

Parser comparison

Here is where things get interesting.

Parser Time Throughput Notes
LLVM (FFI) 647ms 173 MB/s Combined lex + parse
Generated recursive ascent 3,745ms 30 MB/s Parse only, tokens pre-lexed
Non-generated recursive ascent 6,560ms 17 MB/s Parse only, tokens pre-lexed

Important caveat: my parser only collects parse events without building any data structure. It also parses whitespace, which LLVM skips. This is a deliberate design choice: one of my goals is lossless parsing, where the original source can be reconstructed from the parse tree. This enables tools like pretty printers, incremental parsing, and educational tooling that can answer questions like "which grammar rules produced this range of code?" But it does add overhead compared to LLVM's approach.

LLVM's parser does far more in other ways: it builds a complete intermediate representation (IR) with type checking, symbol resolution, and validation. Despite doing less semantic work, my parser is significantly slower. This makes LLVM's speed even more impressive, and it means the real gap is even larger than these numbers suggest. To truly compete, I would need to not only match LLVM's parsing speed but also add IR construction on top.

LLVM lexes and parses 112 MB in 647 milliseconds. My best parser, using pre-lexed tokens, takes 3.7 seconds just to parse. The full pipeline comparison is stark:

Pipeline Time vs LLVM
LLVM lex + parse 647ms 1x
Dart lex + generated parser 4,398ms 6.8x slower
Dart lex + non-generated parser 7,213ms 11x slower

To beat LLVM, I would need to parse in under 400 milliseconds (since my assembly lexer takes about 250ms). My current parser takes 3.7 seconds. That is a 9x gap.

Why LLVM is so fast

LLVM's parser is a hand-written recursive descent parser. It does not generate an intermediate parse tree or abstract syntax tree. Instead, it constructs LLVM's intermediate representation (IR) directly during parsing.

This is not a tree. It is a graph. Instructions reference other instructions. Basic blocks reference other basic blocks. Functions reference global values. The parser builds this graph incrementally as it descends through the grammar.

This design has significant implications for parser generators.

The problem with fusing parsing and IR construction

The key insight is not that bottom-up parsers are inherently slow. The problem is that LLVM fuses parsing with IR construction, and this fusion is difficult to achieve with bottom-up parsing.

In a top-down parser (LL family, including recursive descent), you process nodes in pre-order: parents before children. When you enter a function, you can create the function object immediately. When you encounter instructions, you attach them to the already-existing function. The parent context is always available.

In a bottom-up parser (LR family, including recursive ascent), you process nodes in post-order: children before parents. When you reduce a production, all children have been parsed, but the parent does not exist yet. You cannot attach children to a parent that has not been created.

This means a bottom-up parser cannot easily inline the construction of a graph structure during parsing. You have two choices:

  1. Build an intermediate tree first, then transform it into the final representation in a separate pass. This adds object creation overhead and an extra traversal.

  2. Collect parse events and defer construction. This is what my parser does. It avoids intermediate tree allocation, but to build the final IR, you would need a separate pass over those events. Whether it is even practical to build a graph structure like LLVM's IR from bottom-up parse events is unclear.

LLVM avoids both of these costs by building the IR directly as it parses. The IR is the intermediate representation, but there is no intermediate step between parsing and the IR, no AST, no extra pass. The recursive descent structure naturally provides the parent context needed to build the graph incrementally.

Could a bottom-up parser achieve the same fusion? Perhaps, but it would not be straightforward. The post-order nature of bottom-up parsing means you would need creative workarounds to maintain parent context during reductions. I have not found an elegant solution.

Is LLVM's lexer regular?

Here is an interesting observation: LLVM's lexer appears to be mostly regular. There are no nested comments, no string interpolation. In principle, it should be a pure deterministic finite automaton with no additional state.

This is unusual. Consider Dart, which I lexed in my previous post. To correctly lex Dart, you need a stack:

  • C-style block comments can be nested (/* outer /* inner */ still outer */)
  • String interpolation creates nested lexical contexts ("Hello ${name.toUpperCase()}")

LLVM IR has neither. The lexical structure should be regular.

The cost of hand-written lexers

However, looking at the actual implementation, there are some context-sensitive parts. The lexer has a flag that controls how colons in identifiers are handled:

// If we stopped due to a colon, unless we were directed to ignore it,
// this really is a label.
if (!IgnoreColonInIdentifiers && *CurPtr == ':') {
  StrVal.assign(StartChar-1, CurPtr++);
  return lltok::LabelStr;
}

llvm/lib/AsmParser/LLLexer.cpp, lines 509-514

The parser sets this flag when parsing summary entries:

// For summary entries, colons should be treated as distinct tokens,
// not an indication of the end of a label token.
Lex.setIgnoreColonInIdentifiers(true);
// ... parsing code ...
Lex.setIgnoreColonInIdentifiers(false);

llvm/lib/AsmParser/LLParser.cpp, lines 1097-1106

And again when parsing memory attributes:

// We use syntax like memory(argmem: read), so the colon should not be
// interpreted as a label terminator.
Lex.setIgnoreColonInIdentifiers(true);

llvm/lib/AsmParser/LLParser.cpp, lines 2581-2588

This is unfortunate. It introduces coupling between the parser and lexer that would not exist if the lexical grammar were truly regular.

This is the kind of issue that would not arise if the lexer were generated from a specification. A formal specification forces you to make the lexical grammar explicit, and a generator will not even support context-sensitive rules. Hand-written lexers make it too easy to add "just one flag" to handle an edge case, and over time these accumulate. The LLVM team has full control over the IR format and regularly introduces breaking changes, so this could be cleaned up. But the temptation to add a quick fix is always there.

SIMD and the case for generation

Despite these impurities, the lexer is close enough to regular that SIMD techniques should still apply. Parsing regular languages with SIMD is a well-understood problem. Projects like Intel Hyperscan and its fork Vectorscan (which adds ARM NEON support) demonstrate that SIMD-accelerated regex matching can achieve remarkable throughput. Academic work on SIMD-accelerated regular expression matching shows 2-5x speedups over scalar code. Other projects like Rejit and RoaringRegex explore similar territory.

This is actually an argument for lexer generation. A hand-written lexer is unlikely to use SIMD intrinsics for every token type, and it is prone to accumulating context-sensitive hacks over time. A lexer generator could automatically emit SIMD-optimized code while enforcing that the lexical grammar remains regular. The fact that my current generated lexer is slower than LLVM's hand-written one does not mean generation is a dead end. It means there is room for improvement, and SIMD is one concrete path forward.

Real-world programming languages are messier. They have nested comments, string interpolation, heredocs, and other context-sensitive lexical structures that genuinely require a stack (looking at you, JavaScript, and your context-sensitive regex syntax). But for languages whose lexical structure could be regular, generation offers both performance opportunities and protection against accidental complexity.

What this means for my project

I now have a clear target: LLVM processes 112 MB of complex IR in 647ms, achieving 173 MB/s for combined lexing and parsing.

If I can match LLVM's performance, I will have done something significant. LLVM is one of the largest and most optimized compiler projects in the world, with contributions from Apple, Google, ARM, Intel, and dozens of other companies. It is not a low bar.

But I think it is achievable. The lexing side is already close (293ms vs 248ms). The parsing side needs work.

To close the gap, I see several paths:

  1. Optimize the current parser. My table-driven parser has not been optimized. Direct-coded parsing instead of table interpretation should offer a significant speedup.

  2. SIMD tricks. Projects like simdjson achieve remarkable speeds by processing multiple bytes in parallel. Some of these techniques could be applied to parser generators.

  3. Generate recursive descent parsers. My parser generator currently produces LALR(k) parsers. I could also support generating recursive descent (LL) parsers, which would allow fusing parsing with IR construction. Since LLVM IR requires LALR(2), I suspect I would need at least LL(2) to handle the same language, though I am not entirely certain this would be sufficient.

  4. Profile-guided optimization. If I generate parsers that compile to LLVM IR, I could use LLVM's PGO to optimize them. Using LLVM to beat LLVM has a certain poetic appeal.

The more interesting question is whether I can match LLVM's performance while still generating parsers from grammars, rather than writing them by hand. That would be genuinely useful. Once you have a context-free grammar that corresponds precisely to the implementation, you can generate pretty printers from a simple specification, build IR builders for any programming language that are proven to match the implementation, build IDE tooling, implement incremental parsing, and more. Changes to the grammar could be proven to be additive rather than breaking. Breaking changes could be introduced in a backwards-compatible manner with proper documentation.

Of course, migrating from a hand-written parser to a generated one is a difficult sell when it is several times slower. But I believe this is a solvable problem, and I am actively working on it. If you have ideas, feedback, or just want to follow along, feel free to reach out.

Conclusion

I set out to find a target for parser performance. I found one: LLVM processes 173 MB/s. My generated parser currently achieves 30 MB/s, a 5.8x gap. And since my parser does less work than LLVM's (no IR construction), the effective gap is even larger.

The gap exists not because bottom-up parsing is inherently slow, but because LLVM's architecture fuses parsing with IR construction in a way that requires pre-order traversal. To compete, I may need to generate top-down parsers, or find SIMD tricks that make the parsing paradigm less relevant.

I am not there yet. But I have a target now.