Pika parsing: parsing in reverse solves the left recursion and error recovery problems
release_dmcdm6wdjvhsfomf6ojbk3w3ee
by
Luke A. D. Hutchison
2020
Abstract
A recursive descent parser is built from a set of mutually-recursive
functions, where each function directly implements one of the nonterminals of a
grammar, such that the structure of recursive calls directly parallels the
structure of the grammar. In the worst case, recursive descent parsers take
time exponential in the length of the input and the depth of the parse tree. A
packrat parser uses memoization to reduce the time complexity for recursive
descent parsing to linear. Recursive descent parsers are extremely simple to
write, but suffer from two significant problems: (i) left-recursive grammars
cause the parser to get stuck in infinite recursion, and (ii) it can be
difficult or impossible to optimally recover the parse state and continue
parsing after a syntax error. Both problems are solved by the pika parser, a
novel reformulation of packrat parsing using dynamic programming to parse the
input in reverse: bottom-up and right to left, rather than top-down and left to
right. This reversed parsing order enables pika parsers to directly handle
left-recursive grammars, simplifying grammar writing, and also enables direct
and optimal recovery from syntax errors, which is a crucial property for
building IDEs and compilers. Pika parsing maintains the linear-time performance
characteristics of packrat parsing, within a moderately small constant factor.
Several new insights into precedence, associativity, and left recursion are
presented.
In text/plain
format
Archived Files and Locations
application/pdf 1.6 MB
file_fellwaoq3ffxnhzoheaxakaeei
|
arxiv.org (repository) web.archive.org (webarchive) |
2005.06444v2
access all versions, variants, and formats of this works (eg, pre-prints)