Writing a grammar
When building a parser for a language, you need to define the structure of valid expressions. This is achieved by writing a grammar that describes how expressions in the language can be formed. The grammar is defined using terminals, non-terminals, and regular expressions (regexes).
Key Components of a Grammar
Terminals:
- These are the basic symbols from which strings are formed in the language. Terminals are usually literal values like keywords, operators, or punctuation.
- In the grammar, terminals are enclosed in single quotes (
'
), e.g.,'+'
,'('
, and')'
.
Non-Terminals:
- These are the abstract symbols that represent structures or expressions that can be expanded into sequences of terminals and other non-terminals. Non-terminals can be seen as "rules" in the grammar.
- Non-terminals are simply written without any special surrounding characters.
Regex Tokens:
- Sometimes, you need to define terminals using regular expressions to capture more flexible patterns, like identifiers or numbers.
- In this grammar, a regex token is defined by using a
%
sign followed by the name of the token. The regex itself is specified using standard regex notation between slashes (/
), e.g.,%id -> /[A-Za-z][A-Za-z0-9]+/
.
This grammar can recognize and parse an expression like the following:
foo(bar+baz)
For more info visit the dotlr library docs