Writing a grammar
When building a parser for a language, you need to define the structure of valid expressions. This is achieved by writing a grammar that describes how expressions in the language can be formed. The grammar is defined using terminals, non-terminals, and regular expressions (regexes).
Key Components of a Grammar
Terminals:
- These are the basic symbols from which strings are formed in the language. Terminals are usually literal values like keywords, operators, or punctuation.
- In the grammar, terminals are enclosed in single quotes (
'), e.g.,'+','(', and')'.
Non-Terminals:
- These are the abstract symbols that represent structures or expressions that can be expanded into sequences of terminals and other non-terminals. Non-terminals can be seen as "rules" in the grammar.
- Non-terminals are simply written without any special surrounding characters.
Regex Tokens:
- Sometimes, you need to define terminals using regular expressions to capture more flexible patterns, like identifiers or numbers.
- In this grammar, a regex token is defined by using a
%sign followed by the name of the token. The regex itself is specified using standard regex notation between slashes (/), e.g.,%id -> /[A-Za-z][A-Za-z0-9]+/.
This grammar can recognize and parse an expression like the following:
foo(bar+baz)
For more info visit the dotlr library docs
