Writing a grammar

When building a parser for a language, you need to define the structure of valid expressions. This is achieved by writing a grammar that describes how expressions in the language can be formed. The grammar is defined using terminals, non-terminals, and regular expressions (regexes).

Key Components of a Grammar

Terminals:
- These are the basic symbols from which strings are formed in the language. Terminals are usually literal values like keywords, operators, or punctuation.
- In the grammar, terminals are enclosed in single quotes ('), e.g., '+', '(', and ')'.
Non-Terminals:
- These are the abstract symbols that represent structures or expressions that can be expanded into sequences of terminals and other non-terminals. Non-terminals can be seen as "rules" in the grammar.
- Non-terminals are simply written without any special surrounding characters.
Regex Tokens:
- Sometimes, you need to define terminals using regular expressions to capture more flexible patterns, like identifiers or numbers.
- In this grammar, a regex token is defined by using a % sign followed by the name of the token. The regex itself is specified using standard regex notation between slashes (/), e.g., %id -> /[A-Za-z][A-Za-z0-9]+/.

This grammar can recognize and parse an expression like the following:

foo(bar+baz)

For more info visit the dotlr library docs