Writing a grammar

When building a parser for a language, you need to define the structure of valid expressions. This is achieved by writing a grammar that describes how expressions in the language can be formed. The grammar is defined using terminals, non-terminals, and regular expressions (regexes).

Key Components of a Grammar

  1. Terminals:

    • These are the basic symbols from which strings are formed in the language. Terminals are usually literal values like keywords, operators, or punctuation.
    • In the grammar, terminals are enclosed in single quotes ('), e.g., '+', '(', and ')'.
  2. Non-Terminals:

    • These are the abstract symbols that represent structures or expressions that can be expanded into sequences of terminals and other non-terminals. Non-terminals can be seen as "rules" in the grammar.
    • Non-terminals are simply written without any special surrounding characters.
  3. Regex Tokens:

    • Sometimes, you need to define terminals using regular expressions to capture more flexible patterns, like identifiers or numbers.
    • In this grammar, a regex token is defined by using a % sign followed by the name of the token. The regex itself is specified using standard regex notation between slashes (/), e.g., %id -> /[A-Za-z][A-Za-z0-9]+/.

This grammar can recognize and parse an expression like the following:

foo(bar+baz)

For more info visit the dotlr library docs