How to Use the syntax-cli Parser Generator

Published: Monday, March 13, 2023

Updated: Wednesday, March 15, 2023

Greetings, friends! This is my tenth post in the math expression parser series. In this tutorial, we'll learn how to use the Syntax parser generator, created by Dmitry Soshnikov, to make our own LR parsers! Let's get started!

Installing the Syntax Tool

The Syntax tool is an amazing parser generator toolkit that operates through the Command-line interface (CLI).

The Syntax tool is known as syntax-cli on npm. Let's install this tool globally on our computer, since this software is meant to be used as a CLI tool.

shell

Copied! ⭐️

npm i -g syntax-cli

Next, try running the following command to make sure the program was installed correctly.

shell

Copied! ⭐️

syntax-cli --help

Parsing Modes

The Syntax tool implements the following parsing modes:

ll1
lr0
slr1
lalr1
clr1

In this tutorial, we'll use the parsing mode, lalr1. As mentioned in the previous tutorial, LALR(1) parsers are suitable for a good portion of grammars. We're only building a calculator after all!

How to Use the Syntax Tool

The best way to understanding the Syntax tool is to use it right away. Let's write a small grammar file. Create a new file called pizza.g with the following contents:

text

Copied! ⭐️

%%

E
  : 'pizza'
  ;

Then, we'll run the following command in the same directory as the pizza.g file.

shell

Copied! ⭐️

syntax-cli --grammar pizza.g --mode lalr1 --parse 'pizza'

Alternatively, we can use this shorter command:

shell

Copied! ⭐️

syntax-cli -g pizza.g -m lalr1 -p 'pizza'

After running the command, you should see the following output:

text

Copied! ⭐️

Parsing mode: LALR1_BY_SLR(1).

Parsing:

pizza

✓ Accepted

Parsed value: 

pizza

Delivery! The pizza has been accepted! 🍕

Let me explain what all just happened. We created a grammar called pizza.g. The .g extension doesn't really matter. We could have named the file pizza.grammar or pizza.party. What's important is the contents of the file. The grammar must adhere to a specific syntax (no pun intended) that is accepted by the Syntax tool.

The grammar we chose is very simple.

text

Copied! ⭐️

%%

E
  : 'pizza'
  ;

It starts with the letter, E, which can only be derived to one symbol, pizza. The letter, E, is arbitrary. It just serves as a symbol that can be used throughout the rest of the grammar. Just think of E as an expression. It's easier to write E instead of "expression" over and over. The : is like saying "can be derived into". The ; symbol indicates the end of that specific grammar "rule".

Let's look at another example. Replace the contents of pizza.g with the following contents:

text

Copied! ⭐️

%%

E
  : 'pizza'
  | 'donut'
  ;

Now we can run the Syntax tool again:

shell

Copied! ⭐️

syntax-cli -g pizza.g -m lalr1 -p 'donut'

This should result in an accepted parse. We can still run the original command, and the parse should still work.

shell

Copied! ⭐️

syntax-cli -g pizza.g -m lalr1 -p 'pizza'

The | symbol is like saying "or". Therefore, the current grammar can be read as E can be derived into pizza or donut.

Now, let's try something different. If we try to parse something like pizzadonut, it will fail.

shell

Copied! ⭐️

syntax-cli -g pizza.g -m lalr1 -p 'pizzadonut'

You'll probably see an error like this:

text

Copied! ⭐️

SyntaxError: 

pizzadonut
     ^
Unexpected token: "donut" at 1:5.

There's an unexpected token on line 1, position 5, which is the start of the donut token. Our grammar doesn't support both pizza and donut. It only supports one of them. Therefore, we need to modify our grammar so that we can have one token followed by another.

text

Copied! ⭐️

%%

E
  : E 'pizza'
  | E 'donut'
  | 'pizza'
  | 'donut'
  ;

The following commands should pass:

shell

Copied! ⭐️

syntax-cli -g pizza.g -m lalr1 -p 'pizza'

syntax-cli -g pizza.g -m lalr1 -p 'donut'

syntax-cli -g pizza.g -m lalr1 -p 'pizzapizza'

syntax-cli -g pizza.g -m lalr1 -p 'pizzadonut'

syntax-cli -g pizza.g -m lalr1 -p 'donutpizza'

syntax-cli -g pizza.g -m lalr1 -p 'pizzadonutpizza'

Notice how we're using E 'pizza' and E 'donut in our grammar. The E before the : is often referred as a nonterminal symbol. The pizza symbol represents a terminal symbol because it is a fundamental token. That is, we can derive it down to anything further.

You can think of the parser as recursively trying to go through our grammar and figuring out what the nonterminal, E, should derive to. It should be noted that the parser is not trying each option in order. The LR parser created by the Syntax tool is smart enough to lookup states in a parser table to determine that E should become pizza or donut depending on what's provided as the input to the parser i.e. pizzadonutpizza. The LALR(1) parser will lookahead one token and consult a parsing table to determine how E gets resolved.

For the input, pizzadonutpizza, the parser will try to resolve an expression similar to the following steps:

text

Copied! ⭐️

1. E → E 'pizza'
2.   → E 'donut' 'pizza'
3.   → 'pizza' 'donut' 'pizza'

4. Accept 'pizzadonutpizza'

Essentially, E recursively gets resolved until we're left with pizzadonutpizza 🍕🍩🍕

Parsing Tables

The Syntax tool actually provides a way to look at parsing tables using the --table or -t option. Suppose we had the following grammar:

text

Copied! ⭐️

%%

E
  : E 'pizza'
  | E 'donut'
  | 'pizza'
  | 'donut'
  ;

We can look at the parsing tables that the parser generator constructs for this particular grammar using the following command.

shell

Copied! ⭐️

syntax-cli -g pizza.g -m lalr1 -p 'pizza' -t

When we run the above command, we'll be able to inspect the parser table that is constructed automatically by the parser generator. The Syntax tool also helps us see a list of each "rule" we included in our grammar. This information is useful for debugging.

text

Copied! ⭐️

Grammar:

    0. $accept -> E
    ----------------
    1. E -> E 'pizza'
    2.    | E 'donut'
    3.    | 'pizza'
    4.    | 'donut'

LALR1_BY_SLR(1) parsing table:

┌───┬─────────┬─────────┬─────┬───┐
│   │ 'pizza' │ 'donut' │ $   │ E │
├───┼─────────┼─────────┼─────┼───┤
│ 0 │ s2      │ s3      │     │ 1 │
├───┼─────────┼─────────┼─────┼───┤
│ 1 │ s4      │ s5      │ acc │   │
├───┼─────────┼─────────┼─────┼───┤
│ 2 │ r3      │ r3      │ r3  │   │
├───┼─────────┼─────────┼─────┼───┤
│ 3 │ r4      │ r4      │ r4  │   │
├───┼─────────┼─────────┼─────┼───┤
│ 4 │ r1      │ r1      │ r1  │   │
├───┼─────────┼─────────┼─────┼───┤
│ 5 │ r2      │ r2      │ r2  │   │
└───┴─────────┴─────────┴─────┴───┘

The table might seem confusing if you're not familiar with parser theory. Creating a parsing table is out of the scope of this tutorial series, but it's important to know that these tables are what the parsers use to determine the next course of action. It's kinda like how people follow a list of instructions on a GPS.

The GPS already knows all the possible ways to get to a destination. Once we complete one instruction, we move onto the next instruction, and keep repeating that until we reach our destination. Similarly, our parser will keep scanning text and recognize that the next token should be pizza or donut.

In the table above, The s in s2 refers to shift. The r in r3 refers to reduce. Shift and reduce actions are operations that are performed internally as the parser scans input.

The numbers next to s and r represent all the various states the parser can be in depending on what tokens it scans from a given input. The algorithms used for building parsing tables are a bit complicated, but the Syntax tool takes care of all the hard work for us.

In our parser, a reduce action occurs when we resolve E, a nonterminal, at the start of a grammar rule (before the : symbol in the grammar). A shift action occurs when the parser has resolved a terminal (such as pizza or donut) or a nonterminal (after the : symbol). For example, a shift action will occur for E 'pizza for both E and pizza. A reduce action will occur for the E at the start of the grammar rule. I'm talking about the very top E in the grammar.

text

Copied! ⭐️

%%

E
  : E 'pizza'
  | E 'donut'
  | 'pizza'
  | 'donut'
  ;

The $ symbol in the table refers to the end of the input string. The acc symbol refers to "accept" as in "accept the parse" if we've scanned the entire input and there were no unexpected tokens or conflicts.

Adding Tokens and Handling Whitespaces

So far, we've only made grammars that don't have any whitespaces. If we tried running the following command, it will not work.

shell

Copied! ⭐️

syntax-cli -g pizza.g -m lalr1 -p 'pizza donut'

When running the above command, we'll end up with the following error.

text

Copied! ⭐️

SyntaxError: 

pizza donut
     ^
Unexpected token: " " at 1:5.

There's an unexpected token on line 1, position 5, which is where a whitespace is located. The parser doesn't understand whitespaces because our grammar doesn't allow them. To skip whitespaces, we can pass the --ignore-whitespaces or -w option.

shell

Copied! ⭐️

syntax-cli -g pizza.g -m lalr1 -p 'pizza donut' -w

However, this is like sweeping dust under the rug 🧹. It's good for quickly testing out grammars, but when it's time to build a real parser, it would be better to include rules about whitespaces directly in our grammar.

text

Copied! ⭐️

%lex

%%

\s+             /* skip whitespace */

/lex

%%

E
  : E 'pizza'
  | E 'donut'
  | 'pizza'
  | 'donut'
  ;

Notice the new syntax at the top (no pun intended again). There are two markers, %lex and /lex, that signify a "lexer" section. This is a special section we can include at the top of our grammar file for creating tokens automatically. That's right! No need to build a tokenizer by hand! The parser generator uses regular expressions to define tokens.

If we don't include anything on the same line as a regular expression, then the parser will ignore that token automatically. The /* skip whitespace */ line is actually a comment. Comments we insert in our grammar file are ignored by the parser generator. Effectively, the parser generator sees nothing after \s+. That means we skip whitespaces!

Let's try running our grammar again without the -w flag.

shell

Copied! ⭐️

syntax-cli -g pizza.g -m lalr1 -p 'pizza donut'

Success! It works!

Adding Numbers

We can use the "lex" (or "lexical") section at the beginning of the grammar file to add a NUMBER token. Then, we can access the token in the "bnf" (or "syntactic grammar") section at the bottom of the grammar file.

text

Copied! ⭐️

%lex

%%

\s+             /* skip whitespace */
\d+             return 'NUMBER'

/lex

%%

E
  : E 'pizza'
  | E 'donut'
  | 'pizza'
  | 'donut'
  | NUMBER
  ;

Notice how we use the regular expression, \d+ to capture numbers that are one or more digits. Now, we can parse numbers!

shell

Copied! ⭐️

syntax-cli -g pizza.g -m lalr1 -p '1 pizza'

Semantic Actions

So far, our parser doesn't really do anything. It just displays ✓ Accepted in the terminal when a parse is successful or throws an error when we do something wrong. Inside our grammar file, we can add "semantic actions" to perform a particular action when a reduction happens. As mentioned earlier, a "reduce action" (or "reduction") occurs when we successfully make a derivation. The nonterminal, E, can derive to the following:

text

Copied! ⭐️

E
  : E 'pizza'
  | E 'donut'
  | 'pizza'
  | 'donut'
  | NUMBER
  ;

We would like to log each rule to the console when a "semantic action" occurs (which is during a reduce action). Let's see how we can do this using our current grammar.

text

Copied! ⭐️

%lex

%%

\s+             /* skip whitespace */
\d+             return 'NUMBER'

/lex

%%

E
  : E 'pizza'
  | E 'donut'
  | 'pizza'
  | 'donut'
  | NUMBER { $$ = 'My cool value: ' + Number($1) }
  ;

Next, run the following command:

shell

Copied! ⭐️

syntax-cli -g pizza.g -m lalr1 -p '5'

We should see the following output in our terminal:

text

Copied! ⭐️

Parsing mode: LALR1_BY_SLR(1).

Parsing:

5

✓ Accepted

Parsed value: 

My cool value: 5

Notice what happened? We were able to add a cool message that displayed the value of the number we entered as input.

A semantic action is a group of code surrounded by curly braces { }. Inside the semantic action, we can access the resolved values of an expression. When E resolves to a NUMBER, it triggers the semantic action.

According to the documentation, the following notation is used for semantic action arguments:

yytext -- a matched token value
yyleng -- the length of the matched token
Positioned arguments, i.e. $1, $2, etc.
Positioned locations, i.e. @1, @2, etc.
Named arguments, i.e. $foo, $bar, etc.
Named locations, i.e. @foo, @bar, etc.
$$ -- result value
@$ -- result location

Most of the time, we'll be dealing with the result value, $$, and positioned arguments i.e. $1, $2, etc.

If we look at the grammar, we can see that we're using $1 inside a Number function.

text

Copied! ⭐️

E
  : E 'pizza'
  | E 'donut'
  | 'pizza'
  | 'donut'
  | NUMBER { $$ = 'My cool value: ' + Number($1) }
  ;

The code after the $$ = is actually valid JavaScript code! Think of the parser generator doing something like this:

Copied! ⭐️

const parse = () => {
  const $1 = 5;
  const $$ = 'My cool value: ' + Number($1);

  return $$;
}

console.log(parse());

The parser generator sees 'My cool value: ' + Number($1), but it will replace $1 with an actual value before evaluating the JavaScript code. The value, 5, comes from the value we passed inside our command after the -p option.

Let's add some more semantic actions to our grammar file and change the semantic action for NUMBER.

text

Copied! ⭐️

%lex

%%

\s+             /* skip whitespace */
\d+             return 'NUMBER'

/lex

%%

E
  : E 'pizza' { $$ = $1 + ' ' + $2 + ' 🍕' }
  | E 'donut' { $$ = $1 + ' ' + $2 + ' 🍩'}
  | 'pizza' { $$ = $1 + ' 🍕'}
  | 'donut' { $$ = $1 + ' 🍩' }
  | NUMBER { $$ = Number($1) }
  ;

Try running multiple commands to see how the output behaves:

shell

Copied! ⭐️

syntax-cli -g pizza.g -m lalr1 -p 'pizza pizza'
# Parsed value: pizza 🍕 pizza 🍕 

syntax-cli -g pizza.g -m lalr1 -p 'pizza donut'
# Parsed value: pizza 🍕 donut 🍩 

syntax-cli -g pizza.g -m lalr1 -p '1 pizza'
# Parsed value: 1 pizza 🍕

Let's look at the first semantic action.

text

Copied! ⭐️

E
  : E 'pizza' { $$ = $1 + ' ' + $2 + ' 🍕' }

We'll say that we entered 1 pizza as the input.

The $$ is the result that will be returned by the parser. When running the command in the terminal, it'll simply display the result underneath "Parsed value:". The $1 variable stores information about the first symbol after the :. In this case, the E after the : is first symbol. Since the input to the Syntax CLI tool is 1 pizza, we know that E will derive to E 'pizza'. Therefore, $1 will equal the number, 1, and $2 will equal 'pizza'.

The final result, $$, of the semantic action is then:

text

Copied! ⭐️

$$ = $1 + ' ' + $2 + ' 🍕'
   = 1 + ' ' + 'pizza' + ' 🍕'
   = '1 pizza 🍕'

Remember, everything inside the curly braces is valid JavaScript, except the $$, $1, and $2 are replaced before the JavaScript code evaluates.

Parser Generation

We've been using the command line to output the results of a parse using the Syntax CLI tool. However, the main purpose of this tool is for generating parsers that can be used in JavaScript. We can use the --output or -o option to generate a file that contains a parser.

shell

Copied! ⭐️

syntax-cli -g pizza.g -m lalr1 -o parser.js

When we run this command, a file called parser.js will be generated. This file contains all the code necessary to parse an input. By default, the Syntax tool targets JavaScript, but it's intelligent enough to choose a different language based on the file extension attached to the name of the parser we provide to the Syntax tool after the -o option.

If you scroll to the very bottom of the parser.js file, you'll notice that it exports a single variable using the CommonJS module format, which means we'll have to convert it to ECMAScript module syntax if we want to use the parser in the browser. For now, we can test to make sure the parser works using Node.js.

Create a new file called example.js with the following contents:

Copied! ⭐️

const parser = require('./parser.js');

const result = parser.parse('pizza donut');

console.log(result); // pizza 🍕 donut 🍩

Run the script using node example or node example.js. If everything went well, then you should see a delicious result! The text you see logged to the console should be similar to running the following command:

shell

Copied! ⭐️

syntax-cli -g pizza.g -m lalr1 -p 'pizza donut'

We now have a working LR parser built from our own grammar!

Supported Grammar Formats

The Syntax tool supports two different grammar formats: JSON-like notation and Yacc/Bison-style notation. Up until now, we've been using the Yacc/Bison-style notation based on grammar files people wrote for the Bison parser generator. Both notations accomplish the same thing. We can also write multiline JavaScript comments such as /* ... */ inside the grammar file in both formats.

Here is our grammar in Yacc/Bison-style notation:

text

Copied! ⭐️

/**
 * This is a comment.
 */

%lex

%%

\s+             /* skip whitespace */
\d+             return 'NUMBER'

/lex

%%

E
  : E 'pizza' { $$ = $1 + ' ' + $2 + ' 🍕' }
  | E 'donut' { $$ = $1 + ' ' + $2 + ' 🍩'}
  | 'pizza' { $$ = $1 + ' 🍕'}
  | 'donut' { $$ = $1 + ' 🍩' }
  | NUMBER { $$ = Number($1) }
  ;

Below is the same grammar, but it's written in JSON-like notation instead.

text

Copied! ⭐️

/**
 * This is a comment.
 */

{
  lex: {
    rules: [
      ["\\s+",        "/* skip whitespace */"],
      ["\\d+",        "return 'NUMBER'"],
    ],
  },

  bnf: {
    E: [
      ["E 'pizza'",   "$$ = $1 + ' ' + $2 + ' 🍕'"],
      ["E 'donut'",   "$$ = $1 + ' ' + $2 + ' 🍩'"],
      ["'pizza'",   "$$ = $1 + ' 🍕'"],
      ["'donut'",   "$$ = $1 + ' 🍩'"],
      ["NUMBER",   "$$ = Number($1)"]
    ],
  }
}

We can also use backticks instead of single or double quotes in JSON-like notation:

text

Copied! ⭐️

/**
 * This is a comment.
 */

{
  lex: {
    rules: [
      [`\\s+`,        `/* skip whitespace */`],
      [`\\d+`,        `return 'NUMBER'`],
    ],
  },

  bnf: {
    E: [
      [`E 'pizza'`,   `$$ = $1 + ' ' + $2 + ' 🍕'`],
      [`E 'donut'`,   `$$ = $1 + ' ' + $2 + ' 🍩'`],
      [`'pizza'`,   `$$ = $1 + ' 🍕'`],
      [`'donut'`,   `$$ = $1 + ' 🍩'`],
      [`NUMBER`,   `$$ = Number($1)`]
    ],
  }
}

Whichever notation you choose, you can run the same command to parse the grammar file. The Syntax tool is intelligent enough to automatically figure out which notation you used to build the grammar file.

shell

Copied! ⭐️

syntax-cli -g pizza.g -m lalr1 -p 'pizza donut'
# Parsed value: pizza 🍕 donut 🍩

I prefer the Yacc/Bison-style notation because it's easier to read without all the quotes and backticks in the way. However, the JSON-like notation has the advantage of clearly specifying the type of grammar property such as lex, bnf, or operators.

Debugging Parsers

The Syntax tool provides a few ways to debug issues when creating parsers. A helpful tool is the --tokenize option we can pass to the Syntax CLI in order to extract tokens.

shell

Copied! ⭐️

syntax-cli -g pizza.g -m lalr1 -p 'pizza donut' --tokenize

After running the above command, we should see a list of tokens for the input, pizza donut, as well as some information about where the tokens occur in the input.

text

Copied! ⭐️

[
  {
    "type": "'pizza'",
    "value": "pizza",
    "startOffset": 0,
    "endOffset": 5,
    "startLine": 1,
    "endLine": 1,
    "startColumn": 0,
    "endColumn": 5
  },
  {
    "type": "'donut'",
    "value": "donut",
    "startOffset": 6,
    "endOffset": 11,
    "startLine": 1,
    "endLine": 1,
    "startColumn": 6,
    "endColumn": 11
  }
]

If we want to ignore all the extra information that the Syntax tool logs to the terminal, we can use the --tokenizer-only option.

shell

Copied! ⭐️

syntax-cli -g pizza.g -m lalr1 -p 'pizza donut' --tokenize --tokenizer-only

Another helpful tool is the --validate option. This option can be passed to the Syntax CLI tool to validate or check whether our grammar is free from conflicts or not. It'll also provide possible solutions for resolving conflicts.

shell

Copied! ⭐️

syntax-cli -g pizza.g -m lalr1 -p 'pizza donut' --validate

After running the above command, we should see the following when the grammar is free from any conflicts.

text

Copied! ⭐️

Parsing mode: LALR1_BY_SLR(1).

✓ Grammar doesn't have any conflicts!


Parsing:

pizza donut

✓ Accepted

Parsed value: 

pizza 🍕 donut 🍩

For more advanced debugging, we can use the --collection or -c option to generate and output canonical collections of LR items. If you're confused by what I just said, I don't blame you. The internal workings of a LR parser are quite complicated.

Basically, the Syntax tool implements algorithms for designing LR parsers and part of the algorithm for creating a LR parser involves generating "canonical collections". The parser generator uses canonical collections to figure out all the possible states and transitions between states. This information is used to detect conflicts and for creating parsing tables that the parser uses when scanning input.

Let's see run a command for obtaining the canonical collections for an LR parser.

shell

Copied! ⭐️

syntax-cli -g pizza.g -m lalr1 -p '1 pizza donut' -c

After running the command, we should see the following output.

text

Copied! ⭐️

Parsing mode: LALR1_BY_SLR(1).

Canonical collection of LR items:

Grammar:

    0. $accept -> E
    ----------------
    1. E -> E 'pizza'
    2.    | E 'donut'
    3.    | 'pizza'
    4.    | 'donut'
    5.    | NUMBER

State 0:
  - $accept -> • E (kernel, goes to state 1)
  - E -> • E 'pizza' (goes to state 1)
  - E -> • E 'donut' (goes to state 1)
  - E -> • 'pizza' (shift, goes to state 2)
  - E -> • 'donut' (shift, goes to state 3)
  - E -> • NUMBER (shift, goes to state 4)

State 1:
  - $accept -> E • (kernel, accept)
  - E -> E • 'pizza' (kernel, shift, goes to state 5)
  - E -> E • 'donut' (kernel, shift, goes to state 6)

State 2: (final)
  - E -> 'pizza' • (kernel, reduce by production 3)

State 3: (final)
  - E -> 'donut' • (kernel, reduce by production 4)

State 4: (final)
  - E -> NUMBER • (kernel, reduce by production 5)

State 5: (final)
  - E -> E 'pizza' • (kernel, reduce by production 1)

State 6: (final)
  - E -> E 'donut' • (kernel, reduce by production 2)

Parsing:

1 pizza donut

✓ Accepted

Parsed value: 

1 pizza 🍕 donut 🍩

All the little dots • represent a possible position in an expression. The parser is thinking about all the possible derivations and how it can transition to a different state. Everything begins at State 0. The parser will either perform a "shift" or "reduce" action until it reaches one the possible final states.

If we want more information about the parsing tables that are created by the parser generator, we can use the --table or -t option we mentioned earlier in this tutorial.

shell

Copied! ⭐️

syntax-cli -g pizza.g -m lalr1 -p 'pizza donut' -t

The output will contain a variety of information such as a numbered list of all the production rules of our grammar and a parsing table containing all the various states our parser can transition to.

text

Copied! ⭐️

Parsing mode: LALR1_BY_SLR(1).

Grammar:

    1. $accept -> E
    ----------------
    2. E -> E 'pizza'
    3.    | E 'donut'
    4.    | 'pizza'
    5.    | 'donut'
    6.    | NUMBER

LALR1_BY_SLR(1) parsing table:

┌───┬─────────┬─────────┬────────┬─────┬───┐
│   │ 'pizza' │ 'donut' │ NUMBER │ $   │ E │
├───┼─────────┼─────────┼────────┼─────┼───┤
│ 0 │ s2      │ s3      │ s4     │     │ 1 │
├───┼─────────┼─────────┼────────┼─────┼───┤
│ 1 │ s5      │ s6      │        │ acc │   │
├───┼─────────┼─────────┼────────┼─────┼───┤
│ 2 │ r3      │ r3      │        │ r3  │   │
├───┼─────────┼─────────┼────────┼─────┼───┤
│ 3 │ r4      │ r4      │        │ r4  │   │
├───┼─────────┼─────────┼────────┼─────┼───┤
│ 4 │ r5      │ r5      │        │ r5  │   │
├───┼─────────┼─────────┼────────┼─────┼───┤
│ 5 │ r1      │ r1      │        │ r1  │   │
├───┼─────────┼─────────┼────────┼─────┼───┤
│ 6 │ r2      │ r2      │        │ r2  │   │
└───┴─────────┴─────────┴────────┴─────┴───┘


Parsing:

1 pizza donut

✓ Accepted

Parsed value: 

1 pizza 🍕 donut 🍩

Do you see all the numbers in the parsing table next to s (shift) and r (reduce)? They represent the state transitions we saw in the canonical collections.

Have I confused you yet? It's okay to not understand things like canonical collections and parsing tables. I still can't fully explain them myself 😅

Wikipedia has a good discussion on LR parser theory. After reading the Wikipedia article a few times, things start to make more sense. Don't worry though because my tutorial series can be completed without much knowledge on the inner workings of LR parsers. We'll focus on building grammars and using the Syntax tool to handle the complicated stuff for us.

The Syntax tool also provides a special debug mode. We can activate debug mode by using the --debug or -d option.

shell

Copied! ⭐️

syntax-cli -g pizza.g -m lalr1 -p 'pizza donut' -d

The above command will output certain diagnostics such as measuring the timing of certain steps during the parser generation process as well as other information.

text

Copied! ⭐️

DEBUG mode is: ON

[DEBUG] Grammar (bnf) is in JS format
[DEBUG] Grammar loaded in: 0.906ms

Parsing mode: LALR1_BY_SLR(1).

Parsing:

pizza donut

[DEBUG] Building canonical collection: 1.318ms
[DEBUG] Number of states in the collection: 7
[DEBUG] LALR-by-SLR: Building extended grammar for LALR: 0.934ms
[DEBUG] Building Follow sets: 0.545ms
[DEBUG] LALR-by-SLR: Group extended productions by final sets: 0.279ms
[DEBUG] LALR-by-SLR: Updating item reduce sets: 0.134ms
[DEBUG] Building LALR-by-SLR: 2.584ms
[DEBUG] Building LR parsing table: 0.342ms
[DEBUG] LR parsing: 0.803ms
✓ Accepted

Parsed value: 

pizza 🍕 donut 🍩

For more information on debugging grammars, parsers, or the Syntax tool, please consult the official documentation or run syntax-cli --help in the terminal to see what other options may be of use to you.

Conclusion

This concludes the introduction to the Syntax parser generator tool. The Syntax tool is an amazing piece of software because it can help us understand various parser algorithms, provide helpful error messages, and provide lots of convenient features when building grammars.

The Syntax tool can also build parsers for multiple languages, not just JavaScript! In the next tutorial, we'll start building a math expression parser instead of a pizza-donut parser 🍕🍩. See you there!