# How to Build a PEG-based Parser for Math Expressions with Peggy

Greetings, friends! This is my thirteenth post in the math expression parser series. In this tutorial, we'll learn how to use the Peggy parser generator to build a math expression using parsing expression grammars. Let's get started!

## What is Peggy?

Peggy is a parser generator that supports parsing expression grammars. It is a modern fork of the PEG.js library. Since PEG.js is no longer being actively maintained, a group of developers decided to make Peggy to support newer JavaScript features.

Peggy has been used by a variety of tools including Kibana, a data visualization dashboard software for Elasticsearch, a powerful search engine tool.

The predecessor to Peggy, PEG.js, has been used in a variety of tools as well including Markdoc, a markdown-based syntax and toolchain for creating custom documentation sites.

For the most part, grammars made in PEG.js should still be backwards compatible with Peggy. Searching for questions related to PEG.js is still helpful for solving issues related to building grammars for Peggy. There might be a few syntax changes we may need to make when going between PEG.js and Peggy.

In this tutorial, we'll develop our own parsing expression grammar so that Peggy can generate a math expression parser for us.

## Installing Peggy

Let's install the Peggy parser generator globally on our computer, since this software is meant to be used as a CLI tool.

```
npm i -g peggy
```

Next, try running the following command to make sure the program was installed correctly.

```
peggy --help
```

The `--help`

option will display a list of all the other options we can pass to the Peggy CLI tool. We'll use the `-t`

option throughout this tutorial to test various inputs with a generated parser.

## Online Peggy Playground

In this tutorial, we'll be using the Peggy npm package we just installed, but I'd like to mention another tool in case you find it helpful. The online version of the Peggy tool can be used to test out grammars quickly using a user interface (UI). This tool also lets us download a parser, generated from Peggy, using the grammar we provide in the UI.

## Context-free Grammars

In the previous two tutorials, we've been creating grammars that are similar to Context-free grammars (CFG). Most parser generators that support CFG-like syntax, including the Syntax tool, provide some kind of mechanism for storing context in the grammar, making them context-sensitive. Technically, we could have created parsers that tokenized values depending on context.

There are multiple programming languages that can't be considered context-free due to how complicated their syntax is. The parser might rely on context as it scans through a file coded in some kind of programming language. The JavaScript language is a good example.

In JavaScript, the `+`

symbol has multiple meanings depending on how many occur in an expression and whether it appears next to an identifier. For instance, the prefix `++id`

, postfix `id++`

, and addition `1 + 2`

operations all use the `+`

symbol. What would happen if we used them all together?

```
let num = 0;
console.log(++num+num+++1); // 3
console.log(num); // 2
```

Looks weird, huh? The parser has to be able to handle tricky situations such as this. The `+`

symbol is ambiguous until we have enough information before and after it. Using context is one solution to handle this "ambiguity" problem. However, clever computer scientists came up with the notion of parsing expression grammars to help remove ambiguities.

## Parsing Expression Grammars

When developing production rules in our grammars, we've been using a notation similar to Backus–Naur form. Below is a small grammar we fed into a LR parser in Part 11 using the Syntax tool.

```
%lex
%%
\s+ /* skip whitespace */
\d+ return 'NUMBER'
/lex
%left '+'
%left '*'
%%
e
: e '+' e { $$ = $1 + $3 }
| e '*' e { $$ = $1 * $3 }
| NUMBER { $$ = Number($1) }
;
```

We saw that if we didn't add special `%left`

modifiers, our parser would complain about a shift-reduce conflict caused by an ambiguity. For the input, `1 + 2 * 3`

, the parser didn't know whether it should perform addition or resolve an expression as a number.

In parsing expression grammars, most ambiguity is resolved automatically by how we write the grammar. In the Peggy tool, we need to learn a completely new notation for creating grammar files. In parsing expression grammars, we keep trying production rules *in order* and stop when a rule is matched. This contrasts with the BNF-like notation used in the grammars we wrote in previous tutorials.

Let's create a grammar for both the Syntax tool and Peggy to see how they differ. First, create a small grammar file called `pizza.g`

that can be used with the Syntax tool.

```
%%
food
: 'pizza'
| 'pizza' 'donut'
;
```

The above grammar can accept either `pizza`

or `pizzadonut`

(without whitespaces). We can use the Syntax tool to prove these inputs are valid.

```
syntax-cli -g pizza.g -m lalr1 -p 'pizza'
# Parsed value: pizza
syntax-cli -g pizza.g -m lalr1 -p 'pizzadonut'
# Parsed value: pizzadonut
```

Here's a parsing expression grammar (PEG) that can be accepted by Peggy.

```
Food
= 'pizza'
/ 'pizza' 'donut'
```

Now, we'll run a few tests in Peggy to see if the same inputs are valid.

```
peggy pizza.pegjs -t 'pizza'
# Output: 'pizza'
peggy pizza.pegjs -t 'pizzadonut'
# Error running test
# Error: Expected end of input but "d" found.
# --> command line:1:6
# |
# 1 | pizzadonut
# | ^
```

Why did `pizzadonut`

fail? It's because the `/`

symbol in a PEG represents a *choice* operator. If the first choice succeeds, then the parser stops and never reaches the other choices. In our `pizza.pegjs`

grammar file, `pizza`

is our first choice for `Food`

(isn't it everyone's? 😋) which means the parser stops, so it never sees the `pizzadonut`

option.

## Packrat Parsing

Peggy uses a special kind of parsing algorithm known as packrat parsing. Using memoization (caching), packrat parsers can achieve linear time performance, making them really fast. However, this means that packrat parsers consume more memory (due to its caching mechanism).

There's a great series by Guido van Rossum, the creator of Python, where he talks about the process of building a PEG-based parser using packrat parsing and memoization.

The standard implementation of the default Python interpreter, CPython, used to parse Python by a LL(1) parser called pgen. Ever since Python 3.9, the new PEG-based parser called pegen parses Python instead. I must say, "pegen" is a great pun 😂

There's a great discussion on how the new pegen parser works in the Python's Developer Guide. This guide also talks a bit about how PEG-based parsers work in general.

## Left Recursion

We can think of PEG-based parsers (packrat parsers) as a sibling of LL(*) parsers where the `*`

refers to an infinite lookahead. In LL(1) parsers, we only used one lookahead token, but a packrat parser can essentially have infinite lookahead. This may seem like the parser would get slower as we use more lookaheads, but memoization helps improve performance.

However, packrat parsers can still suffer from the dreadful left recursion problem we mentioned in the previous tutorial when we discussed LL parsers. There are ways of mitigating this problem in packrat parsers using some tricks, but for Peggy (and PEG.js), left recursion should be avoided. If left recursion appears in our grammar, then we may get the following error when testing a generated parser.

```
Error parsing grammar
Maximum call stack size exceeded
```

Yuck! We'd like to avoid that!

## The Peggy VS Code Language Extension

When developing a grammar, it'd be nice to get syntax highlighting. We can think of a grammar as its own little language. Sometimes, a grammar file that's used to build a parser for parsing the syntax used in that same grammar is called a "metagrammar".

Some very thoughtful developers have made a VS Code extension for syntax highlighting the Peggy grammar language. It's super useful. It provides not only syntax highlighting but also error reporting, IntelliSense, Symbols, and a language server. Truly amazing!

I highly recommend installing this extension if you're using VS Code. It automatically detects left recursion and helps us find references for when our grammar becomes really long (which definitely happens when people are building a programming language).

## Starting our PEG-based Math Expression Parser

Let's start building a grammar for a PEG-based math expression parser. As always, we'll start with addition. Create a new file called `calc.pegjs`

that contains the following contents.

```
Start
= Expression
Expression
= Primary "+" Expression
/ Primary
Primary
= [0-9]+
```

The grammar rule, `Primary`

, refers to a "primary expression." Like the previous tutorial, we can think of it as the final part of our parse tree. A primary expression usually resolves into simple values such as numbers, unary expressions, or parenthesized expressions.

We can run the following commands to test our grammar using Peggy.

```
peggy calc.pegjs -t '1'
# Output: [ '1' ]
peggy calc.pegjs -t '1+2'
# Output: [ [ '1' ], '+', [ '2' ] ]
```

Keep in mind that we haven't implemented whitespaces just yet in our grammar, so the input, `1 + 2`

won't work, but `1+2`

will work. We also haven't implemented semantic actions, so Peggy will output an array of characters by default.

By convention, we name our grammar files with a `.pegjs`

extension. This convention originated from Peggy's predecessor, PEG.js. You can technically use whatever extension you'd like, but it's helpful to use the `.pegjs`

filename extension to get syntax highlighting quickly in VS Code with the Peggy VS Code extension.

Notice how similar our grammar looks compared to the grammar we built for a LL parser in the previous tutorial.

```
%lex
%%
\s+ /* skip whitespace */
\d+ return 'NUMBER'
/lex
%%
expression
: primary expression'
;
expression'
: "+" primary expression'
| /* epsilon */
;
primary
: NUMBER
;
```

Both the BNF-like grammar used for the Syntax tool and the parsing expression grammar (PEG) used for Peggy are written such that they avoid left recursion. As mentioned earlier in this tutorial, a PEG-based parser reads the PEG file and stops when it matches one of the choices separated by `/`

.

Let's analyze what's happening in our `calc.pegjs`

file.

```
Start
= Expression
Expression
= Primary "+" Expression
/ Primary
Primary
= [0-9]+
```

The `/`

symbol is like saying "or". The `Expression`

symbol can be either `Primary "+" Expression`

OR be `Primary`

. If `Primary "+" Expression`

is matched, then don't proceed to check the `Primary`

choice after the `/`

symbol.

Suppose we were parsing the expression, `1+2`

(without whitespaces). The starting condition, `Start`

, is our starting point. The derivation process will be similar to the following:

```
1. Start → Expression
2. → Primary "+" Expression / Primary
3. → Primary "+" Expression
4. → Primary "+" (Primary "+" Expression / Primary)
5. → Primary "+" Primary
6. → 1 + 2
```

In step 1, `Start`

derives to `Expression`

. Technically, we don't really need the `Start`

grammar rule. We could have simply started with `Expression`

instead.

```
Expression
= Primary "+" Expression
/ Primary
Primary
= [0-9]+
```

In step 2, we are trying to derive `Expression`

, but we're left with a choice: `Primary "+" Expression`

or `Primary`

. In Step 3, the parser figures out that `Primary "+" Expression`

must be the choice because there's a `+`

sign coming up in the expression, `1+2`

. In Step 4, the `Expression`

symbol is resolved to another choice between `Primary "+" Expression`

or `Primary`

. This time, the parser sees a `2`

coming up and nothing else after that, so `Primary`

is selected as the choice in Step 5. Finally, the parse completes in Step 6.

Now, the whole process isn't exactly like that when Peggy parses an input, but I hope you get the idea. The `/`

determines choices, which means order is very important in a parsing expression grammar.

## Semantic Actions

Most parser generators support the ability to use semantic actions when a parser has completed parsing a portion of an input. Similar to the Syntax tool, we can insert JavaScript code inside curly braces `{ }`

. A semantic action is referred to as a "rule action" in the Peggy documentation. Let's create a semantic action to add numbers together. Inside `calc.pegs`

, replace the contents of the file with the following.

```
Expression
= left:Primary "+" right:Expression { return left + right; }
/ Primary
Primary
= digits:[0-9]+ { return parseInt(digits.join(""), 10); }
```

Run the following command to make sure the parser works correctly.

```
peggy calc.pegjs -t '1'
# Output: 1
peggy calc.pegjs -t '1+2'
# Output: 3
```

Notice how we added names in our grammar rules such as `left:Primary`

. Why did we do that? In Peggy, we need to assign "labels" to the left of "rule names" in order to use them as variables in a corresponding semantic action. Let's take a closer look at the `Expression`

grammar rule:

```
Expression
= left:Primary "+" right:Expression { return left + right; }
/ Primary
```

In the grammar rule above, `left`

is a label. `Primary`

and `Expression`

are rule names. In a semantic action (aka rule action), we use the *labels* to serve as variables in the JavaScript code. We then return `left + right`

as a number, since the semantic action corresponding to `digits:[0-9]+`

returns a number.

Next, let's take a closer look at what's going on in the semantic action for the grammar rule containing `digits:[0-9]+`

.

```
Primary
= digits:[0-9]+ { return parseInt(digits.join(""), 10); }
```

In the grammar rule above, we are using a "character match" as seen in the documentation for parsing expression types. Character matches behave similar to the square bracket `[ ]`

notation used in regular expressions. The `[0-9]+`

notation means that we should match one or more integers where each integer is a digit between `0`

(zero) and `9`

. Therefore, we can match multi-digit numbers such as `7`

, `10`

, `123`

, etc.

In the semantic action for `digits:[0-9]+`

, we are using some JavaScript to return a multi-digit number with the JavaScript type of `number`

. This is done by using the following code:

```
return parseInt(digits.join(""), 10);
```

Why is this necessary though? It's due to how Peggy returns values as it parses an input. By default, Peggy will return an array of matched characters. We can see this happen if we remove the semantic actions from our grammar in `calc.pegjs`

.

```
Expression
= left:Primary "+" right:Expression
/ Primary
Primary
= digits:[0-9]+
```

Let's run the following command using the grammar above.

```
peggy calc.pegjs -t '12'
# Output: [ '1', '2' ]
```

As we can see, we get back an array of characters from the input, `12`

. Let's now change our grammar in `calc.pegjs`

again to only have a semantic action for the `Expression`

rule.

```
Expression
= left:Primary "+" right:Expression { return left + right; }
/ Primary
Primary
= digits:[0-9]+
```

Run the following command using the grammar above.

```
peggy calc.pegjs -t '12+34'
# Output: '1,23,4'
```

That's a strange output. What happened? It turns out that this is just regular good ol' fashioned JavaScript quirkiness. During the parse, something similar to the following is happening.

```
function parse() {
const left = '12'.split(''); // ['1', '2']
const right = '34'.split(''); // ['3', '4']
// ['1', '2'] + ['3', '4']
return left + right;
}
const result = parse();
console.log(result); // '1,23,4'
```

As we can see, adding arrays of strings is perfectly valid in JavaScript. Therefore, we end up with the result, `1,23,4`

when we parse `12+34`

. To fix this, we use the following line:

```
return parseInt(digits.join(""), 10);
```

Then, the parser does something similar to the following:

```
function parse() {
let digits;
digits = '12'.split('')
const left = parseInt(digits.join(''), 10); // 12
digits = '34'.split('')
const right = parseInt(digits.join(''), 10); // 34
// 12 + 34
return left + right;
}
const result = parse();
console.log(result); // 46
```

We can return our grammar back to normal and confirm that `12+34`

equals `46`

. Update the `calc.pegjs`

grammar to have the following contents.

```
Expression
= left:Primary "+" right:Expression { return left + right; }
/ Primary
Primary
= digits:[0-9]+ { return parseInt(digits.join(""), 10); }
```

Then, run the following command, and we should see an output of `46`

.

```
peggy calc.pegjs -t '12+34'
# Output: 46
```

## Action Execution Environment

In Peggy, semantic actions run in an action execution environment where they have access to various variables and functions provided to us by Peggy. One of these functions happens to be the `text()`

function which returns the source text between the start and end of matched text.

We can use the `text()`

function to rewrite our grammar rule for handling digits. This will make the grammar look less verbose.

```
Expression
= left:Primary "+" right:Expression { return left + right; }
/ Primary
Primary
= [0-9]+ { return parseInt(text(), 10); }
```

We should get the same result as before.

```
peggy calc.pegjs -t '12+34'
# Output: 46
```

Peggy stores location information about characters from a given input. For example, the string, `'12+34'`

, contains five characters. The parser knows that `'12'`

starts at position 0 of the string and ends at position 1. Similarly, the parser knowns that `'34'`

starts at position 3 and ends at position 4.

The `text()`

function will return the source text between the `start`

and `end`

of a token. Therefore, the `text()`

function will automatically return the strings, `'12'`

and `'34'`

. We can then use `parseInt`

to convert these strings into numbers.

## Handling Whitespaces

Currently, Peggy throws an error if we try to use whitespaces when running the following command:

```
peggy calc.pegjs -t '12 + 34'
# Error running test
# Error: Expected "+", [0-9], or end of input but " " found.
# --> command line:1:3
# |
# 1 | 12 + 34
# | ^
```

We need to add whitespaces to our grammar, but it won't be as straightforward as in previous tutorials because we need to specify everywhere a whitespace is allowed. We don't have access to a tokenizer that says, "Oh! A whitespace! I'll skip that for you." Inside the grammar files we pass to Peggy, we need to adjust each rule to include the possibility of whitespaces occurring.

```
Expression
= left:Primary _ "+" _ right:Expression { return left + right; }
/ Primary
Primary
= _ [0-9]+ _ { return parseInt(text(), 10); }
_ "whitespace"
= [ \t\n\r]*
```

In the grammar above, we recognize that it's possible to have a whitespace appear between numbers and the `+`

operator. It's also possible for whitespaces to occur before or after numbers.

The underscore `_`

is a common convention you'll see in `.pegjs`

files when people define a rule name for whitespaces. In the grammar, we use `_`

as a rule name, but we could have called it `ws`

or something else. An underscore is allowed by Peggy because it is considered a valid identifier.

The JavaScript string, `"whitespace"`

to the right of the underscore is known as a *human-readable name* in the documentation. This name is used in error messages as well as symbols in the Peggy VS Code extension. We can make a human-readable name for any of our grammar rule names (`Start`

, `Expression`

, `Primary`

, etc.).

Let's run the following commands to make sure the parser understands how to deal with whitespaces.

```
peggy calc.pegjs -t '12'
# Output: 12
peggy calc.pegjs -t ' 12 '
# Output: 12
peggy calc.pegjs -t '12 + 34'
# Output: 46
peggy calc.pegjs -t ' 12 + 34 '
# Output: 46
```

## Handling Parentheses

Adding parentheses to our grammar rules is super simple. We can add a new expression choice at the beginning of the `Primary`

grammar rule.

```
Expression
= left:Primary _ "+" _ right:Expression { return left + right; }
/ Primary
Primary
= _ "(" _ expr:Expression _ ")" _ { return expr; }
/ _ [0-9]+ _ { return parseInt(text(), 10); }
_ "whitespace"
= [ \t\n\r]*
```

We can technically make the parentheses grammar rule cleaner by using the `@`

operator, known as the "pluck" operator in the documentation which lets us pluck a value out of a grammar rule instead of returning the entire rule. Instead of writing the following:

```
Primary
= _ "(" _ expr:Expression _ ")" _ { return expr; }
/ _ [0-9]+ _ { return parseInt(text(), 10); }
```

We can simplify the grammar rule using the `@`

pluck operator:

```
Primary
= _ "(" _ @Expression _ ")" _
/ _ [0-9]+ _ { return parseInt(text(), 10); }
```

This will implicitly return the `Expression`

instead of having to write a label and semantic action.

By inserting the parentheses logic inside the `Primary`

rule, we can guarantee higher priority than addition. Remember, we need to add underscores `_`

to represent every possible occurrence of a whitespace.

For the semantic action, we simply return whatever expression is inside the parentheses and let recursion handle going through all the rules and other semantic actions until we calculate a final result.

Run the following command to confirm that the parentheses work correctly.

```
peggy calc.pegjs -t '(1)'
# Output: 1
peggy calc.pegjs -t '( 1 )'
# Output: 1
peggy calc.pegjs -t ' (1) '
# Output: 1
peggy calc.pegjs -t '(1 + 2)'
# Output: 3
peggy calc.pegjs -t ' (1 + 2) '
# Output: 3
peggy calc.pegjs -t '(1 + 2) + 3'
# Output: 6
```

You may have noticed that we're duplicating some information in our grammar inside the `Primary`

grammar rule.

```
Primary
= _ "(" _ @Expression _ ")" _
/ _ [0-9]+ _ { return parseInt(text(), 10); }
```

In both choices, we have a `_`

before and after the main expression. We can split up the `Primary`

rule into multiples rules to make our grammar look cleaner. In the updated grammar below, `Expression`

rule now derives to a new rule called `Group`

(as in a "whitespace-padded" group) instead of `Primary`

.

```
Expression
= left:Group _ "+" _ right:Expression { return left + right; }
/ Group
Group
= _ @Primary _
Primary
= "(" _ @Expression _ ")"
/ [0-9]+ { return parseInt(text(), 10); }
_ "whitespace"
= [ \t\n\r]*
```

All of the commands we ran earlier should still pass. When building parsers, it's always a good idea to run test cases frequently. Parsers are fragile creatures. One small change could cause unexpected parse results.

## Handling Multiplication

In the previous tutorial, you may recall that we had to add new grammar rules to our LL parser as we added more mathematical operations. In a similar fashion, we need to add new rules and shift our existing rules as we add new mathematical operators with higher precedence.

```
Expression
= left:Term _ "+" _ right:Expression { return left + right; }
/ Term
Term
= left:Group _ "*" _ right:Term { return left * right; }
/ Group
Group
= _ @Primary _
Primary
= "(" _ @Expression _ ")"
/ [0-9]+ { return parseInt(text(), 10); }
_ "whitespace"
= [ \t\n\r]*
```

Have you noticed the pattern? The `Term`

rule was inserted between `Expression`

and `Group`

because multiplication has a higher priority than addition and lower priority than parentheses. The starting condition is generally the lowest priority operation in both LL parsers and PEG-based parsers.

Let's run a few commands to make sure addition, multiplication, parentheses, and whitespaces behave correctly.

```
peggy calc.pegjs -t '1 * 2'
# Output: 2
peggy calc.pegjs -t '1 + 2 * 3'
# Output: 7
peggy calc.pegjs -t '2 * 3 + 1'
# Output: 7
peggy calc.pegjs -t '(1 + 2) * 3'
# Output: 9
peggy calc.pegjs -t '3 * (1 + 2)'
# Output: 9
peggy calc.pegjs -t ' (1 + 2) * 3 '
# Output: 9
```

Everything should pass! Next, let's add subtraction and division.

## Issues with Subtraction and Division

Implementing subtraction and division in our grammar is a bit tricky with Peggy. Since subtraction has the same precedence as addition and multiplication has the same precedence as division, you would think it would be as straightforward as the following.

```
Expression
= left:Term _ "+" _ right:Expression { return left + right; }
/ left:Term _ "-" _ right:Expression { return left - right; }
/ Term
Term
= left:Group _ "*" _ right:Term { return left * right; }
/ left:Group _ "/" _ right:Term { return left / right; }
/ Group
Group
= _ @Primary _
Primary
= "(" _ @Expression _ ")"
/ [0-9]+ { return parseInt(text(), 10); }
_ "whitespace"
= [ \t\n\r]*
```

However, there are issues with this grammar. Let's see what happens when we run the following commands:

```
peggy calc.pegjs -t '5 - 2 - 1'
# Output: 4
peggy calc.pegjs -t '12 / 2 / 3'
# Output: 18
```

I'm no math genius, but I can tell that `5 - 2 - 1`

should equal `2`

and `12 / 2 / 3`

should equal `2`

. Due to how our grammar is arranged, the following arithmetic happens instead.

```
5 - 2 - 1 → 5 - 2 + 1 = 3 + 1 = 4
12 / 2 / 3 → 12 / 2 * 3 = 6 * 3 = 18
```

In order to resolve these issues, we need to update our grammar to use parsing expression types.

## Parsing Expression Types

In the Peggy documentation, you will find information about various parsing expression types. Peggy supports pattern matching by using special symbols and notations. Some notations may even look similar to regular expressions.

As a side note, there's a grammar notation known as Extended Backus–Naur form (EBNF) that uses "regular expression"-like syntax as well. Some parser generators will allow EBNF syntax because it's less verbose than the BNF-like syntax we used when designing grammars with the Syntax tool. Peggy lets us use a similar syntax, but it's not the same as EBNF notation (though it might have been inspired by it).

An important parsing expression type is the `(expression)*`

syntax to match zero or more occurrences of a grammar rule. Let's look at an example.

Create a new grammar file called `exp-types.pegjs`

that has the following contents.

```
Food
= ( Pizza / Donut )*
Pizza
= 'pizza'
Donut
= 'donut'
```

This grammar will accept zero or more occurrences of either the `Pizza`

grammar rule or `Donut`

grammar rule. Let's run a few commands to see how this will work. Remember, Peggy will return an array of parsed tokens by default when there are no semantic actions.

```
peggy exp-types.pegjs -t ''
# Output: []
peggy exp-types.pegjs -t 'pizza'
# Output: [ 'pizza' ]
peggy exp-types.pegjs -t 'donut'
# Output: [ 'donut' ]
peggy exp-types.pegjs -t 'pizzapizza'
# Output: [ 'pizza', 'pizza' ]
peggy exp-types.pegjs -t 'pizzadonut'
# Output: [ 'pizza', 'donut' ]
peggy exp-types.pegjs -t 'donutdonut'
# Output: [ 'donut', 'donut' ]
peggy exp-types.pegjs -t 'pizzadonutpizzadonut'
# Output: [ 'pizza', 'donut', 'pizza', 'donut' ]
```

We can repeat `pizza`

and `donut`

infinitely because we're using the `(Pizza/Donut)*`

parsing expression type.

This knowledge can help us support mathematical operators that have the same precedence such as addition/subtraction and multiplication/division.

## Handling Subtraction and Division

Now that we understand parsing expression types a bit more, let's use them to handle subtraction and division in our `calc.pegjs`

grammar file.

```
Expression
= head:Term tail:(_ ("+" / "-") _ Term)* {
return tail.reduce(function (result, element) {
if (element[1] === "+") { return result + element[3]; }
if (element[1] === "-") { return result - element[3]; }
}, head);
}
Term
= head:Group tail:(_ ("*" / "/") _ Group)* {
return tail.reduce(function (result, element) {
if (element[1] === "*") { return result * element[3]; }
if (element[1] === "/") { return result / element[3]; }
}, head);
}
Group
= _ @Primary _
Primary
= "(" _ @Expression _ ")"
/ [0-9]+ { return parseInt(text(), 10); }
_ "whitespace"
= [ \t\n\r]*
```

This grammar may look overwhelming, so let's analyze what's happening inside the `Expression`

rule.

```
Expression
= head:Term tail:(_ ("+" / "-") _ Term)* {
return tail.reduce(function (result, element) {
if (element[1] === "+") { return result + element[3]; }
if (element[1] === "-") { return result - element[3]; }
}, head);
}
```

We're creating a label called `head`

that corresponds to the `Term`

rule name. Then, we're creating a label called `tail`

that corresponds to an expression that repeats zero or more times. To understand the `tail.reduce`

stuff, it might be helpful to insert a couple `console.log`

statements inside the semantic action.

```
Expression
= head:Term tail:(_ ("+" / "-") _ Term)* {
console.log('head:', head);
console.log('tail:', tail);
return tail.reduce(function (result, element) {
if (element[1] === "+") { return result + element[3]; }
if (element[1] === "-") { return result - element[3]; }
}, head);
}
```

When we run our `calc.pegjs`

grammar against the Peggy tool, we should see the following arrays for the input, `1 + 2`

.

```
head: 1
tail: [ [ [ ' ' ], '+', [ ' ' ], 2 ] ]
```

The `head`

is equal to `1`

, and the `tail`

is equal to the rest of the tokens found in the input. There are arrays that contain `' '`

because of the whitespaces. We can use the native reduce method in JavaScript to add or subtract depending on the operator found in the array.

If the grammar rules for `Expression`

still look confusing, we can take away the labels, subexpressions, and semantic actions to obtain the following.

```
Expression
= Term (_ ("+" / "-") _ Term)*
```

Basically, we're creating a subexpression, `("+" / "-")`

, within a larger expression `(_ ("+" / "-") _ Term)*`

. The larger expression is repeated zero or more times, similar to what happened in our "pizza-donut" grammar we made earlier. By using a parsing expression type like this in our grammar, we can be sure that addition is only chosen when the next operator is a `+`

symbol and subtraction is only chosen when the next operator is a `-`

symbol.

Using our new grammar, let's rerun the following commands we tried earlier.

```
peggy calc.pegjs -t '5 - 2 - 1'
# Output: 2
peggy calc.pegjs -t '12 / 2 / 3'
# Output: 2
```

Looks like everything is working now!

But wait! There's more! Notice how the reducer function passed into the `tail.reduce()`

method is similar for each math operation. We can create a function called `operatorReducer`

that determines which math operation to apply using simple switch-case statements.

```
{{
function operatorReducer (result, element) {
const left = result;
const right = element[3];
const op = element[1];
switch (op) {
case '+':
return left + right;
case '-':
return left - right;
case '*':
return left * right;
case '/':
return left / right;
default:
throw new Error(`Invalid operation: ${op}`);
}
}
}}
Expression
= head:Term tail:(_ ("+" / "-") _ Term)* {
return tail.reduce(operatorReducer, head);
}
Term
= head:Group tail:(_ ("*" / "/") _ Group)* {
return tail.reduce(operatorReducer, head);
}
Group
= _ @Primary _
Primary
= "(" _ @Expression _ ")"
/ [0-9]+ { return parseInt(text(), 10); }
_ "whitespace"
= [ \t\n\r]*
```

Our grammar looks so much cleaner now! 🤩

Peggy lets us inject JavaScript at the top of the grammar file in between a set of double curly braces `{{ }}`

. In the documentation, this is referred to as a "global initializer". The global initializer is executed only once when a generated parser is loaded such as through the command line or with a `require`

or `import`

statement.

## Handling Exponentiation

Let's now implement exponentiation in our grammar. This process is similar to multiplication. We create a new rule called `Factor`

and shift the rules for `Term`

such that it points to `Factor`

instead of `Group`

. We also add a new `^`

operator inside our `operatorReducer`

function.

```
{{
function operatorReducer (result, element) {
const left = result;
const right = element[3];
const op = element[1];
switch (op) {
case '+':
return left + right;
case '-':
return left - right;
case '*':
return left * right;
case '/':
return left / right;
case '^':
return left ** right;
default:
throw new Error(`Invalid operation: ${op}`);
}
}
}}
Expression
= head:Term tail:(_ ("+" / "-") _ Term)* {
return tail.reduce(operatorReducer, head);
}
Term
= head:Factor tail:(_ ("*" / "/") _ Factor)* {
return tail.reduce(operatorReducer, head);
}
Factor
= head:Group tail:(_ "^" _ Factor)* {
return tail.reduce(operatorReducer, head);
}
Group
= _ @Primary _
Primary
= "(" _ @Expression _ ")"
/ [0-9]+ { return parseInt(text(), 10); }
_ "whitespace"
= [ \t\n\r]*
```

Notice the `tail`

label refers to `Factor`

within the `Factor`

rule. Therefore, the rule for exponentiation becomes:

```
Factor = Group ("^" Factor)*
```

This will ensure that the exponentiation operation is right associative which is important in expressions such as `2 ^ 2 ^ 3`

, which should be equal to `256`

.

We can test this grammar by running the following commands:

```
peggy calc.pegjs -t '2 ^ 3'
# Output: 8
peggy calc.pegjs -t '2 ^ 2 ^ 3'
# Output: 256
peggy calc.pegjs -t '(2 ^ 2) ^ 3'
# Output: 64
peggy calc.pegjs -t '2 ^ 3 * 2'
# Output: 16
```

## Handling Unary Operations

Adding unary operations is super simple in a PEG-based grammar. Similar to the previous tutorial, we just need to add an extra option in our `Primary`

grammar rule.

```
Primary
= "(" _ @Expression _ ")"
/ "-" expr:Factor { return -expr; }
/ [0-9]+ { return parseInt(text(), 10); }
```

For the semantic action, we create a new label called `expr`

and return the negative version of it. Thus, we end up with the following grammar:

```
{{
function operatorReducer (result, element) {
const left = result;
const right = element[3];
const op = element[1];
switch (op) {
case '+':
return left + right;
case '-':
return left - right;
case '*':
return left * right;
case '/':
return left / right;
case '^':
return left ** right;
default:
throw new Error(`Invalid operation: ${op}`);
}
}
}}
Expression
= head:Term tail:(_ ("+" / "-") _ Term)* {
return tail.reduce(operatorReducer, head);
}
Term
= head:Factor tail:(_ ("*" / "/") _ Factor)* {
return tail.reduce(operatorReducer, head);
}
Factor
= head:Group tail:(_ "^" _ Factor)* {
return tail.reduce(operatorReducer, head);
}
Group
= _ @Primary _
Primary
= "(" _ @Expression _ ")"
/ "-" expr:Factor { return -expr; }
/ [0-9]+ { return parseInt(text(), 10); }
_ "whitespace"
= [ \t\n\r]*
```

Similar to the previous tutorial, we use `"-" Factor`

instead of `"-" Expression`

to ensure that the exponential operator has higher precedence than the unary operator.

`-2 ^ 2 + 1`

is an example of a math expression where exponentiation should occur first before the unary operation is applied.

As always, let's test this grammar by running some commands.

```
peggy calc.pegjs -t '-2'
# Output: -2
peggy calc.pegjs -t '-(3)'
# Output: -3
peggy calc.pegjs -t '-1 * 2 ^ 3'
# Output: -8
peggy calc.pegjs -t '-2 ^ 2'
# Output: -4
peggy calc.pegjs -t '(-2) ^ 2'
# Output: 4
peggy calc.pegjs -t '2 ^ 2 + 1'
# Output: 5
peggy calc.pegjs -t '-2 ^ 2 + 1'
# Output: -3
```

Awesome! Feel free to test each of these expressions in WolframAlpha if you want to verify all of the outputs.

## Handling Trigonometric Functions

Next, it's time to handle trigonometric functions. Since we don't have access to a tokenizer in PEGs, we have to create a grammar rule for handling "identifiers" or words such as `sin`

, `cos`

, and `tan`

.

If we search the Peggy documentation for `[characters]`

, we'll see that Peggy supports character sets, similar to regular expressions. We've actually been using character sets for handling integer values i.e. `[0-9]*`

. Now, we'd like to use character sets for letters.

```
ID = [a-z]+ { return text(); }
```

This grammar rule says an `id`

is equal to one or more letters. The `+`

means that we should match one or more letters. By using the `text()`

function provided to us by Peggy, we can extract out the whole group of matched text.

We'll then use the `ID`

grammar rule inside the `Primary`

grammar rule.

```
ID
= [a-z]+ { return text(); }
Primary
= "(" _ @Expression _ ")"
/ "-" expr:Factor { return -expr; }
/ id:ID "(" _ expr:Expression _ ")" { return trig(id, expr); }
/ [0-9]+ { return parseInt(text(), 10); }
```

You may have noticed that we're making a function call: `trig(id, exp)`

. Let's create our `trig`

function and add it to the top section of the grammar file.

```
{{
function trig(id, value) {
switch (id) {
case 'sin':
return Math.sin(value);
case 'cos':
return Math.cos(value);
case 'tan':
return Math.tan(value);
default:
throw new Error(`Invalid operation: ${id}`);
}
}
}}
```

A keen reader will notice that this `trig`

function is identical to the one we used in Part 11 of this tutorial series. The Syntax tool let us inject code inside our grammar file as well.

Once we add the `trig`

function, the `calc.pegjs`

should look like the following.

```
{{
function operatorReducer (result, element) {
const left = result;
const right = element[3];
const op = element[1];
switch (op) {
case '+':
return left + right;
case '-':
return left - right;
case '*':
return left * right;
case '/':
return left / right;
case '^':
return left ** right;
default:
throw new Error(`Invalid operation: ${op}`);
}
}
function trig(id, value) {
switch (id) {
case 'sin':
return Math.sin(value);
case 'cos':
return Math.cos(value);
case 'tan':
return Math.tan(value);
default:
throw new Error(`Invalid operation: ${id}`);
}
}
}}
Expression
= head:Term tail:(_ ("+" / "-") _ Term)* {
return tail.reduce(operatorReducer, head);
}
Term
= head:Factor tail:(_ ("*" / "/") _ Factor)* {
return tail.reduce(operatorReducer, head);
}
Factor
= head:Group tail:(_ "^" _ Factor)* {
return tail.reduce(operatorReducer, head);
}
Group
= _ @Primary _
ID
= [a-z]+ { return text(); }
Primary
= "(" _ @Expression _ ")"
/ "-" expr:Factor { return -expr; }
/ id:ID "(" _ expr:Expression _ ")" { return trig(id, expr); }
/ [0-9]+ { return parseInt(text(), 10); }
_ "whitespace"
= [ \t\n\r]*
```

Let's make sure our trigonometric functions behave properly and we handle situations where the user enters an invalid function name.

```
peggy calc.pegjs -t 'cos(0)'
# Output: 1
peggy calc.pegjs -t 'sin(0)'
# Output: 0
peggy calc.pegjs -t 'tan(0)'
# Output: 0
peggy calc.pegjs -t 'pineapple(0)'
# Output: Invalid operation: pineapple
```

## Handling Decimal Values

Similar to Part 11, we would like our math parser to support decimal numbers. We have to think about this problem differently than when we created an LR parser with the Syntax tool.

Since we don't have access to a tokenizer that reads from regular expressions, we'll need to rely on some special symbols that Peggy allows. We can use the `?`

symbol to make a specific token or grammar rule optional. Let's see how we can incorporate decimal values using the `?`

symbol.

```
Decimal
= [0-9]* ("." [0-9]*)?
Primary
= "(" _ @Expression _ ")"
/ "-" expr:Factor { return -expr; }
/ id:ID "(" _ expr:Expression _ ")" { return trig(id, expr); }
/ Decimal { return parseFloat(text(), 10); }
```

We replaced `[0-9]*`

with a new rule called `Decimal`

. Inside this rule, we are saying a decimal value can be zero or more integers followed by an optional group composed of a decimal point `.`

and more integer values. Therefore, the following values should be valid decimal numbers:

```
2
2.
2.5
10.50
.5
.50
```

Since these are all valid decimal values in JavaScript, our semantic actions should work properly when performing arithmetic. Keep in mind that we also changed `parseInt`

to `parseFloat`

in the `Primary`

rule so that we parse the string we get back from the parser as a JavaScript `float`

.

```
{{
function operatorReducer (result, element) {
const left = result;
const right = element[3];
const op = element[1];
switch (op) {
case '+':
return left + right;
case '-':
return left - right;
case '*':
return left * right;
case '/':
return left / right;
case '^':
return left ** right;
default:
throw new Error(`Invalid operation: ${op}`);
}
}
function trig(id, value) {
switch (id) {
case 'sin':
return Math.sin(value);
case 'cos':
return Math.cos(value);
case 'tan':
return Math.tan(value);
default:
throw new Error(`Invalid operation: ${id}`);
}
}
}}
Expression
= head:Term tail:(_ ("+" / "-") _ Term)* {
return tail.reduce(operatorReducer, head);
}
Term
= head:Factor tail:(_ ("*" / "/") _ Factor)* {
return tail.reduce(operatorReducer, head);
}
Factor
= head:Group tail:(_ "^" _ Factor)* {
return tail.reduce(operatorReducer, head);
}
Group
= _ @Primary _
ID
= [a-z]+ { return text(); }
Decimal
= [0-9]* ("." [0-9]*)?
Primary
= "(" _ @Expression _ ")"
/ "-" expr:Factor { return -expr; }
/ id:ID "(" _ expr:Expression _ ")" { return trig(id, expr); }
/ Decimal { return parseFloat(text(), 10); }
_ "whitespace"
= [ \t\n\r]*
```

We can run the following commands to test decimal values with various syntax. The output result might have a different syntax than the input due to JavaScript evaluating the passed string using `parseFloat`

in the semantic action.

```
peggy calc.pegjs -t '2'
# Output: 2
peggy calc.pegjs -t '2.'
# Output: 2
peggy calc.pegjs -t '2.5'
# Output: 2.5
peggy calc.pegjs -t '10.50'
# Output: 10.5
peggy calc.pegjs -t '.5'
# Output: 0.5
peggy calc.pegjs -t '.50'
# Output: 0.5
```

## Testing Various Math Expressions

When building parsers, it's always a good idea to come up with as many math expressions as possible to make sure they behave as expected. Test-driven development (TDD) can be super helpful when building parsers. As you add more features to a grammar, you can keep testing various inputs.

For now, we'll just run a bunch of commands to make sure the expressions return the correct output, but I highly encourage everyone to use a testing library such as Jest or Vitest to test as many math expressions as possible!

Let's make sure our grammar still parses a variety of math expression correctly. Remember to test, test, test!

```
peggy calc.pegjs -t '1'
# Output: 1
peggy calc.pegjs -t ' 2 '
# Output: 2
peggy calc.pegjs -t '1 + 2'
# Output: 3
peggy calc.pegjs -t ' 1 + 2 '
# Output: 3
peggy calc.pegjs -t '1 + 2 * 3'
# Output: 7
peggy calc.pegjs -t '(1 + 2) * 3'
# Output: 9
peggy calc.pegjs -t '5 - 2'
# Output: 3
peggy calc.pegjs -t '5 - 2 - 1'
# Output: 2
peggy calc.pegjs -t '12 / 2 / 3'
# Output: 2
peggy calc.pegjs -t '2 ^ 3 + 1'
# Output: 9
peggy calc.pegjs -t '-2 ^ 2'
# Output: -4
peggy calc.pegjs -t '(-2) ^ 2'
# Output: 4
peggy calc.pegjs -t '-2 ^ 2 + 1'
# Output: -3
peggy calc.pegjs -t 'cos(0) + 3 * -4 / -2 ^ 2'
# Output: 4
```

If you copy and paste this into a terminal, everything should parse correctly! 🎉

## Generating Parsers as JavaScript Modules

Peggy provides multiple ways of generating a parser as a JavaScript module or JavaScript function. We can either use the command line or call `peggy.generate()`

using Peggy's JavaScript API. In this tutorial, we'll keep it simple and use the command line to generate a parser.

To generate a parser with Peggy, using our completed grammar, run the following command:

```
peggy calc.pegjs -o calc-parser.js
```

This command will use our `calc.pegjs`

grammar file to output a JavaScript file called `calc-parser.js`

that will contain our generated parser.

Then, we can create a new JavaScript file called `example.js`

to import our parser and use it to parse mathematical expressions.

```
const parser = require('./calc-parser.js');
const result = parser.parse('1 + 2');
console.log(result); // 3
```

We can execute this script using Node.js by running the following command:

```
node example
```

The output should be result of the math expression. Congratulations! You made a math expression parser using parsing expression grammars (PEG) and the Peggy parser generator! 🎉

## Building Abstract Syntax Trees

In Part 11 of this tutorial series, we updated the semantic actions near each grammar rule to output an AST node instead of computing a result from math expressions. We can accomplish the same task using Peggy. Create a new grammar file called `calc-ast.pegjs`

with the following contents.

```
{{
const functionList = ['sin', 'cos', 'tan'];
function FunctionExpression(id, value) {
if (functionList.includes(id)) {
return {
type: 'Function',
name: id,
value
}
}
}
function BinaryExpression(operator, left, right) {
return {
type: 'BinaryExpression',
operator,
left,
right
}
}
function UnaryExpression(value) {
return {
type: 'UnaryExpression',
value
}
}
function NumericLiteral(value) {
return {
type: 'Number',
value: Number(value)
}
}
function operatorReducer (result, element) {
const left = result;
const right = element[3];
const op = element[1];
return BinaryExpression(op, left, right);
}
}}
Expression
= head:Term tail:(_ ("+" / "-") _ Term)* {
return tail.reduce(operatorReducer, head);
}
Term
= head:Factor tail:(_ ("*" / "/") _ Factor)* {
return tail.reduce(operatorReducer, head);
}
Factor
= head:Group tail:(_ ("^") _ Factor)* {
return tail.reduce(operatorReducer, head);
}
Group
= _ @Primary _
Primary
= ParenthesizedExpression
/ UnaryExpression
/ Function
/ Decimal
Decimal
= _ [0-9]* ('.' [0-9]*)? _ { return NumericLiteral(text()) }
UnaryExpression
= '-' expr:Factor { return UnaryExpression(expr) }
ParenthesizedExpression
= '(' _ @Expression _ ')'
ID
= [a-z]+ { return text(); }
Function
= id:ID expr:ParenthesizedExpression { return FunctionExpression(id, expr) }
_ 'whitespace'
= [ \t\n\r]*
```

Run following command to see the AST for the math expression, `1 + 2 * 3`

.

```
peggy calc-ast.pegjs -t '1 + 2 * 3'
```

We should see the following AST:

```
{
"type": "BinaryExpression",
"operator": "+",
"left": { "type": "Number", "value": 1 },
"right": {
"type": "BinaryExpression",
"operator": "*",
"left": { "type": "Number", "value": 2 },
"right": { "type": "Number", "value": 3 }
}
}
```

Next, we can build a JavaScript module using the following command.

```
peggy calc-ast.pegjs -o calc-ast-parser.js
```

Then, we can create a new JavaScript file called `calc-ast-example.js`

and import our newly created parser.

```
const parser = require('./calc-ast-parser');
const result = parser.parse('1 + 2 * 3');
console.dir(result, { depth: null }); // log nested objects
```

We can execute this script using `node calc-ast-example`

in the command line to see the AST for `1 + 2 * 3`

.

```
{
type: 'BinaryExpression',
operator: '+',
left: { type: 'Number', value: 1 },
right: {
type: 'BinaryExpression',
operator: '*',
left: { type: 'Number', value: 2 },
right: { type: 'Number', value: 3 }
}
}
```

## Evaluating the AST

Since our AST is identical to the one we created in Part 8 of this tutorial series, we can reuse our AST evaluator from that tutorial.

Let's create a new file called `calc-ast-evaluator.js`

and add the following contents.

```
const parser = require('./calc-ast-parser');
class NodeVisitor {
visit(node) {
switch (node.type) {
case 'Number':
return this.visitNumber(node);
case 'BinaryExpression':
return this.visitBinaryExpression(node);
case 'UnaryExpression':
return this.visitUnaryExpression(node);
case 'Function':
return this.visitFunction(node);
}
}
visitNumber(node) {
return node.value;
}
visitBinaryExpression(node) {
switch (node.operator) {
case '+':
return this.visit(node.left) + this.visit(node.right);
case '-':
return this.visit(node.left) - this.visit(node.right);
case '*':
return this.visit(node.left) * this.visit(node.right);
case '/':
return this.visit(node.left) / this.visit(node.right);
case '^':
return this.visit(node.left) ** this.visit(node.right);
default:
throw new Error(`Invalid operation: ${node.operator}`);
}
}
visitUnaryExpression(node) {
return -this.visit(node.value);
}
visitFunction(node) {
switch (node.name) {
case 'sin':
return Math.sin(this.visit(node.value));
case 'cos':
return Math.cos(this.visit(node.value));
case 'tan':
return Math.tan(this.visit(node.value));
default:
throw new Error(`Invalid function: ${node.name}`);
}
}
}
const input = '1 + 2 * 3';
const ast = parser.parse(input);
const visitor = new NodeVisitor();
const result = visitor.visit(ast);
console.log(result); // 7
```

After running `node calc-ast-evaluator`

, we should get the result of `7`

. The power of this approach is that it lets the user, who imports the library, decide how they want to use the AST. Inside each "visitor" function (i.e. `visitNumber`

, `visitBinaryExpression`

, etc.), we could have done something else besides evaluating the math expression. It gives more freedom to the user on how they want to use the AST.

## Conclusion

In this tutorial, we built a math expression parser from a parsing expression grammar (PEG) using the Peggy tool. We compared the similarities and differences between a BNF-like grammar used by the Syntax tool and a PEG used by Peggy.

PEG-based parsers (packrat parsers) can help solve problems that LL and LR parsers suffer from. As mentioned earlier, programming languages such as Python may find them suitable for parsing certain syntax faster. Even problems related to parsing nested parentheses might be easier with PEG-based parsers.

The parsing expression grammars we made in this tutorial can actually help serve as a model for the way developers write recursive descent parsers. In the next tutorial, we'll learn how to build our own recursive descent parser from scratch. No more cheating by using parser generators. It's time to manually create a parser again using JavaScript! See you there! ⭐