How to Build a Pratt Parser for Math Expressions in JavaScript

Published: Monday, March 27, 2023

Greetings, friends! This is my sixteenth post in the math expression parser series. In Part 14, we learned how to make a parser by hand using the recursive descent algorithm. When building a math expression parser, recursive descent tends to get tedious as we handle more operators with varying operator precedence and associativity. In this tutorial, we'll learn how to build an amazing Pratt parser in JavaScript to make things easier! By the end of this tutorial, you'll be able to parse anything! Let's get started!

What is a Pratt Parser?

A Pratt parser is a type of top-down operator precedence parser that uses recursive descent together with operator precedence. Think of it as a combination of recursive descent and the shunting yard algorithm.

The "Pratt parser" algorithm was created by the computer scientist, Vaughan Pratt in the 1973 paper "Top down operator precedence", based on the recursive descent algorithm. This algorithm is implemented in multiple compilers and interpreters including the popular Clang compiler used for the C and C++ programming languages.

Why make a Pratt Parser?

Pratt parsers makes it super easy to handle mathematical expressions compared to regular recursive descent parsers, and they might even perform better than them. A lot of people prefer writing Pratt parsers whenever they need to make a parser because they provide a lot of flexibility in regards to error handling and adding new features or operators.

When we designed a recursive descent parser in Part 14, it was very awkward dealing with unary expressions, exponentiation, operator precedence, and associativity. In Pratt parsers, we define a precedence for each operator, and the algorithm automatically takes care of everything! It's truly a beautiful and elegant algorithm.

Building a Tokenizer for Pratt Parsers

Similar to the tutorial on recursive descent parsers, we're going to build a Pratt parser completely from scratch. However, we can reuse the updated tokenizer we made in Part 6 of this tutorial series to help save some time. Please consult Part 5 of this tutorial series to learn how we constructed this tokenizer.

Create a new file called Tokenizer.js with the following contents:

js
Copied! ⭐️
const TokenTypes = {
  NUMBER: 'NUMBER',
  IDENTIFIER: 'IDENTIFIER',
  ADDITION: '+',
  SUBTRACTION: '-',
  MULTIPLICATION: '*',
  DIVISION: '/',
  EXPONENTIATION: '^',
  PARENTHESIS_LEFT: '(',
  PARENTHESIS_RIGHT: ')'
};

const TokenSpec = [
  [/^\s+/, null],
  [/^(?:\d+(?:\.\d*)?|\.\d+)/, TokenTypes.NUMBER],
  [/^[a-z]+/, TokenTypes.IDENTIFIER],
  [/^\+/, TokenTypes.ADDITION],
  [/^\-/, TokenTypes.SUBTRACTION],
  [/^\*/, TokenTypes.MULTIPLICATION],
  [/^\//, TokenTypes.DIVISION],
  [/^\^/, TokenTypes.EXPONENTIATION],
  [/^\(/, TokenTypes.PARENTHESIS_LEFT],
  [/^\)/, TokenTypes.PARENTHESIS_RIGHT]
];

class Tokenizer {
  constructor(input) {
    this.input = input;
    this.cursor = 0;
  }

  hasMoreTokens() {
    return this.cursor < this.input.length;
  }

  match(regex, inputSlice) {
    const matched = regex.exec(inputSlice);
    if (matched === null) {
      return null;
    }

    this.cursor += matched[0].length;
    return matched[0];
  }

  getNextToken() {
    if (!this.hasMoreTokens()) {
      return null;
    }

    const inputSlice = this.input.slice(this.cursor);

    for (let [regex, type] of TokenSpec) {
      const tokenValue = this.match(regex, inputSlice);

      // No rule was matched!
      if (tokenValue === null) {
        continue;
      }

      // Skip whitespace!
      if (type === null) {
        return this.getNextToken();
      }

      return {
        type,
        value: tokenValue,
      };
    }

    throw new SyntaxError(`Unexpected token: "${inputSlice[0]}"`);
  }

  printAllTokens() {
    let token;
    while ((token = this.getNextToken())) {
      console.log(token);
    }
  }
}

module.exports = {
  TokenTypes,
  Tokenizer
}

Starting our Pratt Parser

Now that we have a tokenizer, it's time to start building a Pratt parser. I know you're excited! Similar to Part 14, we'll create a grammar "blueprint" to serve as a model for developing our Pratt parser.

Let's create a simple grammar that returns a "prefix" expression which only derives to a NUMBER token, representing a decimal number.

js
Copied! ⭐️
Expression
  = Prefix

Prefix
  = NUMBER

Notice how we're using Prefix instead of Primary (which we used in previous tutorials). We'll see why soon.

Let's see how we would create a Pratt parser that follows this "blueprint". Create a new file called Parser.js with the following contents.

js
Copied! ⭐️
const { TokenTypes, Tokenizer } = require('./Tokenizer')

class Parser {
  parse(input) {
    this.input = input
    this.tokenizer = new Tokenizer(input)
    this.lookahead = this.tokenizer.getNextToken()

    return this.Expression()
  }

  // Expect a particular token, consume/eat it, and move to the next token
  eat(tokenType) {
    const token = this.lookahead

    if (token == null) {
      throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
    }

    if (token.type !== tokenType) {
      throw new SyntaxError(
        `Unexpected token: "${token.value}", expected "${tokenType}"`
      )
    }

    // Advance to the next token
    this.lookahead = this.tokenizer.getNextToken()

    return token
  }

  /**
   * Expression
   *    = Prefix
   */
  Expression() {
    return this.Prefix()
  }

  /**
   * Prefix
   *    = NUMBER
   */
  Prefix() {
    const token = this.eat(TokenTypes.NUMBER)
    return Number(token.value)
  }
}

const parser = new Parser()
const result = parser.parse('2')
console.log(result) // 2

So far, everything looks very similar to a recursive descent parser. Let's now move onto implementing mathematical operators such as addition and multiplication to see how Pratt parsers truly shine ✨

Prefix vs Infix and NUD vs LED

The reason why we used Prefix instead of Primary is because Pratt parsers will perform operations differently depending on whether the expression is a "prefix" expression or an "infix" expression.

In the original paper by Vaughan Pratt, he used the terms "nud" and "led" instead of "prefix" and "infix", respectively.

A "nud" refers to a "null denotation" which means an expression that doesn't care about tokens to the left. Examples of a "nud" include numbers, unary expressions, parenthesized expressions, and prefix operations (i.e. ++x). In this tutorial, we'll use the term, "prefix", instead of "nud".

A "led" refers to a "left denotation" which means an expression that does care about tokens to the left and tokens to the right. Examples of a "led" include binary expressions and postfix operations (i.e. x++). In this tutorial, we'll use the term, "infix", instead of "led". If we decide to ever add postfix operations later, we would still use the Infix method despite the name.

The main reason why we care about prefix and infix/postfix operations is because of how some mathematical symbols can be used for multiple operations. The most popular example is the - symbol that can be used for both unary expressions and subtraction. A unary expression would be categorized as a "prefix" operation and subtraction would be an "infix" operation.

Notice how the - symbol occurs at the beginning of the unary expression, -2. Nothing appears to the left of the - symbol, so it belongs to the "prefix" or "nud" group.

js
Copied! ⭐️
// Before a number
-2

// Before an expression
-(1 + 2)

For subtraction, the - symbol occurs between two numbers or between two expressions, so it belongs to the "infix" or "led" group.

js
Copied! ⭐️
// Between two numbers
5 - 2

// Between two expressions
(3 + 4) - (1 + 2)

By grouping expressions in a "prefix" and "infix" group, it'll help keep our grammar blueprint simple as well as the code for our parser.

Handling Addition and Multiplication

Let's adjust our grammar blueprint to handle addition and multiplication using a new Infix rule.

js
Copied! ⭐️
Expression
  = Prefix (Infix)*

Prefix
  = NUMBER

Infix
  = ("+" / "*") Expression

As you may recall, this grammar uses EBNF-like syntax which means we can use the * after a group to refer to zero or more occurrences of a grammar rule i.e. (expression)*.

The grammar above is saying that Expression will be a Prefix expression followed by zero or more Infix expressions. Each Infix expression will be zero or more occurrences of a math operator followed by another expression.

Let's look at some examples of how a math expression might get derived. We'll start with a single number, 2.

js
Copied! ⭐️
Expression → Prefix (Infix)*
    → Prefix
NUMBER
2

Next, we'll look at the math expression, 1 + 2.

js
Copied! ⭐️
Expression → Prefix (Infix)*
    → Prefix Infix
Prefix ("+" Expression)
Prefix ("+" Prefix)
    → Prefix + Prefix
NUMBER + NUMBER
1 + 2

Now, a longer math expression: 1 + 2 * 3.

js
Copied! ⭐️
Expression → Prefix (Infix)*
    → Prefix Infix Infix
Prefix ("+" Expression) ("*" Expression)
Prefix ("+" Prefix) ("*" Prefix)
    → Prefix + Prefix * Prefix
NUMBER + NUMBER * NUMBER
1 + 2 * 3

We'll place the grammar rules above each method so that we can keep track of where we are in the grammar during the recursive descent. Inside the Parser.js, update the contents with the following.

js
Copied! ⭐️
const { TokenTypes, Tokenizer } = require('./Tokenizer')

class Parser {
  parse(input) {
    this.input = input
    this.tokenizer = new Tokenizer(input)
    this.lookahead = this.tokenizer.getNextToken()
    this.operators = {
      '*': 3,
      '+': 2
    }

    return this.Expression()
  }

  eat(tokenType) {
    const token = this.lookahead

    if (token == null) {
      throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
    }

    if (token.type !== tokenType) {
      throw new SyntaxError(
        `Unexpected token: "${token.value}", expected "${tokenType}"`
      )
    }

    this.lookahead = this.tokenizer.getNextToken()

    return token
  }

  getPrecedence(token) {
    if (token?.type) {
      return this.operators[token.value]
    }

    return 0
  }

  /**
   * Expression
   *    = Prefix (Infix)*
   */
  Expression(prec = 0) {
    let left = this.Prefix()

    while (prec < this.getPrecedence(this.lookahead)) {
      left = this.Infix(left, this.lookahead?.type)
    }

    return left
  }

  /**
   * Prefix
   *    = NUMBER
   */
  Prefix() {
    const token = this.eat(TokenTypes.NUMBER)
    return Number(token.value)
  }

  /**
   * Infix
   *    = ("+" / "*") Expression
   */
  Infix(left, operatorType) {
    let token = this.eat(operatorType)
    let newPrec = this.operators[token.value] // newPrec is the new precedence we pass to the "Expression" method
    switch (token.type) {
      case TokenTypes.ADDITION:
        return left + this.Expression(newPrec)
      case TokenTypes.MULTIPLICATION:
        return left * this.Expression(newPrec)
    }
  }
}

const parser = new Parser()
const result = parser.parse('1 + 2 * 3')
console.log(result) // 7

We can run this code using the following command.

shell
Copied! ⭐️
node Parser
# Output: 7

Let's walk through what's happening for the math expression, 1 + 2 * 3. We should have the following tokens from the tokenizer.

js
Copied! ⭐️
{ type: 'NUMBER', value: '1'}
{ type: '+', value: '+'}
{ type: 'NUMBER', value: '2'}
{ type: '*', value: '*'}
{ type: 'NUMBER', value: '3'}

At the very start of the parser, we call the parse method that will proceed to call this.Expression(). This begins our recursive descent through the Prefix and Infix methods.

The Prefix method will consume the 1 token and proceed to go to the following code inside the Expression method:

js
Copied! ⭐️
while (prec < this.getPrecedence(this.lookahead)) {
  left = this.Infix(left, this.lookahead?.type)
}

This is where the magic happens inside a Pratt parser. At the beginning, the precedence, prec, is equal to zero. Then, we compare it to the precedence of the next token, +. The getPrecedence method is a utility method for retrieving the precedence of an operator.

In the parse method, we added a new line to define some mathematical operators and their respective precedence.

js
Copied! ⭐️
this.operators = {
  '*': 3,
  '+': 2
}

We enter the while loop because 0 < 2 where 2 refers to the precedence of addition. Then, we run the Infix method which will perform addition and recursively call this.Expression again BUT with a new precedence!

The while loop will run again for multiplication because 0 < 3. Eventually, we reach the end of the math expression, so the tokenizer returns a null token.

The getPrecedence method will return a value of 0 when there's a null token which ends the while loop in the Expression method. This is why we call the getPrecedence method instead of reaching out to the operators object we defined in the parse method.

This all seems elaborate compared to the regular recursive descent parser, but please continue reading to be impressed!

Handling Subtraction and Division

Let's update our grammar blueprint to include subtraction and division.

js
Copied! ⭐️
Expression
  = Prefix (Infix)*

Prefix
  = NUMBER

Infix
  = ("+" / "-" / "*" / "/") Expression

Implementing these operations will be super easy in our parser code. We just need to add subtraction and division precedence in the operators object inside the parse method.

js
Copied! ⭐️
this.operators = {
  '*': 3,
  '/': 3,
  '+': 2,
  '-': 2
}

Then, we need to add subtraction and division to our Infix method.

js
Copied! ⭐️
Infix(left, operatorType) {
  let token = this.eat(operatorType)
  let newPrec = this.operators[token.value]
  switch (token.type) {
    case TokenTypes.ADDITION:
      return left + this.Expression(newPrec)
    case TokenTypes.SUBTRACTION:
      return left - this.Expression(newPrec)
    case TokenTypes.MULTIPLICATION:
      return left * this.Expression(newPrec)
    case TokenTypes.DIVISION:
      return left / this.Expression(newPrec)
  }
}

Let's look at the code for the parser so far and add an example of subtraction and division at the bottom of the code.

js
Copied! ⭐️
const { TokenTypes, Tokenizer } = require('./Tokenizer')

class Parser {
  parse(input) {
    this.input = input
    this.tokenizer = new Tokenizer(input)
    this.lookahead = this.tokenizer.getNextToken()
    this.operators = {
      '*': 3,
      '/': 3,
      '+': 2,
      '-': 2
    }

    return this.Expression()
  }

  eat(tokenType) {
    const token = this.lookahead

    if (token == null) {
      throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
    }

    if (token.type !== tokenType) {
      throw new SyntaxError(
        `Unexpected token: "${token.value}", expected "${tokenType}"`
      )
    }

    this.lookahead = this.tokenizer.getNextToken()

    return token
  }

  getPrecedence(token) {
    if (token?.type) {
      return this.operators[token.value]
    }

    return 0
  }

  /**
   * Expression
   *    = Prefix (Infix)*
   */
  Expression(prec = 0) {
    let left = this.Prefix()

    while (prec < this.getPrecedence(this.lookahead)) {
      left = this.Infix(left, this.lookahead?.type)
    }

    return left
  }

  /**
   * Prefix
   *    = NUMBER
   */
  Prefix() {
    const token = this.eat(TokenTypes.NUMBER)
    return Number(token.value)
  }

  /**
   * Infix
   *    = ("+" / "-" / "*" / "/") Expression
   */
  Infix(left, operatorType) {
    let token = this.eat(operatorType)
    let newPrec = this.operators[token.value]
    switch (token.type) {
      case TokenTypes.ADDITION:
        return left + this.Expression(newPrec)
      case TokenTypes.SUBTRACTION:
        return left - this.Expression(newPrec)
      case TokenTypes.MULTIPLICATION:
        return left * this.Expression(newPrec)
      case TokenTypes.DIVISION:
        return left / this.Expression(newPrec)
    }
  }
}

const parser = new Parser()
console.log(parser.parse('5 - 2 - 1')) // 2
console.log(parser.parse('12 / 2 / 3')) // 2
console.log(parser.parse('1 + 2 * 3 - 4 / 2')) // 5

When we run this code using the command, node Parser, we should see subtraction and division working correctly.

Handling Exponentiation

Let's now include exponentiation in our grammar file. Notice how our grammar is barely changing compared to when we created grammars in previous tutorials. Pratt parsers make everything better!

js
Copied! ⭐️
Expression
  = Prefix (Infix)*

Prefix
  = NUMBER

Infix
  = ("+" / "-" / "*" / "/" / "^") Expression

In our parser code, we'll add the ^ operator to the operators object inside the parse method. Exponentiation should have the highest precedence.

js
Copied! ⭐️
this.operators = {
  '^': 4,
  '*': 3,
  '/': 3,
  '+': 2,
  '-': 2
}

Here comes the fun part! Exponentiation is an operator that has right-associativity instead of left-associativity. This means that the expression, 2 ^ 2 ^ 3 should return a value of 256.

text
Copied! ⭐️
2 ^ 2 ^ 3 = 2 ^ (2 ^ 3) = 2 ^ 8 = 256

Next, we need to add the exponentiation operator to our Infix method. But wait! Notice how we're subtracting one from the newPrec variable before passing it into the Expression method.

js
Copied! ⭐️
Infix(left, operatorType) {
  let token = this.eat(operatorType)
  let newPrec = this.operators[token.value]
  switch (token.type) {
    case TokenTypes.ADDITION:
      return left + this.Expression(newPrec)
    case TokenTypes.SUBTRACTION:
      return left - this.Expression(newPrec)
    case TokenTypes.MULTIPLICATION:
      return left * this.Expression(newPrec)
    case TokenTypes.DIVISION:
      return left / this.Expression(newPrec)
    case TokenTypes.EXPONENTIATION:
      return left ** this.Expression(newPrec - 1)
  }
}

To handle right associativity, we just need to subtract one from a precedence we pass into the Expression method! Yep, that's it! No rearranging the grammar and struggling to get everything working. I think I got a few gray hairs when making this tutorial series due to the exponentiation operation fighting so much with the unary operation. After making Pratt parsers, my hair has now returned to its original color and glamour ✨

Joking aside, let's look at the code for the parser so far. We'll also add some examples at the bottom of the code to make sure the exponentiation operator is working correctly.

js
Copied! ⭐️
const { TokenTypes, Tokenizer } = require('./Tokenizer')

class Parser {
  parse(input) {
    this.input = input
    this.tokenizer = new Tokenizer(input)
    this.lookahead = this.tokenizer.getNextToken()
    this.operators = {
      '^': 4,
      '*': 3,
      '/': 3,
      '+': 2,
      '-': 2
    }

    return this.Expression()
  }

  eat(tokenType) {
    const token = this.lookahead

    if (token == null) {
      throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
    }

    if (token.type !== tokenType) {
      throw new SyntaxError(
        `Unexpected token: "${token.value}", expected "${tokenType}"`
      )
    }

    this.lookahead = this.tokenizer.getNextToken()

    return token
  }

  getPrecedence(token) {
    if (token?.type) {
      return this.operators[token.value]
    }

    return 0
  }

  /**
   * Expression
   *    = Prefix (Infix)*
   */
  Expression(prec = 0) {
    let left = this.Prefix()

    while (prec < this.getPrecedence(this.lookahead)) {
      left = this.Infix(left, this.lookahead?.type)
    }

    return left
  }

  /**
   * Prefix
   *    = NUMBER
   */
  Prefix() {
    const token = this.eat(TokenTypes.NUMBER)
    return Number(token.value)
  }

  /**
   * Infix
   *    = ("+" / "-" / "*" / "/" / "^") Expression
   */
  Infix(left, operatorType) {
    let token = this.eat(operatorType)
    let newPrec = this.operators[token.value]
    switch (token.type) {
      case TokenTypes.ADDITION:
        return left + this.Expression(newPrec)
      case TokenTypes.SUBTRACTION:
        return left - this.Expression(newPrec)
      case TokenTypes.MULTIPLICATION:
        return left * this.Expression(newPrec)
      case TokenTypes.DIVISION:
        return left / this.Expression(newPrec)
      case TokenTypes.EXPONENTIATION:
        return left ** this.Expression(newPrec - 1)
    }
  }
}

const parser = new Parser()
console.log(parser.parse('2 ^ 2')) // 4
console.log(parser.parse('2 ^ 2 ^ 3')) // 256

When we run this code using the command, node Parser, we should see exponentiation working correctly.

Handling Parentheses

Alright, we've handled all the infix operations, so now we just need to add the prefix operations. We'll start with handling parentheses. Let's adjust our grammar blueprint to include a rule for ParenthesizedExpression inside the Prefix rule.

js
Copied! ⭐️
Expression
  = Prefix (Infix)*

Prefix
  = ParenthesizedExpression
  / NUMBER

Infix
  = ("+" / "-" / "*" / "/" / "^") Expression

ParenthesizedExpression
  = "(" Expression ")"

We'll adjust the Prefix method to call a new method, ParenthesizedExpression, when we encounter a left parenthesis (.

js
Copied! ⭐️
Prefix() {
  if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT) {
    return this.ParenthesizedExpression()
  }

  const token = this.eat(TokenTypes.NUMBER)
  return Number(token.value)
}

Then, we'll define the ParenthesizedExpression in the Parser class and insert the following contents.

js
Copied! ⭐️
ParenthesizedExpression() {
  this.eat(TokenTypes.PARENTHESIS_LEFT)
  const expression = this.Expression()
  this.eat(TokenTypes.PARENTHESIS_RIGHT)
  return expression
}

We consume a ( token, run the Expression method (which defaults to precedence of zero when no parameter is given), and then consume a ) token.

Our parser code should now look like the following.

js
Copied! ⭐️
const { TokenTypes, Tokenizer } = require('./Tokenizer')

class Parser {
  parse(input) {
    this.input = input
    this.tokenizer = new Tokenizer(input)
    this.lookahead = this.tokenizer.getNextToken()
    this.operators = {
      '^': 4,
      '*': 3,
      '/': 3,
      '+': 2,
      '-': 2
    }

    return this.Expression()
  }

  eat(tokenType) {
    const token = this.lookahead

    if (token == null) {
      throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
    }

    if (token.type !== tokenType) {
      throw new SyntaxError(
        `Unexpected token: "${token.value}", expected "${tokenType}"`
      )
    }

    this.lookahead = this.tokenizer.getNextToken()

    return token
  }

  getPrecedence(token) {
    if (token?.type) {
      return this.operators[token.value]
    }

    return 0
  }

  /**
   * Expression
   *    = Prefix (Infix)*
   */
  Expression(prec = 0) {
    let left = this.Prefix()

    while (prec < this.getPrecedence(this.lookahead)) {
      left = this.Infix(left, this.lookahead?.type)
    }

    return left
  }

  /**
   * Prefix
   *    = ParenthesizedExpression
   *    / NUMBER
   */
  Prefix() {
    if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT) {
      return this.ParenthesizedExpression()
    }
  
    const token = this.eat(TokenTypes.NUMBER)
    return Number(token.value)
  }

  /**
   * Infix
   *    = ("+" / "-" / "*" / "/" / "^") Expression
   */
  Infix(left, operatorType) {
    let token = this.eat(operatorType)
    let newPrec = this.operators[token.value]
    switch (token.type) {
      case TokenTypes.ADDITION:
        return left + this.Expression(newPrec)
      case TokenTypes.SUBTRACTION:
        return left - this.Expression(newPrec)
      case TokenTypes.MULTIPLICATION:
        return left * this.Expression(newPrec)
      case TokenTypes.DIVISION:
        return left / this.Expression(newPrec)
      case TokenTypes.EXPONENTIATION:
        return left ** this.Expression(newPrec - 1)
    }
  }

  /**
   * ParenthesizedExpression
   *    = "(" Expression ")"
   */
  ParenthesizedExpression() {
    this.eat(TokenTypes.PARENTHESIS_LEFT)
    const expression = this.Expression()
    this.eat(TokenTypes.PARENTHESIS_RIGHT)
    return expression
  }
}

const parser = new Parser()
console.log(parser.parse('(1 + 2) * 3')) // 9
console.log(parser.parse('(2 ^ 2) ^ 3')) // 64

Handling Unary Operations

It's time to add the dreaded unary operations to our parser. Let's update our grammar to include a UnaryExpression rule inside the Prefix rule.

js
Copied! ⭐️
Expression
  = Prefix (Infix)*

Prefix
  = ParenthesizedExpression
  / UnaryExpression
  / NUMBER

Infix
  = ("+" / "-" / "*" / "/" / "^") Expression

ParenthesizedExpression
  = "(" Expression ")"

UnaryExpression
  = "-" Expression

Notice that we're using "-" Expression in our grammar rule for UnaryExpression this time. In previous tutorials, we were using "-" Factor. Since the Expression method in our Pratt parser will automatically deal with operator precedence, we can safely use "-" Expression in our grammar.

As mentioned earlier in this tutorial, the unary operator uses the same symbol, -, as subtraction. However, a unary operation technically has higher precedence than multiplication/division but lower precedence than exponentiation. Therefore, we need to update our operators object to handle a new unary operator.

js
Copied! ⭐️
this.operators = {
  '^': 5,
  unary: 4,
  '*': 3,
  '/': 3,
  '+': 2,
  '-': 2
}

If we weren't using Pratt parsers, making an adjustment to existing parser code to handle unary operations would be a huge pain. Luckily, you're reading this amazing tutorial and building a Pratt parser instead.

We now need to adjust the Prefix method to call a new method, UnaryExpression, when we encounter a subtraction token, -.

js
Copied! ⭐️
Prefix() {
  if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT) {
    return this.ParenthesizedExpression()
  }
  
  if (this.lookahead.type === TokenTypes.SUBTRACTION) {
    return this.UnaryExpression()
  }

  const token = this.eat(TokenTypes.NUMBER)
  return Number(token.value)
}

Next, we'll define the UnaryExpression method in the Parser class and insert the following contents.

js
Copied! ⭐️
UnaryExpression() {
  this.eat(TokenTypes.SUBTRACTION)
  return -this.Expression(this.getPrecedence('unary'))
}

Notice how we pass the string, unary, into our getPrecedence method. Currently, this method expects an object with a type property. Otherwise, it will return zero. Let's add a conditional statement to check if the received token is a "virtual" unary token.

js
Copied! ⭐️
getPrecedence(token) {
  if (token === 'unary') {
    return this.operators.unary;
  }

  if (token?.type) {
    return this.operators[token.value]
  }

  return 0
}

Our parser code should now look like the following.

js
Copied! ⭐️
const { TokenTypes, Tokenizer } = require('./Tokenizer')

class Parser {
  parse(input) {
    this.input = input
    this.tokenizer = new Tokenizer(input)
    this.lookahead = this.tokenizer.getNextToken()
    this.operators = {
      '^': 5,
      unary: 4,
      '*': 3,
      '/': 3,
      '+': 2,
      '-': 2
    }

    return this.Expression()
  }

  eat(tokenType) {
    const token = this.lookahead

    if (token == null) {
      throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
    }

    if (token.type !== tokenType) {
      throw new SyntaxError(
        `Unexpected token: "${token.value}", expected "${tokenType}"`
      )
    }

    this.lookahead = this.tokenizer.getNextToken()

    return token
  }

  getPrecedence(token) {
    if (token === 'unary') {
      return this.operators.unary;
    }

    if (token?.type) {
      return this.operators[token.value]
    }

    return 0
  }

  /**
   * Expression
   *    = Prefix (Infix)*
   */
  Expression(prec = 0) {
    let left = this.Prefix()

    while (prec < this.getPrecedence(this.lookahead)) {
      left = this.Infix(left, this.lookahead?.type)
    }

    return left
  }

  /**
   * Prefix
   *    = ParenthesizedExpression
   *    / UnaryExpression
   *    / NUMBER
   */
  Prefix() {
    if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT) {
      return this.ParenthesizedExpression()
    }

    if (this.lookahead.type === TokenTypes.SUBTRACTION) {
      return this.UnaryExpression()
    }

    const token = this.eat(TokenTypes.NUMBER)
    return Number(token.value)
  }

  /**
   * Infix
   *    = ("+" / "-" / "*" / "/" / "^") Expression
   */
  Infix(left, operatorType) {
    let token = this.eat(operatorType)
    let newPrec = this.operators[token.value]
    switch (token.type) {
      case TokenTypes.ADDITION:
        return left + this.Expression(newPrec)
      case TokenTypes.SUBTRACTION:
        return left - this.Expression(newPrec)
      case TokenTypes.MULTIPLICATION:
        return left * this.Expression(newPrec)
      case TokenTypes.DIVISION:
        return left / this.Expression(newPrec)
      case TokenTypes.EXPONENTIATION:
        return left ** this.Expression(newPrec - 1)
    }
  }

  /**
   * ParenthesizedExpression
   *    = "(" Expression ")"
   */
  ParenthesizedExpression() {
    this.eat(TokenTypes.PARENTHESIS_LEFT)
    const expression = this.Expression()
    this.eat(TokenTypes.PARENTHESIS_RIGHT)
    return expression
  }

  /**
   * UnaryExpression
   *    = "-" Expression
   */
  UnaryExpression() {
    this.eat(TokenTypes.SUBTRACTION)
    return -this.Expression(this.getPrecedence('unary'))
  }
}

const parser = new Parser()
console.log(parser.parse('-2')) // -2
console.log(parser.parse('-2 + 3')) // 1
console.log(parser.parse('-2 ^ 2 + 1')) // -3
console.log(parser.parse('-(2 * 2)')) // -4
console.log(parser.parse('- - 2')) // 2
console.log(parser.parse('- - - 2')) // -2

Notice the examples we added at the bottom of the code. The unary operation should have lower precedence than exponentiation, and we can use multiple - symbols to stack unary operations.

Handling Trigonometric Functions

Finally, let's add support for trigonometric functions to our parser. We'll update our grammar to include a FunctionExpression inside the Prefix rule.

js
Copied! ⭐️
Expression
  = Prefix (Infix)*

Prefix
  = ParenthesizedExpression
  / UnaryExpression
  / FunctionExpression
  / NUMBER

Infix
  = ("+" / "-" / "*" / "/" / "^") Expression

ParenthesizedExpression
  = "(" Expression ")"

UnaryExpression
  = "-" Expression

FunctionExpression
  = IDENTIFIER ParenthesizedExpression

An example of a FunctionExpression would be cos(3.14) where cos is an IDENTIFIER token and (3.14) is a ParenthesizedExpression.

Let's adjust the Prefix method to call FunctionExpression when we encounter an IDENTIFIER token.

js
Copied! ⭐️
Prefix() {
  if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT) {
    return this.ParenthesizedExpression()
  }
  
  if (this.lookahead.type === TokenTypes.SUBTRACTION) {
    return this.UnaryExpression()
  }

  if (this.lookahead.type === TokenTypes.IDENTIFIER) {
    return this.FunctionExpression()
  } 

  const token = this.eat(TokenTypes.NUMBER)
  return Number(token.value)
}

Next, we'll define the FunctionExpression method in the Parser class and insert the following contents.

js
Copied! ⭐️
FunctionExpression() {
  const id = this.eat(TokenTypes.IDENTIFIER).value
  const expression = this.ParenthesizedExpression()
  return this.trig(id, expression)
}

This method calls a new function called trig that will evaluate the trigonometric functions: sin, cos, and tan. This trig function will be identical to the function we used in the previous tutorials.

js
Copied! ⭐️
trig(id, value) {
    switch (id) {
      case 'sin':
        return Math.sin(value)
      case 'cos':
        return Math.cos(value)
      case 'tan':
        return Math.tan(value)
      default:
        throw new Error(`Invalid operation: ${id}`)
    }
  }

After making the above changes, our Parser.js file should look similar to the following.

js
Copied! ⭐️
const { TokenTypes, Tokenizer } = require('./Tokenizer')

class Parser {
  parse(input) {
    this.input = input
    this.tokenizer = new Tokenizer(input)
    this.lookahead = this.tokenizer.getNextToken()
    this.operators = {
      '^': 5,
      unary: 4,
      '*': 3,
      '/': 3,
      '+': 2,
      '-': 2
    }

    return this.Expression()
  }

  eat(tokenType) {
    const token = this.lookahead

    if (token == null) {
      throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
    }

    if (token.type !== tokenType) {
      throw new SyntaxError(
        `Unexpected token: "${token.value}", expected "${tokenType}"`
      )
    }

    this.lookahead = this.tokenizer.getNextToken()

    return token
  }

  getPrecedence(token) {
    if (token === 'unary') {
      return this.operators.unary
    }

    if (token?.type) {
      return this.operators[token.value]
    }

    return 0
  }

  trig(id, value) {
    switch (id) {
      case 'sin':
        return Math.sin(value)
      case 'cos':
        return Math.cos(value)
      case 'tan':
        return Math.tan(value)
      default:
        throw new Error(`Invalid operation: ${id}`)
    }
  }

  /**
   * Expression
   *    = Prefix (Infix)*
   */
  Expression(prec = 0) {
    let left = this.Prefix()

    while (prec < this.getPrecedence(this.lookahead)) {
      left = this.Infix(left, this.lookahead?.type)
    }

    return left
  }

  /**
   * Prefix
   *    = ParenthesizedExpression
   *    / UnaryExpression
   *    / FunctionExpression
   *    / NUMBER
   */
  Prefix() {
    if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT) {
      return this.ParenthesizedExpression()
    }

    if (this.lookahead.type === TokenTypes.SUBTRACTION) {
      return this.UnaryExpression()
    }

    if (this.lookahead.type === TokenTypes.IDENTIFIER) {
      return this.FunctionExpression()
    }

    const token = this.eat(TokenTypes.NUMBER)
    return Number(token.value)
  }

  /**
   * Infix
   *    = ("+" / "-" / "*" / "/" / "^") Expression
   */
  Infix(left, operatorType) {
    let token = this.eat(operatorType)
    let newPrec = this.operators[token.value]
    switch (token.type) {
      case TokenTypes.ADDITION:
        return left + this.Expression(newPrec)
      case TokenTypes.SUBTRACTION:
        return left - this.Expression(newPrec)
      case TokenTypes.MULTIPLICATION:
        return left * this.Expression(newPrec)
      case TokenTypes.DIVISION:
        return left / this.Expression(newPrec)
      case TokenTypes.EXPONENTIATION:
        return left ** this.Expression(newPrec - 1)
    }
  }

  /**
   * ParenthesizedExpression
   *    = "(" Expression ")"
   */
  ParenthesizedExpression() {
    this.eat(TokenTypes.PARENTHESIS_LEFT)
    const expression = this.Expression()
    this.eat(TokenTypes.PARENTHESIS_RIGHT)
    return expression
  }

  /**
   * UnaryExpression
   *    = "-" Expression
   */
  UnaryExpression() {
    this.eat(TokenTypes.SUBTRACTION)
    return -this.Expression(this.getPrecedence('unary'))
  }

  /**
   * FunctionExpression
   *    = IDENTIFIER ParenthesizedExpression
   */
  FunctionExpression() {
    const id = this.eat(TokenTypes.IDENTIFIER).value
    const expression = this.ParenthesizedExpression()
    return this.trig(id, expression)
  }
}

We can test the trigonometric functions by adding the following code at the bottom of the Parser.js file and running the command, node Parser in the command line.

js
Copied! ⭐️
const parser = new Parser()
console.log(parser.parse('sin(0)')) // 0
console.log(parser.parse('cos(0)')) // 1
console.log(parser.parse('tan(0)')) // 0

Testing the Pratt Parser

Finally, our math expression Pratt parser is fully complete! Let's run some tests to make sure our Pratt parser passes the same tests as the recursive descent parser we built in Part 14 of this tutorial series.

Add the following line at the end of our Parser.js file to export our Parser class.

js
Copied! ⭐️
module.exports = { Parser }

Then, we'll reuse the test.js file from Part 14.

js
Copied! ⭐️
const assert = require('assert').strict
const { Parser } = require('./Parser')

const mathTests = [
  ['1', 1],
  [' 2 ', 2],
  ['1 + 2', 3],
  [' 1 + 2 ', 3],
  ['1 + 2 * 3', 7],
  ['(1 + 2) * 3', 9],
  ['5 - 2', 3],
  ['5 - 2 - 1', 2],
  ['12 / 2 / 3', 2],
  ['2 ^ 3 + 1', 9],
  ['-2 ^ 2', -4],
  ['(-2) ^ 2', 4],
  ['-2 ^ 2 + 1', -3],
  ['cos(0) + 3 * -4 / -2 ^ 2', 4]
]

const parser = new Parser()
let result

for (const [expression, expected] of mathTests) {
  try {
    result = parser.parse(expression)
    assert.equal(result, expected)
  } catch (err) {
    const lines = '-'.repeat(process.stdout.columns)
    console.log(lines)
    console.log(`Expression failed: "${expression}"`)
    console.log(`Expected result: ${expected}`)
    console.log(`Actual result: ${result}`)
    console.log(lines)
    throw err
  }
}

console.log('All tests passed! 🎉')

We can execute this file using the following command:

shell
Copied! ⭐️
node tests

After running this command, all the tests should pass 😁

Conclusion

Congrats, friend! You did it! You can parse anything now! 🎉

Pratt parsers are very elegant and popular, and I hope you've seen why. They're much easier to implement than recursive descent, more easily configurable, and make it easy to keep track of the order of mathematical operations.

Pratt parsing is my favorite way to create parsers. We can even use the knowledge we learned today to create a small programming language. In a future article, we'll create our own tiny programming language to support variables in math expressions, so get ready 😉

If this series has helped you in any way, please consider donating to my cause! I love talking about obscure topics and explaining them in a simple and detailed way 💚

In the next tutorial, we'll learn how to create abstract syntax trees (AST) by adjusting our Pratt parser. See you there! 🙂

Resources