How to Build a Recursive Descent Parser for Math Expressions in JavaScript

Published: Wednesday, March 22, 2023

Updated: Friday, March 24, 2023

Greetings, friends! This is my fourteenth post in the math expression parser series. In the last few tutorials, we discussed how to make a parser using parser generators. Now, it's time to build parsers manually again. In this tutorial, we'll learn how to build a recursive descent parser from scratch using JavaScript!

What is a Recursive Descent Parser?

A recursive descent parser is a super popular type of top-down parser built using recursive functions. Many hand-written parsers use some form recursive descent, including the parsers inside popular libraries like Babel and C++ compilers such as Clang and GCC. Recursive descent parsers are everywhere because they're fast, easy to write, and let users have total control of error messages.

There are two main types of recursive descent parsers:

Predictive parsers
Backtracking parsers

A backtracking parser is a type of parser that tries multiple production rules at each step of a grammar, but backtracks and tries different rules until it finds one that works. Backtracking too much can slow down the parser, but using memoization (caching) can help speed things up.

A predictive parser is a type of parser that uses lookahead tokens to figure out which grammar rules or productions to apply without using backtracking. That is, the parser won't go back and look at previous tokens. A predictive parser is much quicker, but it doesn't support as many grammars as a backtracking parser.

In this tutorial, we will build a predictive recursive descent parser because our grammar just needs to support math expressions. Usually, predictive parsers are the more favorable option when building recursive descent parsers for something as complex as programming languages as well.

As mentioned in previous tutorials, LR parsers can handle more grammars than LL parsers, so where do recursive descent parsers fit in? Generally, a recursive descent parser can work for all LL grammars and possibly more due to the flexibility of their design. For languages as complicated as C or C++, recursive descent can be used with some tricks to remember context so that it can support more complicated syntax.

Building a Tokenizer for Recursive Descent Parsers

In this tutorial, we're going to build the parser completely from scratch, but we can reuse the updated tokenizer we made in Part 6 of this tutorial series to help save some time. Please consult Part 5 of this tutorial series to learn how we constructed this tokenizer.

Create a new file called Tokenizer.js with the following contents:

Copied! ⭐️

const TokenTypes = {
  NUMBER: 'NUMBER',
  IDENTIFIER: 'IDENTIFIER',
  ADDITION: '+',
  SUBTRACTION: '-',
  MULTIPLICATION: '*',
  DIVISION: '/',
  EXPONENTIATION: '^',
  PARENTHESIS_LEFT: '(',
  PARENTHESIS_RIGHT: ')'
};

const TokenSpec = [
  [/^\s+/, null],
  [/^(?:\d+(?:\.\d*)?|\.\d+)/, TokenTypes.NUMBER],
  [/^[a-z]+/, TokenTypes.IDENTIFIER],
  [/^\+/, TokenTypes.ADDITION],
  [/^\-/, TokenTypes.SUBTRACTION],
  [/^\*/, TokenTypes.MULTIPLICATION],
  [/^\//, TokenTypes.DIVISION],
  [/^\^/, TokenTypes.EXPONENTIATION],
  [/^\(/, TokenTypes.PARENTHESIS_LEFT],
  [/^\)/, TokenTypes.PARENTHESIS_RIGHT]
];

class Tokenizer {
  constructor(input) {
    this.input = input;
    this.cursor = 0;
  }

  hasMoreTokens() {
    return this.cursor < this.input.length;
  }

  match(regex, inputSlice) {
    const matched = regex.exec(inputSlice);
    if (matched === null) {
      return null;
    }

    this.cursor += matched[0].length;
    return matched[0];
  }

  getNextToken() {
    if (!this.hasMoreTokens()) {
      return null;
    }

    const inputSlice = this.input.slice(this.cursor);

    for (let [regex, type] of TokenSpec) {
      const tokenValue = this.match(regex, inputSlice);

      if (tokenValue === null) {
        continue;
      }

      if (type === null) {
        return this.getNextToken();
      }

      return {
        type,
        value: tokenValue,
      };
    }

    throw new SyntaxError(`Unexpected token: "${inputSlice[0]}"`);
  }
}

module.exports = {
  TokenTypes,
  Tokenizer
}

This tokenizer will create a set of tokens for decimal numbers, words, and mathematical symbols. It will also skip all whitespaces. Each token will have a type and value property. The getNextToken function will retrieve the next token and use the slice function to look at only the remaining text from the input.

Given the input, sin(123), the tokenizing process will be similar to the following:

Copied! ⭐️

const input = 'sin(123)';
const tokenizer = new Tokenizer(input);

const token1 = tokenizer.getNextToken();
// inputSlice = 'sin(123)'
// token1 = { type: 'IDENTIFIER', value: 'sin' }

const token2 = tokenizer.getNextToken();
// inputSlice = '(123)'
// token2 = { type: '(', value: '(' }

const token3 = tokenizer.getNextToken();
// inputSlice = '123)'
// token3 = { type: 'NUMBER', value: '123' }

const token4 = tokenizer.getNextToken();
// inputSlice = ')'
// token4 = { type: ')', value: ')' }

const token5 = tokenizer.getNextToken();
// token5 = null

Starting our Recursive Descent Parser

Now that we have a tokenizer, it's time to start building a recursive descent parser. When constructing parsers by hand, it's still a good idea to build our own grammar to serve as the blueprint for the parser. We'll use a parsing expression grammar (PEG) because they naturally serve as a model for developing recursive descent parsers.

Let's create a simple grammar that returns a primary expression which only derives to a NUMBER token, representing a decimal number.

Copied! ⭐️

Expression
  = Primary

Primary
  = NUMBER

This grammar won't be stored in a file anywhere (unless you want to), but it'll help serve as our blueprint when adding more features to our recursive descent parser.

Let's see how we would create a recursive descent parser that follows this "blueprint". Create a new file called Parser.js with the following contents.

Copied! ⭐️

const { TokenTypes, Tokenizer } = require('./Tokenizer')

class Parser {
  parse(input) {
    this.input = input
    this.tokenizer = new Tokenizer(input)
    this.lookahead = this.tokenizer.getNextToken()

    return this.Expression()
  }

  // Expect a particular token, consume/eat it, and move to the next token
  eat(tokenType) {
    const token = this.lookahead

    if (token == null) {
      throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
    }

    if (token.type !== tokenType) {
      throw new SyntaxError(
        `Unexpected token: "${token.value}", expected "${tokenType}"`
      )
    }

    // Advance to the next token
    this.lookahead = this.tokenizer.getNextToken()

    return token
  }

  /**
   * Expression
   *    = Primary
   */
  Expression() {
    return this.Primary();
  }

  /**
   * Primary
   *    = NUMBER
   */
  Primary() {
    const token = this.eat(TokenTypes.NUMBER)
    return Number(token.value)
  }
}

const input = '12';
const parser = new Parser();
const result = parser.parse(input);
console.log(result); // 12

We can run the parser as a Node.js script using the following command:

shell

Copied! ⭐️

node Parser
# Output: 12

Let's analyze what's happening. First, we make sure to import Tokenizer and TokenTypes from our Tokenizer script. We created a class called Parse that contains two important methods: parse and eat. Gotta parse and eat every day, am I right? 😂

The parse method will take care of initializing our tokenizer with the provided input. It will also set the lookahead token to the very first token in our input. Then, we return this.Expression() to begin the recursive descent. The Expression method will then call the Primary method.

Inside the Primary method, we are "eating" or "consuming" a NUMBER token. The eat method expects the token to be of a certain type. If it matches the given type, then it will consume the token and move the cursor to a different position in the input string.

Remember, the getNextToken method is responsible for moving the position of our cursor to a spot directly after the consumed token. Since the string, '12', was our only input to the parser, the parse completes after consuming one token. Finally, we return the numeric result, 12.

Notice how we put the production rules in comments above the Expression and Primary methods. This will help us visualize where we are at in the grammar as our recursive descent parser continues to grow. In our grammar, Expression and Primary represent nonterminal symbols and NUMBER represents our terminal symbol.

We recursively call methods corresponding to nonterminal symbols until we've reached the end of the input, hence the name recursive descent parser! 😃

Handling Addition

Let the recursion continue! It's time to update our grammar "blueprint" to allow addition.

Copied! ⭐️

Expression
  = Primary ("+" Primary)*

Primary
  = NUMBER

This grammar uses EBNF-like syntax, similar to the previous tutorial. By using the syntax, (expression)*, we are saying that zero or more occurrences are allowed. It's like saying the Expression resolves to the following:

text

Copied! ⭐️

Primary + Primary + Primary + ...

When dealing with infinite expressions, we can use a while loop that checks if there's another + symbol. This can be seen in the example below.

Let's update our Parser.js file to have the following contents.

Copied! ⭐️

const { TokenTypes, Tokenizer } = require('./Tokenizer')

class Parser {
  parse(input) {
    this.input = input
    this.tokenizer = new Tokenizer(input)
    this.lookahead = this.tokenizer.getNextToken()

    return this.Expression()
  }

  eat(tokenType) {
    const token = this.lookahead

    if (token == null) {
      throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
    }

    if (token.type !== tokenType) {
      throw new SyntaxError(
        `Unexpected token: "${token.value}", expected "${tokenType}"`
      )
    }

    this.lookahead = this.tokenizer.getNextToken()

    return token
  }

  /**
   * Expression
   *    = Primary ("+" Primary)*
   */
  Expression() {
    let left = this.Primary()

    // We use the optional chaining operator "?." because the lookahead can be null
    while (this.lookahead?.type === TokenTypes.ADDITION) {
      this.eat(TokenTypes.ADDITION)
      const right = this.Primary()
      left = left + right
    }

    return left
  }

  /**
   * Primary
   *    = NUMBER
   */
  Primary() {
    const token = this.eat(TokenTypes.NUMBER)
    return Number(token.value)
  }
}

const parser = new Parser();
console.log(parser.parse('1 + 2')); // 3
console.log(parser.parse('1 + 2 + 3')); // 6

Remember! Our tokenizer is already designed to skip whitespaces for us, so we don't need to worry about them in our grammar or parser.

Take a moment to go through this program very carefully. As we can see, more recursion occurs as our grammar grows and when there are more operators in our math expression. After the parse completes, there should be no more tokens left, and the parse method should return a result.

If you need help debugging this parser, I would suggest using the --inspect-brk option when executing the JavaScript file with Node.js.

shell

Copied! ⭐️

node --inspect-brk Parser.js

I have an excellent tutorial on debugging Node.js apps with Chrome DevTools, so check it out!

Handling Subtraction

Let's update our grammar "blueprint" to include subtraction.

Copied! ⭐️

Expression
  = Primary (("+" / "-") Primary)*

Primary
  = NUMBER

Adding subtraction to our parser is super easy. We just add a conditional statement in the Expression method to determine whether the operator is addition or subtraction. Then, we consume that operator token and perform either addition or subtraction.

Copied! ⭐️

const { TokenTypes, Tokenizer } = require('./Tokenizer')

class Parser {
  parse(input) {
    this.input = input
    this.tokenizer = new Tokenizer(input)
    this.lookahead = this.tokenizer.getNextToken()

    return this.Expression()
  }

  // Expect a particular token, consume/eat it, and move to the next token
  eat(tokenType) {
    const token = this.lookahead

    if (token == null) {
      throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
    }

    if (token.type !== tokenType) {
      throw new SyntaxError(
        `Unexpected token: "${token.value}", expected "${tokenType}"`
      )
    }

    // Advance to the next token
    this.lookahead = this.tokenizer.getNextToken()

    return token
  }

  /**
   * Expression
   *    = Primary (("+" / "-") Primary)*
   */
  Expression() {
    let left = this.Primary()

    while (
      this.lookahead?.type === TokenTypes.ADDITION ||
      this.lookahead?.type === TokenTypes.SUBTRACTION
    ) {
      if (this.lookahead?.type === TokenTypes.ADDITION) {
        this.eat(TokenTypes.ADDITION)
        const right = this.Primary()
        left = left + right
      } else {
        this.eat(TokenTypes.SUBTRACTION)
        const right = this.Primary()
        left = left - right
      }
    }

    return left
  }

  /**
   * Primary
   *    = NUMBER
   */
  Primary() {
    const token = this.eat(TokenTypes.NUMBER)
    return Number(token.value)
  }
}

const parser = new Parser()
console.log(parser.parse('5 - 2')) // 3
console.log(parser.parse('5 - 2 - 1')) // 2

Handling Multiplication and Division

Similar to the previous tutorial, we can create a Term rule to handle multiplication and division.

Copied! ⭐️

Expression
  = Term (("+" / "-") Term)*

Term
  = Primary (("*" / "/") Primary)*

Primary
  = NUMBER

In our recursive descent parser, we'll need to create a new method called Term. The Expression method will now invoke Term instead of Primary.

Copied! ⭐️

const { TokenTypes, Tokenizer } = require('./Tokenizer')

class Parser {
  parse(input) {
    this.input = input
    this.tokenizer = new Tokenizer(input)
    this.lookahead = this.tokenizer.getNextToken()

    return this.Expression()
  }

  // Expect a particular token, consume/eat it, and move to the next token
  eat(tokenType) {
    const token = this.lookahead

    if (token == null) {
      throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
    }

    if (token.type !== tokenType) {
      throw new SyntaxError(
        `Unexpected token: "${token.value}", expected "${tokenType}"`
      )
    }

    // Advance to the next token
    this.lookahead = this.tokenizer.getNextToken()

    return token
  }

  /**
   * Expression
   *    = Term (("+" / "-") Term)*
   */
  Expression() {
    let left = this.Term()

    while (
      this.lookahead?.type === TokenTypes.ADDITION ||
      this.lookahead?.type === TokenTypes.SUBTRACTION
    ) {
      if (this.lookahead?.type === TokenTypes.ADDITION) {
        this.eat(TokenTypes.ADDITION)
        const right = this.Term()
        left = left + right
      } else {
        this.eat(TokenTypes.SUBTRACTION)
        const right = this.Term()
        left = left - right
      }
    }

    return left
  }

  /**
   * Term
   *    = Primary (("*" / "/") Primary)*
   */
  Term() {
    let left = this.Primary()

    while (
      this.lookahead?.type === TokenTypes.MULTIPLICATION ||
      this.lookahead?.type === TokenTypes.DIVISION
    ) {
      if (this.lookahead?.type === TokenTypes.MULTIPLICATION) {
        this.eat(TokenTypes.MULTIPLICATION)
        const right = this.Primary()
        left = left * right
      } else {
        this.eat(TokenTypes.DIVISION)
        const right = this.Primary()
        left = left / right
      }
    }

    return left
  }

  /**
   * Primary
   *    = NUMBER
   */
  Primary() {
    const token = this.eat(TokenTypes.NUMBER)
    return Number(token.value)
  }
}

const parser = new Parser()
console.log(parser.parse('12 / 2')) // 6
console.log(parser.parse('12 / 2 / 3')) // 2

Handling Exponentiation

We handled exponentiation in the previous tutorial by creating a new grammar rule called Factor. The Term rule will now derive into Factor instead of Primary.

Copied! ⭐️

Expression
  = Term (("+" / "-") Term)*

Term
  = Factor (("*" / "/") Factor)*

Factor
  = Primary ("^" Factor)*

Primary
  = NUMBER

In our recursive descent parser, we'll need to create a new method called Factor. The Term method will now invoke Factor instead of Primary.

Copied! ⭐️

const { TokenTypes, Tokenizer } = require('./Tokenizer')

class Parser {
  parse(input) {
    this.input = input
    this.tokenizer = new Tokenizer(input)
    this.lookahead = this.tokenizer.getNextToken()

    return this.Expression()
  }

  // Expect a particular token, consume/eat it, and move to the next token
  eat(tokenType) {
    const token = this.lookahead

    if (token == null) {
      throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
    }

    if (token.type !== tokenType) {
      throw new SyntaxError(
        `Unexpected token: "${token.value}", expected "${tokenType}"`
      )
    }

    // Advance to the next token
    this.lookahead = this.tokenizer.getNextToken()

    return token
  }

  /**
   * Expression
   *    = Term (("+" / "-") Term)*
   */
  Expression() {
    let left = this.Term()

    while (
      this.lookahead?.type === TokenTypes.ADDITION ||
      this.lookahead?.type === TokenTypes.SUBTRACTION
    ) {
      if (this.lookahead?.type === TokenTypes.ADDITION) {
        this.eat(TokenTypes.ADDITION)
        const right = this.Term()
        left = left + right
      } else {
        this.eat(TokenTypes.SUBTRACTION)
        const right = this.Term()
        left = left - right
      }
    }

    return left
  }

  /**
   * Term
   *    = Factor (("*" / "/") Factor)*
   */
  Term() {
    let left = this.Factor()

    while (
      this.lookahead?.type === TokenTypes.MULTIPLICATION ||
      this.lookahead?.type === TokenTypes.DIVISION
    ) {
      if (this.lookahead?.type === TokenTypes.MULTIPLICATION) {
        this.eat(TokenTypes.MULTIPLICATION)
        const right = this.Factor()
        left = left * right
      } else {
        this.eat(TokenTypes.DIVISION)
        const right = this.Factor()
        left = left / right
      }
    }

    return left
  }

  /**
   * Factor
   *    = Primary ("^" Factor)*
   */
  Factor() {
    let left = this.Primary()

    while (this.lookahead?.type === TokenTypes.EXPONENTIATION) {
      this.eat(TokenTypes.EXPONENTIATION)
      const right = this.Factor()
      left = left ** right
    }

    return left
  }

  /**
   * Primary
   *    = NUMBER
   */
  Primary() {
    const token = this.eat(TokenTypes.NUMBER)
    return Number(token.value)
  }
}

const parser = new Parser()
console.log(parser.parse('2 ^ 2')) // 4
console.log(parser.parse('2 ^ 2 ^ 3')) // 256

Notice the that Factor grammar rule doesn't derive into Primary ^ Primary. Instead, it derives into Primary ^ Factor.

Copied! ⭐️

Factor
  = Primary ("^" Factor)*

This will ensure that the exponentiation operation is right associative which is important in expressions such as 2 ^ 2 ^ 3, which should be equal to 256.

Cleaning Up the Binary Expressions

As you may have noticed, there's duplicate logic occurring in the Expression, Term, and Factor methods. We can create a BinaryExpression method that helps tidy up our code.

Copied! ⭐️

const { TokenTypes, Tokenizer } = require('./Tokenizer')

class Parser {
  parse(input) {
    this.input = input
    this.tokenizer = new Tokenizer(input)
    this.lookahead = this.tokenizer.getNextToken()

    return this.Expression()
  }

  // Expect a particular token, consume/eat it, and move to the next token
  eat(tokenType) {
    const token = this.lookahead

    if (token == null) {
      throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
    }

    if (token.type !== tokenType) {
      throw new SyntaxError(
        `Unexpected token: "${token.value}", expected "${tokenType}"`
      )
    }

    // Advance to the next token
    this.lookahead = this.tokenizer.getNextToken()

    return token
  }

  BinaryExpression(leftRule, rightRule, operatorType1, operatorType2) {
    let left = leftRule()

    while (
      this.lookahead &&
      (this.lookahead.type === operatorType1 ||
        this.lookahead.type === operatorType2)
    ) {
      const operator = this.eat(this.lookahead.type).type
      switch (operator) {
        case TokenTypes.ADDITION:
          left = left + rightRule()
          break
        case TokenTypes.SUBTRACTION:
          left = left - rightRule()
          break
        case TokenTypes.MULTIPLICATION:
          left = left * rightRule()
          break
        case TokenTypes.DIVISION:
          left = left / rightRule()
          break
        case TokenTypes.EXPONENTIATION:
          left = left ** rightRule()
          break
      }
    }

    return left
  }

  /**
   * Expression
   *    = Term (("+" / "-") Term)*
   */
  Expression() {
    return this.BinaryExpression(
      () => this.Term(),
      () => this.Term(),
      TokenTypes.ADDITION,
      TokenTypes.SUBTRACTION
    )
  }

  /**
   * Term
   *    = Factor (("*" / "/") Factor)*
   */
  Term() {
    return this.BinaryExpression(
      () => this.Factor(),
      () => this.Factor(),
      TokenTypes.MULTIPLICATION,
      TokenTypes.DIVISION
    )
  }

  /**
   * Factor
   *    = Primary ("^" Factor)*
   */
  Factor() {
    return this.BinaryExpression(
      () => this.Primary(),
      () => this.Factor(),
      TokenTypes.EXPONENTIATION
    )
  }

  /**
   * Primary
   *    = NUMBER
   */
  Primary() {
    const token = this.eat(TokenTypes.NUMBER)
    return Number(token.value)
  }
}

The BinaryExpression method accepts four parameters: leftRule, rightRule, operatorType1, operatorType2. Let's look at how this method is used in the Expression method.

Copied! ⭐️

/**
 * Expression
 *    = Term (("+" / "-") Term)*
 */
Expression() {
    return this.BinaryExpression(
      () => this.Term(),
      () => this.Term(),
      TokenTypes.ADDITION,
      TokenTypes.SUBTRACTION
    )
  }

As indicated by our grammar blueprint, both the leftRule and rightRule are Term rules. We need to pass arrow functions as parameters since we're inside a class. If we passed this.Term instead of an arrow function, then the BinaryExpression wouldn't understand what this refers to, and you would get the following error.

text

Copied! ⭐️

TypeError: Cannot read properties of undefined

I hope this all made sense! Please feel free to choose whichever format you're comfortable with. As software developers, it's a good idea to make code DRY whenever possible (unless you're trying to make a hyper performant application that targets a specific compiler's optimizations 😅)

Finally, we can test various mathematical expressions to make sure our parser still works correctly.

Copied! ⭐️

const parser = new Parser()
console.log(parser.parse('1 + 2 + 3')) // 6
console.log(parser.parse('5 - 2 - 1')) // 2
console.log(parser.parse('1 * 2 * 3')) // 6
console.log(parser.parse('12 / 2 / 3')) // 2
console.log(parser.parse('2 ^ 2')) // 4
console.log(parser.parse('2 ^ 2 ^ 3')) // 256
console.log(parser.parse('1 + 2 ^ 2 * 3 - 4 / 2')) // 11

Handling Parentheses

Now that our little detour is finished, let's add support for parentheses in our grammar blueprint.

Copied! ⭐️

Expression
  = Term (("+" / "-") Term)*

Term
  = Factor (("*" / "/") Factor)*

Factor
  = Primary ("^" Factor)*

Primary
  = ParenthesizedExpression
  / NUMBER

ParenthesizedExpression
  = "(" Expression ")"

We will create a new method called ParenthesizedExpression that will handle eating/consuming parentheses tokens ( ) and returning the expression in between them.

Copied! ⭐️

const { TokenTypes, Tokenizer } = require('./Tokenizer')

class Parser {
  parse(input) {
    this.input = input
    this.tokenizer = new Tokenizer(input)
    this.lookahead = this.tokenizer.getNextToken()

    return this.Expression()
  }

  // Expect a particular token, consume/eat it, and move to the next token
  eat(tokenType) {
    const token = this.lookahead

    if (token == null) {
      throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
    }

    if (token.type !== tokenType) {
      throw new SyntaxError(
        `Unexpected token: "${token.value}", expected "${tokenType}"`
      )
    }

    // Advance to the next token
    this.lookahead = this.tokenizer.getNextToken()

    return token
  }

  BinaryExpression(leftRule, rightRule, operatorType1, operatorType2) {
    let left = leftRule()

    while (
      this.lookahead &&
      (this.lookahead.type === operatorType1 ||
        this.lookahead.type === operatorType2)
    ) {
      const operator = this.eat(this.lookahead.type).type
      switch (operator) {
        case TokenTypes.ADDITION:
          left = left + rightRule()
          break
        case TokenTypes.SUBTRACTION:
          left = left - rightRule()
          break
        case TokenTypes.MULTIPLICATION:
          left = left * rightRule()
          break
        case TokenTypes.DIVISION:
          left = left / rightRule()
          break
        case TokenTypes.EXPONENTIATION:
          left = left ** rightRule()
          break
      }
    }

    return left
  }

  /**
   * Expression
   *    = Term (("+" / "-") Term)*
   */
  Expression() {
    return this.BinaryExpression(
      () => this.Term(),
      () => this.Term(),
      TokenTypes.ADDITION,
      TokenTypes.SUBTRACTION
    )
  }

  /**
   * Term
   *    = Factor (("*" / "/") Factor)*
   */
  Term() {
    return this.BinaryExpression(
      () => this.Factor(),
      () => this.Factor(),
      TokenTypes.MULTIPLICATION,
      TokenTypes.DIVISION
    )
  }

  /**
   * Factor
   *    = Primary ("^" Factor)*
   */
  Factor() {
    return this.BinaryExpression(
      () => this.Primary(),
      () => this.Factor(),
      TokenTypes.EXPONENTIATION
    )
  }

  /**
   * Primary
   *    = ParenthesizedExpression
   *    / NUMBER
   */
  Primary() {
    if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT) {
      return this.ParenthesizedExpression()
    }

    const token = this.eat(TokenTypes.NUMBER)
    return Number(token.value)
  }

  /**
   * ParenthesizedExpression
   *    = "(" Expression ")"
   */
  ParenthesizedExpression() {
    this.eat(TokenTypes.PARENTHESIS_LEFT)
    const expression = this.Expression()
    this.eat(TokenTypes.PARENTHESIS_RIGHT)

    return expression
  }
}

const parser = new Parser()
console.log(parser.parse('(1 + 2) * 3')) // 9
console.log(parser.parse('(2 ^ 2) ^ 3')) // 64

Notice how the Primary function now has a conditional statement for checking if a left parenthesis ( token is encountered. If so, then return a parenthesized expression instead of a number.

Handling Unary Operations

Similar to the parenthesized expression, we need to create another rule under Primary called UnaryExpression. Our grammar blueprint should now look like the following.

Copied! ⭐️

Expression
  = Term (("+" / "-") Term)*

Term
  = Factor (("*" / "/") Factor)*

Factor
  = Primary ("^" Factor)*

Primary
  = ParenthesizedExpression
  / UnaryExpression
  / NUMBER

ParenthesizedExpression
  = "(" Expression ")"

UnaryExpression
  = "-" Factor

We want the unary expression to resolve to "-" Factor instead of "-" Expression because we want exponentiation operations to have higher precedence than unary operations. Therefore, the math expression, -2 ^ 2 should result in a value of -4.

Let's implement the UnaryExpression method in our recursive descent parser.

Copied! ⭐️

const { TokenTypes, Tokenizer } = require('./Tokenizer')

class Parser {
  parse(input) {
    this.input = input
    this.tokenizer = new Tokenizer(input)
    this.lookahead = this.tokenizer.getNextToken()

    return this.Expression()
  }

  // Expect a particular token, consume/eat it, and move to the next token
  eat(tokenType) {
    const token = this.lookahead

    if (token == null) {
      throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
    }

    if (token.type !== tokenType) {
      throw new SyntaxError(
        `Unexpected token: "${token.value}", expected "${tokenType}"`
      )
    }

    // Advance to the next token
    this.lookahead = this.tokenizer.getNextToken()

    return token
  }

  BinaryExpression(leftRule, rightRule, operatorType1, operatorType2) {
    let left = leftRule()

    while (
      this.lookahead &&
      (this.lookahead.type === operatorType1 ||
        this.lookahead.type === operatorType2)
    ) {
      const operator = this.eat(this.lookahead.type).type
      switch (operator) {
        case TokenTypes.ADDITION:
          left = left + rightRule()
          break
        case TokenTypes.SUBTRACTION:
          left = left - rightRule()
          break
        case TokenTypes.MULTIPLICATION:
          left = left * rightRule()
          break
        case TokenTypes.DIVISION:
          left = left / rightRule()
          break
        case TokenTypes.EXPONENTIATION:
          left = left ** rightRule()
          break
      }
    }

    return left
  }

  /**
   * Expression
   *    = Term (("+" / "-") Term)*
   */
  Expression() {
    return this.BinaryExpression(
      () => this.Term(),
      () => this.Term(),
      TokenTypes.ADDITION,
      TokenTypes.SUBTRACTION
    )
  }

  /**
   * Term
   *    = Factor (("*" / "/") Factor)*
   */
  Term() {
    return this.BinaryExpression(
      () => this.Factor(),
      () => this.Factor(),
      TokenTypes.MULTIPLICATION,
      TokenTypes.DIVISION
    )
  }

  /**
   * Factor
   *    = Primary ("^" Factor)*
   */
  Factor() {
    return this.BinaryExpression(
      () => this.Primary(),
      () => this.Factor(),
      TokenTypes.EXPONENTIATION
    )
  }

  /**
   * Primary
   *    = ParenthesizedExpression
   *    / UnaryExpression
   *    / NUMBER
   */
  Primary() {
    if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT) {
      return this.ParenthesizedExpression()
    }

    if (this.lookahead.type === TokenTypes.SUBTRACTION) {
      return this.UnaryExpression()
    }

    const token = this.eat(TokenTypes.NUMBER)
    return Number(token.value)
  }

  /**
   * ParenthesizedExpression
   *    = "(" Expression ")"
   */
  ParenthesizedExpression() {
    this.eat(TokenTypes.PARENTHESIS_LEFT)
    const expression = this.Expression()
    this.eat(TokenTypes.PARENTHESIS_RIGHT)

    return expression
  }

  /**
   * UnaryExpression
   *    = "-" Factor
   */
  UnaryExpression() {
    this.eat(TokenTypes.SUBTRACTION) // consume the "-" token
    return -this.Factor() // return the negative version of whatever Factor resolves to
  }
}

const parser = new Parser()
console.log(parser.parse('-2 ^ 2')) // -4
console.log(parser.parse('-2 ^ 2 + 1')) // -3
console.log(parser.parse('-(2 * 3)')) // -6

We updated Primary function to include a new conditional statement for checking if a negative sign (or subtraction) - token is encountered. If so, then return a unary expression.

Handling Trigonometric Functions

Next, we would like to add support for the trigonometric functions, sin, cos, and tan. The tokenizer is already setup to handle words and return IDENTIFIER tokens. Let's add support for functions in our grammar blueprint.

Copied! ⭐️

Expression
  = Term (("+" / "-") Term)*

Term
  = Factor (("*" / "/") Factor)*

Factor
  = Primary ("^" Factor)*

Primary
  = ParenthesizedExpression
  / UnaryExpression
  / FunctionExpression
  / NUMBER

ParenthesizedExpression
  = "(" Expression ")"

UnaryExpression
  = "-" Factor

FunctionExpression
  = IDENTIFIER ParenthesizedExpression

Notice that we added another option to our Primary grammar rule. The FunctionExpression will consume a word such as cos or sin, and then use the parenthesized expression to consume the parentheses.

Copied! ⭐️

const { TokenTypes, Tokenizer } = require('./Tokenizer')

class Parser {
  parse(input) {
    this.input = input
    this.tokenizer = new Tokenizer(input)
    this.lookahead = this.tokenizer.getNextToken()

    return this.Expression()
  }

  // Expect a particular token, consume/eat it, and move to the next token
  eat(tokenType) {
    const token = this.lookahead

    if (token == null) {
      throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
    }

    if (token.type !== tokenType) {
      throw new SyntaxError(
        `Unexpected token: "${token.value}", expected "${tokenType}"`
      )
    }

    // Advance to the next token
    this.lookahead = this.tokenizer.getNextToken()

    return token
  }

  trig(id, value) {
    switch (id) {
      case 'sin':
        return Math.sin(value)
      case 'cos':
        return Math.cos(value)
      case 'tan':
        return Math.tan(value)
      default:
        throw new Error(`Invalid operation: ${id}`)
    }
  }

  BinaryExpression(leftRule, rightRule, operatorType1, operatorType2) {
    let left = leftRule()

    while (
      this.lookahead &&
      (this.lookahead.type === operatorType1 ||
        this.lookahead.type === operatorType2)
    ) {
      const operator = this.eat(this.lookahead.type).type
      switch (operator) {
        case TokenTypes.ADDITION:
          left = left + rightRule()
          break
        case TokenTypes.SUBTRACTION:
          left = left - rightRule()
          break
        case TokenTypes.MULTIPLICATION:
          left = left * rightRule()
          break
        case TokenTypes.DIVISION:
          left = left / rightRule()
          break
        case TokenTypes.EXPONENTIATION:
          left = left ** rightRule()
          break
      }
    }

    return left
  }

  /**
   * Expression
   *    = Term (("+" / "-") Term)*
   */
  Expression() {
    return this.BinaryExpression(
      () => this.Term(),
      () => this.Term(),
      TokenTypes.ADDITION,
      TokenTypes.SUBTRACTION
    )
  }

  /**
   * Term
   *    = Factor (("*" / "/") Factor)*
   */
  Term() {
    return this.BinaryExpression(
      () => this.Factor(),
      () => this.Factor(),
      TokenTypes.MULTIPLICATION,
      TokenTypes.DIVISION
    )
  }

  /**
   * Factor
   *    = Primary ("^" Factor)*
   */
  Factor() {
    return this.BinaryExpression(
      () => this.Primary(),
      () => this.Factor(),
      TokenTypes.EXPONENTIATION
    )
  }

  /**
   * Primary
   *    = ParenthesizedExpression
   *    / UnaryExpression
   *    / FunctionExpression
   *    / NUMBER
   */
  Primary() {
    if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT) {
      return this.ParenthesizedExpression()
    }

    if (this.lookahead.type === TokenTypes.SUBTRACTION) {
      return this.UnaryExpression()
    }

    if (this.lookahead.type === TokenTypes.IDENTIFIER) {
      return this.FunctionExpression()
    }

    const token = this.eat(TokenTypes.NUMBER)
    return Number(token.value)
  }

  /**
   * ParenthesizedExpression
   *    = "(" Expression ")"
   */
  ParenthesizedExpression() {
    this.eat(TokenTypes.PARENTHESIS_LEFT)
    const expression = this.Expression()
    this.eat(TokenTypes.PARENTHESIS_RIGHT)

    return expression
  }

  /**
   * UnaryExpression
   *    = "-" Factor
   */
  UnaryExpression() {
    this.eat(TokenTypes.SUBTRACTION)
    return -this.Factor()
  }

  /**
   * FunctionExpression
   *    = IDENTIFIER ParenthesizedExpression
   */
  FunctionExpression() {
    const id = this.eat(TokenTypes.IDENTIFIER).value
    return this.trig(id, this.ParenthesizedExpression())
  }
}

const parser = new Parser()
console.log(parser.parse('sin(0)')) // 0
console.log(parser.parse('cos(0)')) // 1
console.log(parser.parse('tan(0)')) // 0
console.log(parser.parse('-cos(0) + 3 ^ 2')) // 8

We updated Primary function to include a new conditional statement for checking if an IDENTIFIER token is encountered. If so, then return a function expression. Inside the FunctionExpression method, we're calling a new function called trig. This function is identical to the one we used in the past few tutorials. We added the trig function underneath the eat method.

Running Quick Tests

As our grammar blueprint and parser grow and become more complicated, it's always best to run tests after adding a new featurue. I would encourage everyone to use a testing library such as Jest or Vitest to test as many math expressions as possible!

Without installing a testing library, however, we can run some quick tests to make sure there are no errors in our math expression parser.

Add the following line at the end of our Parser.js file to export our Parser class.

Copied! ⭐️

module.exports = { Parser }

Then, create a new file called tests.js with the following contents.

Copied! ⭐️

const assert = require('assert').strict
const { Parser } = require('./Parser')

const mathTests = [
  ['1', 1],
  [' 2 ', 2],
  ['1 + 2', 3],
  [' 1 + 2 ', 3],
  ['1 + 2 * 3', 7],
  ['(1 + 2) * 3', 9],
  ['5 - 2', 3],
  ['5 - 2 - 1', 2],
  ['12 / 2 / 3', 2],
  ['2 ^ 3 + 1', 9],
  ['-2 ^ 2', -4],
  ['(-2) ^ 2', 4],
  ['-2 ^ 2 + 1', -3],
  ['cos(0) + 3 * -4 / -2 ^ 2', 4]
]

const parser = new Parser()
let result

for (const [expression, expected] of mathTests) {
  try {
    result = parser.parse(expression)
    assert.equal(result, expected)
  } catch (err) {
    const lines = '-'.repeat(process.stdout.columns)
    console.log(lines)
    console.log(`Expression failed: "${expression}"`)
    console.log(`Expected result: ${expected}`)
    console.log(`Actual result: ${result}`)
    console.log(lines)
    throw err
  }
}

console.log('All tests passed! 🎉')

We can execute this script using the command, node tests. If all the tests pass successfully, we should see a celebratory message! 🎉

If any tests fail, then we should see an error message. We can add the following test at the end of the mathTests array to force the test to fail.

Copied! ⭐️

['1 + 2', 4]

If we scroll to the top of the error message, we should see something similar to the following:

text

Copied! ⭐️

----------------------------------------------------
Expression failed: "1 + 2"
Expected result: 4
Actual result: 3
----------------------------------------------------

This is just one way of quickly testing our math expressions! Cool, huh?

Passing in Math Expressions from the Command Line

Sometimes, it's nice to use a command line interface (CLI) to parse math expressions. Let's see how we can turn our parser into our own CLI tool. Node.js provides a special array called process.argv that contains all the text we pass into the command line when executing a JavaScript file.

Specifically, process.argv[2] will help us retrieve the math expression we pass into our Node.js script. Add the following code to the end of the Parser.js file.

Copied! ⭐️

const input = process.argv[2]
const parser = new Parser()
console.log(parser.parse(input))

Then, run the following command:

shell

Copied! ⭐️

node Parser '1 + 2 * 3'

We should see 7 appear in the same terminal you used to execute the Parser.js script. We now have the ability to run our parser as a CLI tool! 🎉

Conclusion

Congrats, friend! You made it to the end! You now have the power to parse anything! Recursive descent parsing is a very powerful and popular algorithm used everywhere. It's so simple yet elegant.

If this series has helped you in any way, please consider donating to my cause! I would be eternally grateful 🙇

These tutorials take a while to write and get right 😅

In the next tutorial, we'll learn how to create abstract syntax trees (AST) of our math expressions by modifying our recursive descent parser! Until then, happy coding! ⭐