Shunting Yard Algorithm in JavaScript

Published: Monday, February 6, 2023

Greetings, friends! This is my third post in the math expression parser series. In this tutorial, we'll learn how to build a math expression parser and evaluator using the Shunting yard algorithm. This tutorial is quite long, so if you're an experienced developer who just wants to see the code, you can see the completed code at the bottom of this page. However, I encourage everyone to read along as I try to explain my thought process behind the Shunting yard algorithm implementation ๐Ÿ˜Š

Introduction

The Shunting yard algorithm is a method for parsing arithmetical or logical expressions. It is commonly used to convert a mathematical expression from infix notation to Reverse Polish notation (RPN), also known as postfix notation.

Once we have the math expression in RPN/postfix form, we can use the RPN evaluator we built in the previous tutorial to solve the math expression. However, we'll see in the next tutorial how we can skip this step and evaluate a math expression during the shunting yard algorithm instead of waiting until after. This will make our math expression evaluator faster by skipping an unnecessary step.

Then, we'll learn how to produce abstract syntax trees (AST) which provide a great way to visualize the order of operations taking place. Programming languages often convert expressions into abstract syntax trees, so this will be good practice for when we may want to create a small programming language later.

Basic Idea for Shunting Yard Algorithm

There is a really good algorithm on Wikipedia written in pseudocode that describes the steps toward building our math expression parser. However, it'd be nice to see the actual code implementation instead. Luckily, I built out an implementation in JavaScript and will share it with you later in this tutorial ๐Ÿ˜Š

Similar to the RPN evaluator, we need to leverage a stack for the Shunting yard algorithm implementation. Wikipedia mentions that we also need a "queue" for the output, but we'll just use a string instead and add characters to it. The output string will eventually contain the RPN/postfix form of the input string that was in infix form.

Here's the basic idea for the Shunting yard algorithm:

  1. Input: 1 + 2
  2. Append 1 to the output string
  3. Push + to the stack array
  4. Append 2 to the output string
  5. End of string detected
  6. Pop + off the stack array and add it to the output string
  7. Output: 1 2 +

The final output will be the string in RPN/postfix form. Now, this algorithm seems simple, but we're missing some important details: operator precedence and associativity.

Precedence and Associativity

Operator precedence is related to the "PEMDAS" mnemonic we learned in the first part of this tutorial series. The order of operations is as follows:

  1. Parentheses
  2. Exponents (evaluated from right to left)
  3. Multiplication/Division (evaluated from left to right)
  4. Addition/Subtraction (evaluated from left to right)

Let's turn this information into a table. Parentheses are intentionally missing from the table because we'll end up handling them when we implement the Shunting yard algorithm.

NameOperatorPrecedenceAssociativity
Exponentiation^4Right
Multiplication*3Left
Division/3Left
Addition+2Left
Subtraction-2Left

In JavaScript, we can encode this table as a JavaScript object:

js
Copied! โญ๏ธ
const operators = {
  '^': {
    prec: 4,
    assoc: 'right',
  },
  '*': {
    prec: 3,
    assoc: 'left',
  },
  '/': {
    prec: 3,
    assoc: 'left',
  },
  '+': {
    prec: 2,
    assoc: 'left',
  },
  '-': {
    prec: 2,
    assoc: 'left',
  },
};

Precedence loosely means that one operator has priority over another. In the math expression, 1 + 2 * 3, multiplication has priority over addition, so 2 * 3 is evaluated first, not 1 + 2.

Associativity loosely means that an operator is applied in a certain order when operators of the same kind are next to each other. This plays a major role in the exponentiation operator. The math expression, 2 ^ 2 ^ 3, is not the same as 2 ^ 3 ^ 2.

js
Copied! โญ๏ธ
2 ^ 2 ^ 3
  = 2 ^ (2 ^ 3)
  = 2 ^ 8
  = 256
js
Copied! โญ๏ธ
2 ^ 3 ^ 2
  = 2 ^ (3 ^ 2)
  = 2 ^ 9
  = 512

We can confirm these results using JavaScript:

js
Copied! โญ๏ธ
console.log(2 ** 2 ** 3); // 256
console.log(2 ** 3 ** 2); // 512

Pseudocode for the Shunting Yard Algorithm

As mentioned earlier in this article, Wikipedia explains the algorithm in detail using pseudocode. However, we're going to simplify things a bit. We're going to avoid allowing functions as operators, and we're going to avoid using a tokenizer for now. We'll assume that each character in the string represents a token just like the RPN evaluator we made in the previous tutorial. Don't worry! We'll build a tokenizer later, so we can add multi-digit numbers such as 10 or 15.

Taking these simplifications in mind, we end up with the following pseudocode.

js
Copied! โญ๏ธ
while there are tokens to be read:
    - read a token

    if the token is:
    - a number:
        add it into the output string
    - an operator o1:
        while (
            there is an operator o2 at the top of the operator stack which is not a left parenthesis, 
            and (o2 has greater precedence than o1
            or (o1 and o2 have the same precedence and o1 is left-associative))
        ):
            - pop o2 from the operator stack and add it to the output string
        - push o1 onto the operator stack

    - a left parenthesis (i.e. "("):
        push it onto the operator stack
    - a right parenthesis (i.e. ")"):

        while the operator at the top of the operator stack is not a left parenthesis:
            - { assert the operator stack is not empty }
            /* If the stack runs out without finding a left parenthesis, then there are mismatched parentheses. */
            - pop the operator from the operator stack and add it to the output string
        - { assert there is a left parenthesis at the top of the operator stack }
        - pop the left parenthesis from the operator stack and discard it

/* After the while loop, pop the remaining items from the operator stack into the output queue. */
while there are tokens on the operator stack:
    /* If the operator token on the top of the stack is a parenthesis, then there are mismatched parentheses. */
    { assert the operator on top of the stack is not a (left) parenthesis }
    - pop the operator from the operator stack and add it to the output string

Yikes, that's a lot to read, huh? ๐Ÿ˜… Let's break this problem down into pieces and figure out how to implement this in JavaScript.

Shunting Yard Algorithm Implementation

Let me explain my thought process behind developing an implementation of this algorithm. The first thing we need to do is to create a function that accepts an input string and will eventually return an output string. We also initialize a stack variable and set it to an empty array.

js
Copied! โญ๏ธ
const toRPN = (input) => {
  const stack = [];
  let output = '';

  return output;
}

const result = toRPN('1 + 2');
console.log(result); // Eventually, the result should be '1 2 +'

We have named the function, toRPN, because it converts a math expression from infix form to RPN/postfix form, but feel free to name it whatever you'd like.

The next step is to create a loop that will iterate through every character in the input string. We will skip whitespaces and call the handleToken method on the other characters.

js
Copied! โญ๏ธ
const toRPN = (input) => {
  const stack = [];
  let output = '';

  for (let i of input) {
    if (i === ' ') continue;

    handleToken(i);
  }

  return output;
}

So far, we have completed these steps from the pseudocode:

js
Copied! โญ๏ธ
while there are tokens to be read:
    - read a token

Next, let's tackle this chunk:

js
Copied! โญ๏ธ
if the token is:
- a number:
    - add it into the output string
- an operator o1:
    while (
        there is an operator o2 at the top of the operator stack which is not a left parenthesis, 
        and (o2 has greater precedence than o1
        or (o1 and o2 have the same precedence and o1 is left-associative))
    ):
        - pop o2 from the operator stack and add it to the output string
    - push o1 onto the operator stack

- a left parenthesis (i.e. "("):
    - push it onto the operator stack
- a right parenthesis (i.e. ")"):

    while the operator at the top of the operator stack is not a left parenthesis:
        - { assert the operator stack is not empty }
        /* If the stack runs out without finding a left parenthesis, then there are mismatched parentheses. */
        - pop the operator from the operator stack and add it to the output string
    - { assert there is a left parenthesis at the top of the operator stack }
    - pop the left parenthesis from the operator stack and discard it

All of this logic will be added inside the handleToken function. Instead of writing out a bunch of if-else statements, we can use a switch (true) statement. This may look weird and perhaps not the best practice, but I think it's very readable, especially when we're comparing it to the original pseudocode.

js
Copied! โญ๏ธ
const handleToken = (token) => {
  switch (true) {
    case !isNaN(parseFloat(token)):
      output += ' ' + token;
      break;
  }
}

A switch (true) statement will perform a test on every case condition in order. If there's a break statement, then we break out of the switch statement. In the code above, we are saying "Is this token a number? If so, then add the token to the output."

We have now completed this step:

js
Copied! โญ๏ธ
if the token is:
- a number:
    add it into the output string

Now, we need complete the following chunk of pseudocode:

js
Copied! โญ๏ธ
if the token is:
- an operator o1:
    while (
        there is an operator o2 at the top of the operator stack which is not a left parenthesis, 
        and (o2 has greater precedence than o1
        or (o1 and o2 have the same precedence and o1 is left-associative))
    ):
        - pop o2 from the operator stack and add it to the output string
    - push o1 onto the operator stack

We can add another condition to our switch(true) statement:

js
Copied! โญ๏ธ
const handleToken = (token) => {
  switch (true) {
    case !isNaN(parseFloat(token)):
      output += ' ' + token;
      break;
     case Object.keys(operators).includes(token):
        const o1 = token;
        let o2 = stack.at(-1); // look at the top of the stack (last element of the array)

        while (
          o2 !== undefined &&
          o2 !== '(' &&
          (operators[o2].prec > operators[o1].prec ||
            (operators[o2].prec === operators[o1].prec &&
              operators[o1].assoc === 'left'))
        ) {
           output += ' ' + stack.pop();
          o2 = stack.at(-1);
        }
        stack.push(o1);
        break;
  }
}

Going from English to JavaScript isn't so easy, huh? ๐Ÿ˜… The trickiest part is how the operator precedence is handled. o1 is the the first operator, and o2 is the second operator. We are making sure the token isn't a left parenthesis and checking if the second operator has higher precedence than the first. We are also checking the associativity of the first operator.

This logic is what determines how operators are laid out in RPN/postfix form. If we had the string, 1 + 2 * 3, the big stack.pop() event would happen once we've reached the multiplication symbol, since that has higher priority than addition.

The while loop won't run when we encounter the + symbol as the first operator because we are checking if o2 is undefined. Therefore, we add it to the stack using stack.push(o1) at the very bottom of the code.

If you're wondering where the operators object came from, it's the same object we created earlier in this tutorial. It looks like this:

js
Copied! โญ๏ธ
const operators = {
  '^': {
    prec: 4,
    assoc: 'right',
  },
  '*': {
    prec: 3,
    assoc: 'left',
  },
  '/': {
    prec: 3,
    assoc: 'left',
  },
  '+': {
    prec: 2,
    assoc: 'left',
  },
  '-': {
    prec: 2,
    assoc: 'left',
  },
};

Still with me? Take time to read through the code carefully. I know I had to read it like 20 times before I got the program working correctly ๐Ÿ˜…

On to the next task! We will now check if the token is a left parenthesis ( and push it onto the stack if it is. Seems simple enough!

js
Copied! โญ๏ธ
if the token is:
- a left parenthesis (i.e. "("):
    - push it onto the operator stack

Let's go back to our handleToken function and add another case to our switch statement.

js
Copied! โญ๏ธ
const handleToken = (token) => {
  switch (true) {
    case !isNaN(parseFloat(token)):
      output += ' ' + token;
      break;
    case Object.keys(operators).includes(token):
      const o1 = token;
      let o2 = stack.at(-1); // look at the top of the stack (last element of the array)

      while (
        o2 !== undefined &&
        o2 !== '(' &&
        (operators[o2].prec > operators[o1].prec ||
          (operators[o2].prec === operators[o1].prec &&
            operators[o1].assoc === 'left'))
      ) {
          output += ' ' + stack.pop();
        o2 = stack.at(-1);
      }
      stack.push(o1);
      break;
    case token === '(':
      stack.push(token);
      break;
  }
}

That was easy! Alright, the next and final token check is another large chunk:

js
Copied! โญ๏ธ
if the token is:
- a right parenthesis (i.e. ")"):

    while the operator at the top of the operator stack is not a left parenthesis:
        - { assert the operator stack is not empty }
        /* If the stack runs out without finding a left parenthesis, then there are mismatched parentheses. */
        - pop the operator from the operator stack and add it to the output string
    - { assert there is a left parenthesis at the top of the operator stack }
    - pop the left parenthesis from the operator stack and discard it

When the token is a right parenthesis ), a lot of events happen. It signifies that we've reached the end of a grouped expression between parentheses, so we can start adding operators to the output and removing left parentheses ( from the stack.

You may have noticed some "assertions" in the pseudocode. We can implement an assert function, so that the program stops when it encounters an unexpected symbol or token.

js
Copied! โญ๏ธ
const assert = (predicate) => {
  if (predicate) return; // everything is fine, carry on
  throw new Error('Assertion failed due to invalid token');
};

Let's now convert this pseudocode into JavaScript and add it to our handleToken function:

js
Copied! โญ๏ธ
const handleToken = (token) => {
  switch (true) {
    case !isNaN(parseFloat(token)):
      output += ' ' + token;
      break;
    case Object.keys(operators).includes(token):
      const o1 = token;
      let o2 = stack.at(-1); // look at the top of the stack (last element of the array)

      while (
        o2 !== undefined &&
        o2 !== '(' &&
        (operators[o2].prec > operators[o1].prec ||
          (operators[o2].prec === operators[o1].prec &&
            operators[o1].assoc === 'left'))
      ) {
          output += ' ' + stack.pop();
        o2 = stack.at(-1);
      }
      stack.push(o1);
      break;
    case token === '(':
      stack.push(token);
      break;
    case token === ')':
      let topOfStack = stack.at(-1);
      while (topOfStack !== '(') {
        assert(stack.length !== 0);
        addToOutput(stack.pop());
        topOfStack = stack.at(-1);
      }
      assert(stack.at(-1) === '(');
      stack.pop();
      break;
  }
}

Sure is getting crowded! But, we're done with the handleToken function finally ๐Ÿฅต

The last piece of pseudocode we need to convert to JavaScript is the following:

js
Copied! โญ๏ธ
/* After the while loop, pop the remaining items from the operator stack into the output queue. */
while there are tokens on the operator stack:
    /* If the operator token on the top of the stack is a parenthesis, then there are mismatched parentheses. */
    { assert the operator on top of the stack is not a (left) parenthesis }
    - pop the operator from the operator stack and add it to the output string

We will add a while loop to the bottom of the toRPN function that continues to run until the stack is empty. The stack contains all the remaining digits and operators. Now, we just start popping characters off the stack and add it to the output!

js
Copied! โญ๏ธ
const toRPN = (input) => {
  const stack = [];
  const output = '';

  const handleToken = (token) => {
    /* all the code we spent forever typing */
  }

  while (stack.length !== 0) {
    assert(stack.at(-1) !== '(');
    output += ' ' + stack.pop();
  }
}

That's it! We're finally done! Nice job! I knew you could do it ๐Ÿ˜„

Final Code for Shunting Yard Algorithm Implementation

Below is the final code for the Shunting yard algorithm implementation in JavaScript. I created a few functions to clean up repetitive code. I added the peek, addToOutput, and handlePop functions. The handlePop function will be important in the next tutorial, so make sure to add it!

js
Copied! โญ๏ธ
const operators = {
  '^': {
    prec: 4,
    assoc: 'right',
  },
  '*': {
    prec: 3,
    assoc: 'left',
  },
  '/': {
    prec: 3,
    assoc: 'left',
  },
  '+': {
    prec: 2,
    assoc: 'left',
  },
  '-': {
    prec: 2,
    assoc: 'left',
  },
};

const assert = (predicate) => {
  if (predicate) return;
  throw new Error('Assertion failed due to invalid token');
};

const toRPN = (input) => {
  const opSymbols = Object.keys(operators);
  const stack = [];
  let output = '';

  const peek = () => {
    return stack.at(-1);
  };

  const addToOutput = (token) => {
    output += ' ' + token;
  };

  const handlePop = () => {
    return stack.pop();
  }

  const handleToken = (token) => {
    switch (true) {
      case !isNaN(parseFloat(token)):
        addToOutput(token);
        break;
      case opSymbols.includes(token):
        const o1 = token;
        let o2 = peek();

        while (
          o2 !== undefined &&
          o2 !== '(' &&
          (operators[o2].prec > operators[o1].prec ||
            (operators[o2].prec === operators[o1].prec &&
              operators[o1].assoc === 'left'))
        ) {
          addToOutput(handlePop());
          o2 = peek();
        }
        stack.push(o1);
        break;
      case token === '(':
        stack.push(token);
        break;
      case token === ')':
        let topOfStack = peek();
        while (topOfStack !== '(') {
          assert(stack.length !== 0);
          addToOutput(handlePop());
          topOfStack = peek();
        }
        assert(peek() === '(');
        handlePop();
        break;
      default:
        throw new Error(`Invalid token: ${token}`);
    }
  };

  for (let i of input) {
    if (i === ' ') continue;

    handleToken(i);
  }

  while (stack.length !== 0) {
    assert(peek() !== '(');
    addToOutput(stack.pop());
  }

  return output;
};

const input = '1 + 2 * 3 - 4'
const result = toRPN(input);
console.log(result); // 1 2 3 * + 4 -

After all the sweat and tears (unless you skipped to the very bottom of this page ๐Ÿคจ), we finally have a function that can convert a math expression from infix form to RPN/postfix form. Congrats! ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰

Evaluating the Math Expression

Now that we have our RPN/postfix form of the mathematical expression, we can finally evaluate it using the RPN evaluator we created in the previous tutorial.

js
Copied! โญ๏ธ
// Infix form:   1 + 2 * 3 - 4
// Postfix form: 1 2 3 * + 4 -

const result = RPNEvaluator('1 2 3 * + 4 -');
console.log(result); // 3

I hope it's clear now why computers prefer RPN/postfix form over infix form, huh? Converting to RPN is a lot of work, both for computers and us. People prefer infix notation, so we need to make calculators that can handle it.

You now have the skills necessary to build a calculator in JavaScript without relying on the evil and dangerous eval function! Don't listen to the bad shoulder angel trying to convince you to use eval. Listen to that good shoulder angel telling you to build a parser ๐Ÿ˜‡

Conclusion

In this tutorial, we created a math expression parser that accepts a mathematical expression in the standard infix form and returns another math expression in RPN/postfix form. We can then feed the RPN/postfix form into our RPN evaluator to evaluate the math expression. In the next tutorial, we'll learn how to evaluate expressions during the Shunting yard algorithm instead of outputting an RPN/postfix form and sending it to our RPN evaluator.

Until then, relax, take a break, get some water, stretch, do some shoulder exercises, and I'll see you in the next tutorial ๐Ÿ™‚