Handling Unary Operations in the Shunting Yard Algorithm

Published: Monday, February 20, 2023

Greetings, friends! This is my seventh post in the math expression parser series. In the previous tutorial, we learned how to use a tokenizer with our math expression parser and evaluator! Our tokenizer helped us add functionality to our evaluator such as recognizing decimal values and function names. However, we still haven't been able to use negative numbers in our math expressions because the - is assigned to the subtraction operation only. In this tutorial, we'll learn how to make a unary operation to add support for negative numbers. Let's get started!

What is a Unary Operation?

When designing programming languages, developers often need a way of differentiating between a negative number and the subtraction operation whenever the - token occurs.

In many programming languages, we use the - symbol to perform subtraction between two values. Subtraction is considered a binary operation because it needs two values to perform the operation.

js
Copied! ⭐️
const value = 5 - 2;

The math expression 5 - 2 contains three tokens:

js
Copied! ⭐️
['5', '-', '2']

Most programming languages also let us create negative numbers by placing the - symbol before a number like so:

js
Copied! ⭐️
const value = -3;

The math expression -3 contains two tokens:

js
Copied! ⭐️
['-', '3']

In this situation, the - symbol acts as a unary operation, which means it operates on only one value, the number following the - symbol. Therefore, this symbol can take on two meanings.

When symbols are used to mean multiple things, it gets tricky to parse. We have to think about what the token means in a given context. In JavaScript, it's tricky to figure out how we should treat a slash / because it is used in both division and regular expressions such as /^[a-z]+/.

However, we know that if the / symbol is followed directly after a number token, there's a good chance that we are dealing with division. Similarly, the dash - symbol can be used to represent a subtraction (binary operation) or a unary operation. We just need to think about the context.

The term, context, is used in many places in the world of software development. It generally means that we need to keep track of some state, or we need to treat state differently depending on where something is in a string or piece of code. In our case, we know that the - symbol should perform an action depending on what tokens surround it.

Let's consider all the possibilities of how the - token may occur.

js
Copied! ⭐️
// 1. The '-' token is at the very start of the input
const input1 = '-1';

// 2. The '-' token is after a '(' token
const input2 = '(-1)';

// 3. The '-' token is after a binary operator token such as '+'
const input3 = '1 + -2'

// 4. The '-' token is after after another '-' token
const input4 = '1 - -2'

We need to adjust the shunting yard algorithm to consider each of these four possibilities, so we can perform a unary operation instead of subtraction.

Adding Support for the Unary Operation

The shunting yard algorithm is currently designed such that every token corresponds to a unique operation. Without straying too far from the original implementation of the algorithm, it may be difficult to add support for unary operations.

There are multiple ways of adding unary operation support, but I will show you one way of adding it without changing much code. We're going to use a small little trick that will add a new operator for unary operations.

When we first implemented the shunting yard algorithm, we defined an object that contained a list of operators. We're going to add a new operator named u for "unary" operation. We'll give it the highest priority (the same as exponentiation) and assign a "right" associativity.

js
Copied! ⭐️
const operators = {
  u: {
    prec: 4,
    assoc: 'right',
  },
  '^': {
    prec: 4,
    assoc: 'right',
  },
  '*': {
    prec: 3,
    assoc: 'left',
  },
  '/': {
    prec: 3,
    assoc: 'left',
  },
  '+': {
    prec: 2,
    assoc: 'left',
  },
  '-': {
    prec: 2,
    assoc: 'left',
  },
};

By making the precedence and associativity of the unary operation the same as exponentiation, our calculator will perform the operations in the correct order after we add support for unary operations.

js
Copied! ⭐️
const input1 = '-2 ^ 2'; // same as -(2 ^ 2) according to mathematicians
const input2 = '(-2) ^ 2';

console.log(evaluate(input1)); // -4
console.log(evaluate(input2)); // 4

Let's start adding the functionality for unary support to our math expression evaluator. The idea is that we're going to return a "virtual" token called u that is only understood by the handlePop function. The tokenizer will remain unchanged because we want it to still return a - token whenever it is encountered.

Inside the evaluate function, we are iterating through every token generated by the tokenizer and calling the handleToken function.

js
Copied! ⭐️
const tokenizer = new Tokenizer(input);
let token;
while ((token = tokenizer.getNextToken())) {
  handleToken(token.value);
}

We want to perform a check to see if the current token is a - token. If so, then we need to think about the four possibilities we discussed earlier.

  1. The - token is at the very start of the input
  2. The - token is after a ( token
  3. The - token is after a binary operator token such as +
  4. The - token is after after another - token

We'll need to keep track of a context. That is, we'll need to remember the token that occurred right before we encounter a - token. We can do this by adding a new variable called prevToken. We'll set it equal to null at first and then it'll get updated upon every iteration of the tokenizer.

js
Copied! ⭐️
const tokenizer = new Tokenizer(input);
let token;
let prevToken;
while ((token = tokenizer.getNextToken())) {
  handleToken(token.value);
  prevToken = token; // after the current token is handled, set the previous token
}

Now we need to add conditional logic to check whether the previous token is null, contains a ( token, or is a binary operator such as + or -. This can be done using an if statement. If we meet all of these criteria, then we should return our special u token. Let's look at the code.

js
Copied! ⭐️
const tokenizer = new Tokenizer(input);
let token;
let prevToken = null;
while ((token = tokenizer.getNextToken())) {
  if (
    token.value === '-' &&
    (prevToken === null ||
      prevToken.value === '(' ||
      opSymbols.includes(prevToken.value))
  ) {
    handleToken('u'); // we pass in a "virtual" unary token
  } else {
    handleToken(token.value);
  }
  prevToken = token;
}

When we call the handleToken function, the u token will eventually hit the case opSymbols.includes(token) statement. Recall that the opSymbols variable refers to a list of our operators: +, -, *, /, ^, and our new u operator. Since the u token is in this list, we proceed through this case statement.

Like all operators, we handle the mathematical computation within the handlePop function. Therefore, we need to add code for dealing with a unary operation when a u token is received.

js
Copied! ⭐️
const handlePop = () => {
  const op = stack.pop();

  if (op === '(') return;

  // When we receive a 'u' token, return the
  // negative version of the number!
  if (op === 'u') return -parseFloat(output.pop());

  if (isFunction(op)) {
    const poppedValue = output.pop();
    switch (op) {
      case 'sin':
        return Math.sin(poppedValue);
      case 'cos':
        return Math.cos(poppedValue);
      case 'tan':
        return Math.tan(poppedValue);
    }
  }

  const right = parseFloat(output.pop());
  const left = parseFloat(output.pop());

  switch (op) {
    case '+':
      return left + right;
    case '-':
      return left - right;
    case '*':
      return left * right;
    case '/':
      return left / right;
    case '^':
      return left ** right;
    default:
      throw new Error(`Invalid operation: ${op}`);
  }
};

That's it! A unary operation just parses the next number token and returns the negative version of it. Pretty simple, right?

The last thing we need to do is to fix a small error that occurs with the isFunction method. At the very end of the handleToken function, we are performing a check:

js
Copied! ⭐️
if (isFunction(peek())) {
  addToOutput(handlePop());
}

At this point in the code, it's possible that the stack contains no more tokens. If the input to our evaluator was simply -1, then both tokens would be consumed before we reached the isFunction method. If we passed undefined or null as a parameter to this method, then we'd likely get an error because we can't call toLowerCase on a variable that is not a string.

Therefore, we need to tweak the code slightly:

js
Copied! ⭐️
topOfStack = peek();
if (topOfStack && isFunction(topOfStack)) {
  addToOutput(handlePop());
}

Now we have a guard against a potential error. We can be on our merry way, using unary operations with no issue.

Finished Code

After making all these changes to our implementation of the shunting yard algorithm to handle unary operations, let's look at the finished code!

js
Copied! ⭐️
const TokenTypes = {
  NUMBER: 'NUMBER',
  IDENTIFIER: 'IDENTIFIER',
  ADDITION: '+',
  SUBTRACTION: '-',
  MULTIPLICATION: '*',
  DIVISION: '/',
  EXPONENTIATION: '^',
  PARENTHESIS_LEFT: '(',
  PARENTHESIS_RIGHT: ')',
};

const TokenSpec = [
  [/^\s+/, null],
  [/^(?:\d+(?:\.\d*)?|\.\d+)/, TokenTypes.NUMBER],
  [/^[a-z]+/, TokenTypes.IDENTIFIER],
  [/^\+/, TokenTypes.ADDITION],
  [/^\-/, TokenTypes.SUBTRACTION],
  [/^\*/, TokenTypes.MULTIPLICATION],
  [/^\//, TokenTypes.DIVISION],
  [/^\^/, TokenTypes.EXPONENTIATION],
  [/^\(/, TokenTypes.PARENTHESIS_LEFT],
  [/^\)/, TokenTypes.PARENTHESIS_RIGHT],
];

class Tokenizer {
  constructor(input) {
    this.input = input;
    this.cursor = 0;
  }

  hasMoreTokens() {
    return this.cursor < this.input.length;
  }

  match(regex, inputSlice) {
    const matched = regex.exec(inputSlice);
    if (matched === null) {
      return null;
    }

    this.cursor += matched[0].length;
    return matched[0];
  }

  getNextToken() {
    if (!this.hasMoreTokens()) {
      return null;
    }

    const inputSlice = this.input.slice(this.cursor);

    for (let [regex, type] of TokenSpec) {
      const tokenValue = this.match(regex, inputSlice);

      if (tokenValue === null) {
        continue;
      }

      if (type === null) {
        return this.getNextToken();
      }

      return {
        type,
        value: tokenValue,
      };
    }

    throw new SyntaxError(`Unexpected token: "${inputSlice[0]}"`);
  }
}

const operators = {
  u: {
    prec: 4,
    assoc: 'right',
  },
  '^': {
    prec: 4,
    assoc: 'right',
  },
  '*': {
    prec: 3,
    assoc: 'left',
  },
  '/': {
    prec: 3,
    assoc: 'left',
  },
  '+': {
    prec: 2,
    assoc: 'left',
  },
  '-': {
    prec: 2,
    assoc: 'left',
  },
};

const functionList = ['sin', 'cos', 'tan'];

const assert = (predicate) => {
  if (predicate) return;
  throw new Error('Assertion failed due to invalid token');
};

const isFunction = (token) => {
  return functionList.includes(token.toLowerCase());
};

const evaluate = (input) => {
  const opSymbols = Object.keys(operators);
  const stack = [];
  let output = [];

  const peek = () => {
    return stack.at(-1);
  };

  const addToOutput = (token) => {
    output.push(token);
  };

  const handlePop = () => {
    const op = stack.pop();

    if (op === '(') return;

    if (op === 'u') return -parseFloat(output.pop());

    if (isFunction(op)) {
      const poppedValue = output.pop();
      switch (op) {
        case 'sin':
          return Math.sin(poppedValue);
        case 'cos':
          return Math.cos(poppedValue);
        case 'tan':
          return Math.tan(poppedValue);
      }
    }

    const right = parseFloat(output.pop());
    const left = parseFloat(output.pop());

    switch (op) {
      case '+':
        return left + right;
      case '-':
        return left - right;
      case '*':
        return left * right;
      case '/':
        return left / right;
      case '^':
        return left ** right;
      default:
        throw new Error(`Invalid operation: ${op}`);
    }
  };

  const handleToken = (token) => {
    switch (true) {
      case !isNaN(parseFloat(token)):
        addToOutput(token);
        break;
      case isFunction(token):
        stack.push(token);
        break;
      case opSymbols.includes(token):
        const o1 = token;
        let o2 = peek();

        while (
          o2 !== undefined &&
          o2 !== '(' &&
          (operators[o2].prec > operators[o1].prec ||
            (operators[o2].prec === operators[o1].prec &&
              operators[o1].assoc === 'left'))
        ) {
          addToOutput(handlePop());
          o2 = peek();
        }
        stack.push(o1);
        break;
      case token === '(':
        stack.push(token);
        break;
      case token === ')':
        let topOfStack = peek();
        while (topOfStack !== '(') {
          assert(stack.length !== 0);
          addToOutput(handlePop());
          topOfStack = peek();
        }
        assert(peek() === '(');
        handlePop();
        topOfStack = peek();
        if (topOfStack && isFunction(topOfStack)) {
          addToOutput(handlePop());
        }
        break;
      default:
        throw new Error(`Invalid token: ${token}`);
    }
  };

  const tokenizer = new Tokenizer(input);
  let token;
  let prevToken = null;
  while ((token = tokenizer.getNextToken())) {
    if (
      token.value === '-' &&
      (prevToken === null ||
        prevToken.value === '(' ||
        opSymbols.includes(prevToken.value))
    ) {
      handleToken('u');
    } else {
      handleToken(token.value);
    }
    prevToken = token;
  }

  while (stack.length !== 0) {
    assert(peek() !== '(');
    addToOutput(handlePop());
  }

  return output[0];
};

Let's run a few tests to make sure our unary operation behaves properly.

js
Copied! ⭐️
const input1 = '-1';
const input2 = '(-1)';
const input3 = '1 + -2';
const input4 = '1 - -2';
const input5 = '-2 ^ 2';
const input6 = '(-2) ^ 2';

console.log(evaluate(input1)); // -1
console.log(evaluate(input2)); // -1
console.log(evaluate(input3)); // -1
console.log(evaluate(input4)); // 3
console.log(evaluate(input5)); // -4
console.log(evaluate(input6)); // 4

Looks good to me! It's good to cover as many edge cases as possible. Sometimes, you need to let others try to break your code though. We're all human, so we all can make mistakes or forget something. That's why they put erasers on pencils, after all!

Conclusion

In this tutorial, we added support for unary operations in the shunting yard algorithm, so we can use negative numbers in our math expression evaluator. We thought about all the different possibilities of how the - token might appear in a math expression. Then, we used a small trick of converting a - to a u token to differentiate between subtraction and unary operations.

As parser designers, we need to consider every possible configuration of how tokens appear. This is why adding new operators or symbols to a language is extremely tricky. The ECMAScript 6 version of JavaScript added a lot of new types of tokens and configurations which made it a completely different language than its previous versions. Supersets of JavaScript such as TypeScript and JSX need completely new parsers to handle new types of tokens and configurations.

Adding new tokens to a language can also slow down parsing quite a bit, so careful deliberation must be taken to ensure performance isn't impacted. As you can probably tell, designing parsers can be very tricky as they're quite fragile creatures 😅

In the next tutorial, we'll discuss how to create abstract syntax trees (AST) using the shunting yard algorithm. See you then 🙂