How to Build a Recursive Descent Parser for Math Expressions in JavaScript
Greetings, friends! This is my fourteenth post in the math expression parser series. In the last few tutorials, we discussed how to make a parser using parser generators. Now, it's time to build parsers manually again. In this tutorial, we'll learn how to build a recursive descent parser from scratch using JavaScript!
What is a Recursive Descent Parser?
A recursive descent parser is a super popular type of top-down parser built using recursive functions. Many hand-written parsers use some form recursive descent, including the parsers inside popular libraries like Babel and C++ compilers such as Clang and GCC. Recursive descent parsers are everywhere because they're fast, easy to write, and let users have total control of error messages.
There are two main types of recursive descent parsers:
- Predictive parsers
- Backtracking parsers
A backtracking parser is a type of parser that tries multiple production rules at each step of a grammar, but backtracks and tries different rules until it finds one that works. Backtracking too much can slow down the parser, but using memoization (caching) can help speed things up.
A predictive parser is a type of parser that uses lookahead tokens to figure out which grammar rules or productions to apply without using backtracking. That is, the parser won't go back and look at previous tokens. A predictive parser is much quicker, but it doesn't support as many grammars as a backtracking parser.
In this tutorial, we will build a predictive recursive descent parser because our grammar just needs to support math expressions. Usually, predictive parsers are the more favorable option when building recursive descent parsers for something as complex as programming languages as well.
As mentioned in previous tutorials, LR parsers can handle more grammars than LL parsers, so where do recursive descent parsers fit in? Generally, a recursive descent parser can work for all LL grammars and possibly more due to the flexibility of their design. For languages as complicated as C or C++, recursive descent can be used with some tricks to remember context so that it can support more complicated syntax.
Building a Tokenizer for Recursive Descent Parsers
In this tutorial, we're going to build the parser completely from scratch, but we can reuse the updated tokenizer we made in Part 6 of this tutorial series to help save some time. Please consult Part 5 of this tutorial series to learn how we constructed this tokenizer.
Create a new file called Tokenizer.js
with the following contents:
const TokenTypes = {
NUMBER: 'NUMBER',
IDENTIFIER: 'IDENTIFIER',
ADDITION: '+',
SUBTRACTION: '-',
MULTIPLICATION: '*',
DIVISION: '/',
EXPONENTIATION: '^',
PARENTHESIS_LEFT: '(',
PARENTHESIS_RIGHT: ')'
};
const TokenSpec = [
[/^\s+/, null],
[/^(?:\d+(?:\.\d*)?|\.\d+)/, TokenTypes.NUMBER],
[/^[a-z]+/, TokenTypes.IDENTIFIER],
[/^\+/, TokenTypes.ADDITION],
[/^\-/, TokenTypes.SUBTRACTION],
[/^\*/, TokenTypes.MULTIPLICATION],
[/^\//, TokenTypes.DIVISION],
[/^\^/, TokenTypes.EXPONENTIATION],
[/^\(/, TokenTypes.PARENTHESIS_LEFT],
[/^\)/, TokenTypes.PARENTHESIS_RIGHT]
];
class Tokenizer {
constructor(input) {
this.input = input;
this.cursor = 0;
}
hasMoreTokens() {
return this.cursor < this.input.length;
}
match(regex, inputSlice) {
const matched = regex.exec(inputSlice);
if (matched === null) {
return null;
}
this.cursor += matched[0].length;
return matched[0];
}
getNextToken() {
if (!this.hasMoreTokens()) {
return null;
}
const inputSlice = this.input.slice(this.cursor);
for (let [regex, type] of TokenSpec) {
const tokenValue = this.match(regex, inputSlice);
if (tokenValue === null) {
continue;
}
if (type === null) {
return this.getNextToken();
}
return {
type,
value: tokenValue,
};
}
throw new SyntaxError(`Unexpected token: "${inputSlice[0]}"`);
}
}
module.exports = {
TokenTypes,
Tokenizer
}
This tokenizer will create a set of tokens for decimal numbers, words, and mathematical symbols. It will also skip all whitespaces. Each token will have a type
and value
property. The getNextToken
function will retrieve the next token and use the slice
function to look at only the remaining text from the input.
Given the input, sin(123)
, the tokenizing process will be similar to the following:
const input = 'sin(123)';
const tokenizer = new Tokenizer(input);
const token1 = tokenizer.getNextToken();
// inputSlice = 'sin(123)'
// token1 = { type: 'IDENTIFIER', value: 'sin' }
const token2 = tokenizer.getNextToken();
// inputSlice = '(123)'
// token2 = { type: '(', value: '(' }
const token3 = tokenizer.getNextToken();
// inputSlice = '123)'
// token3 = { type: 'NUMBER', value: '123' }
const token4 = tokenizer.getNextToken();
// inputSlice = ')'
// token4 = { type: ')', value: ')' }
const token5 = tokenizer.getNextToken();
// token5 = null
Starting our Recursive Descent Parser
Now that we have a tokenizer, it's time to start building a recursive descent parser. When constructing parsers by hand, it's still a good idea to build our own grammar to serve as the blueprint for the parser. We'll use a parsing expression grammar (PEG) because they naturally serve as a model for developing recursive descent parsers.
Let's create a simple grammar that returns a primary expression which only derives to a NUMBER
token, representing a decimal number.
Expression
= Primary
Primary
= NUMBER
This grammar won't be stored in a file anywhere (unless you want to), but it'll help serve as our blueprint when adding more features to our recursive descent parser.
Let's see how we would create a recursive descent parser that follows this "blueprint". Create a new file called Parser.js
with the following contents.
const { TokenTypes, Tokenizer } = require('./Tokenizer')
class Parser {
parse(input) {
this.input = input
this.tokenizer = new Tokenizer(input)
this.lookahead = this.tokenizer.getNextToken()
return this.Expression()
}
// Expect a particular token, consume/eat it, and move to the next token
eat(tokenType) {
const token = this.lookahead
if (token == null) {
throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
}
if (token.type !== tokenType) {
throw new SyntaxError(
`Unexpected token: "${token.value}", expected "${tokenType}"`
)
}
// Advance to the next token
this.lookahead = this.tokenizer.getNextToken()
return token
}
/**
* Expression
* = Primary
*/
Expression() {
return this.Primary();
}
/**
* Primary
* = NUMBER
*/
Primary() {
const token = this.eat(TokenTypes.NUMBER)
return Number(token.value)
}
}
const input = '12';
const parser = new Parser();
const result = parser.parse(input);
console.log(result); // 12
We can run the parser as a Node.js script using the following command:
node Parser
# Output: 12
Let's analyze what's happening. First, we make sure to import Tokenizer
and TokenTypes
from our Tokenizer
script. We created a class called Parse
that contains two important methods: parse
and eat
. Gotta parse and eat every day, am I right? 😂
The parse
method will take care of initializing our tokenizer with the provided input
. It will also set the lookahead
token to the very first token in our input. Then, we return this.Expression()
to begin the recursive descent. The Expression
method will then call the Primary
method.
Inside the Primary
method, we are "eating" or "consuming" a NUMBER
token. The eat
method expects the token to be of a certain type. If it matches the given type, then it will consume the token and move the cursor
to a different position in the input
string.
Remember, the getNextToken
method is responsible for moving the position of our cursor
to a spot directly after the consumed token. Since the string, '12'
, was our only input to the parser, the parse completes after consuming one token. Finally, we return the numeric result, 12
.
Notice how we put the production rules in comments above the Expression
and Primary
methods. This will help us visualize where we are at in the grammar as our recursive descent parser continues to grow. In our grammar, Expression
and Primary
represent nonterminal symbols and NUMBER
represents our terminal symbol.
We recursively call methods corresponding to nonterminal symbols until we've reached the end of the input, hence the name recursive descent parser! 😃
Handling Addition
Let the recursion continue! It's time to update our grammar "blueprint" to allow addition.
Expression
= Primary ("+" Primary)*
Primary
= NUMBER
This grammar uses EBNF-like syntax, similar to the previous tutorial. By using the syntax, (expression)*
, we are saying that zero or more occurrences are allowed. It's like saying the Expression
resolves to the following:
Primary + Primary + Primary + ...
When dealing with infinite expressions, we can use a while
loop that checks if there's another +
symbol. This can be seen in the example below.
Let's update our Parser.js
file to have the following contents.
const { TokenTypes, Tokenizer } = require('./Tokenizer')
class Parser {
parse(input) {
this.input = input
this.tokenizer = new Tokenizer(input)
this.lookahead = this.tokenizer.getNextToken()
return this.Expression()
}
eat(tokenType) {
const token = this.lookahead
if (token == null) {
throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
}
if (token.type !== tokenType) {
throw new SyntaxError(
`Unexpected token: "${token.value}", expected "${tokenType}"`
)
}
this.lookahead = this.tokenizer.getNextToken()
return token
}
/**
* Expression
* = Primary ("+" Primary)*
*/
Expression() {
let left = this.Primary()
// We use the optional chaining operator "?." because the lookahead can be null
while (this.lookahead?.type === TokenTypes.ADDITION) {
this.eat(TokenTypes.ADDITION)
const right = this.Primary()
left = left + right
}
return left
}
/**
* Primary
* = NUMBER
*/
Primary() {
const token = this.eat(TokenTypes.NUMBER)
return Number(token.value)
}
}
const parser = new Parser();
console.log(parser.parse('1 + 2')); // 3
console.log(parser.parse('1 + 2 + 3')); // 6
Remember! Our tokenizer is already designed to skip whitespaces for us, so we don't need to worry about them in our grammar or parser.
Take a moment to go through this program very carefully. As we can see, more recursion occurs as our grammar grows and when there are more operators in our math expression. After the parse completes, there should be no more tokens left, and the parse
method should return a result.
If you need help debugging this parser, I would suggest using the --inspect-brk
option when executing the JavaScript file with Node.js.
node --inspect-brk Parser.js
I have an excellent tutorial on debugging Node.js apps with Chrome DevTools, so check it out!
Handling Subtraction
Let's update our grammar "blueprint" to include subtraction.
Expression
= Primary (("+" / "-") Primary)*
Primary
= NUMBER
Adding subtraction to our parser is super easy. We just add a conditional statement in the Expression
method to determine whether the operator is addition or subtraction. Then, we consume that operator token and perform either addition or subtraction.
const { TokenTypes, Tokenizer } = require('./Tokenizer')
class Parser {
parse(input) {
this.input = input
this.tokenizer = new Tokenizer(input)
this.lookahead = this.tokenizer.getNextToken()
return this.Expression()
}
// Expect a particular token, consume/eat it, and move to the next token
eat(tokenType) {
const token = this.lookahead
if (token == null) {
throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
}
if (token.type !== tokenType) {
throw new SyntaxError(
`Unexpected token: "${token.value}", expected "${tokenType}"`
)
}
// Advance to the next token
this.lookahead = this.tokenizer.getNextToken()
return token
}
/**
* Expression
* = Primary (("+" / "-") Primary)*
*/
Expression() {
let left = this.Primary()
while (
this.lookahead?.type === TokenTypes.ADDITION ||
this.lookahead?.type === TokenTypes.SUBTRACTION
) {
if (this.lookahead?.type === TokenTypes.ADDITION) {
this.eat(TokenTypes.ADDITION)
const right = this.Primary()
left = left + right
} else {
this.eat(TokenTypes.SUBTRACTION)
const right = this.Primary()
left = left - right
}
}
return left
}
/**
* Primary
* = NUMBER
*/
Primary() {
const token = this.eat(TokenTypes.NUMBER)
return Number(token.value)
}
}
const parser = new Parser()
console.log(parser.parse('5 - 2')) // 3
console.log(parser.parse('5 - 2 - 1')) // 2
Handling Multiplication and Division
Similar to the previous tutorial, we can create a Term
rule to handle multiplication and division.
Expression
= Term (("+" / "-") Term)*
Term
= Primary (("*" / "/") Primary)*
Primary
= NUMBER
In our recursive descent parser, we'll need to create a new method called Term
. The Expression
method will now invoke Term
instead of Primary
.
const { TokenTypes, Tokenizer } = require('./Tokenizer')
class Parser {
parse(input) {
this.input = input
this.tokenizer = new Tokenizer(input)
this.lookahead = this.tokenizer.getNextToken()
return this.Expression()
}
// Expect a particular token, consume/eat it, and move to the next token
eat(tokenType) {
const token = this.lookahead
if (token == null) {
throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
}
if (token.type !== tokenType) {
throw new SyntaxError(
`Unexpected token: "${token.value}", expected "${tokenType}"`
)
}
// Advance to the next token
this.lookahead = this.tokenizer.getNextToken()
return token
}
/**
* Expression
* = Term (("+" / "-") Term)*
*/
Expression() {
let left = this.Term()
while (
this.lookahead?.type === TokenTypes.ADDITION ||
this.lookahead?.type === TokenTypes.SUBTRACTION
) {
if (this.lookahead?.type === TokenTypes.ADDITION) {
this.eat(TokenTypes.ADDITION)
const right = this.Term()
left = left + right
} else {
this.eat(TokenTypes.SUBTRACTION)
const right = this.Term()
left = left - right
}
}
return left
}
/**
* Term
* = Primary (("*" / "/") Primary)*
*/
Term() {
let left = this.Primary()
while (
this.lookahead?.type === TokenTypes.MULTIPLICATION ||
this.lookahead?.type === TokenTypes.DIVISION
) {
if (this.lookahead?.type === TokenTypes.MULTIPLICATION) {
this.eat(TokenTypes.MULTIPLICATION)
const right = this.Primary()
left = left * right
} else {
this.eat(TokenTypes.DIVISION)
const right = this.Primary()
left = left / right
}
}
return left
}
/**
* Primary
* = NUMBER
*/
Primary() {
const token = this.eat(TokenTypes.NUMBER)
return Number(token.value)
}
}
const parser = new Parser()
console.log(parser.parse('12 / 2')) // 6
console.log(parser.parse('12 / 2 / 3')) // 2
Handling Exponentiation
We handled exponentiation in the previous tutorial by creating a new grammar rule called Factor
. The Term
rule will now derive into Factor
instead of Primary
.
Expression
= Term (("+" / "-") Term)*
Term
= Factor (("*" / "/") Factor)*
Factor
= Primary ("^" Factor)*
Primary
= NUMBER
In our recursive descent parser, we'll need to create a new method called Factor
. The Term
method will now invoke Factor
instead of Primary
.
const { TokenTypes, Tokenizer } = require('./Tokenizer')
class Parser {
parse(input) {
this.input = input
this.tokenizer = new Tokenizer(input)
this.lookahead = this.tokenizer.getNextToken()
return this.Expression()
}
// Expect a particular token, consume/eat it, and move to the next token
eat(tokenType) {
const token = this.lookahead
if (token == null) {
throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
}
if (token.type !== tokenType) {
throw new SyntaxError(
`Unexpected token: "${token.value}", expected "${tokenType}"`
)
}
// Advance to the next token
this.lookahead = this.tokenizer.getNextToken()
return token
}
/**
* Expression
* = Term (("+" / "-") Term)*
*/
Expression() {
let left = this.Term()
while (
this.lookahead?.type === TokenTypes.ADDITION ||
this.lookahead?.type === TokenTypes.SUBTRACTION
) {
if (this.lookahead?.type === TokenTypes.ADDITION) {
this.eat(TokenTypes.ADDITION)
const right = this.Term()
left = left + right
} else {
this.eat(TokenTypes.SUBTRACTION)
const right = this.Term()
left = left - right
}
}
return left
}
/**
* Term
* = Factor (("*" / "/") Factor)*
*/
Term() {
let left = this.Factor()
while (
this.lookahead?.type === TokenTypes.MULTIPLICATION ||
this.lookahead?.type === TokenTypes.DIVISION
) {
if (this.lookahead?.type === TokenTypes.MULTIPLICATION) {
this.eat(TokenTypes.MULTIPLICATION)
const right = this.Factor()
left = left * right
} else {
this.eat(TokenTypes.DIVISION)
const right = this.Factor()
left = left / right
}
}
return left
}
/**
* Factor
* = Primary ("^" Factor)*
*/
Factor() {
let left = this.Primary()
while (this.lookahead?.type === TokenTypes.EXPONENTIATION) {
this.eat(TokenTypes.EXPONENTIATION)
const right = this.Factor()
left = left ** right
}
return left
}
/**
* Primary
* = NUMBER
*/
Primary() {
const token = this.eat(TokenTypes.NUMBER)
return Number(token.value)
}
}
const parser = new Parser()
console.log(parser.parse('2 ^ 2')) // 4
console.log(parser.parse('2 ^ 2 ^ 3')) // 256
Notice the that Factor
grammar rule doesn't derive into Primary ^ Primary
. Instead, it derives into Primary ^ Factor
.
Factor
= Primary ("^" Factor)*
This will ensure that the exponentiation operation is right associative which is important in expressions such as 2 ^ 2 ^ 3
, which should be equal to 256
.
Cleaning Up the Binary Expressions
As you may have noticed, there's duplicate logic occurring in the Expression
, Term
, and Factor
methods. We can create a BinaryExpression
method that helps tidy up our code.
const { TokenTypes, Tokenizer } = require('./Tokenizer')
class Parser {
parse(input) {
this.input = input
this.tokenizer = new Tokenizer(input)
this.lookahead = this.tokenizer.getNextToken()
return this.Expression()
}
// Expect a particular token, consume/eat it, and move to the next token
eat(tokenType) {
const token = this.lookahead
if (token == null) {
throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
}
if (token.type !== tokenType) {
throw new SyntaxError(
`Unexpected token: "${token.value}", expected "${tokenType}"`
)
}
// Advance to the next token
this.lookahead = this.tokenizer.getNextToken()
return token
}
BinaryExpression(leftRule, rightRule, operatorType1, operatorType2) {
let left = leftRule()
while (
this.lookahead &&
(this.lookahead.type === operatorType1 ||
this.lookahead.type === operatorType2)
) {
const operator = this.eat(this.lookahead.type).type
switch (operator) {
case TokenTypes.ADDITION:
left = left + rightRule()
break
case TokenTypes.SUBTRACTION:
left = left - rightRule()
break
case TokenTypes.MULTIPLICATION:
left = left * rightRule()
break
case TokenTypes.DIVISION:
left = left / rightRule()
break
case TokenTypes.EXPONENTIATION:
left = left ** rightRule()
break
}
}
return left
}
/**
* Expression
* = Term (("+" / "-") Term)*
*/
Expression() {
return this.BinaryExpression(
() => this.Term(),
() => this.Term(),
TokenTypes.ADDITION,
TokenTypes.SUBTRACTION
)
}
/**
* Term
* = Factor (("*" / "/") Factor)*
*/
Term() {
return this.BinaryExpression(
() => this.Factor(),
() => this.Factor(),
TokenTypes.MULTIPLICATION,
TokenTypes.DIVISION
)
}
/**
* Factor
* = Primary ("^" Factor)*
*/
Factor() {
return this.BinaryExpression(
() => this.Primary(),
() => this.Factor(),
TokenTypes.EXPONENTIATION
)
}
/**
* Primary
* = NUMBER
*/
Primary() {
const token = this.eat(TokenTypes.NUMBER)
return Number(token.value)
}
}
The BinaryExpression
method accepts four parameters: leftRule
, rightRule
, operatorType1
, operatorType2
. Let's look at how this method is used in the Expression
method.
/**
* Expression
* = Term (("+" / "-") Term)*
*/
Expression() {
return this.BinaryExpression(
() => this.Term(),
() => this.Term(),
TokenTypes.ADDITION,
TokenTypes.SUBTRACTION
)
}
As indicated by our grammar blueprint, both the leftRule
and rightRule
are Term
rules. We need to pass arrow functions as parameters since we're inside a class. If we passed this.Term
instead of an arrow function, then the BinaryExpression
wouldn't understand what this
refers to, and you would get the following error.
TypeError: Cannot read properties of undefined
I hope this all made sense! Please feel free to choose whichever format you're comfortable with. As software developers, it's a good idea to make code DRY whenever possible (unless you're trying to make a hyper performant application that targets a specific compiler's optimizations 😅)
Finally, we can test various mathematical expressions to make sure our parser still works correctly.
const parser = new Parser()
console.log(parser.parse('1 + 2 + 3')) // 6
console.log(parser.parse('5 - 2 - 1')) // 2
console.log(parser.parse('1 * 2 * 3')) // 6
console.log(parser.parse('12 / 2 / 3')) // 2
console.log(parser.parse('2 ^ 2')) // 4
console.log(parser.parse('2 ^ 2 ^ 3')) // 256
console.log(parser.parse('1 + 2 ^ 2 * 3 - 4 / 2')) // 11
Handling Parentheses
Now that our little detour is finished, let's add support for parentheses in our grammar blueprint.
Expression
= Term (("+" / "-") Term)*
Term
= Factor (("*" / "/") Factor)*
Factor
= Primary ("^" Factor)*
Primary
= ParenthesizedExpression
/ NUMBER
ParenthesizedExpression
= "(" Expression ")"
We will create a new method called ParenthesizedExpression
that will handle eating/consuming parentheses tokens ( )
and returning the expression in between them.
const { TokenTypes, Tokenizer } = require('./Tokenizer')
class Parser {
parse(input) {
this.input = input
this.tokenizer = new Tokenizer(input)
this.lookahead = this.tokenizer.getNextToken()
return this.Expression()
}
// Expect a particular token, consume/eat it, and move to the next token
eat(tokenType) {
const token = this.lookahead
if (token == null) {
throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
}
if (token.type !== tokenType) {
throw new SyntaxError(
`Unexpected token: "${token.value}", expected "${tokenType}"`
)
}
// Advance to the next token
this.lookahead = this.tokenizer.getNextToken()
return token
}
BinaryExpression(leftRule, rightRule, operatorType1, operatorType2) {
let left = leftRule()
while (
this.lookahead &&
(this.lookahead.type === operatorType1 ||
this.lookahead.type === operatorType2)
) {
const operator = this.eat(this.lookahead.type).type
switch (operator) {
case TokenTypes.ADDITION:
left = left + rightRule()
break
case TokenTypes.SUBTRACTION:
left = left - rightRule()
break
case TokenTypes.MULTIPLICATION:
left = left * rightRule()
break
case TokenTypes.DIVISION:
left = left / rightRule()
break
case TokenTypes.EXPONENTIATION:
left = left ** rightRule()
break
}
}
return left
}
/**
* Expression
* = Term (("+" / "-") Term)*
*/
Expression() {
return this.BinaryExpression(
() => this.Term(),
() => this.Term(),
TokenTypes.ADDITION,
TokenTypes.SUBTRACTION
)
}
/**
* Term
* = Factor (("*" / "/") Factor)*
*/
Term() {
return this.BinaryExpression(
() => this.Factor(),
() => this.Factor(),
TokenTypes.MULTIPLICATION,
TokenTypes.DIVISION
)
}
/**
* Factor
* = Primary ("^" Factor)*
*/
Factor() {
return this.BinaryExpression(
() => this.Primary(),
() => this.Factor(),
TokenTypes.EXPONENTIATION
)
}
/**
* Primary
* = ParenthesizedExpression
* / NUMBER
*/
Primary() {
if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT) {
return this.ParenthesizedExpression()
}
const token = this.eat(TokenTypes.NUMBER)
return Number(token.value)
}
/**
* ParenthesizedExpression
* = "(" Expression ")"
*/
ParenthesizedExpression() {
this.eat(TokenTypes.PARENTHESIS_LEFT)
const expression = this.Expression()
this.eat(TokenTypes.PARENTHESIS_RIGHT)
return expression
}
}
const parser = new Parser()
console.log(parser.parse('(1 + 2) * 3')) // 9
console.log(parser.parse('(2 ^ 2) ^ 3')) // 64
Notice how the Primary
function now has a conditional statement for checking if a left parenthesis (
token is encountered. If so, then return a parenthesized expression instead of a number.
Handling Unary Operations
Similar to the parenthesized expression, we need to create another rule under Primary
called UnaryExpression
. Our grammar blueprint should now look like the following.
Expression
= Term (("+" / "-") Term)*
Term
= Factor (("*" / "/") Factor)*
Factor
= Primary ("^" Factor)*
Primary
= ParenthesizedExpression
/ UnaryExpression
/ NUMBER
ParenthesizedExpression
= "(" Expression ")"
UnaryExpression
= "-" Factor
We want the unary expression to resolve to "-" Factor
instead of "-" Expression
because we want exponentiation operations to have higher precedence than unary operations. Therefore, the math expression, -2 ^ 2
should result in a value of -4
.
Let's implement the UnaryExpression
method in our recursive descent parser.
const { TokenTypes, Tokenizer } = require('./Tokenizer')
class Parser {
parse(input) {
this.input = input
this.tokenizer = new Tokenizer(input)
this.lookahead = this.tokenizer.getNextToken()
return this.Expression()
}
// Expect a particular token, consume/eat it, and move to the next token
eat(tokenType) {
const token = this.lookahead
if (token == null) {
throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
}
if (token.type !== tokenType) {
throw new SyntaxError(
`Unexpected token: "${token.value}", expected "${tokenType}"`
)
}
// Advance to the next token
this.lookahead = this.tokenizer.getNextToken()
return token
}
BinaryExpression(leftRule, rightRule, operatorType1, operatorType2) {
let left = leftRule()
while (
this.lookahead &&
(this.lookahead.type === operatorType1 ||
this.lookahead.type === operatorType2)
) {
const operator = this.eat(this.lookahead.type).type
switch (operator) {
case TokenTypes.ADDITION:
left = left + rightRule()
break
case TokenTypes.SUBTRACTION:
left = left - rightRule()
break
case TokenTypes.MULTIPLICATION:
left = left * rightRule()
break
case TokenTypes.DIVISION:
left = left / rightRule()
break
case TokenTypes.EXPONENTIATION:
left = left ** rightRule()
break
}
}
return left
}
/**
* Expression
* = Term (("+" / "-") Term)*
*/
Expression() {
return this.BinaryExpression(
() => this.Term(),
() => this.Term(),
TokenTypes.ADDITION,
TokenTypes.SUBTRACTION
)
}
/**
* Term
* = Factor (("*" / "/") Factor)*
*/
Term() {
return this.BinaryExpression(
() => this.Factor(),
() => this.Factor(),
TokenTypes.MULTIPLICATION,
TokenTypes.DIVISION
)
}
/**
* Factor
* = Primary ("^" Factor)*
*/
Factor() {
return this.BinaryExpression(
() => this.Primary(),
() => this.Factor(),
TokenTypes.EXPONENTIATION
)
}
/**
* Primary
* = ParenthesizedExpression
* / UnaryExpression
* / NUMBER
*/
Primary() {
if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT) {
return this.ParenthesizedExpression()
}
if (this.lookahead.type === TokenTypes.SUBTRACTION) {
return this.UnaryExpression()
}
const token = this.eat(TokenTypes.NUMBER)
return Number(token.value)
}
/**
* ParenthesizedExpression
* = "(" Expression ")"
*/
ParenthesizedExpression() {
this.eat(TokenTypes.PARENTHESIS_LEFT)
const expression = this.Expression()
this.eat(TokenTypes.PARENTHESIS_RIGHT)
return expression
}
/**
* UnaryExpression
* = "-" Factor
*/
UnaryExpression() {
this.eat(TokenTypes.SUBTRACTION) // consume the "-" token
return -this.Factor() // return the negative version of whatever Factor resolves to
}
}
const parser = new Parser()
console.log(parser.parse('-2 ^ 2')) // -4
console.log(parser.parse('-2 ^ 2 + 1')) // -3
console.log(parser.parse('-(2 * 3)')) // -6
We updated Primary
function to include a new conditional statement for checking if a negative sign (or subtraction) -
token is encountered. If so, then return a unary expression.
Handling Trigonometric Functions
Next, we would like to add support for the trigonometric functions, sin
, cos
, and tan
. The tokenizer is already setup to handle words and return IDENTIFIER
tokens. Let's add support for functions in our grammar blueprint.
Expression
= Term (("+" / "-") Term)*
Term
= Factor (("*" / "/") Factor)*
Factor
= Primary ("^" Factor)*
Primary
= ParenthesizedExpression
/ UnaryExpression
/ FunctionExpression
/ NUMBER
ParenthesizedExpression
= "(" Expression ")"
UnaryExpression
= "-" Factor
FunctionExpression
= IDENTIFIER ParenthesizedExpression
Notice that we added another option to our Primary
grammar rule. The FunctionExpression
will consume a word such as cos
or sin
, and then use the parenthesized expression to consume the parentheses.
const { TokenTypes, Tokenizer } = require('./Tokenizer')
class Parser {
parse(input) {
this.input = input
this.tokenizer = new Tokenizer(input)
this.lookahead = this.tokenizer.getNextToken()
return this.Expression()
}
// Expect a particular token, consume/eat it, and move to the next token
eat(tokenType) {
const token = this.lookahead
if (token == null) {
throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
}
if (token.type !== tokenType) {
throw new SyntaxError(
`Unexpected token: "${token.value}", expected "${tokenType}"`
)
}
// Advance to the next token
this.lookahead = this.tokenizer.getNextToken()
return token
}
trig(id, value) {
switch (id) {
case 'sin':
return Math.sin(value)
case 'cos':
return Math.cos(value)
case 'tan':
return Math.tan(value)
default:
throw new Error(`Invalid operation: ${id}`)
}
}
BinaryExpression(leftRule, rightRule, operatorType1, operatorType2) {
let left = leftRule()
while (
this.lookahead &&
(this.lookahead.type === operatorType1 ||
this.lookahead.type === operatorType2)
) {
const operator = this.eat(this.lookahead.type).type
switch (operator) {
case TokenTypes.ADDITION:
left = left + rightRule()
break
case TokenTypes.SUBTRACTION:
left = left - rightRule()
break
case TokenTypes.MULTIPLICATION:
left = left * rightRule()
break
case TokenTypes.DIVISION:
left = left / rightRule()
break
case TokenTypes.EXPONENTIATION:
left = left ** rightRule()
break
}
}
return left
}
/**
* Expression
* = Term (("+" / "-") Term)*
*/
Expression() {
return this.BinaryExpression(
() => this.Term(),
() => this.Term(),
TokenTypes.ADDITION,
TokenTypes.SUBTRACTION
)
}
/**
* Term
* = Factor (("*" / "/") Factor)*
*/
Term() {
return this.BinaryExpression(
() => this.Factor(),
() => this.Factor(),
TokenTypes.MULTIPLICATION,
TokenTypes.DIVISION
)
}
/**
* Factor
* = Primary ("^" Factor)*
*/
Factor() {
return this.BinaryExpression(
() => this.Primary(),
() => this.Factor(),
TokenTypes.EXPONENTIATION
)
}
/**
* Primary
* = ParenthesizedExpression
* / UnaryExpression
* / FunctionExpression
* / NUMBER
*/
Primary() {
if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT) {
return this.ParenthesizedExpression()
}
if (this.lookahead.type === TokenTypes.SUBTRACTION) {
return this.UnaryExpression()
}
if (this.lookahead.type === TokenTypes.IDENTIFIER) {
return this.FunctionExpression()
}
const token = this.eat(TokenTypes.NUMBER)
return Number(token.value)
}
/**
* ParenthesizedExpression
* = "(" Expression ")"
*/
ParenthesizedExpression() {
this.eat(TokenTypes.PARENTHESIS_LEFT)
const expression = this.Expression()
this.eat(TokenTypes.PARENTHESIS_RIGHT)
return expression
}
/**
* UnaryExpression
* = "-" Factor
*/
UnaryExpression() {
this.eat(TokenTypes.SUBTRACTION)
return -this.Factor()
}
/**
* FunctionExpression
* = IDENTIFIER ParenthesizedExpression
*/
FunctionExpression() {
const id = this.eat(TokenTypes.IDENTIFIER).value
return this.trig(id, this.ParenthesizedExpression())
}
}
const parser = new Parser()
console.log(parser.parse('sin(0)')) // 0
console.log(parser.parse('cos(0)')) // 1
console.log(parser.parse('tan(0)')) // 0
console.log(parser.parse('-cos(0) + 3 ^ 2')) // 8
We updated Primary
function to include a new conditional statement for checking if an IDENTIFIER
token is encountered. If so, then return a function expression. Inside the FunctionExpression
method, we're calling a new function called trig
. This function is identical to the one we used in the past few tutorials. We added the trig
function underneath the eat
method.
Running Quick Tests
As our grammar blueprint and parser grow and become more complicated, it's always best to run tests after adding a new featurue. I would encourage everyone to use a testing library such as Jest or Vitest to test as many math expressions as possible!
Without installing a testing library, however, we can run some quick tests to make sure there are no errors in our math expression parser.
Add the following line at the end of our Parser.js
file to export our Parser
class.
module.exports = { Parser }
Then, create a new file called tests.js
with the following contents.
const assert = require('assert').strict
const { Parser } = require('./Parser')
const mathTests = [
['1', 1],
[' 2 ', 2],
['1 + 2', 3],
[' 1 + 2 ', 3],
['1 + 2 * 3', 7],
['(1 + 2) * 3', 9],
['5 - 2', 3],
['5 - 2 - 1', 2],
['12 / 2 / 3', 2],
['2 ^ 3 + 1', 9],
['-2 ^ 2', -4],
['(-2) ^ 2', 4],
['-2 ^ 2 + 1', -3],
['cos(0) + 3 * -4 / -2 ^ 2', 4]
]
const parser = new Parser()
let result
for (const [expression, expected] of mathTests) {
try {
result = parser.parse(expression)
assert.equal(result, expected)
} catch (err) {
const lines = '-'.repeat(process.stdout.columns)
console.log(lines)
console.log(`Expression failed: "${expression}"`)
console.log(`Expected result: ${expected}`)
console.log(`Actual result: ${result}`)
console.log(lines)
throw err
}
}
console.log('All tests passed! 🎉')
We can execute this script using the command, node tests
. If all the tests pass successfully, we should see a celebratory message! 🎉
If any tests fail, then we should see an error message. We can add the following test at the end of the mathTests
array to force the test to fail.
['1 + 2', 4]
If we scroll to the top of the error message, we should see something similar to the following:
----------------------------------------------------
Expression failed: "1 + 2"
Expected result: 4
Actual result: 3
----------------------------------------------------
This is just one way of quickly testing our math expressions! Cool, huh?
Passing in Math Expressions from the Command Line
Sometimes, it's nice to use a command line interface (CLI) to parse math expressions. Let's see how we can turn our parser into our own CLI tool. Node.js provides a special array called process.argv
that contains all the text we pass into the command line when executing a JavaScript file.
Specifically, process.argv[2]
will help us retrieve the math expression we pass into our Node.js script. Add the following code to the end of the Parser.js
file.
const input = process.argv[2]
const parser = new Parser()
console.log(parser.parse(input))
Then, run the following command:
node Parser '1 + 2 * 3'
We should see 7
appear in the same terminal you used to execute the Parser.js
script. We now have the ability to run our parser as a CLI tool! 🎉
Conclusion
Congrats, friend! You made it to the end! You now have the power to parse anything! Recursive descent parsing is a very powerful and popular algorithm used everywhere. It's so simple yet elegant.
If this series has helped you in any way, please consider donating to my cause! I would be eternally grateful 🙇
These tutorials take a while to write and get right 😅
In the next tutorial, we'll learn how to create abstract syntax trees (AST) of our math expressions by modifying our recursive descent parser! Until then, happy coding! ⭐