How to Build a Pratt Parser for Math Expressions in JavaScript
Greetings, friends! This is my sixteenth post in the math expression parser series. In Part 14, we learned how to make a parser by hand using the recursive descent algorithm. When building a math expression parser, recursive descent tends to get tedious as we handle more operators with varying operator precedence and associativity. In this tutorial, we'll learn how to build an amazing Pratt parser in JavaScript to make things easier! By the end of this tutorial, you'll be able to parse anything! Let's get started!
What is a Pratt Parser?
A Pratt parser is a type of top-down operator precedence parser that uses recursive descent together with operator precedence. Think of it as a combination of recursive descent and the shunting yard algorithm.
The "Pratt parser" algorithm was created by the computer scientist, Vaughan Pratt in the 1973 paper "Top down operator precedence", based on the recursive descent algorithm. This algorithm is implemented in multiple compilers and interpreters including the popular Clang compiler used for the C and C++ programming languages.
Why make a Pratt Parser?
Pratt parsers makes it super easy to handle mathematical expressions compared to regular recursive descent parsers, and they might even perform better than them. A lot of people prefer writing Pratt parsers whenever they need to make a parser because they provide a lot of flexibility in regards to error handling and adding new features or operators.
When we designed a recursive descent parser in Part 14, it was very awkward dealing with unary expressions, exponentiation, operator precedence, and associativity. In Pratt parsers, we define a precedence for each operator, and the algorithm automatically takes care of everything! It's truly a beautiful and elegant algorithm.
Building a Tokenizer for Pratt Parsers
Similar to the tutorial on recursive descent parsers, we're going to build a Pratt parser completely from scratch. However, we can reuse the updated tokenizer we made in Part 6 of this tutorial series to help save some time. Please consult Part 5 of this tutorial series to learn how we constructed this tokenizer.
Create a new file called Tokenizer.js
with the following contents:
const TokenTypes = {
NUMBER: 'NUMBER',
IDENTIFIER: 'IDENTIFIER',
ADDITION: '+',
SUBTRACTION: '-',
MULTIPLICATION: '*',
DIVISION: '/',
EXPONENTIATION: '^',
PARENTHESIS_LEFT: '(',
PARENTHESIS_RIGHT: ')'
};
const TokenSpec = [
[/^\s+/, null],
[/^(?:\d+(?:\.\d*)?|\.\d+)/, TokenTypes.NUMBER],
[/^[a-z]+/, TokenTypes.IDENTIFIER],
[/^\+/, TokenTypes.ADDITION],
[/^\-/, TokenTypes.SUBTRACTION],
[/^\*/, TokenTypes.MULTIPLICATION],
[/^\//, TokenTypes.DIVISION],
[/^\^/, TokenTypes.EXPONENTIATION],
[/^\(/, TokenTypes.PARENTHESIS_LEFT],
[/^\)/, TokenTypes.PARENTHESIS_RIGHT]
];
class Tokenizer {
constructor(input) {
this.input = input;
this.cursor = 0;
}
hasMoreTokens() {
return this.cursor < this.input.length;
}
match(regex, inputSlice) {
const matched = regex.exec(inputSlice);
if (matched === null) {
return null;
}
this.cursor += matched[0].length;
return matched[0];
}
getNextToken() {
if (!this.hasMoreTokens()) {
return null;
}
const inputSlice = this.input.slice(this.cursor);
for (let [regex, type] of TokenSpec) {
const tokenValue = this.match(regex, inputSlice);
// No rule was matched!
if (tokenValue === null) {
continue;
}
// Skip whitespace!
if (type === null) {
return this.getNextToken();
}
return {
type,
value: tokenValue,
};
}
throw new SyntaxError(`Unexpected token: "${inputSlice[0]}"`);
}
printAllTokens() {
let token;
while ((token = this.getNextToken())) {
console.log(token);
}
}
}
module.exports = {
TokenTypes,
Tokenizer
}
Starting our Pratt Parser
Now that we have a tokenizer, it's time to start building a Pratt parser. I know you're excited! Similar to Part 14, we'll create a grammar "blueprint" to serve as a model for developing our Pratt parser.
Let's create a simple grammar that returns a "prefix" expression which only derives to a NUMBER
token, representing a decimal number.
Expression
= Prefix
Prefix
= NUMBER
Notice how we're using Prefix
instead of Primary
(which we used in previous tutorials). We'll see why soon.
Let's see how we would create a Pratt parser that follows this "blueprint". Create a new file called Parser.js
with the following contents.
const { TokenTypes, Tokenizer } = require('./Tokenizer')
class Parser {
parse(input) {
this.input = input
this.tokenizer = new Tokenizer(input)
this.lookahead = this.tokenizer.getNextToken()
return this.Expression()
}
// Expect a particular token, consume/eat it, and move to the next token
eat(tokenType) {
const token = this.lookahead
if (token == null) {
throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
}
if (token.type !== tokenType) {
throw new SyntaxError(
`Unexpected token: "${token.value}", expected "${tokenType}"`
)
}
// Advance to the next token
this.lookahead = this.tokenizer.getNextToken()
return token
}
/**
* Expression
* = Prefix
*/
Expression() {
return this.Prefix()
}
/**
* Prefix
* = NUMBER
*/
Prefix() {
const token = this.eat(TokenTypes.NUMBER)
return Number(token.value)
}
}
const parser = new Parser()
const result = parser.parse('2')
console.log(result) // 2
So far, everything looks very similar to a recursive descent parser. Let's now move onto implementing mathematical operators such as addition and multiplication to see how Pratt parsers truly shine ✨
Prefix vs Infix and NUD vs LED
The reason why we used Prefix
instead of Primary
is because Pratt parsers will perform operations differently depending on whether the expression is a "prefix" expression or an "infix" expression.
In the original paper by Vaughan Pratt, he used the terms "nud" and "led" instead of "prefix" and "infix", respectively.
A "nud" refers to a "null denotation" which means an expression that doesn't care about tokens to the left. Examples of a "nud" include numbers, unary expressions, parenthesized expressions, and prefix operations (i.e. ++x
). In this tutorial, we'll use the term, "prefix", instead of "nud".
A "led" refers to a "left denotation" which means an expression that does care about tokens to the left and tokens to the right. Examples of a "led" include binary expressions and postfix operations (i.e. x++
). In this tutorial, we'll use the term, "infix", instead of "led". If we decide to ever add postfix operations later, we would still use the Infix
method despite the name.
The main reason why we care about prefix and infix/postfix operations is because of how some mathematical symbols can be used for multiple operations. The most popular example is the -
symbol that can be used for both unary expressions and subtraction. A unary expression would be categorized as a "prefix" operation and subtraction would be an "infix" operation.
Notice how the -
symbol occurs at the beginning of the unary expression, -2
. Nothing appears to the left of the -
symbol, so it belongs to the "prefix" or "nud" group.
// Before a number
-2
// Before an expression
-(1 + 2)
For subtraction, the -
symbol occurs between two numbers or between two expressions, so it belongs to the "infix" or "led" group.
// Between two numbers
5 - 2
// Between two expressions
(3 + 4) - (1 + 2)
By grouping expressions in a "prefix" and "infix" group, it'll help keep our grammar blueprint simple as well as the code for our parser.
Handling Addition and Multiplication
Let's adjust our grammar blueprint to handle addition and multiplication using a new Infix
rule.
Expression
= Prefix (Infix)*
Prefix
= NUMBER
Infix
= ("+" / "*") Expression
As you may recall, this grammar uses EBNF-like syntax which means we can use the *
after a group to refer to zero or more occurrences of a grammar rule i.e. (expression)*
.
The grammar above is saying that Expression
will be a Prefix
expression followed by zero or more Infix
expressions. Each Infix
expression will be zero or more occurrences of a math operator followed by another expression.
Let's look at some examples of how a math expression might get derived. We'll start with a single number, 2
.
Expression → Prefix (Infix)*
→ Prefix
→ NUMBER
→ 2
Next, we'll look at the math expression, 1 + 2
.
Expression → Prefix (Infix)*
→ Prefix Infix
→ Prefix ("+" Expression)
→ Prefix ("+" Prefix)
→ Prefix + Prefix
→ NUMBER + NUMBER
→ 1 + 2
Now, a longer math expression: 1 + 2 * 3
.
Expression → Prefix (Infix)*
→ Prefix Infix Infix
→ Prefix ("+" Expression) ("*" Expression)
→ Prefix ("+" Prefix) ("*" Prefix)
→ Prefix + Prefix * Prefix
→ NUMBER + NUMBER * NUMBER
→ 1 + 2 * 3
We'll place the grammar rules above each method so that we can keep track of where we are in the grammar during the recursive descent. Inside the Parser.js
, update the contents with the following.
const { TokenTypes, Tokenizer } = require('./Tokenizer')
class Parser {
parse(input) {
this.input = input
this.tokenizer = new Tokenizer(input)
this.lookahead = this.tokenizer.getNextToken()
this.operators = {
'*': 3,
'+': 2
}
return this.Expression()
}
eat(tokenType) {
const token = this.lookahead
if (token == null) {
throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
}
if (token.type !== tokenType) {
throw new SyntaxError(
`Unexpected token: "${token.value}", expected "${tokenType}"`
)
}
this.lookahead = this.tokenizer.getNextToken()
return token
}
getPrecedence(token) {
if (token?.type) {
return this.operators[token.value]
}
return 0
}
/**
* Expression
* = Prefix (Infix)*
*/
Expression(prec = 0) {
let left = this.Prefix()
while (prec < this.getPrecedence(this.lookahead)) {
left = this.Infix(left, this.lookahead?.type)
}
return left
}
/**
* Prefix
* = NUMBER
*/
Prefix() {
const token = this.eat(TokenTypes.NUMBER)
return Number(token.value)
}
/**
* Infix
* = ("+" / "*") Expression
*/
Infix(left, operatorType) {
let token = this.eat(operatorType)
let newPrec = this.operators[token.value] // newPrec is the new precedence we pass to the "Expression" method
switch (token.type) {
case TokenTypes.ADDITION:
return left + this.Expression(newPrec)
case TokenTypes.MULTIPLICATION:
return left * this.Expression(newPrec)
}
}
}
const parser = new Parser()
const result = parser.parse('1 + 2 * 3')
console.log(result) // 7
We can run this code using the following command.
node Parser
# Output: 7
Let's walk through what's happening for the math expression, 1 + 2 * 3
. We should have the following tokens from the tokenizer.
{ type: 'NUMBER', value: '1'}
{ type: '+', value: '+'}
{ type: 'NUMBER', value: '2'}
{ type: '*', value: '*'}
{ type: 'NUMBER', value: '3'}
At the very start of the parser, we call the parse
method that will proceed to call this.Expression()
. This begins our recursive descent through the Prefix
and Infix
methods.
The Prefix
method will consume the 1
token and proceed to go to the following code inside the Expression
method:
while (prec < this.getPrecedence(this.lookahead)) {
left = this.Infix(left, this.lookahead?.type)
}
This is where the magic happens inside a Pratt parser. At the beginning, the precedence, prec
, is equal to zero. Then, we compare it to the precedence of the next token, +
. The getPrecedence
method is a utility method for retrieving the precedence of an operator.
In the parse
method, we added a new line to define some mathematical operators and their respective precedence.
this.operators = {
'*': 3,
'+': 2
}
We enter the while
loop because 0 < 2
where 2
refers to the precedence of addition. Then, we run the Infix
method which will perform addition and recursively call this.Expression
again BUT with a new precedence!
The while
loop will run again for multiplication because 0 < 3
. Eventually, we reach the end of the math expression, so the tokenizer returns a null
token.
The getPrecedence
method will return a value of 0
when there's a null
token which ends the while
loop in the Expression
method. This is why we call the getPrecedence
method instead of reaching out to the operators
object we defined in the parse
method.
This all seems elaborate compared to the regular recursive descent parser, but please continue reading to be impressed!
Handling Subtraction and Division
Let's update our grammar blueprint to include subtraction and division.
Expression
= Prefix (Infix)*
Prefix
= NUMBER
Infix
= ("+" / "-" / "*" / "/") Expression
Implementing these operations will be super easy in our parser code. We just need to add subtraction and division precedence in the operators
object inside the parse
method.
this.operators = {
'*': 3,
'/': 3,
'+': 2,
'-': 2
}
Then, we need to add subtraction and division to our Infix
method.
Infix(left, operatorType) {
let token = this.eat(operatorType)
let newPrec = this.operators[token.value]
switch (token.type) {
case TokenTypes.ADDITION:
return left + this.Expression(newPrec)
case TokenTypes.SUBTRACTION:
return left - this.Expression(newPrec)
case TokenTypes.MULTIPLICATION:
return left * this.Expression(newPrec)
case TokenTypes.DIVISION:
return left / this.Expression(newPrec)
}
}
Let's look at the code for the parser so far and add an example of subtraction and division at the bottom of the code.
const { TokenTypes, Tokenizer } = require('./Tokenizer')
class Parser {
parse(input) {
this.input = input
this.tokenizer = new Tokenizer(input)
this.lookahead = this.tokenizer.getNextToken()
this.operators = {
'*': 3,
'/': 3,
'+': 2,
'-': 2
}
return this.Expression()
}
eat(tokenType) {
const token = this.lookahead
if (token == null) {
throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
}
if (token.type !== tokenType) {
throw new SyntaxError(
`Unexpected token: "${token.value}", expected "${tokenType}"`
)
}
this.lookahead = this.tokenizer.getNextToken()
return token
}
getPrecedence(token) {
if (token?.type) {
return this.operators[token.value]
}
return 0
}
/**
* Expression
* = Prefix (Infix)*
*/
Expression(prec = 0) {
let left = this.Prefix()
while (prec < this.getPrecedence(this.lookahead)) {
left = this.Infix(left, this.lookahead?.type)
}
return left
}
/**
* Prefix
* = NUMBER
*/
Prefix() {
const token = this.eat(TokenTypes.NUMBER)
return Number(token.value)
}
/**
* Infix
* = ("+" / "-" / "*" / "/") Expression
*/
Infix(left, operatorType) {
let token = this.eat(operatorType)
let newPrec = this.operators[token.value]
switch (token.type) {
case TokenTypes.ADDITION:
return left + this.Expression(newPrec)
case TokenTypes.SUBTRACTION:
return left - this.Expression(newPrec)
case TokenTypes.MULTIPLICATION:
return left * this.Expression(newPrec)
case TokenTypes.DIVISION:
return left / this.Expression(newPrec)
}
}
}
const parser = new Parser()
console.log(parser.parse('5 - 2 - 1')) // 2
console.log(parser.parse('12 / 2 / 3')) // 2
console.log(parser.parse('1 + 2 * 3 - 4 / 2')) // 5
When we run this code using the command, node Parser
, we should see subtraction and division working correctly.
Handling Exponentiation
Let's now include exponentiation in our grammar file. Notice how our grammar is barely changing compared to when we created grammars in previous tutorials. Pratt parsers make everything better!
Expression
= Prefix (Infix)*
Prefix
= NUMBER
Infix
= ("+" / "-" / "*" / "/" / "^") Expression
In our parser code, we'll add the ^
operator to the operators
object inside the parse
method. Exponentiation should have the highest precedence.
this.operators = {
'^': 4,
'*': 3,
'/': 3,
'+': 2,
'-': 2
}
Here comes the fun part! Exponentiation is an operator that has right-associativity instead of left-associativity. This means that the expression, 2 ^ 2 ^ 3
should return a value of 256
.
2 ^ 2 ^ 3 = 2 ^ (2 ^ 3) = 2 ^ 8 = 256
Next, we need to add the exponentiation operator to our Infix
method. But wait! Notice how we're subtracting one from the newPrec
variable before passing it into the Expression
method.
Infix(left, operatorType) {
let token = this.eat(operatorType)
let newPrec = this.operators[token.value]
switch (token.type) {
case TokenTypes.ADDITION:
return left + this.Expression(newPrec)
case TokenTypes.SUBTRACTION:
return left - this.Expression(newPrec)
case TokenTypes.MULTIPLICATION:
return left * this.Expression(newPrec)
case TokenTypes.DIVISION:
return left / this.Expression(newPrec)
case TokenTypes.EXPONENTIATION:
return left ** this.Expression(newPrec - 1)
}
}
To handle right associativity, we just need to subtract one from a precedence we pass into the Expression
method! Yep, that's it! No rearranging the grammar and struggling to get everything working. I think I got a few gray hairs when making this tutorial series due to the exponentiation operation fighting so much with the unary operation. After making Pratt parsers, my hair has now returned to its original color and glamour ✨
Joking aside, let's look at the code for the parser so far. We'll also add some examples at the bottom of the code to make sure the exponentiation operator is working correctly.
const { TokenTypes, Tokenizer } = require('./Tokenizer')
class Parser {
parse(input) {
this.input = input
this.tokenizer = new Tokenizer(input)
this.lookahead = this.tokenizer.getNextToken()
this.operators = {
'^': 4,
'*': 3,
'/': 3,
'+': 2,
'-': 2
}
return this.Expression()
}
eat(tokenType) {
const token = this.lookahead
if (token == null) {
throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
}
if (token.type !== tokenType) {
throw new SyntaxError(
`Unexpected token: "${token.value}", expected "${tokenType}"`
)
}
this.lookahead = this.tokenizer.getNextToken()
return token
}
getPrecedence(token) {
if (token?.type) {
return this.operators[token.value]
}
return 0
}
/**
* Expression
* = Prefix (Infix)*
*/
Expression(prec = 0) {
let left = this.Prefix()
while (prec < this.getPrecedence(this.lookahead)) {
left = this.Infix(left, this.lookahead?.type)
}
return left
}
/**
* Prefix
* = NUMBER
*/
Prefix() {
const token = this.eat(TokenTypes.NUMBER)
return Number(token.value)
}
/**
* Infix
* = ("+" / "-" / "*" / "/" / "^") Expression
*/
Infix(left, operatorType) {
let token = this.eat(operatorType)
let newPrec = this.operators[token.value]
switch (token.type) {
case TokenTypes.ADDITION:
return left + this.Expression(newPrec)
case TokenTypes.SUBTRACTION:
return left - this.Expression(newPrec)
case TokenTypes.MULTIPLICATION:
return left * this.Expression(newPrec)
case TokenTypes.DIVISION:
return left / this.Expression(newPrec)
case TokenTypes.EXPONENTIATION:
return left ** this.Expression(newPrec - 1)
}
}
}
const parser = new Parser()
console.log(parser.parse('2 ^ 2')) // 4
console.log(parser.parse('2 ^ 2 ^ 3')) // 256
When we run this code using the command, node Parser
, we should see exponentiation working correctly.
Handling Parentheses
Alright, we've handled all the infix operations, so now we just need to add the prefix operations. We'll start with handling parentheses. Let's adjust our grammar blueprint to include a rule for ParenthesizedExpression
inside the Prefix
rule.
Expression
= Prefix (Infix)*
Prefix
= ParenthesizedExpression
/ NUMBER
Infix
= ("+" / "-" / "*" / "/" / "^") Expression
ParenthesizedExpression
= "(" Expression ")"
We'll adjust the Prefix
method to call a new method, ParenthesizedExpression
, when we encounter a left parenthesis (
.
Prefix() {
if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT) {
return this.ParenthesizedExpression()
}
const token = this.eat(TokenTypes.NUMBER)
return Number(token.value)
}
Then, we'll define the ParenthesizedExpression
in the Parser
class and insert the following contents.
ParenthesizedExpression() {
this.eat(TokenTypes.PARENTHESIS_LEFT)
const expression = this.Expression()
this.eat(TokenTypes.PARENTHESIS_RIGHT)
return expression
}
We consume a (
token, run the Expression
method (which defaults to precedence of zero when no parameter is given), and then consume a )
token.
Our parser code should now look like the following.
const { TokenTypes, Tokenizer } = require('./Tokenizer')
class Parser {
parse(input) {
this.input = input
this.tokenizer = new Tokenizer(input)
this.lookahead = this.tokenizer.getNextToken()
this.operators = {
'^': 4,
'*': 3,
'/': 3,
'+': 2,
'-': 2
}
return this.Expression()
}
eat(tokenType) {
const token = this.lookahead
if (token == null) {
throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
}
if (token.type !== tokenType) {
throw new SyntaxError(
`Unexpected token: "${token.value}", expected "${tokenType}"`
)
}
this.lookahead = this.tokenizer.getNextToken()
return token
}
getPrecedence(token) {
if (token?.type) {
return this.operators[token.value]
}
return 0
}
/**
* Expression
* = Prefix (Infix)*
*/
Expression(prec = 0) {
let left = this.Prefix()
while (prec < this.getPrecedence(this.lookahead)) {
left = this.Infix(left, this.lookahead?.type)
}
return left
}
/**
* Prefix
* = ParenthesizedExpression
* / NUMBER
*/
Prefix() {
if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT) {
return this.ParenthesizedExpression()
}
const token = this.eat(TokenTypes.NUMBER)
return Number(token.value)
}
/**
* Infix
* = ("+" / "-" / "*" / "/" / "^") Expression
*/
Infix(left, operatorType) {
let token = this.eat(operatorType)
let newPrec = this.operators[token.value]
switch (token.type) {
case TokenTypes.ADDITION:
return left + this.Expression(newPrec)
case TokenTypes.SUBTRACTION:
return left - this.Expression(newPrec)
case TokenTypes.MULTIPLICATION:
return left * this.Expression(newPrec)
case TokenTypes.DIVISION:
return left / this.Expression(newPrec)
case TokenTypes.EXPONENTIATION:
return left ** this.Expression(newPrec - 1)
}
}
/**
* ParenthesizedExpression
* = "(" Expression ")"
*/
ParenthesizedExpression() {
this.eat(TokenTypes.PARENTHESIS_LEFT)
const expression = this.Expression()
this.eat(TokenTypes.PARENTHESIS_RIGHT)
return expression
}
}
const parser = new Parser()
console.log(parser.parse('(1 + 2) * 3')) // 9
console.log(parser.parse('(2 ^ 2) ^ 3')) // 64
Handling Unary Operations
It's time to add the dreaded unary operations to our parser. Let's update our grammar to include a UnaryExpression
rule inside the Prefix
rule.
Expression
= Prefix (Infix)*
Prefix
= ParenthesizedExpression
/ UnaryExpression
/ NUMBER
Infix
= ("+" / "-" / "*" / "/" / "^") Expression
ParenthesizedExpression
= "(" Expression ")"
UnaryExpression
= "-" Expression
Notice that we're using "-" Expression
in our grammar rule for UnaryExpression
this time. In previous tutorials, we were using "-" Factor
. Since the Expression
method in our Pratt parser will automatically deal with operator precedence, we can safely use "-" Expression
in our grammar.
As mentioned earlier in this tutorial, the unary operator uses the same symbol, -
, as subtraction. However, a unary operation technically has higher precedence than multiplication/division but lower precedence than exponentiation. Therefore, we need to update our operators
object to handle a new unary
operator.
this.operators = {
'^': 5,
unary: 4,
'*': 3,
'/': 3,
'+': 2,
'-': 2
}
If we weren't using Pratt parsers, making an adjustment to existing parser code to handle unary operations would be a huge pain. Luckily, you're reading this amazing tutorial and building a Pratt parser instead.
We now need to adjust the Prefix
method to call a new method, UnaryExpression
, when we encounter a subtraction token, -
.
Prefix() {
if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT) {
return this.ParenthesizedExpression()
}
if (this.lookahead.type === TokenTypes.SUBTRACTION) {
return this.UnaryExpression()
}
const token = this.eat(TokenTypes.NUMBER)
return Number(token.value)
}
Next, we'll define the UnaryExpression
method in the Parser
class and insert the following contents.
UnaryExpression() {
this.eat(TokenTypes.SUBTRACTION)
return -this.Expression(this.getPrecedence('unary'))
}
Notice how we pass the string, unary
, into our getPrecedence
method. Currently, this method expects an object with a type
property. Otherwise, it will return zero. Let's add a conditional statement to check if the received token is a "virtual" unary
token.
getPrecedence(token) {
if (token === 'unary') {
return this.operators.unary;
}
if (token?.type) {
return this.operators[token.value]
}
return 0
}
Our parser code should now look like the following.
const { TokenTypes, Tokenizer } = require('./Tokenizer')
class Parser {
parse(input) {
this.input = input
this.tokenizer = new Tokenizer(input)
this.lookahead = this.tokenizer.getNextToken()
this.operators = {
'^': 5,
unary: 4,
'*': 3,
'/': 3,
'+': 2,
'-': 2
}
return this.Expression()
}
eat(tokenType) {
const token = this.lookahead
if (token == null) {
throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
}
if (token.type !== tokenType) {
throw new SyntaxError(
`Unexpected token: "${token.value}", expected "${tokenType}"`
)
}
this.lookahead = this.tokenizer.getNextToken()
return token
}
getPrecedence(token) {
if (token === 'unary') {
return this.operators.unary;
}
if (token?.type) {
return this.operators[token.value]
}
return 0
}
/**
* Expression
* = Prefix (Infix)*
*/
Expression(prec = 0) {
let left = this.Prefix()
while (prec < this.getPrecedence(this.lookahead)) {
left = this.Infix(left, this.lookahead?.type)
}
return left
}
/**
* Prefix
* = ParenthesizedExpression
* / UnaryExpression
* / NUMBER
*/
Prefix() {
if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT) {
return this.ParenthesizedExpression()
}
if (this.lookahead.type === TokenTypes.SUBTRACTION) {
return this.UnaryExpression()
}
const token = this.eat(TokenTypes.NUMBER)
return Number(token.value)
}
/**
* Infix
* = ("+" / "-" / "*" / "/" / "^") Expression
*/
Infix(left, operatorType) {
let token = this.eat(operatorType)
let newPrec = this.operators[token.value]
switch (token.type) {
case TokenTypes.ADDITION:
return left + this.Expression(newPrec)
case TokenTypes.SUBTRACTION:
return left - this.Expression(newPrec)
case TokenTypes.MULTIPLICATION:
return left * this.Expression(newPrec)
case TokenTypes.DIVISION:
return left / this.Expression(newPrec)
case TokenTypes.EXPONENTIATION:
return left ** this.Expression(newPrec - 1)
}
}
/**
* ParenthesizedExpression
* = "(" Expression ")"
*/
ParenthesizedExpression() {
this.eat(TokenTypes.PARENTHESIS_LEFT)
const expression = this.Expression()
this.eat(TokenTypes.PARENTHESIS_RIGHT)
return expression
}
/**
* UnaryExpression
* = "-" Expression
*/
UnaryExpression() {
this.eat(TokenTypes.SUBTRACTION)
return -this.Expression(this.getPrecedence('unary'))
}
}
const parser = new Parser()
console.log(parser.parse('-2')) // -2
console.log(parser.parse('-2 + 3')) // 1
console.log(parser.parse('-2 ^ 2 + 1')) // -3
console.log(parser.parse('-(2 * 2)')) // -4
console.log(parser.parse('- - 2')) // 2
console.log(parser.parse('- - - 2')) // -2
Notice the examples we added at the bottom of the code. The unary operation should have lower precedence than exponentiation, and we can use multiple -
symbols to stack unary operations.
Handling Trigonometric Functions
Finally, let's add support for trigonometric functions to our parser. We'll update our grammar to include a FunctionExpression
inside the Prefix
rule.
Expression
= Prefix (Infix)*
Prefix
= ParenthesizedExpression
/ UnaryExpression
/ FunctionExpression
/ NUMBER
Infix
= ("+" / "-" / "*" / "/" / "^") Expression
ParenthesizedExpression
= "(" Expression ")"
UnaryExpression
= "-" Expression
FunctionExpression
= IDENTIFIER ParenthesizedExpression
An example of a FunctionExpression
would be cos(3.14)
where cos
is an IDENTIFIER
token and (3.14)
is a ParenthesizedExpression
.
Let's adjust the Prefix
method to call FunctionExpression
when we encounter an IDENTIFIER
token.
Prefix() {
if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT) {
return this.ParenthesizedExpression()
}
if (this.lookahead.type === TokenTypes.SUBTRACTION) {
return this.UnaryExpression()
}
if (this.lookahead.type === TokenTypes.IDENTIFIER) {
return this.FunctionExpression()
}
const token = this.eat(TokenTypes.NUMBER)
return Number(token.value)
}
Next, we'll define the FunctionExpression
method in the Parser
class and insert the following contents.
FunctionExpression() {
const id = this.eat(TokenTypes.IDENTIFIER).value
const expression = this.ParenthesizedExpression()
return this.trig(id, expression)
}
This method calls a new function called trig
that will evaluate the trigonometric functions: sin
, cos
, and tan
. This trig
function will be identical to the function we used in the previous tutorials.
trig(id, value) {
switch (id) {
case 'sin':
return Math.sin(value)
case 'cos':
return Math.cos(value)
case 'tan':
return Math.tan(value)
default:
throw new Error(`Invalid operation: ${id}`)
}
}
After making the above changes, our Parser.js
file should look similar to the following.
const { TokenTypes, Tokenizer } = require('./Tokenizer')
class Parser {
parse(input) {
this.input = input
this.tokenizer = new Tokenizer(input)
this.lookahead = this.tokenizer.getNextToken()
this.operators = {
'^': 5,
unary: 4,
'*': 3,
'/': 3,
'+': 2,
'-': 2
}
return this.Expression()
}
eat(tokenType) {
const token = this.lookahead
if (token == null) {
throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)
}
if (token.type !== tokenType) {
throw new SyntaxError(
`Unexpected token: "${token.value}", expected "${tokenType}"`
)
}
this.lookahead = this.tokenizer.getNextToken()
return token
}
getPrecedence(token) {
if (token === 'unary') {
return this.operators.unary
}
if (token?.type) {
return this.operators[token.value]
}
return 0
}
trig(id, value) {
switch (id) {
case 'sin':
return Math.sin(value)
case 'cos':
return Math.cos(value)
case 'tan':
return Math.tan(value)
default:
throw new Error(`Invalid operation: ${id}`)
}
}
/**
* Expression
* = Prefix (Infix)*
*/
Expression(prec = 0) {
let left = this.Prefix()
while (prec < this.getPrecedence(this.lookahead)) {
left = this.Infix(left, this.lookahead?.type)
}
return left
}
/**
* Prefix
* = ParenthesizedExpression
* / UnaryExpression
* / FunctionExpression
* / NUMBER
*/
Prefix() {
if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT) {
return this.ParenthesizedExpression()
}
if (this.lookahead.type === TokenTypes.SUBTRACTION) {
return this.UnaryExpression()
}
if (this.lookahead.type === TokenTypes.IDENTIFIER) {
return this.FunctionExpression()
}
const token = this.eat(TokenTypes.NUMBER)
return Number(token.value)
}
/**
* Infix
* = ("+" / "-" / "*" / "/" / "^") Expression
*/
Infix(left, operatorType) {
let token = this.eat(operatorType)
let newPrec = this.operators[token.value]
switch (token.type) {
case TokenTypes.ADDITION:
return left + this.Expression(newPrec)
case TokenTypes.SUBTRACTION:
return left - this.Expression(newPrec)
case TokenTypes.MULTIPLICATION:
return left * this.Expression(newPrec)
case TokenTypes.DIVISION:
return left / this.Expression(newPrec)
case TokenTypes.EXPONENTIATION:
return left ** this.Expression(newPrec - 1)
}
}
/**
* ParenthesizedExpression
* = "(" Expression ")"
*/
ParenthesizedExpression() {
this.eat(TokenTypes.PARENTHESIS_LEFT)
const expression = this.Expression()
this.eat(TokenTypes.PARENTHESIS_RIGHT)
return expression
}
/**
* UnaryExpression
* = "-" Expression
*/
UnaryExpression() {
this.eat(TokenTypes.SUBTRACTION)
return -this.Expression(this.getPrecedence('unary'))
}
/**
* FunctionExpression
* = IDENTIFIER ParenthesizedExpression
*/
FunctionExpression() {
const id = this.eat(TokenTypes.IDENTIFIER).value
const expression = this.ParenthesizedExpression()
return this.trig(id, expression)
}
}
We can test the trigonometric functions by adding the following code at the bottom of the Parser.js
file and running the command, node Parser
in the command line.
const parser = new Parser()
console.log(parser.parse('sin(0)')) // 0
console.log(parser.parse('cos(0)')) // 1
console.log(parser.parse('tan(0)')) // 0
Testing the Pratt Parser
Finally, our math expression Pratt parser is fully complete! Let's run some tests to make sure our Pratt parser passes the same tests as the recursive descent parser we built in Part 14 of this tutorial series.
Add the following line at the end of our Parser.js
file to export our Parser
class.
module.exports = { Parser }
Then, we'll reuse the test.js
file from Part 14.
const assert = require('assert').strict
const { Parser } = require('./Parser')
const mathTests = [
['1', 1],
[' 2 ', 2],
['1 + 2', 3],
[' 1 + 2 ', 3],
['1 + 2 * 3', 7],
['(1 + 2) * 3', 9],
['5 - 2', 3],
['5 - 2 - 1', 2],
['12 / 2 / 3', 2],
['2 ^ 3 + 1', 9],
['-2 ^ 2', -4],
['(-2) ^ 2', 4],
['-2 ^ 2 + 1', -3],
['cos(0) + 3 * -4 / -2 ^ 2', 4]
]
const parser = new Parser()
let result
for (const [expression, expected] of mathTests) {
try {
result = parser.parse(expression)
assert.equal(result, expected)
} catch (err) {
const lines = '-'.repeat(process.stdout.columns)
console.log(lines)
console.log(`Expression failed: "${expression}"`)
console.log(`Expected result: ${expected}`)
console.log(`Actual result: ${result}`)
console.log(lines)
throw err
}
}
console.log('All tests passed! 🎉')
We can execute this file using the following command:
node tests
After running this command, all the tests should pass 😁
Conclusion
Congrats, friend! You did it! You can parse anything now! 🎉
Pratt parsers are very elegant and popular, and I hope you've seen why. They're much easier to implement than recursive descent, more easily configurable, and make it easy to keep track of the order of mathematical operations.
Pratt parsing is my favorite way to create parsers. We can even use the knowledge we learned today to create a small programming language. In a future article, we'll create our own tiny programming language to support variables in math expressions, so get ready 😉
If this series has helped you in any way, please consider donating to my cause! I love talking about obscure topics and explaining them in a simple and detailed way 💚
In the next tutorial, we'll learn how to create abstract syntax trees (AST) by adjusting our Pratt parser. See you there! 🙂