How to Build a Small Programming Language with a Pratt Parser using JavaScript

Published: Wednesday, March 29, 2023

Greetings, friends! This is my eighteenth post in the math expression parser series. This is a special bonus tutorial! ⭐

In this tutorial, we'll learn how to build a build a small programming language called "Tiny Math Language" or TML for short. This custom language will support the ability to define and use variables in math expressions. We're still building a math expression parser, but it'll be a bit different from all the previous tutorials. As a base, we'll use the Pratt parser we made in Part 16 of this tutorial series, so it shouldn't be too much extra work 😅

Let's get started!

Introducing the TML Programming Language!

Welcome all to the greatest programming language on the planet! Tiny Math Language (TML)!

Before we start building a programming language, let's set a foundation for what the language grammar should like. As the name implies, TML will be super tiny. The main difference between the TML parser and the Pratt parser we built in Part 16 is that the TML parser will support the declaration of variables. We'll also provide a way to print or log the results of multiple math expressions.

Here's a small code snippet of TML:

js
Copied! ⭐️
const x = 1 + 2 * 3
const y = 2 ^ 2 ^ 3
const z = 3

print(x + y + z)

Yep! That's it! Super basic, right? There are three main features we will add to our math expression parser:

  1. The var keyword to declare variables
  2. The print keyword to log the results of an expression
  3. The semicolon ; to indicate the end of a statement

Although our language will be tiny, it'll be a great learning experience! The main purpose of this tutorial is to take a small step into the world of programming language design.

Grammar Blueprint for TML

When designing a parser, it's always a good idea to create a grammar "blueprint" of our language. Let's look at the completed grammar for TML.

js
Copied! ⭐️
Program
  = StatementList

StatementList
  = Statement+

Statement
  = VariableStatement
  / PrintStatement
  / ExpressionStatement

VariableStatement
  = "var" IDENTIFIER "=" Expression ";"

PrintStatement
  = "print" ParenthesizedExpression ";"

ExpressionStatement
  = Expression ";"

Expression
  = Prefix (Infix)*

Prefix
  = ParenthesizedExpression
  / UnaryExpression
  / VariableOrFunctionExpression
  / NUMBER

Infix
  = ("+" / "-" / "*" / "/" / "^") Expression

ParenthesizedExpression
  = "(" Expression ")"

UnaryExpression
  = "-" Expression

VariableOrFunctionExpression
  = Variable
  / FunctionExpression

Variable
  = IDENTIFIER

FunctionExpression
  = IDENTIFIER ParenthesizedExpression

I hope this grammar doesn't look too scary 😅 If you've been following along my tutorial series, this should hopefully make sense. A good portion of the above grammar is actually similar to the grammar we created for the Pratt parser in Part 16 of this tutorial series.

Compared to our Pratt parser from earlier, the grammar for TML has a few new additions:

js
Copied! ⭐️
Program
  = StatementList

StatementList
  = Statement+

Statement
  = VariableStatement
  / PrintStatement
  / ExpressionStatement

VariableStatement
  = "var" IDENTIFIER "=" Expression ";"

PrintStatement
  = "print" ParenthesizedExpression ";"

ExpressionStatement
  = Expression ";"

VariableOrFunctionExpression
  = Variable
  / FunctionExpression

Variable
  = IDENTIFIER

The TML parser will start at Program instead of Expression. Then, we'll proceed to go to a StatementList, but wait...what is a statement?

A statement is an entire group of code that performs a specific action or side effect. In TML, it will be a group of code followed by one semicolon ;.

It might be helpful to look at the difference between a statement and expression. Let's look at an example of a VariableStatement.

js
Copied! ⭐️
// VariableStatement
const x = 1 + 2

In the code above, var x = 1 + 2; is a statement. More specifically, it's a "variable statement". 1 + 2 is an expression inside the "variable statement".

Let's look at an example of a PrintStatement. Note that this type of grammar rule is specific to TML, so you may not see it in actual programming languages 😅

js
Copied! ⭐️
// PrintStatement
print(1 + 2)

In the code above, print(1 + 2); is a statement and 1 + 2 is an expression inside that statement.

Next, let's look at an example of an ExpressionStatement. This type of grammar rule is common among actual programming languages. But wait...how can it be both an expression and statement? An "expression statement" is simply a statement that only includes an expression.

js
Copied! ⭐️
// ExpressionStatement
1 + 2

Notice how an "expression statement" looks very similar to an expression. We'll use the ExpressionStatement grammar rule to serve as a "container" around an expression. All expressions will lie inside a statement. In TML, we'll use a semicolon ; to mark the end of a statement to make it easier to parse.

As seen in the grammar above, there will be three different kinds of statements in TML.

js
Copied! ⭐️
Statement
  = VariableStatement
  / PrintStatement
  / ExpressionStatement

The TML program will start with StatementList which will just be an array of one or more statements as indicated by Statement+.

js
Copied! ⭐️
Program
  = StatementList

StatementList
  = Statement+

Our parser will go through TML code line-by-line, or rather, statement-by-statement. It'll collect a list of statements and store them in array. When we build an abstract syntax tree (AST) in the next tutorial, it'll make more sense why we store the statements in an array. In this tutorial, however, we'll evaluate each statement immediately.

tip
If you navigate to AST Explorer and view the AST for a JavaScript program, then you may notice that a JavaScript program has a body AST node which contains a list of statements.

Building a Tokenizer for TML

In Part 5 of this tutorial series, we learned how to build a tokenizer for math expression parsers. It was then updated in Part 6. We've been using this updated version when building parsers by hand including the recursive descent parser in Part 14 and the Pratt parser in Part 16.

Since we're adding new keywords, we need to update our tokenizer once again. We will add the following token types:

js
Copied! ⭐️
{
  VAR: 'VAR',
  PRINT: 'PRINT',
  SEMICOLON: ';',
  ASSIGNMENT: '=',
}

We'll also add the following token specifications:

js
Copied! ⭐️
[
  [/^\bvar\b/, TokenTypes.VAR],
  [/^\bprint\b/, TokenTypes.PRINT],
  [/^;/, TokenTypes.SEMICOLON],
  [/^\=/, TokenTypes.ASSIGNMENT],
]

Keep in mind that the order of where we place the token specifications matters! Our tokenizer will run through each token spec in order. Create a new file called Tokenizer.js and insert the following contents to complete our tokenizer.

js
Copied! ⭐️
const TokenTypes = {
  VAR: 'VAR',
  PRINT: 'PRINT',
  NUMBER: 'NUMBER',
  IDENTIFIER: 'IDENTIFIER',
  ADDITION: '+',
  SUBTRACTION: '-',
  MULTIPLICATION: '*',
  DIVISION: '/',
  EXPONENTIATION: '^',
  PARENTHESIS_LEFT: '(',
  PARENTHESIS_RIGHT: ')',
  SEMICOLON: ';',
  ASSIGNMENT: '='
}

const TokenSpec = [
  [/^\s+/, null],
  [/^\bvar\b/, TokenTypes.VAR],
  [/^\bprint\b/, TokenTypes.PRINT],
  [/^(?:\d+(?:\.\d*)?|\.\d+)/, TokenTypes.NUMBER],
  [/^[a-z]+/, TokenTypes.IDENTIFIER],
  [/^\+/, TokenTypes.ADDITION],
  [/^\-/, TokenTypes.SUBTRACTION],
  [/^\*/, TokenTypes.MULTIPLICATION],
  [/^\//, TokenTypes.DIVISION],
  [/^\^/, TokenTypes.EXPONENTIATION],
  [/^\(/, TokenTypes.PARENTHESIS_LEFT],
  [/^\)/, TokenTypes.PARENTHESIS_RIGHT],
  [/^;/, TokenTypes.SEMICOLON],
  [/^\=/, TokenTypes.ASSIGNMENT]
]

class Tokenizer {
  constructor(input) {
    this.input = input
    this.cursor = 0
  }

  hasMoreTokens() {
    return this.cursor < this.input.length
  }

  match(regex, inputSlice) {
    const matched = regex.exec(inputSlice)
    if (matched === null)
      return null

    this.cursor += matched[0].length
    return matched[0]
  }

  getNextToken() {
    if (!this.hasMoreTokens())
      return null

    const inputSlice = this.input.slice(this.cursor)

    for (const [regex, type] of TokenSpec) {
      const tokenValue = this.match(regex, inputSlice)

      // No rule was matched!
      if (tokenValue === null)
        continue

      // Skip whitespace!
      if (type === null)
        return this.getNextToken()

      return {
        type,
        value: tokenValue
      }
    }

    throw new SyntaxError(`Unexpected token: "${inputSlice[0]}"`)
  }

  printAllTokens() {
    let token
    while ((token = this.getNextToken()))
      console.log(token)
  }
}

module.exports = {
  TokenTypes,
  Tokenizer
}

Starting our TML Parser

We will use the Pratt parser we built in Part 16 as a base for our TML parser. Below is the finished parser from that tutorial. We'll name this file Parser.js.

js
Copied! ⭐️
const { TokenTypes, Tokenizer } = require('./Tokenizer')

class Parser {
  parse(input) {
    this.input = input
    this.tokenizer = new Tokenizer(input)
    this.lookahead = this.tokenizer.getNextToken()
    this.operators = {
      '^': 5,
      'unary': 4,
      '*': 3,
      '/': 3,
      '+': 2,
      '-': 2
    }

    return this.Expression()
  }

  eat(tokenType) {
    const token = this.lookahead

    if (token == null)
      throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)

    if (token.type !== tokenType) {
      throw new SyntaxError(
        `Unexpected token: "${token.value}", expected "${tokenType}"`
      )
    }

    this.lookahead = this.tokenizer.getNextToken()

    return token
  }

  getPrecedence(token) {
    if (token === 'unary')
      return this.operators.unary

    if (token?.type)
      return this.operators[token.value]

    return 0
  }

  trig(id, value) {
    switch (id) {
      case 'sin':
        return Math.sin(value)
      case 'cos':
        return Math.cos(value)
      case 'tan':
        return Math.tan(value)
      default:
        throw new Error(`Invalid operation: ${id}`)
    }
  }

  /**
   * Expression
   *    = Prefix (Infix)*
   */
  Expression(prec = 0) {
    let left = this.Prefix()

    while (prec < this.getPrecedence(this.lookahead))
      left = this.Infix(left, this.lookahead?.type)

    return left
  }

  /**
   * Prefix
   *    = ParenthesizedExpression
   *    / UnaryExpression
   *    / FunctionExpression
   *    / NUMBER
   */
  Prefix() {
    if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT)
      return this.ParenthesizedExpression()

    if (this.lookahead.type === TokenTypes.SUBTRACTION)
      return this.UnaryExpression()

    if (this.lookahead.type === TokenTypes.IDENTIFIER)
      return this.FunctionExpression()

    const token = this.eat(TokenTypes.NUMBER)
    return Number(token.value)
  }

  /**
   * Infix
   *    = ("+" / "-" / "*" / "/" / "^") Expression
   */
  Infix(left, operatorType) {
    const token = this.eat(operatorType)
    const newPrec = this.operators[token.value]
    switch (token.type) {
      case TokenTypes.ADDITION:
        return left + this.Expression(newPrec)
      case TokenTypes.SUBTRACTION:
        return left - this.Expression(newPrec)
      case TokenTypes.MULTIPLICATION:
        return left * this.Expression(newPrec)
      case TokenTypes.DIVISION:
        return left / this.Expression(newPrec)
      case TokenTypes.EXPONENTIATION:
        return left ** this.Expression(newPrec - 1)
    }
  }

  /**
   * ParenthesizedExpression
   *    = "(" Expression ")"
   */
  ParenthesizedExpression() {
    this.eat(TokenTypes.PARENTHESIS_LEFT)
    const expression = this.Expression()
    this.eat(TokenTypes.PARENTHESIS_RIGHT)
    return expression
  }

  /**
   * UnaryExpression
   *    = "-" Expression
   */
  UnaryExpression() {
    this.eat(TokenTypes.SUBTRACTION)
    return -this.Expression(this.getPrecedence('unary'))
  }

  /**
   * FunctionExpression
   *    = IDENTIFIER ParenthesizedExpression
   */
  FunctionExpression() {
    const id = this.eat(TokenTypes.IDENTIFIER).value
    const expression = this.ParenthesizedExpression()
    return this.trig(id, expression)
  }
}

module.exports = { Parser }

This parser can only accept expressions, so it's time to add statements!

Handling StatementList

The first thing we need to change in our parser is the starting point. Currently, the parser method returns this.Expression. We will change it to this.Program instead.

js
Copied! ⭐️
parse(input) {
  this.input = input
  this.tokenizer = new Tokenizer(input)
  this.lookahead = this.tokenizer.getNextToken()
  this.operators = {
    '^': 5,
    unary: 4,
    '*': 3,
    '/': 3,
    '+': 2,
    '-': 2
  }

  return this.Program() // Our new starting point!
}

Next, we will add a Program method to our Parser class:

js
Copied! ⭐️
/**
 * Program
 *    = StatementList
 */
Program() {
  return this.StatementList()
}

The Program method will then call the StatementList which will begin the recursive descent. Let's add the StatementList method to the Parser class:

js
Copied! ⭐️
/**
 * StatementList
 *    = Statement+
 */
StatementList() {
  const statementList = [this.Statement()]

  while (this.lookahead !== null) {
    statementList.push(this.Statement())
  }

  return statementList
}

The StatementList grammar rule derives to a list of one or statements as indicated by the + sign in Statement+. Any time we're dealing repetition, we can use a loop such as a while loop.

We will initialize the statementList variable to the first Statement in the program. Then, we'll keep pushing new statements into the statementList array until we've reached a null token. That is, we continue until we reach the end of input. The input will be our small TML program.

Handling Statement and ExpressionStatement

Currently, we can't run our code successfully yet. We're missing an implementation of the Statement method. Let's add the following code inside the Parser class:

js
Copied! ⭐️
/**
 * Statement
 *    = ExpressionStatement
 */
Statement() {
  return this.ExpressionStatement()
}

For now, we'll only support the ExpressionStatement. This statement is the simplest, so it's a good starting point. We'll add support for the PrintStatement and VariableStatement methods soon.

Let's create a method for ExpressionStatement:

js
Copied! ⭐️
/**
 * ExpressionStatement
 *    = Expression ";"
 */
ExpressionStatement() {
  const expression = this.Expression()
  this.eat(TokenTypes.SEMICOLON)
  return expression
}

This statement is just an Expression followed by a semicolon ;. Yes, I know. The infamous semicolon. When testing the parser, don't forget to include a semicolon at the end of an expression!

Our parser should have the following code so far:

js
Copied! ⭐️
const { TokenTypes, Tokenizer } = require('./Tokenizer')

class Parser {
  parse(input) {
    this.input = input
    this.tokenizer = new Tokenizer(input)
    this.lookahead = this.tokenizer.getNextToken()
    this.operators = {
      '^': 5,
      'unary': 4,
      '*': 3,
      '/': 3,
      '+': 2,
      '-': 2
    }

    return this.Program()
  }

  eat(tokenType) {
    const token = this.lookahead

    if (token == null)
      throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)

    if (token.type !== tokenType) {
      throw new SyntaxError(
        `Unexpected token: "${token.value}", expected "${tokenType}"`
      )
    }

    this.lookahead = this.tokenizer.getNextToken()

    return token
  }

  getPrecedence(token) {
    if (token === 'unary')
      return this.operators.unary

    if (token && token.type != TokenTypes.PARENTHESIS_RIGHT)
      return this.operators[token.value]

    return 0
  }

  trig(id, value) {
    switch (id) {
      case 'sin':
        return Math.sin(value)
      case 'cos':
        return Math.cos(value)
      case 'tan':
        return Math.tan(value)
      default:
        throw new Error(`Invalid operation: ${id}`)
    }
  }

  /**
   * Program
   *    = StatementList
   */
  Program() {
    return this.StatementList()
  }

  /**
   * StatementList
   *    = Statement+
   */
  StatementList() {
    const statementList = [this.Statement()]

    while (this.lookahead !== null)
      statementList.push(this.Statement())

    return statementList
  }

  /**
   * Statement
   *    = ExpressionStatement
   */
  Statement() {
    return this.ExpressionStatement()
  }

  /**
   * ExpressionStatement
   *    = Expression ";"
   */
  ExpressionStatement() {
    const expression = this.Expression()
    this.eat(TokenTypes.SEMICOLON)
    return expression
  }

  /**
   * Expression
   *    = Prefix (Infix)*
   */
  Expression(prec = 0) {
    let left = this.Prefix()

    while (prec < this.getPrecedence(this.lookahead))
      left = this.Infix(left, this.lookahead?.type)

    return left
  }

  /**
   * Prefix
   *    = ParenthesizedExpression
   *    / UnaryExpression
   *    / FunctionExpression
   *    / NUMBER
   */
  Prefix() {
    if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT)
      return this.ParenthesizedExpression()

    if (this.lookahead.type === TokenTypes.SUBTRACTION)
      return this.UnaryExpression()

    if (this.lookahead.type === TokenTypes.IDENTIFIER)
      return this.FunctionExpression()

    const token = this.eat(TokenTypes.NUMBER)
    return Number(token.value)
  }

  /**
   * Infix
   *    = ("+" / "-" / "*" / "/" / "^") Expression
   */
  Infix(left, operatorType) {
    const token = this.eat(operatorType)
    const newPrec = this.operators[token.value]
    switch (token.type) {
      case TokenTypes.ADDITION:
        return left + this.Expression(newPrec)
      case TokenTypes.SUBTRACTION:
        return left - this.Expression(newPrec)
      case TokenTypes.MULTIPLICATION:
        return left * this.Expression(newPrec)
      case TokenTypes.DIVISION:
        return left / this.Expression(newPrec)
      case TokenTypes.EXPONENTIATION:
        return left ** this.Expression(newPrec - 1)
    }
  }

  /**
   * ParenthesizedExpression
   *    = "(" Expression ")"
   */
  ParenthesizedExpression() {
    this.eat(TokenTypes.PARENTHESIS_LEFT)
    const expression = this.Expression()
    this.eat(TokenTypes.PARENTHESIS_RIGHT)
    return expression
  }

  /**
   * UnaryExpression
   *    = "-" Expression
   */
  UnaryExpression() {
    this.eat(TokenTypes.SUBTRACTION)
    return -this.Expression(this.getPrecedence('unary'))
  }

  /**
   * FunctionExpression
   *    = IDENTIFIER ParenthesizedExpression
   */
  FunctionExpression() {
    const id = this.eat(TokenTypes.IDENTIFIER).value
    const expression = this.ParenthesizedExpression()
    return this.trig(id, expression)
  }
}

module.exports = { Parser }

Let's add the following code at the end of our Parser.js file.

js
Copied! ⭐️
const parser = new Parser()
console.log(parser.parse('1 + 1; 2 * 3;')) // Don't forget the semicolons!

We can execute our code by running the following command:

shell
Copied! ⭐️
node Parser
# Output: [ 2, 6 ]

After executing this code, we should get the following array: [2, 6]. Why do we get an array? As you may recall, we're building an array of statements in our TML parser. We're no longer printing a single result to the console anymore.

The StatementList method will return an array of statements and each statement will return an evaluated expression. In the next tutorial, we'll build an AST where each statement returns an AST node representing the expression instead of an evaluated value.

Handling PrintStatement

For convenience, we would like to let the user print results declaratively instead of displaying a weird array of numbers. That is where the "print statement" comes in handy. It will act like a console.log function to display the results of an expression.

Let's adjust our Statement method to call PrintStatement when the lookahead token is a print token.

js
Copied! ⭐️
/**
 * Statement
 *    = PrintStatement
 *    / ExpressionStatement
 */
Statement() {
  if (this.lookahead.type === TokenTypes.PRINT) {
    return this.PrintStatement()
  }

  return this.ExpressionStatement()
}

Then, we'll add the PrintStatement method to the Parser class.

js
Copied! ⭐️
/**
 * PrintStatement
 *    = "print" ParenthesizedExpression ";"
 */
PrintStatement() {
  this.eat(TokenTypes.PRINT)
  const expression = this.ParenthesizedExpression()
  this.eat(TokenTypes.SEMICOLON)
  console.log(expression) // Print the result of the expression!
}

The console.log line is the main purpose of the PrintStatement method. Since our parser is built using JavaScript, it will use JavaScript as its runtime environment language. We've been using Node.js to execute our parser. Node.js is built using the V8 engine which uses a C++ runtime to execute JavaScript. For our TML language, we're using Node.js as our runtime, so it'll take care of logging information to the console for us.

Our parser should now have the following code:

js
Copied! ⭐️
const { TokenTypes, Tokenizer } = require('./Tokenizer')

class Parser {
  parse(input) {
    this.input = input
    this.tokenizer = new Tokenizer(input)
    this.lookahead = this.tokenizer.getNextToken()
    this.operators = {
      '^': 5,
      'unary': 4,
      '*': 3,
      '/': 3,
      '+': 2,
      '-': 2
    }

    return this.Program()
  }

  eat(tokenType) {
    const token = this.lookahead

    if (token == null)
      throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)

    if (token.type !== tokenType) {
      throw new SyntaxError(
        `Unexpected token: "${token.value}", expected "${tokenType}"`
      )
    }

    this.lookahead = this.tokenizer.getNextToken()

    return token
  }

  getPrecedence(token) {
    if (token === 'unary')
      return this.operators.unary

    if (token && token.type != TokenTypes.PARENTHESIS_RIGHT)
      return this.operators[token.value]

    return 0
  }

  trig(id, value) {
    switch (id) {
      case 'sin':
        return Math.sin(value)
      case 'cos':
        return Math.cos(value)
      case 'tan':
        return Math.tan(value)
      default:
        throw new Error(`Invalid operation: ${id}`)
    }
  }

  /**
   * Program
   *    = StatementList
   */
  Program() {
    return this.StatementList()
  }

  /**
   * StatementList
   *    = Statement+
   */
  StatementList() {
    const statementList = [this.Statement()]

    while (this.lookahead !== null)
      statementList.push(this.Statement())

    return statementList
  }

  /**
   * Statement
   *    = PrintStatement
   *    / ExpressionStatement
   */
  Statement() {
    if (this.lookahead.type === TokenTypes.PRINT)
      return this.PrintStatement()

    return this.ExpressionStatement()
  }

  /**
   * PrintStatement
   *    = "print" ParenthesizedExpression ";"
   */
  PrintStatement() {
    this.eat(TokenTypes.PRINT)
    const expression = this.ParenthesizedExpression()
    this.eat(TokenTypes.SEMICOLON)
    console.log(expression)
  }

  /**
   * ExpressionStatement
   *    = Expression ";"
   */
  ExpressionStatement() {
    const expression = this.Expression()
    this.eat(TokenTypes.SEMICOLON)
    return expression
  }

  /**
   * Expression
   *    = Prefix (Infix)*
   */
  Expression(prec = 0) {
    let left = this.Prefix()

    while (prec < this.getPrecedence(this.lookahead))
      left = this.Infix(left, this.lookahead?.type)

    return left
  }

  /**
   * Prefix
   *    = ParenthesizedExpression
   *    / UnaryExpression
   *    / FunctionExpression
   *    / NUMBER
   */
  Prefix() {
    if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT)
      return this.ParenthesizedExpression()

    if (this.lookahead.type === TokenTypes.SUBTRACTION)
      return this.UnaryExpression()

    if (this.lookahead.type === TokenTypes.IDENTIFIER)
      return this.FunctionExpression()

    const token = this.eat(TokenTypes.NUMBER)
    return Number(token.value)
  }

  /**
   * Infix
   *    = ("+" / "-" / "*" / "/" / "^") Expression
   */
  Infix(left, operatorType) {
    const token = this.eat(operatorType)
    const newPrec = this.operators[token.value]
    switch (token.type) {
      case TokenTypes.ADDITION:
        return left + this.Expression(newPrec)
      case TokenTypes.SUBTRACTION:
        return left - this.Expression(newPrec)
      case TokenTypes.MULTIPLICATION:
        return left * this.Expression(newPrec)
      case TokenTypes.DIVISION:
        return left / this.Expression(newPrec)
      case TokenTypes.EXPONENTIATION:
        return left ** this.Expression(newPrec - 1)
    }
  }

  /**
   * ParenthesizedExpression
   *    = "(" Expression ")"
   */
  ParenthesizedExpression() {
    this.eat(TokenTypes.PARENTHESIS_LEFT)
    const expression = this.Expression()
    this.eat(TokenTypes.PARENTHESIS_RIGHT)
    return expression
  }

  /**
   * UnaryExpression
   *    = "-" Expression
   */
  UnaryExpression() {
    this.eat(TokenTypes.SUBTRACTION)
    return -this.Expression(this.getPrecedence('unary'))
  }

  /**
   * FunctionExpression
   *    = IDENTIFIER ParenthesizedExpression
   */
  FunctionExpression() {
    const id = this.eat(TokenTypes.IDENTIFIER).value
    const expression = this.ParenthesizedExpression()
    return this.trig(id, expression)
  }
}

module.exports = { Parser }

Let's add the following code at the end of our Parser.js file.

js
Copied! ⭐️
const parser = new Parser()
parser.parse('print(1 + 2 * 3);')

We can execute our code by running the following command:

shell
Copied! ⭐️
node Parser
# Output: 7

After executing this code, we should see only one value printed to the console: 7. Now our parser handles printing to the console for us whenever we use a "print statement". We can use the print statement to help debug as well.

Handling VariableStatement

The final statement we need to implement is VariableStatement. This statement is responsible for handling variable declarations in TML. Whenever a programming language needs to store values, it uses a data structure called a symbol table.

A common data structure used to implement a symbol table is a hash table (aka hash map). In JavaScript, we can use regular objects to store values like a hash table or hash map.

Inside our Parser.js file, let's create a new class called SymbolTable and insert the following contents:

js
Copied! ⭐️
class SymbolTable {
  constructor() {
    this.symbolTable = {}
  }

  set(name, value) {
    this.symbolTable[name] = value
  }

  get(name) {
    if (this.has(name))
      return this.symbolTable[name]
  }

  has(name) {
    if (this.symbolTable.hasOwnProperty(name))
      return true
    else throw new Error(`${name} has not been declared.`)
  }
}

Suppose we had the following TML program:

js
Copied! ⭐️
const x = 2
print(x)

When encountering a "variable statement", our parser can use the set method to store a variable named x with a value of 2:

js
Copied! ⭐️
const symbolTable = new SymbolTable()
symbolTable.set(x, 2)

Later, we can retrieve the value stored in variable x using the get method:

js
Copied! ⭐️
symbolTable.get(x)

When we run print(x) in our TML program, the get method in our SymbolTable class will retrieve x and return a value of 2. If the variable was never defined, then the get method will throw an error.

Inside the parse method of our Parser class, let's instantiate a new symbol table:

js
Copied! ⭐️
parse(input) {
  this.input = input
  this.tokenizer = new Tokenizer(input)
  this.lookahead = this.tokenizer.getNextToken()
  this.operators = {
    '^': 5,
    unary: 4,
    '*': 3,
    '/': 3,
    '+': 2,
    '-': 2
  }
  // Instantiate a new symbol table!
  this.symbolTable = new SymbolTable()

  return this.Program()
}

Then, we'll update our Statement method to call the VariableStatement method when we encounter a var token:

js
Copied! ⭐️
/**
 * Statement
 *    = VariableStatement
 *    / PrintStatement
 *    / ExpressionStatement
 */
Statement() {
  if (this.lookahead.type === TokenTypes.VAR) {
    return this.VariableStatement()
  }

  if (this.lookahead.type === TokenTypes.PRINT) {
    return this.PrintStatement()
  }

  return this.ExpressionStatement()
  }

Let's add the VariableStatement method to the Parser class:

js
Copied! ⭐️
/**
 * VariableStatement
 *    = "var" IDENTIFIER "=" Expression ";"
 */
VariableStatement() {
  this.eat(TokenTypes.VAR)
  const name = this.eat(TokenTypes.IDENTIFIER).value
  this.eat(TokenTypes.ASSIGNMENT)
  const value = this.Expression()
  this.eat(TokenTypes.SEMICOLON)

  this.symbolTable.set(name, value)
}

This method will consume an IDENTIFIER token such as myVar, x, y, etc. and use it as the name parameter in the SymbolTable.set method. The value will be equal to the result of the expression after the = token.

Retrieving Variable Values

We can now store variables when they're declared in TML code, but there's still the issue of retrieving them. When we use a variable in an expression such as x + 1, our parser needs to know how to retrieve the value stored in that variable, so it can evaluate the expression.

In the Pratt parser tutorials, our Prefix rule looked like the following.

js
Copied! ⭐️
Prefix
  = ParenthesizedExpression
  / UnaryExpression
  / FunctionExpression
  / NUMBER

However, we now have a conflict between a variable and function in TML. Our Prefix method currently looks like the following:

js
Copied! ⭐️
Prefix() {
  if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT) {
    return this.ParenthesizedExpression()
  }

  if (this.lookahead.type === TokenTypes.SUBTRACTION) {
    return this.UnaryExpression()
  }

  if (this.lookahead.type === TokenTypes.IDENTIFIER) {
    return this.FunctionExpression()
  }

  const token = this.eat(TokenTypes.NUMBER)
  return Number(token.value)
}

Suppose we had the identifier, myVar. How would our parser know if myVar is a function name or a variable name? We can determine if the identifier belongs to a function name by checking the next token. If the next token is a left parenthesis ( after the IDENTIFIER token, then we can say that myVar belongs to a function name.

In the example below, we have three statements in TML code.

js
Copied! ⭐️
const myVar = 0
print(myVar)
print(cos(myVar))

The first statement is declaring a variable named myVar and setting it equal to zero. The second statement is printing the value stored in myVar. Notice how myVar is not followed by a left parenthesis (. The third statement is printing the expression, cos(myVar). Since cos is an identifier that precedes a left parenthesis, the parser knows that it is a function and not a variable.

In the grammar blueprint we defined at the beginning of this tutorial, we had to make an adjustment to the Prefix rule.

js
Copied! ⭐️
Prefix
  = ParenthesizedExpression
  / UnaryExpression
  / VariableOrFunctionExpression
  / NUMBER

VariableOrFunctionExpression
  = Variable
  / FunctionExpression

Variable
  = IDENTIFIER

FunctionExpression
  = IDENTIFIER ParenthesizedExpression

A Prefix can resolve to a VariableOrFunctionExpression rule which then can resolve to either Variable or FunctionExpression. Let's implement all these methods in our parser.

js
Copied! ⭐️
/**
 * Prefix
 *    = ParenthesizedExpression
 *    / UnaryExpression
 *    / VariableOrFunctionExpression
 *    / NUMBER
 */
Prefix() {
  if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT) {
    return this.ParenthesizedExpression()
  }

  if (this.lookahead.type === TokenTypes.SUBTRACTION) {
    return this.UnaryExpression()
  }

  if (this.lookahead.type === TokenTypes.IDENTIFIER) {
    return this.VariableOrFunctionExpression()
  }

  const token = this.eat(TokenTypes.NUMBER)
  return Number(token.value)
}

/**
 * VariableOrFunctionExpression
 *    = FunctionExpression
 *    / Variable
 */
VariableOrFunctionExpression() {
  const id = this.eat(TokenTypes.IDENTIFIER).value

  if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT) {
    return this.FunctionExpression(id)
  }

  return this.Variable(id)
}

/**
 * Variable
 *    = IDENTIFIER
 */
Variable(id) {
  return this.symbolTable.get(id)
}

/**
 * FunctionExpression
 *    = IDENTIFIER ParenthesizedExpression
 */
FunctionExpression(id) {
  const expression = this.ParenthesizedExpression()
  return this.trig(id, expression)
}

Inside the Variable method, we can pass in an identifier, id, which will be the name of the variable. Then, we can use SymbolTable.get to retrieve the value stored in that variable.

Finally, our TML parser is fully complete and should look like the following:

js
Copied! ⭐️
const { TokenTypes, Tokenizer } = require('./Tokenizer')

class SymbolTable {
  constructor() {
    this.symbolTable = {}
  }

  set(name, value) {
    this.symbolTable[name] = value
  }

  get(name) {
    if (this.has(name))
      return this.symbolTable[name]
  }

  has(name) {
    if (this.symbolTable.hasOwnProperty(name))
      return true
    else throw new Error(`${name} has not been declared.`)
  }
}

class Parser {
  parse(input) {
    this.input = input
    this.tokenizer = new Tokenizer(input)
    this.lookahead = this.tokenizer.getNextToken()
    this.operators = {
      '^': 5,
      'unary': 4,
      '*': 3,
      '/': 3,
      '+': 2,
      '-': 2
    }
    this.symbolTable = new SymbolTable()

    return this.Program()
  }

  eat(tokenType) {
    const token = this.lookahead

    if (token == null)
      throw new SyntaxError(`Unexpected end of input, expected "${tokenType}"`)

    if (token.type !== tokenType) {
      throw new SyntaxError(
        `Unexpected token: "${token.value}", expected "${tokenType}"`
      )
    }

    this.lookahead = this.tokenizer.getNextToken()

    return token
  }

  getPrecedence(token) {
    if (token === 'unary')
      return this.operators.unary

    if (token && token.type != TokenTypes.PARENTHESIS_RIGHT)
      return this.operators[token.value]

    return 0
  }

  trig(id, value) {
    switch (id) {
      case 'sin':
        return Math.sin(value)
      case 'cos':
        return Math.cos(value)
      case 'tan':
        return Math.tan(value)
      default:
        throw new Error(`Invalid operation: ${id}`)
    }
  }

  /**
   * Program
   *    = StatementList
   */
  Program() {
    return this.StatementList()
  }

  /**
   * StatementList
   *    = Statement+
   */
  StatementList() {
    const statementList = [this.Statement()]

    while (this.lookahead !== null)
      statementList.push(this.Statement())

    return statementList
  }

  /**
   * Statement
   *    = VariableStatement
   *    / PrintStatement
   *    / ExpressionStatement
   */
  Statement() {
    if (this.lookahead.type === TokenTypes.VAR)
      return this.VariableStatement()

    if (this.lookahead.type === TokenTypes.PRINT)
      return this.PrintStatement()

    return this.ExpressionStatement()
  }

  /**
   * VariableStatement
   *    = "var" IDENTIFIER "=" Expression ";"
   */
  VariableStatement() {
    this.eat(TokenTypes.VAR)
    const name = this.eat(TokenTypes.IDENTIFIER).value
    this.eat(TokenTypes.ASSIGNMENT)
    const value = this.Expression()
    this.eat(TokenTypes.SEMICOLON)

    this.symbolTable.set(name, value)
  }

  /**
   * PrintStatement
   *    = "print" ParenthesizedExpression ";"
   */
  PrintStatement() {
    this.eat(TokenTypes.PRINT)
    const expression = this.ParenthesizedExpression()
    this.eat(TokenTypes.SEMICOLON)
    console.log(expression)
  }

  /**
   * ExpressionStatement
   *    = Expression ";"
   */
  ExpressionStatement() {
    const expression = this.Expression()
    this.eat(TokenTypes.SEMICOLON)
    return expression
  }

  /**
   * Expression
   *    = Prefix (Infix)*
   */
  Expression(prec = 0) {
    let left = this.Prefix()

    while (prec < this.getPrecedence(this.lookahead))
      left = this.Infix(left, this.lookahead?.type)

    return left
  }

  /**
   * Prefix
   *    = ParenthesizedExpression
   *    / UnaryExpression
   *    / VariableOrFunctionExpression
   *    / NUMBER
   */
  Prefix() {
    if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT)
      return this.ParenthesizedExpression()

    if (this.lookahead.type === TokenTypes.SUBTRACTION)
      return this.UnaryExpression()

    if (this.lookahead.type === TokenTypes.IDENTIFIER)
      return this.VariableOrFunctionExpression()

    const token = this.eat(TokenTypes.NUMBER)
    return Number(token.value)
  }

  /**
   * Infix
   *    = ("+" / "-" / "*" / "/" / "^") Expression
   */
  Infix(left, operatorType) {
    const token = this.eat(operatorType)
    const newPrec = this.operators[token.value]
    switch (token.type) {
      case TokenTypes.ADDITION:
        return left + this.Expression(newPrec)
      case TokenTypes.SUBTRACTION:
        return left - this.Expression(newPrec)
      case TokenTypes.MULTIPLICATION:
        return left * this.Expression(newPrec)
      case TokenTypes.DIVISION:
        return left / this.Expression(newPrec)
      case TokenTypes.EXPONENTIATION:
        return left ** this.Expression(newPrec - 1)
    }
  }

  /**
   * ParenthesizedExpression
   *    = "(" Expression ")"
   */
  ParenthesizedExpression() {
    this.eat(TokenTypes.PARENTHESIS_LEFT)
    const expression = this.Expression()
    this.eat(TokenTypes.PARENTHESIS_RIGHT)
    return expression
  }

  /**
   * UnaryExpression
   *    = "-" Expression
   */
  UnaryExpression() {
    this.eat(TokenTypes.SUBTRACTION)
    return -this.Expression(this.getPrecedence('unary'))
  }

  /**
   * VariableOrFunctionExpression
   *    = FunctionExpression
   *    / Variable
   */
  VariableOrFunctionExpression() {
    const id = this.eat(TokenTypes.IDENTIFIER).value

    if (this.lookahead.type === TokenTypes.PARENTHESIS_LEFT)
      return this.FunctionExpression(id)

    return this.Variable(id)
  }

  /**
   * Variable
   *    = IDENTIFIER
   */
  Variable(id) {
    return this.symbolTable.get(id)
  }

  /**
   * FunctionExpression
   *    = IDENTIFIER ParenthesizedExpression
   */
  FunctionExpression(id) {
    const expression = this.ParenthesizedExpression()
    return this.trig(id, expression)
  }
}

module.exports = { Parser }

We can test to make sure variables can be retrieved correctly using the following code:

js
Copied! ⭐️
const parser = new Parser()
parser.parse(`
  var x = cos(0);
  print(x);
  print(1 + 2 * x);
`)

/* OUTPUT:
1
3
*/

Testing our Parser

Similar to the previous tutorial, we'll create a new file called tests.js for running tests. This time, we'll test various statements instead of math expressions. We need to make sure our programming language works correctly with all kinds of combinations of tokens, so it's a good idea to test as many combinations as possible.

Insert the following contents inside the tests.js file.

js
Copied! ⭐️
const assert = require('node:assert').strict
const { Parser } = require('./Parser')

const mathTests = [
  // Expression statements
  ['1;', [1]],
  [' 2 ;', [2]],
  ['1 + 2;', [3]],
  [' 1 + 2 ;', [3]],
  ['1 + 2 * 3;', [7]],
  ['(1 + 2) * 3;', [9]],
  ['5 - 2;', [3]],
  ['5 - 2 - 1;', [2]],
  ['12 / 2 / 3;', [2]],
  ['2 ^ 3 + 1;', [9]],
  ['-2 ^ 2;', [-4]],
  ['(-2) ^ 2;', [4]],
  ['-2 ^ 2 + 1;', [-3]],
  ['cos(0) + 3 * -4 / -2 ^ 2;', [4]],

  // Variable Statements
  ['var x = 1; x + 1;', [undefined, 2]],
  ['var x = (1 + 2) * 3; x + 1;', [undefined, 10]],
  ['var x = 1; var y = 2; x + y;', [undefined, undefined, 3]]
]

const parser = new Parser()
let result

for (const [expression, expected] of mathTests) {
  try {
    result = parser.parse(expression)
    assert.deepEqual(result, expected)
  }
  catch (err) {
    const lines = '-'.repeat(process.stdout.columns)
    console.log(lines)
    console.log(`Expression failed: "${expression}"`)
    console.log(`Expected result: ${expected}`)
    console.log(`Actual result: ${result}`)
    console.log(lines)
    throw err
  }
}

console.log('All tests passed! 🎉')

We can execute this file using the following command:

shell
Copied! ⭐️
node tests

After running this command, all the tests should pass! Notice how the expected result is now an array of values. We're using assert.deepEqual to compare arrays instead of the assert.equal method we used in previous tutorials.

The Program method is where our parser starts. This method returns a "statement list" which is an array of statements. Since we're evaluating statements on the fly, our arrays should contain either undefined or a numeric value. Remember, our VariableStatement method sets the value of a variable and stores it in a symbol table. Then, it returns undefined, so we should expect nothing to be returned from "variable statements".

Running .tml Files

Time for some fun! Suppose we want to run .tml files using our TML parser from the command line. The .tml files will contain only TML code.

Create a new file called example.tml with the following contents.

js
Copied! ⭐️
const x = 1
const y = 2
const z = 3

const average = (x + y + z) / 3
print(average)

Don't forget the semicolons! The code above will compute the average between three numbers. Let's write a Node.js script called tml-runner (with no file extension) that will import our TML parser and execute a file passed to it.

js
Copied! ⭐️
#!/usr/bin/env node

const fs = require('node:fs/promises')
const { Parser } = require('./Parser')

const parser = new Parser()

async function run() {
  try {
    const code = await fs.readFile(process.argv[2], { encoding: 'utf8' })
    parser.parse(code)
  }
  catch (err) {
    console.log(err)
  }
}

run()

The line at the top is called a shebang and lets us run tml-runner as an executable.

shell
Copied! ⭐️
#!/usr/bin/env node

Next, we need to change permissions of this file so that we're allowed to run it as an executable on our computer. In MacOS and Linux operating systems, we can run the following command to make tml-runner an executable.

shell
Copied! ⭐️
chmod +x tml-runner

Now, we can run tml-runner as an executable and pass in a .tml file:

shell
Copied! ⭐️
./tml-runner example.tml

After executing the above command, a value of 2 should display in the console. Congrats! You built your very first interpreter! 🎉

We can even "pipe" the output of the example.tml file into a JavaScript program. Create a new file called display.js with the following contents:

js
Copied! ⭐️
let data = ''

process.stdin.on('data', (chunk) => {
  data += chunk
})

process.stdin.on('end', () => {
  console.log(`The average is: ${data} 🎉🎉🎉`)
})

Next, run the following command to pipe the output of example.tml into the input of display.js.

shell
Copied! ⭐️
./tml-runner example.tml | node display.js

After running the above command, we should see the following in our terminal:

shell
Copied! ⭐️
The average is: 2
 🎉🎉🎉

The result from example.tml gets piped into a normal JavaScript file where we can use the result in any way we want. I know this example seems silly, but it showcases the power of using our own language together with another language. In our case, we're able to use TML together with JavaScript!

Conclusion

In this tutorial, we created a small programming language called "Tiny Math Language" or TML for short. We took the math expression Pratt parser from a previous tutorial and restructured it so that it could parse statements. We used statements to add support for storing variables in a math expression parser.

Then, we created a file to test various statements and math expressions. We also learned how to make our very own interpreter for the TML language using a Node.js runtime environment. By creating an interpreter for the TML language that could be run via the command line, we could pipe a result into a regular JavaScript file to allow communication between a .tml file and .js file. Pretty cool! 😎

In this tutorial, we were evaluating statements and math expressions on the fly. In the next tutorial, we'll learn how to make a brand new abstract syntax tree (AST) that supports the statements we added to our parser. See you there!