AASHAN

Creating a Programming Language (sort of)

Mon Feb 27 2023 · 7 min read
Creating a Programming Language (sort of)

Preface

I have always been fascinated by how computers work. The bits and the pieces combining to work together and make huge computations which would take a human years to fathom have always inspired me to get up and sit in front of computer screen each day. Of all the things to learn about computers, softwares are my favorite. I love softwares. I try to make them, sometimes I break them and in general have fun working in them (my boss told me to write I have fun while debugging). And since I am so obsessed with softwares of all kinds, it has only added me the curiosity for how actually softwares run.

From a layman's view, we understand that softwares are just some code which the computers understand. Softwares tell the processor to do something, and it does. The process for creating a software is you pick a programming language, you write some code in the language and ask a compiler/interpreter to execute the code. Depending on the programming language (or the inner architecture of a language compiler/interpreter to be precise) there may be an output executable file which is your software that you can now run. The first part of the process is picking a software which someone already has written for us. This is the part we skip most and have a very little idea on, although we spend our lives working with it.

I had studied about compilers and interpreters in my undergrad school, but it was all theory and we never implemented anything. On the September of 2022, I felt compilers calling out to me. So I picked up my dusted book and started flipping the pages. The more I read the book, the more I wanted to write a compiler. I could not resist the temptation and that is how Balance was born.

How do I write a Programming Language?

It might be overwhelming at first, and trust me it is. But once we get a grasp of internal architecture on how a programming language works, it gets a bit easier. In this project, we are building an interpreter or an expression evaluator to be exact that will evaluate simple arithmetic expressions like for example 2 * (2 + (2 - 1)). For us to understand how to write a program that can evaluate the given string into its actual value, we need to understand a few key concepts which we will get on the way. For now, let's just focus on the bare minimum:

  • Lexer
  • Parser
  • Syntax Tree
  • Expression Evaluator

The Lexer

Lexer is responsible for separating out individual tokens from the text. In the above example, 2, *, (, 2, +, (, 2, -, 1, ) and ) are the individual tokens which have different meaning and priority of evaluation. Lexer is not concerned with the priority of token, that is something an evaluator handles. Lexer is just focused in breaking down the whole input into individual tokens. We will be discussing more in depth about each of these components in depth in their own blog posts. For now, we can understand that a lexer is responsible for simply for breaking down the synatx into individual tokens.

Parser

Once the lexer breaks down input string into tokens, parser takes over. The parser is responsible for generating parse tree. A parse tree is just a data structure (a tree structure) that holds all the tokens. The parser keeps on adding the tokens into the syntax tree until a stop token (usually represented by end of file token) is reached. We will learn in depth about the balance parser in its own blog post.

Syntax Tree

A syntax tree as discussed earlier, is just a tree structure holding the tokens generated by parser. It's leaf nodes are the simpler tokens while the inner nodes are tokens that represent a statement or a block of code. Usually syntax trees are generated for each file. For example, if your program contains multiple files, there will be a syntax tree for each of the file, which later will be combined. In this example however, we are only going to cover one syntax tree per program, meaning our parser can only parse one file at a time.

Evaluator

The evaluator is our final step on the puzzle and it is the one which actually gives us the result for an expression. Evaluator is the part which actually will generate the answer 4 for 2 + 2. We will learn more about evaluators in their own post as well.

Have some questions? Let's get in touch

Related posts