A few days ago, in an attempt to understand how languages are written, I tried to write one myself
I was not really looking forward to writing a tokenizer/lexer from scratch, and was more interested in the implementation of the language.
So I decided to use a parser generator,
after trying out Jison (javascript) and Antlr (Java), I chose Antlr
Antlr’s support for generating targets in
multiple languages (Java, Javascript, Ruby, Python) made me choose it over Jison.
What this language should support?
- Type Inference
- Variables (Int, String, Boolean, Null)
- Printing to stdout
- Branching with conditionals like
if
,else if
andif
- Loops
while
(and should handle nested loops) - Arithmetic operations
- Basic string operations
- Implicit type casting
Grammar
First, I define a tokenizing rules and grammar in BNF format.
BNF
1 | grammar Baritsu; |
I then use Antlr to generate a parser generator for this grammar.
I initially started using listener pattern, but then switched to the visitor pattern for more flexibility.
To know the difference, this link is helpful.
The Compiler
class initializes the lexer, the parser and starts the EvalVisitor
‘s visit.
The methods in this class define what happens when a grammar rule is matched
Let us walk through one such method,
def x = 42 + 5
- when
variable_assignment
from the grammar is matched the methodvisitVariable_assignment
is called. - the text of the
ID
(x) will be variableName - the right hand side (42 + 5), will be then visited
- when the value of this RHS expression is
computed, it is assigned to a list of global ``Variable's in the
Environment` scope . - I’ve used the
Environment
class, to group variables in the same context - The
Variable
class allows variable declaration, getting and updating values. - A map <String, Value> scoped to the
Environment
, stores the variables - The primitives in this language are defined in the class
Value
- Each
Value
instance has the datatype, and the data - The
Value
class also exposes methods for typecasting and arithmetic operations
Notes:
- I had to define a function to evaluate the truthyness of non-boolean primitives (for eg: is the integer 0 true or false?)
- A
getType
function returns the inferred datatype of the primitive, this is used when typecasting (For eg: def “There are “ + 13 + “ original Cylon models”) - A
null
type was added - An
undefined
type had to be added to handle invalid data
Final notes
Implementing this language has been a fun learning experience,
this language only has a small subset of features offered by the popular general purpose programming languages, and inspite of this, I learnt much more than expected.
Make me better appreciate, the hard work that goes into implementing and supporting a language.
You can check out the source code of this language here..
Thanks for reading!