Create your own stack-based programming language

21 min readDec 18, 2024

Understand how a stack-based language works by creating your own!

As a developer, you are generally familiar with high-level languages like Rust, Python, or JavaScript, which introduce concepts like variables, conditions, or functions that make writing programmes in these languages easier.

However, you may already know that the computer doesn’t understand these languages. The computer only understands 1s and 0s. In some cases, the compiler of your language will use an “intermediate language” before compiling to bits. A stack-based language is often used as an intermediate language.

One of the most popular uses of a stack-based language nowadays is the set of EVM opcodes used on Ethereum and all the blockchains that use the EVM. Its also a stack-based language on the Tezos blockchain called Michelson that sparked my interest in these languages.

In this tutorial, we will build together a simple stack-based language called “lifo” (for “Last In First Out”) so you can learn how this kind of programming language works by building one.

Let’s get started 👨🏻‍💻

Prerequisites

LIFO is written in Rust. For the lexer, we will use an existing crate called “logos” that will help us focus on the language itself instead of focusing on the lexer.

You will need a beginner to intermediate level in Rust, we won’t do anything too complicated, but you need to understand how Rust works.

If you are impatient, you can check the repo on Github.

Note: I will often use “SBL” instead of “slack-based language” in this article.

What is a stack-based programming language?

As the name implies, a “stack-based” programming language is a language that uses a stack.

A “stack” in programming is a very easy concept to understand. Your programme stores the values it uses one on top of the other. It then manipulates these values according to the instructions it receives.

The stack works on the “last in first out” principle (hence the name of our programming language): you can only use the value (or values, depending on the instruction) that are directly at the top of the stack, i.e. the ones that were added last.

If you want to use a value that’s deeper in the stack, you have to consume the ones that are above it or use an instruction available in the set of instructions to make it come to the top of the stack.

The instructions available in the set are generally of two kinds:

the instructions that add new values to the stack: for example, PUSH adds a new value on top of the stack and MUL multiplies the two values on top of the stack, consumes them and pushes the result on top of the stack.
the instructions that manipulate the stack: for example, SWAP changes the order of the two values on top of the stack and POP removes the first value on the stack.

Note: we say that an instruction consumes a value when it removes it from the stack. Very often, an instruction that operates on a value will consume it, that’s why it is important to duplicate it with DUP if you intend to use that value again later.

Let’s continue and take a look at the primitive types in Ligo.

Lifo’s primitive types & the stack

Because Lifo is designed to be an basic example of a stack-based language, the primitive types are quite simple:

Integer (INT): this type represents unsigned integers. Because it translates to usize in the underlying Rust, it can store a value large enough for our purpose
Boolean (BOOL): this type represents a boolean value, true or false
String (STRING): this type represents a string, it translates to a String in the underlying Rust
Vector (VEC): this represents a vector of values of the same type, it translates to a vector in Rust, it can hold values of type INT , BOOL or STRING but not of type VEC

These types are available to the developer who writes code in Lifo. The lexer in Rust also has types that represent the stack and the values it contains:

#[derive(Debug, Clone, PartialEq)]
pub enum StackElValue {
    Int(usize),
    String(String),
    Bool(bool),
    Vector(LifoVector),
}

#[derive(Debug, Clone, PartialEq)]
pub struct StackEl {
    pub token: Token,
    pub value: StackElValue,
}

pub type Stack = Vec<StackEl>;

Unsurprisingly, the Stack type is a vector of StackEl. The StackEl is a struct with 2 fields: the token field represents the instruction that added the element onto the stack and the value field holds the value on the stack.

The value itself is an enum that represents the 4 primitive types of Lifo with the value held in the associated data.

The vector has a special type, LifoVector:

#[derive(Debug, Clone, PartialEq)]
pub enum LifoVector {
    EmptyVector,
    VectorOfInt(Vec<usize>),
    VectorOfBool(Vec<bool>),
    VectorOfString(Vec<String>),
}

An enum is required here because we want Lifo to be able to push empty vectors for which the type will only be known after a value is added to it.

More about vectors at the end of the article.

The lexer

This article does not focus on building a lexer for a stack-based language, but on understanding how such a language works, so we want to use an existing solution.

This tutorial uses the logos crate, an excellent and fast lexer that will get us up and running in no time.

The lexer is based on an enum called Token where each field describes the different tokens of the language. In a stack-based language, a token is either an instruction or a parameter that follows an instruction. By convention, instructions in Lifo are in uppercase.

For example, ADD is a token (the instruction itself) while PUSH 44 is made of 2 tokens: PUSH is the instruction token and 44 is the parameter token.

Let’s set up the Token enum:

use logos::Logos;

#[derive(Default, Debug)]
pub struct State {
    pub stack: Stack,
    pub prev_op: Token,
    current_label: Option<String>,
}

#[derive(Logos, Debug, PartialEq, Clone, Default)]
#[logos(skip r"[ \t\n\f]+")]
#[logos(extras = State)]
#[logos(error = LexingError)]
pub enum Token {
  ...
}

The enum takes a few attributes:

it must derive a few implementations, including the default one, as you need to set a default behaviour for the lexer when it encounters a token it doesn’t know
#[logos(skip r"[ \t\n\f]+")] tells the lexer to skip white spaces, tabs and new lines, as they are not meaningful tokens in Lifo
#[logos(extras = State)] : logos allows you to keep track of a mutable state as you go through the different tokens, our state here is a struct that will keep track of the state of the stack, the previous operation and the current label (if any)
#[logos(error = LexingError)] : logos also allows you to set your own errors, we will use an enum to represent where the error happens and what it means (you can have a look at the full representation here)

Now, let’s add a few tokens and see how it works!

PUSH, DUP, SWAP and POP

Here are the first instructions you will use in Lifo:

#[derive(Logos, Debug, PartialEq, Clone, Default)]
#[logos(skip r"[ \t\n\f]+")]
#[logos(extras = State)]
#[logos(error = LexingError)]
pub enum Token {
    #[regex("[0-9]+", priority = 1)]
    Int,

    #[regex("PUSH_INT", op_push_int)]
    PushInt,

    #[regex("DUP", op_dup)]
    Dup,

    #[regex("POP", op_pop)]
    Pop,

    #[regex("SWAP", op_swap)]
    Swap,

    ...

    #[default]
    #[regex(
        "[a-zA-Z0-9_:-]+", 
        priority = 0, 
        callback = invalid_token
    )]
    Invalid,
}

Every branch of the enum receives an attribute. Logos lets us set 2 attributes: token to match the exact word or regex to match a regular expression.

The INT token

The first type of token added to Lifo is the integer. It uses the regex attribute with a regular expression as a parameter that catches 1 or more digits between 0 and 9. Every number that matches this rule will be considered as an integer in Lifo.

Note 1: in Lifo, integers are used as instruction parameters, so the value always comes after an instruction that takes a parameter, like PUSH_INT. If an integer appears alone, it will just be ignored.
Note 2: the priority = 1 is part of the logos implementation and only indicates that this regex takes precedence over the more general one at the end.

The PUSH_INT instruction

The PUSH_INT instruction will push an integer onto the stack. You use the token attribute because you want to match the exact token PUSH_INT. This attribute takes two parameters:

the exact token to match
an optional function to run when the token is matched

Let’s have a look at op_push_int, the callback function that will run:

fn op_push_int(lex: &mut Lexer<Token>) -> Result<(), LexingError> {
    match lex.next() {
        Some(Ok(Token::Int)) => {
            let val = lex.slice().parse().unwrap();
            lex.extras
                .stack
                .push(StackEl::new(Token::Int, StackElValue::Int(val)));
            Ok(())
        }
        _ => Err(LexingError::InvalidInteger(String::from(lex.slice()))),
    }
}

Every callback function of a token attribute receives an argument of type &mut Lexer<Token>. This type presents different properties, two of which are going to be important in this tutorial:

extras : this is the state of the program where you keep track of the stack and other properties
next() : the next() function allows you to skip the current token and parse the next token, it is useful for instructions that take a parameter like PUSH_INT

The next() returns an option that indicates if there is a following token or not. In the case of PUSH_INT, there must be a following token and this token must be of type Token::Int.

This is a must, the execution of the programme must stop if there is no token after the instruction and if the following token is not an integer.

After this is verified, you can access the value itself with lex.slice() that must be parsed and unwrapped. Because PUSH_INT doesn’t consume any value on the stack, we can add the integer on top of the stack and update the state.

If the token following PUSH_INT is not an integer, it is important to return an error so the execution of the program can stop. We set up an enum called LexingError where each branch represents an error that can happen when parsing the code followed by an error message.

For example, here: LexingError::InvalidInteger(String::from(lex.slice()) indicates that the expected integer after PUSH_INT is missing and returns the erroneous value.

After implementing PUSH_INT, you can go further and implement PUSH_BOOL and PUSH_STR to add booleans and strings to the stack.

The DUP instruction

One of the most common instructions in a stack-based language is DUP because a lot of instructions consume the value they operate on and you need to duplicate it if you want to be able to reuse it later.

Here is the callback function that is called when the program encounters the DUP instruction:

pub fn dup(stack: &Stack) -> Result<Stack, String> {
    if stack.len() < 1 {
        return Err(String::from("Stack must be at least 1 element deep"));
    }

    let new_value = stack[0].clone();
    Ok(vec![vec![new_value], stack.clone()].concat())
}

fn op_dup(lex: &mut Lexer<Token>) -> Result<(), LexingError> {
    match dup(&lex.extras.stack) {
        Ok(new_stack) => {
            lex.extras.stack = new_stack;
            Ok(())
        }
        Err(err) => Err(LexingError::DupError(err)),
    }
}

The function checks that there is at least one element on the stack (you can’t duplicate nothing!), then it clones the value on the top of the stack and pushes its clone in the first position.

The POP instruction

In addition to duplicating a value, deleting it from the stack can be very useful, and that’s what the POP instruction achieves!

Here is what the code looks like:

pub fn pop(stack: &Stack) -> Result<Stack, String> {
    if stack.len() < 1 {
        return Err(String::from("Stack must be at least 1 element deep"));
    }

    Ok(stack[1..].to_vec())
}

You can see that it is a simple instruction, the only condition to run is to have at least one element on the stack. If that’s true, you can remove the element at the top of the stack and save the new version of the stack.

The SWAP instruction

The last operation that manipulates the order of the elements on the stack is SWAP.

SWAP allows you to put the first element on the stack in the second position and the second element in the first position. This can be useful if you want to consume the value that is in the second position on the stack and keep the first element for later.

pub fn swap(stack: &mut Stack) -> Result<Stack, String> {
    if stack.len() < 2 {
        return Err(String::from("Stack must be at least 2 elements deep"));
    }

    stack.swap(0, 1);

    Ok(stack.to_vec())
}

You have to check that the stack is at least two elements deep, and then Rust provides a swap method on the values of type Vec that makes it even easier to implement!

The default attribute

What happens if the developers write an instruction that is not recognized by the lexer? You have to let them know 🙂

This is why you will add an Invalid token to represent the tokens that the lexer doesn’t understand:

#[default]
#[regex(
    "[a-zA-Z0-9_:-]+", 
    priority = 0, 
    callback = invalid_token
)]
Invalid,

This branch of the Token enum takes the default attribute to tell the lexer to use it when the current token doesn’t match any other branches. There is also a regular expression to catch the token in question, which also means that unknown tokens that don’t match the regex will be ignored.

The invalid_token function is used as a callback:

fn invalid_token(lex: &mut Lexer<Token>) -> Result<(), LexingError> {
    if lex.extras.prev_op == Token::Jump {
        Err(LexingError::InvalidLabel(lex.slice().to_string()))
    } else {
        Err(LexingError::InvalidToken(lex.slice().to_string()))
    }
}

The function will generate a different error based on the previous instruction: if the unknown token happens after a JUMP (we will see this instruction later), it returns an InvalidLabel error, otherwise, it returns an InvalidToken error.

Arithmetic operations

Now, let’s do something more fun, calculations 🤓

You can add a few new tokens to the lexer:

#[derive(Logos, Debug, PartialEq, Clone, Default)]
#[logos(skip r"[ \t\n\f]+")]
#[logos(extras = State)]
#[logos(error = LexingError)]
pub enum Token {
    [...]

    #[token("ADD", op_add)]
    Add,

    #[token("SUB", op_sub)]
    Sub,

    #[token("MUL", op_mul)]
    Mul,

    [...]
}

The first token is ADD. You may have guessed that it will perform an addition.

Here is the function that will run when the ADD token is matched:

pub fn add(stack: &Stack) -> Result<Stack, String> {
    if stack.len() < 2 {
        return Err(String::from("Stack must be at least 2 elements deep"));
    }

    match (
        &stack[0].token,
        &stack[0].value,
        &stack[1].token,
        &stack[1].value,
    ) {
        (Token::Int, StackElValue::Int(val1), Token::Int, StackElValue::Int(val2)) => {
            let new_value = val1 + val2;
            let new_stack = vec![
                vec![StackEl::new(Token::Int, StackElValue::Int(new_value))],
                stack[2..].to_vec(),
            ]
            .concat();
            Ok(new_stack)
        }
        _ => Err(String::from("Only integers can be added")),
    }
}

fn op_add(lex: &mut Lexer<Token>) -> Result<(), LexingError> {
    match add(&lex.extras.stack) {
        Ok(new_stack) => {
            lex.extras.stack = new_stack;
            Ok(())
        }
        Err(err) => Err(LexingError::AddError(err)),
    }
}

Note: these two functions are in different parts of the code base to keep things clean, they are presented here together for conciseness

You pass the stack to the add function that will perform the required checks and changes:

the function checks that the stack is at least 2 elements deep, it wouldn’t make any sense to add 1 or 0 elements, there must always be two
the function matches the type of token and its value to verify that the two elements on the top of the stack are integers
if that’s the case, the two values are added together and a new stack is created by consuming (i.e. removing) the two elements on top of the stack and pushing the result of the addition

It is crucial to verify that the stack is of the expected shape and that the values on the stack are of the expected type. This is why it is essential to set up a strong typing system with clear errors. The programme must immediately abort if the stack is not correct.

The SUB and MUL instructions work similarly:

You check that there are at least two elements on the stack to perform the operation
You check that the two elements are integers
You fetch the values and subtract them or multiply them
You remove the two values and push the result on the stack

If you inspect the code, you can see that the subtraction instruction is a little special:

pub fn sub(stack: &Stack) -> Result<Stack, String> {
    if stack.len() < 2 {
        return Err(String::from("Stack must be at least 2 elements deep"));
    }

    match (
        &stack[0].token,
        &stack[0].value,
        &stack[1].token,
        &stack[1].value,
    ) {
        (Token::Int, StackElValue::Int(minuend), Token::Int, StackElValue::Int(subtrahend)) => {
            if minuend < subtrahend {
                return Err(String::from("Subtraction overflow"));
            }
            let new_value = minuend - subtrahend;

            let new_stack = vec![
                vec![StackEl::new(Token::Int, StackElValue::Int(new_value))],
                stack[2..].to_vec(),
            ]
            .concat();

            Ok(new_stack)
        }
        _ => Err(String::from("Only integers can be subtracted")),
    }
}

Because the integers in Lifo are all unsigned, it is not possible to get a negative value and subtracting two values that would yield a negative integer causes an underflow error.

JUMP and labels

If you have ever inspected SBL code, you may have seen instructions that look like JUMP, JUMPI or GOTO. These instructions are essential to implement basic conditions in your language.

Lifo uses the JUMP instruction followed by a label.

What is a label?

Before you ask your program to skip a set of instructions, you have to give it a stopping point, a place in its execution where it can start reading the instructions again.

This destination will be marked by a label. In Lifo, a label must be a string of alphabetical characters and underscores. The label will be set in memory and the program will skip all the instructions until it meets the label.

Implementing the JUMP instruction

The syntax will be easy: JUMP label_name. Three next tokens will be added to the lexer:

#[derive(Logos, Debug, PartialEq, Clone, Default)]
#[logos(skip r"[ \t\n\f]+")]
#[logos(extras = State)]
#[logos(error = LexingError)]
pub enum Token {
    [...]

    #[regex("JUMP", op_jump)]
    Jump,

    #[regex("[a-z_]+", check_label_name)]
    LabelName,

    #[regex("[a-z_]+:", |lex| {
        match &lex.extras.current_label {
            None => Err(LexingError::UnsetLabel(lex.slice().to_string())),
            Some(current_label) =>
                {
                    let mut label = lex.slice().to_string();
                    let _ = label.pop(); // removes the colon at the end of the string
                    if &label == current_label {
                        // same label
                        lex.extras.current_label = None;
                    }
                    Ok(())
                }
        }
    })]
    Label,

    [...]
}

As you may have guessed from the code, the token that follows the JUMP instruction is LabelName while the actual label is Label.

Here is the function that runs when a JUMP instruction is met:

fn op_jump(lex: &mut Lexer<Token>) -> Result<(), LexingError> {
    lex.extras.prev_op = Token::Jump;

    match lex.next() {
        Some(Ok(Token::LabelName)) => {
            let label = lex.slice();
            lex.extras.current_label = Some(label.to_string());
            Ok(())
        }
        _ => Err(LexingError::InvalidLabel(String::from(lex.slice()))),
    }
}

There is a new step in this code: lex.extras.prev_op = Token::Jump. Indeed, a label is a valid token in the Lifo language, but it is only valid if it is a parameter of the JUMP instruction. However, once you call lex.next(), you do not have access to the previous instructions, so you need to keep at least the previous instruction in memory. This is what this line of code does.

After that, you can parse the next token, which must be a LabelName. Parsing a LabelName token triggers the following function:

fn check_label_name(lex: &mut Lexer<Token>) -> Result<(), LexingError> {
    match lex.extras.prev_op {
        Token::Jump | Token::Jumpi => {
            lex.extras.prev_op = Token::Invalid;
            Ok(())
        }
        _ => Err(LexingError::InvalidOpcode(String::from(lex.slice()))),
    }
}

This function makes sure that the token before a label name was a JUMP instruction and then resets the lex.extras.prev_op value to Token::Invalid (the default value).

If the label name follows the rules of Lifo, it is stored in the state of the program under the lex.extras.current_label property.

Ignoring the code until the label

Once a label name has been registered in the state, you have to tell the code that it must skip the following instructions until the label is reached.

You can create a function that will check if a label is set:

fn label_is_set(current_label: &Option<String>) -> bool {
    match current_label {
        None => false,
        Some(_) => true,
    }
}

If a label is set in the state of the program, the function returns true. You can then use this value to skip the execution of an instruction.

Matching the label

When the lexer finds a label that matches the regex [a-z_]+:, it verifies if the label name exists in the state. If it does, it removes it, so the lexer can start executing instructions again.

#[regex("[a-z_]+:", |lex| {
    match &lex.extras.current_label {
        None => Err(LexingError::UnsetLabel(lex.slice().to_string())),
        Some(current_label) =>
            {
                let mut label = lex.slice().to_string();
                let _ = label.pop(); // removes the colon at the end of the string
                if &label == current_label {
                    // same label
                    lex.extras.current_label = None;
                }
                Ok(())
            }
    }
})]
Label,

The first thing that the lexer does is check if the label has been set. If not, it will throw an UnsetLabel error with the name of the unset label.

If there is a registered label, the lexer will compare it with the label it just encountered. If it is the same one, it will reset the current_label property of the state so that the following instructions can be executed.

Otherwise, it will skip it and continue ignoring the instructions until it finds a matching label.

Vectors

To finish, let’s try to implement a more complex data structure in Lifo.

Until now, you can only store and read simple values like integers, strings and booleans. In this last paragraph, you will allow Lifo to create, manipulate and read vectors.

This is how vectors work in Lifo:

you use the EMPTY_VECTOR instruction to create a new empty vector. The type of the elements it will hold is not indicated when the vector is created.
the elements must be pushed into the vector one by one, the first one being pushed will dictate the type of all the other elements (all the elements must be of the same type)
the SIZE instruction returns the number of elements in the vector
the INDEX instruction returns the element at the provided index

Creating an empty vector

Remember the StackElValue enum that represents in Rust the kind of value that Lifo handles?

#[derive(Debug, Clone, PartialEq)]
pub enum StackElValue {
    Int(usize),
    String(String),
    Bool(bool),
    Vector(LifoVector),
}

The argument of each branch must be typed, but because you won’t specify the type of the elements the vector will hold, you cannot use Vec<...> for the StackElValue, this information is not available now.

Instead, you create an enum called LifoVector to represent the different types of vectors:

#[derive(Debug, Clone, PartialEq)]
pub enum LifoVector {
    EmptyVector,
    VectorOfInt(Vec<usize>),
    VectorOfBool(Vec<bool>),
    VectorOfString(Vec<String>),
}

impl LifoVector {
    pub fn new() -> Self {
        return LifoVector::EmptyVector;
    }
}

Now, you have a branch called EmptyVector to represent a newly pushed vector with no element.

LifoVector also implements a method called new to create a new empty vector.

When the lexer encounters the EMPTY_VECTOR instruction, this is what happens:

fn op_empty_vector(lex: &mut Lexer<Token>) -> Result<(), LexingError> {
    let new_value = 
        StackEl::new(
            Token::EmptyVector, 
            StackElValue::Vector(LifoVector::new())
        );
    lex.extras.stack = vec![vec![new_value], lex.extras.stack.clone()].concat();

    Ok(())
}

The new method is called and a new empty vector is added on top of the stack.

Inserting elements in the vector

Now that there is an empty vector, it is time to fill it!

Let’s introduce the INSERT instruction. Before using it, the stack must have the following state:

At index 0, there must be a value to insert in the vector
At index 1, there must be a vector that accepts the type of value present at index 0

You can add an insert method in the implementation of the LifoVector to make it easier to add elements to the vector and check that the element being added is correct:

impl LifoVector {
    pub fn insert(self, el: StackElValue) -> Result<Self, String> {
        match (self.clone(), el.clone()) {
            // vector is empty
            (LifoVector::EmptyVector, StackElValue::Int(val)) => {
                Ok(LifoVector::VectorOfInt(vec![val]))
            }
            (LifoVector::EmptyVector, StackElValue::Bool(val)) => {
                Ok(LifoVector::VectorOfBool(vec![val]))
            }
            (LifoVector::EmptyVector, StackElValue::String(val)) => {
                Ok(LifoVector::VectorOfString(vec![val]))
            }
            // vector is already populated
            (LifoVector::VectorOfBool(mut vector), StackElValue::Bool(val)) => {
                vector.push(val);
                Ok(LifoVector::VectorOfBool(vector))
            }
            (LifoVector::VectorOfInt(mut vector), StackElValue::Int(val)) => {
                vector.push(val);
                Ok(LifoVector::VectorOfInt(vector))
            }
            (LifoVector::VectorOfString(mut vector), StackElValue::String(val)) => {
                vector.push(val);
                Ok(LifoVector::VectorOfString(vector))
            }
            _ => {
                let vector_type = match self {
                    LifoVector::EmptyVector => "unknown",
                    LifoVector::VectorOfBool(_) => "bool",
                    LifoVector::VectorOfInt(_) => "int",
                    LifoVector::VectorOfString(_) => "string",
                };

                Err(format!(
                    "Cannot insert value of type {} into vector of type vector<{}>",
                    el.to_string(),
                    vector_type
                ))
            }
        }
    }
}

When the instruction is executed, two situations can happen: either the vector is empty or there are already elements in it.

If it’s empty, you can add the element to the underlying Rust vector and return a LifoVector enum that reflects the type of elements that it can accept.

If it is already populated, you must check that the element that must be inserted is of the correct type and if it matches, you can insert it.

Don’t forget to return useful error messages along the way for the developer who will use your awesome language!

After this is implemented, you can create the function that will update the stack. It must verify that the stack has the right state and it must pop the element and the vector from the top before pushing the new vector:

pub fn insert_vector(stack: &mut Stack) -> Result<Stack, String> {
    if stack.len() < 2 {
        return Err(String::from("Stack must be at least 2 elements deep"));
    }

    match (&stack[0].value, &stack[1].value) {
        (val_to_insert, StackElValue::Vector(lifo_vector)) => {
            let new_vector = lifo_vector.clone().insert(val_to_insert.clone())?;
            let stack_val = StackElValue::Vector(new_vector);
            // removes the 2 values on top of the stack and pushes the new vector
            let new_stack = vec![
                vec![StackEl::new(Token::InsertVector, stack_val)],
                stack[2..].to_vec(),
            ]
            .concat();

            Ok(new_stack)
        }
        _ => Err(String::from(
            "Invalid stack to insert an element in a vector",
        )),
    }
}

fn op_insert_vector(lex: &mut Lexer<Token>) -> Result<(), LexingError> {
    match insert_vector(&mut lex.extras.stack) {
        Ok(new_stack) => {
            lex.extras.stack = new_stack;
        
            Ok(())
        }
        Err(err) => Err(LexingError::InsertError(err))
    }

}

Reading the size

It can sometimes be useful to know the size of a vector, for example, to limit the number of elements it contains.

You will now implement the SIZE instruction that will push an integer on the stack that represents the number of elements in the vector.

The instruction must be called when the top element on the stack is a vector. In Lifo, the instruction will also consume the vector, so if you need it later, you must DUP it before reading its length:

pub fn size(stack: &Stack) -> Result<Stack, String> {
    if stack.len() < 1 {
        return Err(String::from("Stack must be at least 1 element deep"));
    }

    let new_value = match stack[0].clone().value {
        StackElValue::String(val) => Ok(StackEl::new(Token::Size, StackElValue::Int(val.len()))),
        StackElValue::Vector(val) => {
            let size = match val {
                LifoVector::EmptyVector => 0,
                LifoVector::VectorOfBool(vec) => vec.len(),
                LifoVector::VectorOfInt(vec) => vec.len(),
                LifoVector::VectorOfString(vec) => vec.len(),
            };
            Ok(StackEl::new(Token::Size, StackElValue::Int(size)))
        }
        _ => Err(format!(
            "Cannot give the size of element of type {}",
            stack[0].value.clone().to_string()
        )),
    }?;

    let new_stack = vec![vec![new_value], stack[1..].to_vec()].concat();

    Ok(new_stack)
}

fn op_size(lex: &mut Lexer<Token>) -> Result<(), LexingError> {
    if label_is_set(&lex.extras.current_label) {
        return Ok(());
    }
    
    match size(&mut lex.extras.stack) {
        Ok(new_stack) => {
            lex.extras.stack = new_stack;
            Ok(())
        }
        Err(err) => Err(LexingError::SizeError(err)),
    }
}

Note: Lifo also uses the SIZE instruction to determine the size of a string. You can see here that the same instruction can be used for different types of values according to the value on top of the stack when the instruction is called.

Returning an element by its index

Why would you want to store items in a vector if you can’t retrieve them later when you need them?

This is why you can now implement an instruction to retrieve an element in a vector by its index: INDEX [unsigned integer] where the unsigned integer represents the index of the element.

Keep in mind that you must check that the index is not out of bound to keep Lifo safe:

pub fn index(stack: &Stack, index: usize) -> Result<Stack, String> {
    if stack.len() < 1 {
        return Err(String::from("Stack must be at least 1 element deep"));
    }

    let new_value = match stack[0].clone().value {
        StackElValue::String(val) => {
            if val.len() < index {
                Err(format!(
                    "Out of bound index {} for string of length {}",
                    index,
                    val.len()
                ))
            } else {
                Ok(StackEl::new(
                    Token::Index,
                    StackElValue::String(val.chars().nth(index).unwrap().to_string()),
                ))
            }
        }
        StackElValue::Vector(val) => match val {
            LifoVector::EmptyVector => {
                Err(format!("Out of bound index {} for empty vector", index))
            }
            LifoVector::VectorOfBool(vec) => {
                if vec.len() < index {
                    Err(format!(
                        "Out of bound index {} for vector of length {}",
                        index,
                        vec.len()
                    ))
                } else {
                    Ok(StackEl::new(Token::Index, StackElValue::Bool(vec[index])))
                }
            }
            LifoVector::VectorOfInt(vec) => {
                if vec.len() < index {
                    Err(format!(
                        "Out of bound index {} for vector of length {}",
                        index,
                        vec.len()
                    ))
                } else {
                    Ok(StackEl::new(Token::Index, StackElValue::Int(vec[index])))
                }
            }
            LifoVector::VectorOfString(vec) => {
                if vec.len() < index {
                    Err(format!(
                        "Out of bound index {} for vector of length {}",
                        index,
                        vec.len()
                    ))
                } else {
                    Ok(StackEl::new(
                        Token::Index,
                        StackElValue::String(vec[index].to_string()),
                    ))
                }
            }
        },
        _ => Err(format!(
            "Cannot index element of type {}",
            stack[0].value.clone().to_string()
        )),
    }?;

    let new_stack = vec![vec![new_value], stack[1..].to_vec()].concat();

    Ok(new_stack)
}

First, you have to make sure that there is at least one element on top of the stack.

Then, you can match the vector by the type of its elements and extract it from the LifoVector value that contains it. Note how matching against the vector type is useful as we will need the type to create the value of a new StackEl that will be pushed onto the stack.

You must verify that the vector is not empty, or you must stop the execution and raise an error.

After that, you must check that the requested index is not out of bound, i.e., if the vector has n elements, the maximum value of the index is n — 1.

As it is the tradition, vectors in Lifo are zero-indexed.

If all that is correct, you can create a new StackEl and push it onto the stack. Otherwise, the program's execution must stop and an error must be returned.

Note: just like SIZE, INDEX also works with strings and extracts the character at the provided index.

To go further

Lifo is a very simple stack-based language with probably no use case 😅

But it’s enough to serve as a base and add more features.

If you check the Github repo, you will notice that I implemented some other instructions that do not appear in this tutorial like EQ, NEQ, CONCAT, JUMPI or LOG.

But you can create your own features and improve Lifo! For example, you could add more types of value that can be manipulated by the language. Until now, it only accepts values of type integer, boolean, string and vector, but you could also have enum or struct.

You could also add more instructions to manipulate the current values, like an instruction to pop the first element of a vector, an instruction to reverse the order of the elements in a vector, an instruction to split a vector or a string, etc.

If you liked the lexer from logos and want to tackle an even harder challenge, you could create a scripting language that compiles to Lifo!

The possibilities are endless, but whatever you build, don’t hesitate to leave a comment and to tell me about it!

Conclusion

Lifo is a perfect minimalistic example of a stack-based language.

Throughout this tutorial, you learn how a stack-based language works, how the instructions are applied and how the stack evolves.

Not only that, but you also learned how a lexer works and how it reads and processes the instructions you wrote.

The knowledge you gained in this article will help you tackle more complex SBLs like Huff or RISC-V.

And who knows, you may even want to take a look at Assembly and write instructions in what is the mother of all modern programming languages!