Introduction
This book serves as a short introduction and reference to the Brain Aneurysm programming language v1.0,
which I will refer to as basm
throughout the book.
This book will also touch on the companion cli tool of the same name.
The basm transpiler is fully open-sourced at
https://github.com/AtmolanderMimi/basm,
if you encounter any issues while using it please report it there so that I can fix the problem!
Basm is a very simple, assembly-like esoteric programming language made to transpile into Brainfuck (aka bf
for short).
My goal in creating basm was to prove (although it had already been done before) that bf can be used to write any program
a user can think of. Whilst bf is Turing-complete, it is rare to see that Turing-completeness actually exploited to
write generic, everyday programs understandably so. Hopefully, by providing tooling to abstract over writing bf, it can become
feasible to write anything in it.
Making us finally free from the cage of that is this language with a funny name.
By aiming to be transpiled into bf, basm shares many similarities with bf. Certain pieces of logic, like instructions, are transpiled pretty much 1:1 into bf. We even allow writing raw bf in a basm source file. Basm does separate itself from bf where it matters however. I've summed up the pain points I've aimed to solve when writing bf programs:
- Relative nature of memory
- Lack of organization or code-reuse features
- Difficulties with text handling
Relative Nature of Memory
In bf, you can't just say: add cell no.3 to cell no.2
. You need to actually manage the tape pointer and most likely
do mental arithmetic to calculate how many >
and <
you'll need to get to the cells you want.
So, you'd better not mess up your bf tape pointer or you might end up with a silent bug killing
both your program and any hope of creating anything productive with bf that may have inhabited you before.
You may even end up in a situation where you don't know exactly where you are, because you conditionally moved the tape pointer.
This can happen in a loop like this for example: [>]
.
You could say that bf dynamically addresses its memory by default. It says: get the cell -1 from me
. Basm tries to solve this by having a static addressing system.
This means that, in a basm source file, you can safely refer to the 3rd cell and it will always be the third cell
no matter the operations you performed before.
That's possible thanks to the tape pointer being automatically managed by the compiler.
Also, because of the abstraction over raw bf operations, basm makes it impossible to reach a "relative state" without you wanting to. You cannot accidentally write [>]
!
You must implicitly use a RAW
or ASUM
operation with the intent of entering a relative state to enter a relative state. Reaching a relative state with these two operation is valid as it is needed to write things like dynamic array getting/setting. (Implementing these operations is really interesting and involves a lot of language features that I describe throghly in the Writing Relative Code
chapter)
In brief, relative is when you don't know where the cell pointer is in the program. As a rule of thumb when programming in basm: relative bad (mostly), static good (mostly).
Lack of Organisation or Code-reuse Features
Bf has virtually nothing to help developers, maybe that's why it got so popular, maybe people like torture... Welp, I don't! So, I filled basm with plenty of ways to help you abstract over bf.
Firstly, in basm, the base building block of our program are not operators, but instead instructions.
This gives the language a bigger indivisible piece of logic compared to bf.
INCR 0 16;
is much easier to conceptualize than ++++++++++++++++
, don't you think?
Secondly, we have aliases with the ALIS
operation, which allow you to
give name to values, addresses or blocks of code.
In this household (programming language), we call blocks of code "scopes".
Lastly, there are "meta-instructions" which are kinda similar to functions, but in implementation are moreso similar to macros. These meta-instructions, once defined can be used like any in-built instruction in the program. Since bf doesn't have a jump operator and data is separate from code, it's practically impossible to create functions. So, meta-instructions, as the name implies, are simply instructions made up of other lower-level instructions. Hence why they are more similar to macros than functions, they are all inlined when used.
Difficulties With Text Handleing
Bf works with text like any other programming language would, it stores one character by cell in Unicode format.
The problem with that is that bf is not like any other programming language in pretty much all regards!
First and foremost, there is no way to set a cell to a specific value directly. If you want the character 'c',
then hold on tight, because you will need to type +
99 times!
Basm implements both a character literal and a string literal, thus with basm you only need to type INCR 0 'c';
to set cell no.0 to the value of 'c'.
There are also in-built operations for loading and printing strings, which is a normally a very tedious task in bf.
Quick Note on Code Formating Throughout This Book
This book will contain and promote how I prefer to write code in basm. Almost nothing in the language forces a specific formatting standard. For example, to save columns, I like to omit putting a level of tab in the upmost scope.
Like so:
[main] [
// my code here
INCR 0 12;
WHNE 0 0 [
// but i raise for scopes after
DECR 0 1;
];
]
There is nothing in the language forcing you to omit whitespaces, in fact there is nothing in the language dictating how you should use whitespaces (except for separating identifiers). So, keep in mind that whilst I like to follow certain naming and formatting standards, the flexible nature of basm allows you to write it however your heart desires!
Now that you know whether or not basm is for you, let's hop in!
Disclamer
Basic knowledge of bf is required to understand this book. That is, you need to know the operators do and little else. If you read up to here I'm guessing, you're probably fine on that.
Both the language and transpiler are my first foray into language development and are serving as my first real major project in Rust. There have been better implementations of bf transpilable languages which are both more efficient are easier to use. This project is only for my own personal pleasure as a creator and is not aimed at creating efficient bf programs. If you are seeking usable and well informed implementations of bf transpilers look here.
Installing the Cli
First, before getting into programming in basm, you'll need to install the basm transpiler.
You can find compiled binaries of the basm
cli tool in the releases section at https://github.com/AtmolanderMimi/basm or you can choose to compile the Rust code straight from source with cargo
.
(Don't be scared, despite being built 100% Rust, learning basm does not require any Rust knowledge)
Once installed you can use the basm
tool through the terminal.
It comes with a basm to bf transpiler, simple bf code optimizer and a bf interpreter.
To transpile and run any basm file use the run
subcommand like below:
basm run my-file.basm
For a more comprehensive list of capabilities, you can use the help
subcommand or read the chapter on the basm
cli tool.
Our First Program
Like any good programming language, you must start from the basics. The basics in this case being a "Hello, world!" program. Here is how you write one in basm:
[main] [
PSTR 0 "Hello, world!";
]
This source code file, once transpiled through the basm
cli would compile down to this is bf:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.
+++++++++++++++++++++++++++++.+++++++..
+++.[-]++++++++++++++++++++++++++++++++++++++++++++.------------.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.
--------.+++.------.--------.
[-]+++++++++++++++++++++++++++++++++.[-]
This program uses cell 0 as a buffer to increment and output each character in the string "Hello, world!"
consecutively.
Quite simple isn't it? (compared to the bf at least)
Let's start dissecting what it all means.
First, we begin at the top with the [main]
decorator which indicates that the following
scope will be a main field. There can only be one main field per program, and by the name you
have probably already guessed what it is. It's the entry point of our program! Any instruction put inside the scope of the [main]
field will be transpiled and then executed at runtime. In this example, the main field only held one instruction, which is PSTR 0 "Hello, world!"
.
Instructions Statement
"Ok, but what's an instruction in basm?", I hear you asking.
An instruction is the basic building block of basm programs.
They represent an action and are the smallest unit of logic in the programming language.
You can think of them like functions in other programming languages:
They take a set number of typed arguments and operate using these arguments.
In this program, the operation was "print a string" using the cell at index 0
.
Here is the syntax for any instruction statement:
INSTRUCTION_NAME arg1 arg2 .. ;
INSTRUCTION_NAME
is the name of the instruction.
An instruction's name is not required to be fully capitalized,
although it is the standard that I am going to use.
arg1
and arg2
are the arguments to the instruction.
Arguments are separated by whitespaces (don't put commas!).
To end an instruction we need to put a semicolon, and we are done here.
In the case of the PSTR
instruction above:
PSTR
was the name (meaning "print string"),0
was the first argument and"Hello, world!"
was the second argument.
Scopes
As said prior, a [main]
field is made up of a [main]
decorator and a scope. So what is a scope?
A scope is a collection of zero or more instructions or scopes.
Scopes are denoted by matching square brackets which encapsulate their contents.
They can be used to make fields and can serve as arguments to instructions,
such as conditionals and loops.
Scopes also serve as lifetimes for aliases.
Meaning any alias defined within themselves are invalid outside of themselves,
basically serving to limit the lifetime of aliases.
(we will see this later in the Aliases chapter)
The scope syntax is very simple, just put square brackets around stuff! This would be a valid scope:
[] // totally empty
// (comments can be made by writing "//" which will make the rest of a line a comment)
.. and so would this third argument in this WHNE
instruction:
WHNE 0 42 [
INCR 0 1;
];
Like other languages, it is totally valid to write a scope within another scope as a statement. Doing this, inlines the contents of the scope, practically like you would have not written the scope contents in a scope. With what you know now, scope statements are not useful, but you will learn that they can be convenient for making local, temporary aliases and sublogic in parts of code. Note that scope aliases, which you will also see later in the chapter on aliases, can't be inlined like this.
Here is an example of using a scope as an expression: (only instruction need semicolons to terminate their statement, scope statements don't need it)
[main] [
// doing stuff
INCR 0 42;
// oof, the main scope is getting quite crowded let's get out
[
OUT 0;
]
]
Scope statements (for inlining) and scope expressions (for instruction arguments) are written exactly the same. Their expression-ness is dependent on context. When in a scope, a scope is parsed as a statement. Whereas, when in an instruction argument (which is most of the time), a scope is an expression.
Note
From now on, all code will be implicitly contained in [main]
unless otherwise noted for ease of reading (and writing ;)).
In-built Instructions
Any basm source file start with some predefined instructions called called in-built. These instructions are valid wherever and whenever. Basm has very reduced set of in-built instructions for simplicity. This means that any instruction that could be easily reproduced by combining two or more instructions was not included (with some few exceptions). For example, there is no multiplication instruction built in. I said the instruction set was small and, in fact, the set of built-in instructions is so reduced that I can show the 16 of them all here:
Arithmetic
Name | Arguments | Function |
---|---|---|
ZERO | addr | sets the value ofaddr to 0 |
INCR | addr, value | increments the value of theaddr cell by value |
DECR | addr, value | decrements the value of theaddr cell by value |
ADDP | addr1, addr2 | addsaddr2 to addr1 , the result is stored in addr1 (in place) |
SUBP | addr1, addr2 | substractaddr2 from addr1 , the result is stored in addr1 (in place) |
COPY | addr1, addr2, addr3 | copies the value ofaddr1 into addr2 and addr3 |
Control Flow / Loop
Name | Arguments | Function |
---|---|---|
WHNE | addr, value, [scope] | while the value ofaddr cell is not equal to value runs the [scope] . addr is not consumed |
I/O
Name | Arguments | Function |
---|---|---|
IN | addr | takes input from the user and sets it inaddr , behaviour will vary between bf implementations |
OUT | addr | outputs the value ofaddr , addr is not consumed |
LSTR | start_addr, "str" | loads the string character by character into cells from thestart_addr advancing forward |
PSTR | addr, "str" | prints the string character by character using the celladdr as a buffer |
Language / Compilation
Name | Arguments | Function |
---|---|---|
ALIS | ident, value or [scope] | creates an alias to a value or scope namedident . This instruction is purely abstraction |
INLN | [scope] | inlines a scope |
RAW | "str" | includes the string after transpilation, this can be used to include brainfuck operators |
BBOX | addr | moves the tape pointer toaddr |
ASUM | addr | tells to compiler to assume that the tape pointer is ataddr . If that assumption is wrong all cells accesses will be offset |
You can find this table of instructions in the reference section of this book.
Types
Seeing this chart you might be a little confused due to how arguments in the Arguments
column are formatted.
If confusion lasts for more than eight hours contact a doctor (or continue reading).
Of course, there is a reason why arguments are denoted differently, it's because they don't have the same expected type! Each value in basm source code is typed and totally static, meaning that values within the source code can't get mutated at runtime. Only the value of cells in the tape can be mutated during execution. (basm is like one big set of macros, it would be for impossible for values to be mutable at runtime)
Basm has 3 types for its source code values being:
number
denoted by nothingscope
denoted by being surrounded with[..]
string
denoted by being surrounded with".."
Instructions statements will only take arguments with types corresponding to the instruction's arguments' types. You cannot pass in a string where a number is expected, types will not coerce. Instructions in the built-in set have arguments which are ordered by their type and meaning. The way I like to order them follows this little diagram:
address numbers -> pure numbers -> strings -> scopes -> stack pointer number
Number Type
Numbers in basm are the most common type for literals and are probably going to be the type you use most as they are very versatile.
As you may have seen in the table above, numerics usually have one of two purposes.
Either they serve as an address to a cell or they serve as a pure number.
For example, INCR
takes in the arguments addr
and value
which are both numeric,
but their values hold two very different meanings.
The value of addr
(address) denotes the address of the cell to be operated on.
Whereas the value of the passed in value
(pure number) denotes by how much the cell at addr
should be increased.
Be sure to not mix numerics representing address and and numerics representing (pure) numbers as that is a surefire way to create bugs!
There is three ways to write a number literal:
- Via a positive number, ex:
0
,42
,732
- Via a character literal, ex:
'b'
,'F'
- Or by combining two number literals into an expression, ex:
3+'a'
Character literals' values are mapped to their value in Unicode.
Expression are interesting as they allow modularity to arguments.
If I have an array starting at index 3
and I want to zero the first 3 cells in that array,
I could elegantly write it like so:
ZERO 3+0;
ZERO 3+1;
ZERO 3+2;
Another interesting property about expressions is that they themselves are number literals, meaning that we can chain them together like to.
// reminder that values in sources files are computed at compile-time
// expressions don't resolve at runtime!
// 75 - 32 - 1 = 42
INCR 0 'K' - ' ' - 1;
OUT 0;
The way expressions are built is that they take a "base" value and then a "modifier". It merges the two values (the base and modifier value) into one expression and repeats. Currently, modifiers have four types which are implemented.
- Addition:
+
, - Substraction:
-
, - Multiplicaton:
*
, - Integer division (towards 0):
/
Basm uses left to right priority, totally forgoing PEMDAS and any parentheses to denote priority. Here would be the implicit priority of that last example:
((75) - 32) - 1 = 42
A little note on integer division modifiers, basm doesn't do decimal numbers.
This means that division are kinda weird as they can't result a decimal number.
So, all divisions made are truncated (practically, the decimal digits are removed) towards 0.
As a result, 5/3
, which is in normal maths equal to 1.6666.., is equal to 1 in basm.
This also introduces a loss of precision. 10/3*3
is not in fact 10, but 9.
Since we execute from left to right we first compute 10/3
which is 3.3333..
That gets rounded to 3 and then we execute the multiplication modifier on our rounded 3, so we get 9!
The Difference Between Address Numbers and Pure Numbers
You may have noticed that I separate numbers into two "sub-types" based on their meaning. There are "address numbers" which are numeric values which represent addresses to cells and there are "pure numbers" which are numeric values which represent numeric values used for operations. Nothing stops you from using address numbers where pure numbers belong. This is also true the other way around.
This distinction is going to be important once we start naming these numeric values with aliases.
You will see, I prefix address number aliases with A
for "address"
and pure value aliases with V
for pure number "value".
In the built-in's' tables above, arguments named addr
or addr_nth
are address numbers,
whereas arguments named value
are pure numbers.
Scopes
Scopes, as I described in the last chapter, are collections of instructions and other scopes. You can pass them into instructions as arguments by simply writing them in their literal form (which means the same thing as expression here). Here's a reminder of what a scope literal looks like:
[
// ...content here...
]
String
Strings as types are very limited in usage,
they cannot be aliased and you cannot make a meta-instruction with a string arugment.
This means, basically, that the string type is only used by in-built instructions
(i.e: PSTR
, LSTR
and RAW
).
Whilst writing code, it is very rare to actually end up working with string anymore than
loading them into memory or printing them out.
Because of this, string support is currently fairly lacking.
You cannot operate on source code strings in any meaningful way in basm 1.0.
The only power you hold is to summon them through their literal form,
which like many programming languages is simply text between two quotes ("
).
For example:
RAW "this will be included in the transpiled file!
";
Noticed how I needed to actually include a newline character
in the source file for the string to contain it?
This is because basm does not contain escapable characters like \n
as of now.
It also means that you can't write a string containing "
(or a character literal with '
).
That's because the lexer, in the state it currently is,
would freak out if I ever tried to implement escaped characters.
(I really need to rewrite the lexer)
Assumptions Made by Instructions
All instructions both the ones you make and the ones which are built-in will hold some sort of assumptions.
In basm, instructions "consume" the values of the source cells unless otherwise noted.
"consuming", in this context, means that the cell is set to 0.
For example, when we add two numbers together using ADDP
, the second cell, which is not an output,
will inadvertently be zeroed due to the operation we made on it.
Cells which hold the result of operations are of course not zeroed.
Apart from that, instructions also assume that address arguments don't have the same value (aka "alias", but that word has another meaning in basm). An instruction statement passed in with two of the same addresses have unspecified behaviour. Most often, this ends up locking the program in an endless loop, although it can also cause other undesirable behaviour.
Lastly, notably, instructions assume that the values of the cells at output addresses are 0. If that condition is not met, an instruction which "sets a cell to a value" would rather "increment a cell by that value".
Notable Instructions
Now that you know a bit more about what constitutes an instruction, let's look at some notable instructions which may not be immediately intuitive just by reading their description.
WHNE (While Not Equal)
Arguments
Name | Type | Description |
---|---|---|
addr | number | address of the cell being compared against thevalue |
value | number | value that the cell ataddr must reach to stop the loop |
scope | scope | code to be executed in loop while the condition is not resolved |
This instruction is vital, not only for writing good software in basm,
but for writing any software that can be considered Turing-complete in general.
WHNE
is the only source of conditional execution and looping
included within the whole pool of built-in instructions (excluding RAW
, because it's weird).
Thus, all sources of conditionality need to be derive from this.
It is a sad day to be alive, but bf has no other primitive form of conditional than a while not equal to zero with the [
and ]
operators.
This instruction translates directly to a matching pair of [
and]
alongside some +
and -
so that we aren't restricted to equality with zero.
As the name implies the behaviour is to:
- Check if cell at
addr
is equal tovalue
. If it is, exit - Execute the code in
scope
- Return to step 1
Unlike most other operations WHNE does not consume the value of the cell it checks, meaning you don't have to reinitialize the cell after each check.
Example: The fact that the cell is not consumed makes it easy to create something like a counter
// outputs values from 0-99
WHNE 0 100 [
OUT 0;
INCR 0 1;
];
COPY
Arguments
Name | Type | Description |
---|---|---|
addr1 | number | address the source cell to be copied toaddr2 and addr3 , the cell at addr1 is consumed |
addr2 | number | address of a cell receiving a copy of the celladdr1 |
addr3 | number | address of a cell receiving a copy of the celladdr1 |
COPY
is a weird one which is needed because of the weird nature of the weird language which is bf (weirdly).
There is no way to add two numbers together without one disappearing into the void, never to be seen again.
This is why we require COPY
.
If we want operate on a value and still have a copy of it afterwards we will need to have the value twice.
Once to use for the operation and the other to keep it alive.
COPY
was made for this, it takes in one cell and sets the value of two other cells
to the value of the source addr1
cell, consuming it in the process.
This behaviour is essential for building almost any program.
Example: a program outputting the double of the last number forever, or until the cell value limit is reached:
INCR 0 1;
WHNE 0 0 [
OUT 0;
COPY 0 2 3;
ADDP 0 2;
ADDP 0 3;
];
Now, most basm professionals don't want you to know this... but it's acceptable for the two output addresses of COPY
to be the same.
Doing so will copy the value of the cell twice in the same place, practically doubling it.
Although I know it is safe in this circumstance to have two address arguments with the same value due to the fact that I kinda made the language,
this is not a safe bet to take on ANY OTHER instruction! (pretty cool though)
ADDP (Add in Place)
Arguments
Name | Type | Description |
---|---|---|
addr1 | number | cell to be added toaddr2 , output is stored here |
addr2 | number | cell to be added toaddr1 |
You might be surprised to see ADDP
in this list, after all it's an addition operation,
how complicated can it be?
Welp, not all that much, but it still does have a property which is not reflected by its name.
That being the ability to act as a move instruction.
As long as the value at addr1
is 0,
calling ADDP
will simply move the contents of the cell at addr2
to addr1
.
This is because x + 0 = x
, or in layman's terms: adding something to nothing will just return the initial something.
And, since we happen to return that "initial something" to somewhere different from where we took it from, we effectively moved that "something".
So, we can giveADDP
a little pet name likeMOVE
as it really serves as so sometimes.
Example: I want to copy the value of cell 0 to cell 1 without consuming it
INCR 0 42;
// reminder that we can't copy to 0 directly as that would make two pointers alias,
// which in turn would cause an infinite loop
// (no, the doubling trick with COPY was not about that)
COPY 0 1 2;
// we "move" 2 -> 0
ADDP 0 2;
RAW
Arguments
Name | Type | Description |
---|---|---|
str | string | a string to be included in the transpiled code at the appropriate location |
RAW
is a bit of a special case in that it can do the job of all the other instructions.
If I truly wanted to reduce the instruction set to the maximum there would be one instruction
and it would be this.
Now, including a string in the compiled output does not seem like it would be able to do much at first.
Like, ok, we can put comments in the output?
But that's not the main attraction of RAW
. The real reason it is of interest is because you can
smuggle ANY string into the compiled file.
This means that it is possible to include operators like +
, -
, >
, <
, [
, ]
, ,
and .
allowing us to insert raw bf code directly in our basm source file.
With this, we can harvest the full power (it being rather small still) of bf.
This comes with one downside, tape pointer movement, which is usually completely handled by the compiler,
and unmatched brackets are not checked within strings passed to RAW
.
An unmatched bracket will probably just match with a totally unrelated one, messing up jumps made by WHNE
and messing with tape pointer position unintentionally is BAD.
I won't go into much details why it is, but simply put, the compiler needs to know were the pointer is at all times
(unless you plan for it, we'll see that in the relative-code chapter).
Just writing RAW ">";
is enough to completely mess up your program by effectively offsetting all the cells by 1.
With RAW "">";
, we can write dynamic code to our heart content without being limited by basm's static memory addresssing.
The art of writing basm a program capable of harnessing the power of dynamic memory addresssing
is one hard to come by these days,
but in time (reading the chapter on relative code) you shall learn how to master it.
Example: Why would I want to use basm, when bf worked just fine for a "hello, world!"?
// be careful to not include bf operators in your text!
RAW "my hello world program:
";
RAW "++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.";
Which will give us exactly this output, as expected:
my hello world program:
++++++++[>++++[->++>+++>+++>+<<<<]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>->.+<.<.+++.------.--------.>>.>++.
ALIS (Alias)
Oh, wait... we have the whole next chapter just for that one!
Aliases
Aliases are by far the most common abstraction feature you are going to use while writing basm code,
no questions asked.
They allow to alias number values or scopes to identifiers to representing the value.
You can think of aliases like variables in other programming languages, except they are purely immutable constants.
Once an alias is defined, it can be used to substitute having the write the literals forms of values anywhere the alias is valid.
Defining an alias is done via the ALIS
"instruction" (ALIS
can hardly be called an instruction as it does not compile into anything).
For example, if I have a cell located at 42
containing the index of an array, rather than having to memorize its location,
I could simply create an alias to the address like Aindex
with the instructionALIS Aindex 42;
.
The ALIS
Instruction
At the exception of meta-instructions, the ALIS
instruction is the only way to create aliases.
Since I baited you in the last chapter, below is its nicely formatted specification!
Arguments
Name | Type | Description |
---|---|---|
alias_name | identifier | the name of the alias |
value | numeric/scope | the value to be bound to the alias |
(don't worry about the "identifier" type, it's not actually a type. Just that ALIS
is a special boy and can take names as argument)
ALIS
allows creating aliases in the current scope by binding numeric or scope expressions to an identifier.
As an instruction, ALIS
does not compile to anything, it is purely abstraction.
To define numeric aliases and then use the alias in place of any numeric value do as so:
ALIS Aindex 1; // address of the index cell
ALIS Vshrob 2; // some random value
ALIS Vvalue 40+Vshrob; // value of the index
ALIS Vshrob 33; // overwriting the last Vshrob alias
INCR Aindex Vvalue;
Aliases are purely abstraction for writing the value of the bound expressions where the alias is. So, to help visualize, here is what the above program looks like after the aliases have been normalized out:
INCR 1 42;
Notice how Vvalue
is replaced by the value of the expression passed into ALIS
when the alias is created.
This is why Vvalue
got replaced by 42 (40+2) rather than 73 (40+33).
The alias was bound to the value of the expression rat
her than the expression itself,
if it were bound to the expression changes to Vshrob
would change Vvalue
.
Alias Validity
Aliases are valid for the duration of the scope they are defined in. This means that an alias defined in a higher scope cannot reach a lower one. That alias behaviour is the main reason why you would want to insert a scope within another, to have operation specific aliases. Here is an example of alias validity over the program:
ALIS Vsome 1;
// Vsome: 1
[
// Vsome: 1
ALIS Vsome 2;
// Vsome: 2
]
// Vsome: 1
Whilst we are talking about this, might as well mention that, as in the example above, aliases can shadow each other and that not only through scope boundaries! They can also refer to the value they will shadow when defined. Here's an example of a fairly common pattern using aliases:
ALIS Vnum_cats 5;
// ... some operations ...
ALIS Vnum_cats Vnum_cats+3;
That example firsts set Vnum_cats
to 5
then does all the operations between the two ALIS
with 5 as Vnum_cats
.
Lastly, the Vnum_cats
alias is shadowed by the second ALIS
which sets the alias to the value of Vnum_cats+5
.
The compiler normalizes the expression and gets the result of 8 (5+3) which it sets to be the new Vnum_cats
.
Using this, we can reach the what would be expected of variables in other programming languages, but don't get it
twisted, aliases are not variables. They are replaced by their value at compile time
and are totally gone at runtime.
So, don't expect something like this to increment forever:
// THIS DOES NOT WORK!
ALIS Vincrement 0;
WHNE 0 1 [
ALIS Vincrement Vincrement+1;
];
That example would completely fail as the alias in the looped scope doesn't persist over loops.
It will get thrown out at the end of its scope like any other alias,
as thus the value of Vincrement
will always be 1 (0+1) in the looped scope.
Alias Types
There are currently two types which are supported by aliases: number and scope.
Yes, this means you can't alias a string, sadly.
So far, we have only made aliases of numeric values as these are the most common,
but we can also create scope aliases.
Aliases with different types won't overwrite each other, even if they have the same name.
They are treated as completely different.
This means you can have both a numeric alias and scope alias named my_alias
at the same time with no issue.
Basm will automatically use the alias with the fitting type when my_alias
is used.
You create aliases of different types by matching identifiers to different expressions.
If you give a numeric value to ALIS
you will create a numeric alias,
on the other hand if you give it a scope you will get a scope alias.
Scope Aliases
Now, scope aliases are interesting to talk about as they allow you to share and reuse code around your source file, unlike numeric aliases which are only about values. Right now, they are not really interesting, but as you'll learn about meta-instruction you will see one of most interesting use case, as a callback-like thingamabob.
Since there hasn't been an example of alias scopes yet, here is one:
// defining "my_scope" scope alias
ALIS my_scope [
INCR 0 1;
OUT 0;
];
// using my_scope with the scope identifier syntax
WHNE 0 128 [my_scope];
You have probably noticed a difference to how we use numeric aliases, being that we need to use a "scope identifier" to specify our scope alias. A scope identifier is simply an identifier (aka the name of the alias) wrapped by square brackets. This syntax tells the compiler to checks for the scope alias rather than the numeric alias (also it allows the person reading it to know that this is a scope alias being passed as an argument, which is cool).
If you don't use the scope identifier syntax you will get this error:
------------------ [ ERRORS ] ------------------
Error: from Ln 7, Col 12 in "/some/path/to/the/file.basm"
→ alias was not defined
[...] [main] [
ALIS my_scope [
INCR 0 1;
OUT 0;
];
WHNE 0 128 my_scope;
]
[...]
(the second my_scope
is highlighted and underscored in the terminal)
This means that the compiler tried to search for the numeric alias my_scope
,
which obviously doesn't exist, and failed.
There is one more thing I want to add to scope aliases and that's inlining them.
Unlike scope literals you can't simply write the alias down to inline it!
So, you will need to use an instruction I made specifically for that called INLN
.
It takes one scope argument and inlines it, as if you would have written a literal in the file at its place.
Here is an example displaying what I just talked about and an extra property that may not be obvious:
ALIS Vscale 7;
ALIS increment [
INCR 0 Vscale;
];
// we repeat the scope three times
INLN [increment];
INLN [increment];
INLN [increment];
// then three more
ALIS Vscale 12;
INLN [increment];
INLN [increment];
INLN [increment];
OUT 0;
This example, when ran, outputs 42 (7*6) or the *
character if your interpreter outputs as text.
That may seem odd, after all we set Vscale
to 12 halfway through which should give us 57 (7*3 + 12*3), right?
Wrong! Whilst, you can include aliases within a scope alias, they will be immediately normalized (aka replaced)
by their bound value. So, the increment
alias actually looks more like this after being defined:
ALIS increment [
// Vscale was replaced by its value
INCR 0 7;
];
Calculating Fibonacci
Now that you've learnt the basic building blocks of the language, you are probably eager to do something useful with them, but like a kid with un-assorted Lego blocks it may not be as easy as it seems to build something coherent with them. This synthesis chapter is a walk-through for creating a program which calculates the nth number of the Fibonacci sequence. We will first start by creating the basic logic then add user input.
If you want to go ahead and try to build a fib program yourself before reading this explanation, I highly recommend it! Getting your hands dirty is the best way to learn and memorise quirks of the language (and any skill for that matter).
Building the Inputless Version
We are going to be implementing an iterative version of a Fibonacci calculator. (In opposition to the recursive one, because there is no recursion in basm)
If you have never built an iterative Fibonacci calculator in another language before, here is pseudo-code for how it typically looks like:
a = 0
b = 1
for i in [0; nth[:
c = a + b
a = b
b = c
output a
What's nice here is that most of the features present in that pseudo-code example are present in basm.
We can store values like a
, b
and c
in cells and we of course have addition and output.
There one pain point though, which is that unlike in the pseudo-code basm instructions consume their inputs.
This means that we can't feasibly use b
in c = a + b
and a = b
without having to copy it.
That will unfortunately require writing a bit of copying logic boilerplate.
Here's the non-functional pseudo-code/basm hybrid with the pieces which we can easily replace:
// unlike in the example above, we need to set the value of the variables
ALIS Aa 0; // a = 0 (all cells are 0 by default)
ALIS Ab 1;
INCR Ab 1; // b = 1
// this for loop syntax is invalid in basm
for i in range [0; nth[:
ALIS Ac 2; // we define Ac here, because it is operation specific to the loop
ALIS Atmp 3;
COPY Ab Ac Atmp; // c = b and tmp = b
ADDP Ac Aa; // c += a
ADDP Aa Atmp; // a = tmp (which is b)
ADDP Ab Ac; // b = c
OUT Aa; // output a
Notice how ADDP
has the double purpose of both adding and moving values around?
We can be sure that these moves are safe to do because the cells were consumed by prior instructions.
An example of this is with Ab
, we knew it was zeroed from the COPY
instruction,
so it was safe to use ADDP
as a move to move Ac
to Ab
.
It is very important to keep track of which cell has been consumed when writing basm code.
That, not only for being able to move cells to other cells without having to preemptively ZERO
them,
but also for what I like to call "scope hygeine". What it means is that a scope, once executed,
will not leave non-zeroed cells. It is the duty of the scope to zero the cells it uses for operation.
In the case of this example, the cells used for operation in the for
loop scope are Ac
and Atmp
.
Looking at the code we can see that they both get consumed before the end of the scope. Perfect!
If you betray scope hygiene, you might end up trying to use a cell with an arbitrary value,
whereas you expected all cells to be zero by default. By default, in basm,
we assume all cells untouched by the current scope to be zero, this allows us to save operations
having to ZERO
all cells before using them.
Here is an example where not zeroing all cells before the end of the scope is problematic:
[
// .. some operation ..
ALIS Atmp 1;
INCR Atmp 2;
COPY Atmp 0 2;
]
ALIS Aval 0; // we expect *Aval == 0
INCR Aval '!';
OUT Aval; // we should get '!', but get '#'
(In this case the problem is very easy to spot, it is as the project size increases that bugs caused by these kinds of errors become harder and harder to spot)
For the last step in transforming our pseudo-code example into real basm code is to deal the for
loop.
There is no in-built for
loop instruction, so we will have to manually handle our loop's exit condition.
Normally other languages would do this for you, but being basm being low-level we will need to explicitly
create and increase the index as well as check whether that index is out of the range.
ALIS Aa 0;
ALIS Ab 1;
INCR Ab 1;
// we add a cell to store the index
ALIS Ai 2;
// the index we want to reach (it's a constant for now)
ALIS Vdesired_index 11;
// this reads like: while (index != desired_index)
WHNE Ai Vdesired_index [
// because of the extra cell used for the index we need to offset those by 1
ALIS Ac 3;
ALIS Atmp 4;
COPY Ab Ac Atmp;
ADDP Ac Aa;
ADDP Aa Atmp;
ADDP Ab Ac;
// increase the index for each iteration
INCR Ai 1;
];
OUT Aa;
If you did everything well, now the program should output 89!
(or Y
, if it does output this you can use the -m
flag on the basm interpreter to output cells as numbers)
You can try to fiddle with the value of Vdesired_index
to get different numbers from the Fibonacci sequence.
Be weary though, the bf interpreter that comes with the basm cli
defaults to using cells of unsigned 8 bit integers, meaning that numbers are limited from 0-255!
To change the size of the cells, use -c 16
flag to get 16 bit unsigned cells.
Adding User Control
We will now be adding a very small amount of user input to our program: we'll let the user decide which number they want from the Fibonacci sequence. This may seem simple, and it is! Rather than increasing a cell until we reach the desired value, we can simply decrease the cell containing the desired value until we reach zero. It's a bit different from the last version thought since the desired value will be stored in a cell rather than be a constant.
ALIS Aa 0;
ALIS Ab 1;
INCR Ab 1;
// we set a cell to take user input (Value -> Address)
ALIS Adesired_index 2;
IN Adesired_index;
WHNE Adesired_index 0 [
ALIS Ac 3;
ALIS Atmp 4;
COPY Ab Ac Atmp;
ADDP Ac Aa;
ADDP Aa Atmp;
ADDP Ab Ac;
// decrease the index for each iteration
DECR Adesired_index 1;
];
OUT Aa;
Now when running the program, you will be prompted to input something which will be treated as the index.
As you may have already guessed IN
and OUT
transpile directly to ,
and .
.
Make sure you use the -n
flag while using the inbuilt basm interpreter
to parse input as numbers rather than as characters.
The prompt will repeat until you enter a valid number. That being until your input can parsed and contained by the cell.
Input is special in bf as it is the only thing that can straight up overwrite cell data
without caring about what it contained before. (it is also special in that its implementation varies a lot from interpreter to interpreter, but whatever)
Now you have a working interactive Fibonacci calculator!
You check out the generated bf by printing it out via the -p
flag
and make my interpreter sweat by asking it anything above fib(30)
(you will need the -c 32
flag for the result to be correct).
Meta-Instructions
Wow, only now are we onto meta-instructions! Meta-instruction are like functions in other programming languages, except that they aren't. They are actually just multiple instructions joined into one, thus the name meta-instruction. These composite instructions are expanded at compile time into their component built-in instructions. So, they are actually more similar to macros or always inlining functions.
The Meta Fields
To define a meta-instruction you need to use the second and only other field other than [main]
,
which is the [@META arg1 arg2]
field.
A basm source file can have zero or more meta-instructions fields present in a file before the [main]
one.
This field allows you to create a meta instruction of the name META
with the arguments following it,
in the case of the example they would be two numeric arguments named arg1
and arg2
.
Once defined a meta-instruction can be used anywhere a normal built-in instruction can, even in other meta-instruction bodies (excluding those who came before and itself of course).
Arguments
Meta-instructions can take zero or more arguments of numeric or scope type. To specify the arguments you want your meta-instruction to take in, you simply write they names (identifiers) in the field header:
[@META]
: would take no arguments[@META arg1 arg2]
: would take two numeric arguments[@META arg1 [scope]]
: would take one numeric argument followed by a scope argument
Basm will automatically create aliases named after the arguments which are bound to the arguments passed in by the caller.
This means that you can refer to arg1
like any other numeric alias in the scope of [@META arg1]
.
Naming
Meta-instructions cannot share the name of other instructions, must they be built-in's or user defined meta-instructions. Also, quick aside, while I prefer to keep the 4 capital letters naming convention for instruction names, names abide by the same restrictions as aliases and are thus not limited to 4 capital letters or less.
Examples
Meta-instructions are very practical to remove boilerplate. Since I decided to greatly reduce the amount of built-in instructions, you will probably need some to define some quality of life instructions. Thankfully, doing so is both easy on the meta-instruction definition end and in the usage end.
SET
If I want to set the value of cell by incrementing, but I know that it may be not consumed, then I would need to zero it before incrementing the cell, causing boilerplate. It would be much easier to read if both zeroing and incrementing the cell to a value would be the same instruction. Well, with meta-instructions you can implement that instruction yourself:
[@SET Acell Vval] [
ZERO Acell;
INCR Acell Vval;
]
[main] [
// instead of:
ZERO 0;
INCR 0 12;
// you can simply write:
SET 0 12;
]
At compile time, the SET
instruction will be expanded and the aliases will be resolved,
leading the main field looking something like this after meta-instruction inlining:
// instead of:
ZERO 0;
INCR 0 12;
// you can simply write:
ZERO 0;
INCR 0 12;
As you can see, using meta-instruction saves you time and brain!
COPC
(Copy Conservatively)
It would have been useful in the Fibonacci program that we just wrote if we could have copied the value of Ab
to Ac
without having to explicitly use a temporary Atmp
, right? We can made an instruction for that!
This meta-instruction is called COPC
for "copy conservatively".
In this context, being conservative means that this instruction does not consume its inputs,
or rather as we will see while implementing it, they are not consumed when the instruction ends.
[@COPC Asrc Adst sp] [
ALIS Atmp sp;
ALIS sp sp+1;
COPY Asrc Adst Atmp;
ADDP Asrc Atmp;
]
I am guessing you aren't too chocked for the most part by this implementation, we simply encapsulate
the temporary shuffle in a meta-instruction, like we would encapsulate logic in any other language.
Everything is normal to one exception, you are probably asking are yourself what sp
is doing here and what it means.
We cannot simply use a static memory position for Atmp
as we don't know which cells are free to be used when when COPC
is called.
Just using a static address for Atmp
would probably just end up creating a situation where COPC
and another operation are both using that static cell for something.
Instruction which need memory for their operation can include a sp
argument as the last one to ask for where it is safe to "allocate" cells for computation.sp
meaning "stack pointer" it is an address pointing to the first free cell that the instruction can use for its operations.
In the context of basm, allocating a cell is synonymous to "reserving for a cell for an operation".
This is whyAtmp
is defined fromsp
. We know that cell including and after sp
are free to use.
I personally like to always have an alias named sp
, which increment as I allocate more cells, in all scopes.
When I want to allocate a new cells, I create an alias for the address of all cells and then increase sp
by each cell I've allocated.
In the example above, this logic comes in. I create 1 alias for a temporary cell named Atmp
and then increase sp
by 1.
The sp
alias also benefits greatly from alias lifetimes.
When you increase the value of sp
what you really do is create a new copy of sp
that shadows the last one.
This means that when you are done with a specific operation and the scope ends, sp
will decrease naturally.
This practically frees the cells that were allocated in the scope!
Using sp
is especially useful in the [main]
scope where you want to keep track of where you have not allocated yet.
Allocating based on the stack pointer also allows you to seamlessly add a cell without having to offset all later
cell pointers by one (like we had to do in the Fibonacci example).
Here is what Fibonacci would look like with COPC
and the sp
system:
[@COPC Asrc Adst sp] [
ALIS Atmp sp;
ALIS sp sp+1;
COPY Asrc Adst Atmp;
ADDP Asrc Atmp;
]
[main] [
ALIS Aa 0;
ALIS Ab 1;
INCR Ab 1;
ALIS Adesired_index 2;
IN Adesired_index;
ALIS sp 3; // set the sp to the next free cell
WHNE Adesired_index 0 [
// "allocate" cells where the is free space
// this allows us to easily add another cell before without having to touch this
ALIS Ac sp;
ALIS Atmp sp+1;
ALIS sp sp+2; // update sp
// we use sp here
COPC Ab Ac sp;
ADDP Ac Aa;
ADDP Aa Ab;
ADDP Ab Ac;
DECR Adesired_index 1;
];
// aliases being bound by scope helps us here by giving us back
// our old sp from before the WHNE scope, basically freeing a part of the stack
OUT Aa;
]
TWIC (Twice)
There aren't many interesting examples of using scopes as arguments
other than trying to make conditional execution, so I wanted to give an little simple example here before actually getting into conditionals,
because those are a bit complicated.
TWIC
is a instruction that takes a scope and inlines it twice, causing it to execute twice.
[@TWIC [scope]] [
INLN [scope];
INLN [scope];
]
I mean, I have a whole chapter written about using scope as arguments for conditionals, and it's the next one... so let's read it!
Creating Custom Conditionals
When we have one type of conditional execution in a language, we can derive all the other types from it.
Same for looping, if we have a looping element we can make any looping element.
In the case of basm, that any conditional/looping element is WHNE
which we can derive all conditionals and loops we desire from.
In this chapter, we're going to first implement IFNE
(If Not Equal), then from it IFEQ
(If Equal).
First off, to make these meta-instruction conditionals we need to think of their arguments.
We are going to want a cell address and a value so we can compare them together,
a scope to be conditionally executed, and lastly,
we are going to make use of the sp
pattern to use some operation memory.
So, the argument table for all of our conditionals should look like this:
Name | Type | Description |
---|---|---|
addr | number | address of cell compared toval |
val | number | value compared to cell ataddr |
[scp] | scope | code to be conditionally executed |
sp | number | address to the next free cell |
IFNE (If Not Equal)
IFNE
is the easiest conditional to define if we consider that right now we only have WHNE
.
When you think about it, an if statement is simply a conditional loop that loops once.
With that in mind, we'll want to purposefully make WHNE
loop once while also keeping its comparason ability.
Here is how I would go about implementing a one iteration WHNE
loop:
[@IFNE Aaddr Vval [scp] sp] [
WHNE Aaddr Vval [
INLN [scp];
// force Aaddr to be equal to Vval
ZERO Aaddr;
INCR Aaddr Vval;
];
]
You will notice that this implementation works! But, there is some pretty big downsides that we will want to avoid.
We don't want these implementations to consume their inputs, so IFNE
's can be chained.
In this case this implementation sets Aaddr
to the value ofVval
, which is no good.
If we want to chain IFNE
's we will need to manually copy Aaddr
each time we pass it in, which is tedious.
Apart from that, there should be no non-zero allocated cells when inlining the scope as it might corrupt the behaviour of the scope,
which does not expectIFNE
to allocate cells while it is running.
The specifics of what I just said are important, we don't want non-zero allocated cells.
What this means, is that we can allocate cells, but when we inline the scope argument, all of our cells should be zero.
If all our cells are zero, it is as if, to the inlined scope, that there are no allocated cells.
Allocation is only a concept, what matters is whether the cells are zero or not when the scope is inlined.
This notion is important as it will allow the caller to use the same sp
both in the scope and in the IFNE
argument.
If IFNE
required a cell to be allocated when inlining the scope, then all the mentions of sp
in the scope should be increased by 1.
So, let's solve these issues via extra cells that are granted by sp
:
[@IFNE Aaddr Vval [scp] sp] [
ALIS Atmp1 sp;
ALIS Atmp2 sp+1;
ALIS sp sp+2;
// copy to Atmp1 so we can consume it
COPY Aaddr Atmp1 Atmp2;
ADDP Aaddr Atmp2;
WHNE Atmp1 Vval [
// we don't care about Atmp1, it needs to be consumed before the scope
ZERO Atmp1;
// at this point, all of the values allocated in IFNE are zero,
// so it's like we allocated nothing, scp can use the same cells as we just did without causing bugs
INLN [scp];
INCR Atmp1 Vval;
];
// cleanup
ZERO Atmp1;
]
Now let's try it and see if it works:
// .. add the IFNE definition
[main] [
ALIS Aval 0;
INCR Aval 42;
ALIS sp 1;
// this will print ..
// (IFNE and PSTR can use the same sp)
IFNE Aval 33 [
PSTR sp "Aval is not equal to 33!";
] sp;
// .. but not this
IFNE Aval 42 [
PSTR sp "Aval is not equal to 42!";
] sp;
]
Notice how both PSTR
uses the same cells as IFNE
? Yet, they don't collide as we purposefully
zeroed all cells from the IFNE
scope before running the scope argument!
Try it yourself: Move the ZERO Atmp1;
instruction after the inlining
and see what happens when values are still non-zero.
IFEQ (If Equal)
IFEQ
is very much linked to IFNE
. IFEQ
executes only when IFNE
doesn't.
We can make use of relation to easily derive IFEQ
from a couple of IFNE
's.
Rather than copying the value around,
this implementation will make use of a flag representing wheter or not IFNE
was executed.
[@IFEQ Aaddr Vval [scp] sp] [
ALIS Aflag sp;
ALIS sp sp+1;
ALIS Vnot_equal 1;
IFNE Aaddr Vval [
INCR Aflag Vnot_equal;
] sp;
// this reads like "if is not equal to not equal"
// both "not equals" cancel out and we get "if is equal"
// once again all cells are zero here,
// as the flag needs to be of 0 for this to execute
IFNE Aflag Vnot_equal [scp] sp;
// cleanup (only useful if it did not execute)
ZERO Aflag;
]
Once again I encourage you to test it:
// .. IFEQ and IFNE definition here
[main] [
ALIS Aval 0;
INCR Aval 42;
ALIS sp 1;
// this will print ..
IFEQ Aval 42 [
PSTR sp "Aval is equal to 42!";
] sp;
// .. but not this
IFEQ Aval 60 [
PSTR sp "Aval is equal to 60!";
] sp;
]
With that out of the way, it's going to be much faster to write actually useful programs
and now that you know the trade, I am sure you will be able to create your own conditionals!
(If you want to challenge yourself, try making a WHEQ
)
Writing Relative Code
Onto the last remaining chip of knowledge that basm requires: Harvesting the Power of Relative Code! Remember at the very start of this book, where I said that every memory access is static unless otherwise stated? Well, this chapter is about stating you don't want static accessing and would much prefer write relative logic. Sometimes static addressing is either less efficient, harder or simply impossible to use for certain problems. In these cases you will need to reach for relative logic.
But, what is the difference between relative and static in basm exactly?
By default, basm refers to cells in a static manner, meaning that an address will always refer to the same cell (e.g address 1
will always refer to the same cell).
This kind of addressing is known as static.
Relative means that an address will not always refer to the same cell,
it will refer to a cell relative to a dynamic point.
In most cases, that "relative point" is simply where your tape pointer is at a given moment.
Now, by default basm moves your tape pointer automatically in order to access specific addresses.
Basm won't lose track of where the tape pointer is unless you explicitly tell it to do so.
When you ask to operate on a cell, basm will have remembered where the tape pointer currently is, and it will then try to move the tape pointer by the delta between the current position and the desired position.
The tape pointer position that is remembered by the compiler is also known as the assumed tape pointer position,
as it can get temporarily invalidated when working with relative code.
For example, if we have the tape pointer at position 3
and ask to reach cell 5
, basm will automatically insert >>
.
There are three instruction made specifically for dealing with the tape pointer
and be able to write relative code from it.
BBOX
(Black Box)
Arguments
Name | Type | Description |
---|---|---|
addr | number | the address for the tape pointer to be moved to |
BBOX
is probably the most simple instruction in the whole set of built-in ones.
Its one and only purpose is to guaranty the position of the tape pointer at a certain point in the program.
All it does it move the tape pointer to addr
and does nothing.
This behaviour is very useful for other instruction that work off of the assumption that the tape pointer is positioned
at a certain cell.
ASUM
(Assume)
Arguments
Name | Type | Description |
---|---|---|
addr | number | the address overwriting the assumed tape pointer position |
This instruction is tightly integrated with the compiler.
What it does is that it overwrites the assumed tape pointer position with the specified addr
.
This can serve either to put the program into a relative state (where the assumed pointer position is wrong)
or to set the assumed pointer position back to a valid value.
When combined with BBOX
it can serve to offset the whole address field.
Think about it, if the real tape pointer position is 1, but you tell the compiler it's 0 through ASUM
,
then the assumed pointer position will be offset by +1.
Trying to access cell no.0 would lead to accessing cell no.1.
When this happens, I like to call this program state the "relative state", all accesses are relative to 1.
So, when this happens, all addresses become relative to the offset.
INCR 1 42;
OUT 1; // returns '*'
// offsets the pointer by +1 (so cells by -1)
BBOX 0;
ASUM 1;
OUT 0; // returns '*'
This may all see like too much pedantry for something as simple as offsetting all addresses by one. Yes, in this case it is, but that's not what makes relative code interesting. Relative programming in basm starts becoming truly interesting when you loop this offset. Once you start looping, your offset cannot be known at compile time. Looping offsets are even the main reason why anyone would want to write relative code as it allows to create "flyers". Flyers are an operation which is applied while moving forwards (or backwards) through the tape. The word "flyer"/"glider" comes from the glider in Conway's Game of Life where it is a self replicating structure moving through the board. In basm, flyers can include data cells which move with the shift in assumed pointer position. I think it is easier to understand them in bf:
[.>]
This is a basic flyer in bf. It outputs every cell until it reaches a cell with a value of 0.
It may seem like it is useless at first, but it can serve to print strings in memory loaded via LSTR
.
This flyer takes advantage of the fact that strings are not constituted of arbitrary data and that they will not contain a zero cell (at least not until you the string's end).
We can easily create this pattern in basm with BBOX
and ASUM
:
// dynamic print string
// Astart is the first cell
// Aend is the 0 cell right after the last non-zero cell
[@DPSTR Astart Aend] [ // "Dynamic Print String"
BBOX Astr;
ASUM 0;
WHNE 0 0 [
OUT 0;
// move forward
BBOX 1;
ASUM 0;
];
// we set the pointer back to the valid assumption after we are done
ASUM Aend;
]
(If you compile the above example with a [main]
field with only DPSTR 0 0;
, it will compile to exactly [.>]
)
Most notably, this instruction will return the assumed pointer position to a valid value after running.
Logic making use of relative state should always use ASUM
to set back the assumed pointer position
to a valid state.
Since all basm code relies on the fact that addressing is static,
leaving it in a relative state will no doubt make your program bug!
RAW
Arguments
Name | Type | Description |
---|---|---|
str | string | inlines the string in the transpiled code, this can be used to include brainfuck operators |
RAW
is inherently relative by default, as any moves made in it is not considered by the compiler.
As thus, writing RAW ">";
has the same effect using BBOX 1; ASUM 0;
like before.
You have access to the whole of bf, which means that you can very easily cause offsets and write flyers.
While I prefer using the more basm'ic solution, RAW
is a valid solution for your relative programming needs.
Next chapter will focus on applying relative knowledge to create a dynamically indexable array.
Dynamically Indexing
We are now here, all the content chapters are after us and we are on the last stretch to making the final project, being a bf interpreter written in basm. But, before being able to go into that directly, I'd like to isolate a part of that bigger program and built it from scratch prior. To make a bf interpreter we will need a dynamically indexable array.
Parking
Yes, I do say dynamically indexable array and not meta-instruction for dynamically indexing. While it would be possible to index any array dynamically without changing its layout, for the sake of conciseness, we will need to modify the layout of a basic array somewhat to allow for our dynamic get/set operations. To hold meta-instruction specific logic we'll reserve a little amount of memory before the actual array content. That memory region in the array is called the "parking".
Since we are going to use a flyer to move through the array, we need somewhere to create it before it starts moving. The reserved space is going to be used for flyer construction. Our implementation of a dynamically indexable array will need 4 cells zeroed cells for parking before the main contents of the array. Getting and setting will use these spaces in different ways.
GETD
(Get Dynamic)
Arguments
Name | Type | Description |
---|---|---|
Aarray | number | the address of the start of the array (including the parking) |
Aindex | number | the address of the cell to be gotten in the array |
Adst | number | the address that will be used to store the gotten cell |
As mentioned prior, we will be using a flyer with state to move through the array.
The data it will need to hold are the index (aka the number of cells it needs to move),
the return index (aka the number of cells it needs to move to get back to the parking)
and, finally, when on the return, it will of course need to contain the gotten cell.
We will also need to reserve an empty space (which we will call [swap]
) to swap array content from the front to the back of our flyer.
Flyer Layout
A diagram of our flyer would look like so:
[swap].[return][index].[element] ->
And then on the return, once index is depleted:
<- [element].[return][cell].[swap]
[index]
holds the distance left until the flyer reaches the index.
It counts down for everytime the glider moves forward.
[return]
holds og_index - [index]
.
Basically everytime [index]
is decreased, we increase [return]
.
Since we slowly consume [index]
to go forward, we need another counter to go backward.
That counter comes in the form of the [return]
cell to be able to return.
In the best of worlds we would just move back until we reach a certain flag, rather than having to store that [return]
index,
but unlike strings, arrays can hold any arbitrary data.
This means that we can't just move back until we meet a cell with a specific value,
as that specific value could be contained in the array itself!
[cell]
holds the value of the cell at the index.
Once [index]
is zeroed by the decrements, [cell]
takes its place.
[swap]
is the destination cell for [element]
.
Since array needs to be moved out of the way in order for the glider to move forward,
an empty cell needs to be reserved for the element to be moved into.
Note that [swap]
needs to be there in the parking so that the cell no.0 can be moved out,
although [[swap]
is not part of the glider.
The [swap]
cell on the flyer should always be zero.
By default,[swap]
is included in our parking space and, as we move forward,
it gets zeroed by the glider moving operation, so[swap]
should always be zero
If it is not, it will corrupt the array when flying through it
as adding into a non-zero cell is no longer a move, like we expect, but an actual addition.
[element]
represents the element of the array right in front of the flyer,
it is not part of the flyer.
It is the first cell of content of the array in the initial glider state.
(Not much to add here...)
(Right now, not all parking cells are used, don't worry we will need all the 4 of them in ADDP
)
Making our Flyer Move
Moving our gliders (yes, I will also call flyers gliders) around is going to be a 4 step operation:
- End if
[index] == 0
, decrease the[index]
by 1, - Move the cell in front of us (
[element]
to our back[swap]
) - Move the whole of the glider cells up by one
- Offset the assumed pointer position by one and repeat
This procedure will shift all the cells before the cell at index to our back, this is fixed on the way back, where we do the same shifting to go backwards, effectively cancelling the shift.
A Simpler INCD
(Increment Dynamic)
Before getting too much into the weeds of implementing this whole thing, let's just stand back a little and write the simplest part of it: a glider that can move to an dynamic index and increase the cell by 1.
That glider will be much simpler! We won't need [return]
or [cell]
, making it a flyer with 1 cell of state.
For context, diagram of our flyer would look like so:
[swap].[index].[element] ->
We won't be able to move backwards without [return]
.
This also means that the assumed pointer position will be busted once the glider has done its thing.
INCD
as written here should be the last thing you do execute your program.
[@INCD Aarray Aindex] [
// we still assume the array has 4 cells of parking
// we load the parking like so: [empty][empty][swap][index]
ADDP Aarray+3 Aindex; // Aindex -> [index]
// entering relative mode
// (since we won't have free unallocated space, we won't be able to use instructions with sp)
BBOX Aarray+2;
ASUM 0;
ALIS Aswap 0;
ALIS Aindex 1;
ALIS Aelement 2;
WHNE Aindex 0 [
DECR Aindex 1; // step 1
ADDP Aswap Aelement; // step 2
ADDP Aindex+1 Aindex; // step 3
// step 4
// (it's very important do this before the end of the scope,
// because we need Aindex to point to the actual index cell so that the WHNE can check it)
BBOX 1;
ASUM 0;
];
// we are here, this means that we moved all we need
// the cell we want is right in front of us at Aelement
INCR Aelement 1;
// normally this is here we would come back to unshift all the array
// and fix the assumed tape pointer position, but we can't since we didn't store a return index!
]
Let's try it by giving it an array with some already initialized values!
// add the INCD definition here...
[main] [
ALIS Aindex 0;
INCR Aindex 3;
ALIS Aarray 1;
ALIS Aarray_contents Aarray+4;
INCR Aarray_contents 1;
INCR Aarray_contents+1 2;
INCR Aarray_contents+2 3;
INCR Aarray_contents+3 4; // the one that will be increased
INCR Aarray_contents+4 5;
INCR Aarray_contents+5 6;
INCD Aarray Aindex;
]
If we use the -d
flag to dump the memory after execution we can see the program state:
-- TAPE STATE NUMERIC --
0: 0
1: 0
2: 0
3: 1
4: 2
5: 3
6: 0
7: 0
8: 5
9: 5
10: 6
Good news, the cell containing 4 was successfully increased to 5! Bad news, we can see the effects of the array shift: every element before the one we operated on got offset by 2. Notably, you can also see the parking (cells 1 to 4) has been partially overwritten by the shifted elements of the array and the cells 6 and 7, being those of our flyer, being completely zeroed as expected.
Don't expect to write these kind of meta-instruction right on the first try, there is always going to be little typos and unforeseen behaviour! Myself, while writing this example, had made many typos and oversights on the first try.
Back to GETD
Alright, enough dilly dallying now! Let's get to actually implementing GETD
with what we have learnt.
The first notable difference of GETD
from what we just wrote is the memory layout,
we already saw what we needed extensively in the layout section.
Just keep in mind that our parking layout should look like this:
[empty][swap][return][index]
Just adding the [return]
cell (and later the [cell]
cell) to our logic isn't too difficult:
[@GETD Aarray Aindex Adst] [
ADDP Aarray+3 Aindex; // Aindex -> [index]
// switch +2 -> +1 because our flyer is bigger
BBOX Aarray+1;
ASUM 0;
ALIS Aswap 0;
ALIS Areturn 1; // add return
ALIS Aindex 2;
ALIS Aelement 3;
WHNE Aindex 0 [
DECR Aindex 1; // step 1
INCR Areturn 1;
ADDP Aswap Aelement; // step 2
ADDP Aindex+1 Aindex; // step 3
// it's important to correctly order the moves,
// so that your glider doesn't overlap itself
ADDP Areturn+1 Areturn;
// step 4
BBOX 1;
ASUM 0;
];
// we arrived!
// we don't have a [index] anymore, but we have a [cell] at its place
ALIS Acell Aindex;
ADDP Acell Aelement; // taking the element
// not done yet...
We are not done here!
The second notable difference from the INCD
is going to be that we want to go back
both to un-shift the cells and reset the assumed pointer position to the correct value.
If we don't do both of these things, GETD
becomes practically the last valid instruction in the program.
To do that we will have to copy the code of going forward and modify it to go back.
It is as simple as making our loop read from [return]
instead of[index]
and reordering the flyer state moves.
// ... rest of GETD implementation
// since we move back, we'll swap [element] and [swap]
ALIS Aswap Aelement;
ALIS Aelement 0;
WHNE Areturn 0 [
DECR Areturn 1; // step 1
// we now have <- [element][return][cell][swap]
// we move back so -1 instead of +1
ADDP Areturn-1 Areturn; // step 3
ADDP Acell-1 Acell;
// step 4
// we inverted BBOX and ASUM values so we can go back
BBOX 0;
ASUM 1;
// step 2 (because of off by 1 reasons, we need it after the move)
ADDP Aswap Aelement;
];
// we are back at parking here and since we know where parking is,
// we can set the assumed pointer back to a valid value!
BBOX Aelement;
ASUM Aarray+1;
ADDP Adst Aarray+3;
]
We can now try the same [main]
as with INCD
before, but with INCD Aarray Aindex;
changed to GETD Aarray Aindex 0;
:
-- TAPE STATE NUMERIC --
0: 4
1: 0
2: 0
3: 0
4: 0
5: 1
6: 2
7: 3
8: 0
9: 5
10: 6
That's exactly what we wanted, the 3rd element of the array has been gotten and moved to cell 0. Notice how none of the array is shifted anymore because of the backtracking we did, perfect!
ADDD
(Add Dynamic)
Arguments
Name | Type | Description |
---|---|---|
Aarray | number | the address of the start of the array (including the parking) |
Aindex | number | the address of the cell to be gotten in the array |
Asrc | number | the address of the cell that will be added to the dynamic one |
Now that we are done with GETD
, we can move onto ADDD
.
This meta-instruction is going to serve as our setter, we won't call it SETD
though,
because it will not zero the cell it adds to, making it more of an add than a set.
This means that pretty much all the rules that we saw for moving with ADDP
's will apply to this, except now in a dynamic fashion.
Flyer Layout
The interesting (and annoying) part of the ADDD
glider, is that rather than carrying a cell value
on the return trip, we do it on the going trip.
This means we can't take advantage of the index being zeroed in the return trip to store the cell value where index stood.
So, sadly, this requires us to make the flyer have 3 reserved data cells at the start
(plus the swap, maxing out our 4 spots of parking).
A diagram of our flyer would look like so:
[swap].[return][index][cell].[element] ->
And then on the return, once index is depleted:
<- [element].[return][empty][empty].[swap]
It's going to be important to keep the [empty]
cells empty, so that we properly un-shift all the cells on the return.
Implementing
ADDD
is very similar to GETD
outside of the flyer layout.
The only notable difference between the two being that the flyer has 3 data cells, and that rather
than taking a cell, it gives one of its cells.
Other than that, you can probably take GETD
and make an ADDD
out of it with ease
like we made GETD
from our knowledge of INCD
(despite being similar in name,
ADDD
is very different from our arguably incomplete INCD
implementation).
With that said, this means that a ADDD
implementation should not be foreign to you.
So, here is its definition:
[@ADDD Aarray Aindex Asrc] [
// this layout forward: [swap].[return][index][cell].[element]
ADDP Aarray+3 Asrc; // Asrc -> [cell]
ADDP Aarray+2 Aindex; // Aindex -> [index]
// we'll remove the +1, because we'll use the full parking
BBOX Aarray;
ASUM 0;
ALIS Aswap 0;
ALIS Areturn 1;
ALIS Aindex 2;
ALIS Acell 3; // new!
ALIS Aelement 4; // .. which offset this by one
WHNE Aindex 0 [
DECR Aindex 1; // step 1
INCR Areturn 1;
ADDP Aswap Aelement; // step 2
ADDP Acell+1 Acell; // step 3 (also, new Acell move!)
ADDP Aindex+1 Aindex;
ADDP Areturn+1 Areturn;
// step 4
BBOX 1;
ASUM 0;
];
// we arrived!
// adding the element out of the flyer
ADDP Aelement Acell;
ALIS Aswap Aelement;
ALIS Aelement 0;
WHNE Areturn 0 [
DECR Areturn 1; // step 1
// we now have <- [element][return][empty][empty][swap]
// (we won't need to move both empty cells)
ADDP Areturn-1 Areturn; // step 3
// step 4
BBOX 0;
ASUM 1;
ADDP Aswap Aelement;
];
BBOX Aelement;
ASUM Aarray; // removed the +1, again
]
Running the following program: ..
// .. ADDD definition here
[main] [
ALIS Aindex 0;
INCR Aindex 3;
// we need to allocate one more cell
ALIS Acell 1;
INCR Acell 40;
ALIS Aarray 2;
ALIS Aarray_contents Aarray+4;
INCR Aarray_contents 1;
INCR Aarray_contents+1 2;
INCR Aarray_contents+2 3;
INCR Aarray_contents+3 4;
INCR Aarray_contents+4 5;
INCR Aarray_contents+5 6;
ADDD Aarray Aindex Acell;
]
.. would expectedly give us this result:
-- TAPE STATE NUMERIC --
0: 0
1: 0
2: 0
3: 0
4: 0
5: 0
6: 1
7: 2
8: 3
9: 44
10: 5
11: 6
With that out of the way, we are ready to tackle the last chapter of this book! In the next chapter, we will use what we just wrote to handle the memory tape array and instruction array reading/setting of our bf interpreter. So, don't throw away these meta-instructions just yet!
Note
These dynamic get/set implementation can probably be improved by you!
Due to the nature of flyers (specifically moving the front element to the back swap),
using long arrays/a lot of dynamic array addresing is not good for performance.
If you can implement something while forgoing using these GETD
/ADDP
, prefer the unglided way.
Furthermore, there is a way to make dynamic indexing for regular arrays without parking. This would require to move some cells out of the way to create a temporary parking and then set them back after the instruction is done. The issue with this, and why this is not the version used here, is because it is a bit more complex. You need to check wether the index refers to one of the cells that you are going to move away for example.
Synthesis: Making a Bf Interpreter
This chapter focuses entirely on the conception process of a bf interpreter in basm. This interpreter will be able in turn to transpile to bf, to make a bf interpreter in bf! It is highly recommended to have read all other chapters before digging into this one.
Meta-instructions "header"
While working on the logic of the interpreter,
I will use meta-instruction which we already defined earlier (notably GETD
and ADDD
).
Make sure to add these meta-instructions definitions before any other code that uses them:
// sets a value to a specific value by zeroing it before writing
[@SET Aaddr Vval] [
ZERO Aaddr;
INCR Aaddr Vval;
]
// copies the content of a cell whilst keeping the source (conservatively)
[@COPC Asrc Adst sp] [
ALIS Atmp sp;
COPY Asrc Adst Atmp;
ADDP Asrc Atmp;
]
// if not equal conditional
[@IFNE Aaddr Vval [scp] sp] [
ALIS Atmp1 sp;
ALIS Atmp2 sp+1;
ALIS sp sp+2;
COPY Aaddr Atmp1 Atmp2;
ADDP Aaddr Atmp2;
WHNE Atmp1 Vval [
ZERO Atmp1;
INLN [scp];
INCR Atmp1 Vval;
];
ZERO Atmp1;
]
// if equal conditional
[@IFEQ Aaddr Vval [scp] sp] [
ALIS Aflag sp;
ALIS sp sp+1;
ALIS Vnot_equal 1;
IFNE Aaddr Vval [
INCR Aflag Vnot_equal;
] sp;
IFNE Aflag Vnot_equal [scp] sp;
ZERO Aflag;
]
// moves the value of the cell at Aarray[value of Aindex] to Adst
[@GETD Aarray Aindex Adst] [
ADDP Aarray+3 Aindex;
BBOX Aarray+1;
ASUM 0;
ALIS Aswap 0;
ALIS Areturn 1;
ALIS Aindex 2;
ALIS Aelement 3;
WHNE Aindex 0 [
DECR Aindex 1
INCR Areturn 1;
ADDP Aswap Aelement;
ADDP Aindex+1 Aindex;
ADDP Areturn+1 Areturn;
BBOX 1;
ASUM 0;
];
ALIS Acell Aindex;
ADDP Acell Aelement;
ALIS Aswap Aelement;
ALIS Aelement 0;
WHNE Areturn 0 [
DECR Areturn 1;
ADDP Areturn-1 Areturn;
ADDP Acell-1 Acell;
BBOX 0;
ASUM 1;
ADDP Aswap Aelement;
];
BBOX Aelement;
ASUM Aarray+1;
ADDP Adst Aarray+3;
]
// adds the value of the cell at Asrc to the Aarray[value of Aindex] cell
[@ADDD Aarray Aindex Asrc] [
ADDP Aarray+3 Asrc;
ADDP Aarray+2 Aindex;
BBOX Aarray;
ASUM 0;
ALIS Aswap 0;
ALIS Areturn 1;
ALIS Aindex 2;
ALIS Acell 3;
ALIS Aelement 4;
WHNE Aindex 0 [
DECR Aindex 1;
INCR Areturn 1;
ADDP Aswap Aelement;
ADDP Acell+1 Acell;
ADDP Aindex+1 Aindex;
ADDP Areturn+1 Areturn;
BBOX 1;
ASUM 0;
];
ADDP Aelement Acell;
ALIS Aswap Aelement;
ALIS Aelement 0;
WHNE Areturn 0 [
DECR Areturn 1;
ADDP Areturn-1 Areturn;
BBOX 0;
ASUM 1;
ADDP Aswap Aelement;
];
BBOX Aelement;
ASUM Aarray;
]
Memory Layout
Our the memory of our bf interpreter's tape consist of 3 parts:
- Generic memory for operations, aka operating memory
- Instruction array (with parking)
- Tape array (with parking)
In memory the sections should look like this:
[operation][instruction][tape]
The tape array will be allowed to grow infinitely, as there is no ranges of allocated allocated cells after it. This is unlike the instruction array, which is stuck in-between the operation and tape array, meaning that we will need to reserve it a static size so that it does not overwrite the tape array's cells. Since we can't expand infinitely before the 0'th cell, as using cells before the 0'th is invalid bf (not the one we are going to interpret, but the one our program is made of), we cannot grow infinitely backwards, only forwards.
Taking User Input
We will want the user to be able to input any bf program they find
and paste it directly into the bf input without any processing on the part of the user.
We'll also need a way for the user to tell us that they are done inputting the program,
for this we can check for a specific character like !
to stop instruction input.
To do this your first though might be to have a counter
representing the number of characters inputted.
When there is an input, use ADDD
to set it to the index of counter and then increase counter by 1.
Repeat until we reach !
.
That is fine, with nothing more. There are some issues, mostly performance (both compute and size), which arise from it. We'll need all the space we can take, and I assure you that any performance gain will be very appreciated.
First, using this method, we will end up storing any non-operator characters in our instruction array.
This both takes up space for no reason and slows down GETD
and ADDD
(the more element in the array, the less efficient they become).
Second, we don't take advantage of the limited nature of the operators.
The set of characters which represent an operator is very restricted.
When storing a +
with this method, we will actually store the +
character, being 43.
Remember, most operations in bf take O(n)
time, where n: value of the cell
Whereas in most languages, operations are in O(1)
time, aka constant time.
This means that, if we can reduce the values representing operators,
this will speed up all operations on operators including indexing them,
since array indexing is just shifting them around (shifting once is O(n)
)
To address these issues, we will discard all non-operators characters and we will map all operators to the lowest values possible. We can also use 0 to denote the start/end of the array contents. Since all cells are zero by default, we won't need to add 0's manually, which is quite nifty. (You could also store the array length to know when you reach the end of the contents, but that will not be possible with our implementation) Here is the mapping:
start / end -> 0
`+` -> 1
`-` -> 2
`>` -> 3
`<` -> 4
`[` -> 5
`]` -> 6
`,` -> 7
`.` -> 8
There is one more issue, although rather small compared to the two first ones.
That being, ADDD
is not made for this at all!
Every time we would add a new operator,
we would need to travel through all the other elements of the array we already added before.
It's not that bad, since instruction input is a one time process,
but this is the best example where GETD
and ADDD
like meta-instruction should be avoided.
So we are going to work around having to use them by making a dedicated instruction array filling flyer.
Writing The Flyer
Instead of having a flyer go out for each operator through the use of ADDD
,
we can simply write a specialized flyer to fill the array.
This input flyer will be able to take advantage of the context to have zero cells of state!
That's right, we don't care about where we are or where we are going, so no index needed.
Furthermore, since the array it will be constructing does not contain arbitrary data,
we can guaranty that certain values will not show up (like 0).
So, we can return without having to store a return cell.
For the return, we will look for 0, which is not mapped to any instructions,
but is present in the parking.
I'll write this logic into a program specific meta-instruction.
A program specific meta-instruction is a meta-instruction which is
only useful within the context of the program.
Custom instructions like COPC
are not program specific,
because they are generic enough to be used in many circumstances.
This is not the case with what we will write.
When writing program specific meta-instructions, I like to give them snake case names, like aliases.
Don't worry meta-instruction names cannot overwrite your aliases.
[@init_input Aarray] [
BBOX Aarray+4;
ASUM 0;
ALIS Aflag 1;
ALIS Vappended 1;
ALIS Vexit 2;
ALIS sp 2;
// we can still use sp, since we know the cells will be zeroed
WHNE Aflag Vexit [
IN 0;
// -- mapping operators --
IFEQ 0 '+' [
SET 0 1;
INCR Aflag Vappended;
] sp;
IFEQ 0 '-' [
SET 0 2;
INCR Aflag Vappended;
] sp;
IFEQ 0 '>' [
SET 0 3;
INCR Aflag Vappended;
] sp;
IFEQ 0 '<' [
SET 0 4;
INCR Aflag Vappended;
] sp;
IFEQ 0 '[' [
SET 0 5;
INCR Aflag Vappended;
] sp;
IFEQ 0 ']' [
SET 0 6;
INCR Aflag Vappended;
] sp;
IFEQ 0 ',' [
SET 0 7;
INCR Aflag Vappended;
] sp;
IFEQ 0 '.' [
SET 0 8;
INCR Aflag Vappended;
] sp;
// check for end
IFEQ 0 '!' [
ZERO 0;
SET Aflag Vexit;
] sp;
// if we added something, we need to move up one
IFEQ Aflag Vappended [
ZERO Aflag;
BBOX 1;
ASUM 0;
] sp;
// we don't clear the last character, even if it was invalid,
// as the next IN will overwrite the cell
];
// NEVER forget cleanup
ZERO Aflag;
// -- going back --
// because the last char will be our zeroed '!', we need to offset by at least 1
// if we don't want to read 0 directly
BBOX 0;
ASUM 1;
// moves back until we reached the zeroed cells of the parking
WHNE 0 0 [
BBOX 0;
ASUM 1;
];
ASUM Aarray+3;
]
As you can see we once again use a flag for control flow. In this case our flag can have three states:
0
: when nothing has happenedVappend
: when a operation is appended to the arrayVexit
: when the '!' character is detected
This flag doesn't count as the flyer having a state properly speaking. It's more like operation memory than state, since we won't carry that flag along for each glider shift.
We also, despite having a flyer, use sp
.
We can safely do this as we can assume that the un-visited parts of the array are still zeroed.
Also, our input safely overwrites the flag when getting user input,
meaning that our code will always consume our flag, leaving no trash on the array.
Once again, taking advantage of safe array assumptions saves us from using ADDD
/GETD
which operate on no assumption and are much less efficient.
Memory Layout in [main]
First and foremost, before defining the main logic we'll define the cells for our program to use.
ALIS Voperating_memory 16;
ALIS Aprog_pointer 0;
ALIS Amem_pointer 1;
ALIS Aflag 2;
ALIS Vexit 1;
ALIS Vbracket 2;
ALIS Aoperator 3;
ALIS Acell 4;
ALIS Abracket_jump 5;
ALIS sp 6;
ALIS Vprog_array_cap 256;
ALIS Aprog Voperating_memory;
// we add 4 to take into account the parking of the program array
ALIS Amem Aprog+Vprog_array_cap+4;
There's alot to unpack here. Here are the most notable ones:
Voperating_memory
This alias defines how many cells are reserved for general computation (that being all but the arrays).
The reserved space granted by Voperating_memory
is required for us to store cells like Aflag
.
It is also required for meta-instruction using sp
so they can have zeroed cells.
Otherwise, without free zeroed cells, meta-instructions would reach into the program array, which would be bad for program state.
Aflag
Aflag
is once again a flag cell. It can have one of these values:
0
: nothing happenedVexit
: program is done and should exitVbracket
: a bracket has jumped, soAprog_pointer
should be updated toAbracket_jump
(this needs to be done after all the other logic, which is why it is stored in the flag)
Aoperator
and Acell
Those two aliased cells will hold both the current operator and current memory cell respectively. Since we can't edit cells directly in their array, we need to keep them in a static place in memory to read/write them.
Aprog
This alias defines the start of the program array (at its parking). The way it is defined is that it comes straight after the operating memory segment, meaning that its address can be defined as the length of the operating memory segment! That means that the starting position of the instruction array is relative to the end of the reserved operation cells range. This would cause an off by 1 error if basm started indexing cells from 1, but it indexes from 0, so we are good!
Aprog
also comes with Vprog_array_cap
aka the capacity of the program array.
We don't inherently need it to define the array,
but we will need to define how far away we want the memory array to be from the program array.
Effectively, choosing how much space we allocated to the program array.
In this example, I set 256 as the array capacity,
because that is the max that you can index with an unsigned 8 bit cells.
If you want to interpret bigger programs, you can increase this capacity,
but beware you will also need to use cells of more than 8 bits to be able to index the instructions!
Amem
All that I specified about Aprog
applies to Amem
.
The value is the start of the array (at its parking).
While the memory array is technically not limited by memory, since it cannot overwrite other memory,
it is still practically limited by the (un)efficiency of indexing
and the maximum value storable in the cells, limiting the number of indexable cells.
Abracket_jump
Is used to store the address of the bracket operation matching the current bracket operation.
(Aka, the [
which is linked to the current ]
or vice versa)
The bracket jumping implementation will not set the program pointer directly to it when running the bracket specific logic,
since the program needs to do some cleanup before updating the Aprog_pointer
.
That cleanup includes setting the current operation back in Aoperator
back in the array.
To do that we need to know the index of that operator, which is always just the index inAprog_pointer
.
This is why we don't change the operator pointer right away as we need to store it to return Aoperator
.
You are probably going to understand it better when you'll see the code managing this for yourself.
Writing [main]
Logic
The bread and butter of our logic is of course going to be a loop. We'll loop over every operator in the program until we reach the end, applying the effects of them as we go. To be more precise here's the broken down version of what the loop does:
- Get the operator at
Aprog_pointer
- Execute the operator specific logic
- Set the operator back
- Increase
Aprog_pointer
Notice how getting the memory cell is not part of the loop, and neither will it be part of operator specific logic.
This is because it would be incredibly wasteful to get and set the cell in the tape for every operator
since the cell mostly remains over operators.
Only >
and <
change the cell index, we will set and get a new cell only when these operators appear.
Think about interpreting the very common pattern +++
.
If we were to get the cell, increase it by one and then put it back 3 times performance would be horrendous!
So, to combat this, Acell
functions more like a cell "cache" then its Aoperator
counterpart.
We only get/set the memory cell when moving the tape pointer with >
and <
.
With all that said, let's start implementing the easy parts of the logic, that being all operators, but the brackets:
// .. add the header here
// .. define init_input here
[main] [
ALIS Voperating_memory 16;
ALIS Aprog_pointer 0;
ALIS Amem_pointer 1;
ALIS Aflag 2;
ALIS Vexit 1;
ALIS Vbracket 2;
ALIS Aoperator 3;
ALIS Acell 4;
ALIS Abracket_jump 5;
ALIS sp 6;
ALIS Vprog_array_cap 256;
ALIS Aprog Voperating_memory;
ALIS Amem Aprog+Vprog_array_cap+4;
// get program from user
init_input Aprog;
WHNE Aflag Vexit [
ALIS Atmp sp;
ALIS sp sp+1;
// 1. load in operator
COPC Aprog_pointer Atmp sp;
GETD Aprog Atmp Aoperator;
// 2. operator specific logic
// + + + + +
IFEQ Aoperator 1 [
INCR Acell 1;
] sp;
// - - - - -
IFEQ Aoperator 2 [
DECR Acell 1;
] sp;
// > > > > >
IFEQ Aoperator 3 [
// store the old cell
COPC Amem_pointer Atmp sp;
ADDD Amem Atmp Acell;
// get the new cell
INCR Amem_pointer 1;
COPC Amem_pointer Atmp sp;
GETD Amem Atmp Acell;
] sp;
// < < < < <
IFEQ Aoperator 4 [
// store the old cell
COPC Amem_pointer Atmp sp;
ADDD Amem Atmp Acell;
// get the new cell
// NOTE: this DECR may underflow the tape pointer
DECR Amem_pointer 1;
COPC Amem_pointer Atmp sp;
GETD Amem Atmp Acell;
] sp;
// TODO: implement '[' and ']'
// , , , , ,
IFEQ Aoperator 7 [
IN Acell;
] sp;
// . . . . .
IFEQ Aoperator 8 [
OUT Acell;
] sp;
// check for the end
IFEQ Aoperator 0 [
INCR Aflag Vexit;
] sp;
// set the operator back
COPC Aprog_pointer Atmp sp;
ADDD Aprog Atmp Aoperator;
// NOTE: we'll add something here later... (ominous)
// increase program pointer
INCR Aprog_pointer 1;
];
]
With that that's seemingly most of the bf interpreter done already!
(with the exception of [
and ]
)
You can take it easy from here we're already on the final stretch.
With this stripped down bf interpreter, we don't have access to looping or conditionals, but we have enough to print some basic characters.
Try the following code: (don't forget to use !
to end program input!)
outputs the answer to life the universe and everything
++++++++++++++++++++++++++++++++++++++++++++++++++++.--.
The "Needlessly" Complicated Part
You may not have understood the part about Abracket_jump
and Vbracket
for Aflag
.
It's time where we need to use them...
Welp, the bracket jumping is logic not actually that bad, it's just that it is the most complicated part of this interpreter by far.
If you equate complexity with badness, like me, this is the worst part, but it ain't that bad.
Don't treat, my implementation is going to be the simplest one possible.
When we get a [
and we point to 0 or when we have ]
and point to a non-zero value,
we follow this procedure to find the index of the opposing matching bracket:
- Initiate a counter at 1
- Initiate a search pointer at the current program pointer
- Add/Subtract the search pointer by 1
- Get the operator at the search pointer
- If the operator is a bracket and it is the same as the original, add 1 to the counter. Else, if the operator is a bracket and the bracket is opposite, remove 1 to the counter
- Set the operator back at the search pointer
- If the counter is non-zero, repeat from number 3
With this procedure, our search counter should always end on the matching bracket!
Or, the program will loop forever/read oob if there is an unmatched bracket, but most bf code doesn't contain unmatched brackets.
So, this is not a issue by one bit.
Once we have completed the procedure the searched search counter, equates to the position of the matching bracket.
We don't want to apply it right away for reasons stated before.
Instead, we will opt into setting the flag to Vbracket
,
so we can apply it after having set the operator back.
Add these IFEQ
instructions to apply bracket logic:
// [ [ [ [ [
IFEQ Aoperator 5 [
IFEQ Acell 0 [
ALIS Acounter sp;
INCR Acounter 1; // 1.
ALIS Asearch_op sp+1;
ALIS sp sp+2;
// 2.
// we don't need to define our search counter here,
// we can use Abracket_jump directly
COPC Aprog_pointer Abracket_jump sp;
WHNE Acounter 0 [
// 3.
INCR Abracket_jump 1;
// 4.
COPC Abracket_jump Atmp sp;
GETD Aprog Atmp Asearch_op;
// 5.
// if op == '['
IFEQ Asearch_op 5 [
INCR Acounter 1;
] sp;
// if op == ']'
IFEQ Asearch_op 6 [
DECR Acounter 1;
] sp;
// 6.
COPC Abracket_jump Atmp sp;
ADDD Aprog Atmp Asearch_op;
]; // 7.
INCR Aflag Vbracket;
] sp;
] sp;
// ] ] ] ] ]
IFEQ Aoperator 6 [
IFNE Acell 0 [
ALIS Acounter sp;
INCR Acounter 1; // 1.
ALIS Asearch_op sp+1;
ALIS sp sp+2;
// 2.
COPC Aprog_pointer Abracket_jump sp;
WHNE Acounter 0 [
// 3.
DECR Abracket_jump 1;
// 4.
COPC Abracket_jump Atmp sp;
GETD Aprog Atmp Asearch_op;
// 5.
// if op == '['
IFEQ Asearch_op 5 [
DECR Acounter 1;
] sp;
// if op == ']'
IFEQ Asearch_op 6 [
INCR Acounter 1;
] sp;
// 6.
COPC Abracket_jump Atmp sp;
ADDD Aprog Atmp Asearch_op;
]; // 7.
INCR Aflag Vbracket;
] sp;
] sp;
And the extra conditional to check the flag and apply the bracket jump: (you should add it at my ominous `NOTE
`)
// if there was a jump, set the pointer to the matching bracket
// (having the pointer increment after is intentional)
IFEQ Aflag Vbracket [
ZERO Aprog_pointer;
ADDP Aprog_pointer Abracket_jump;
ZERO Aflag;
] sp;
Whew... that, was, all. So now, with this enormous blob of bracket logic written, we finally have a 100% functional bf interpreter in basm!!!
Try it out with this fancier "Hello Word!" example: (taken from Wikipedia)
++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.
What to Improve
You now own a bf interpreter, problem: it's slooooooooow... I highly encourage you to keep on working on this code to get a better handle on the logic that goes behind it. You can probably improve your knowledge of the language overall just by reviewing the interpreter. In implementing tweaks, you will find your own preferred way to approach writing basm code and find how you typically make mistakes. Making mistakes is part of learning, don't forget that!
The bf interpreter may have seemed like an easy project
if you just copied and pasted the code provided in this book, but let me tell you it isn't!
Myself, despite having already written an extremely similar interpreter in basm and despite basm being a language I created,
has struggled with many bugs which were not mentioned in this book for brevity and also for my ego a little.
For example, as I was rewriting every meta-instruction from scratch for the book,
I made a mistake in the original IFEQ
definition. I forgot to zero the flag.
This issue caught up with me when writing this very chapter on the interpreter.
I didn't suspect that IFEQ
, which I had written a long while ago and worked seeming well, would pollute cells.
Now, with my experience and the fact that I already had a working IFEQ
implementation
before writing the book version, you would expect it to be easy to spot that mistake.
It wasn't, especially considering this was not the only mistake I had made on the interpreter.
I won't go over how to implement patches to improve the interpreter beyond what is shown above, but I can point out areas that you can thinker with if you crave performance.
Bulking Operators
Adjacent operators of the same type can be merged into one single 2 cell wide item on the program array.
For example, you could lay bulked operator on the array with this layout: [operator][recurrence]
.
So, [+][+][+][+]
could be represented as [+][4]
saving both space and the time to get those cells.
(Saving space also comes with saving time, as the bigger the array is, the longer it will take to index)
You could also store the precomputed index of the matching partner for brackets instead of recurrence. That would save iterating over the program array every time a bracket jumps.
While at storing items of two cells wide in an indexable array, you could also implement indexable arrays of 2 cell big elements. Firstly, that would save you having to do two gets/sets for each element. Secondly, you could index by element rather than by cell, increasing the length of programs that can be stored with 8 bit cells.
Flipping The Program Array
Currently memory is layed-out as such:
[operating memory].[prog parking][prog content].[mem parking][mem content]
This is perfectly fine if you don't care what the transpiled bf looks like. If you do care about what the compiled bf looks like you should be worried.
The compiled basm (with optimization) looks like this:
>>>>>>>>>>>>>>>>>>>>>--[++<,[->>>+>+<<<<]>>>>[-<<<<+>>>>]<-------------------------------------------[[-]<+>][-]<[->+>+<<]>>[-<<+>>]<-[[-]<<+<[-]+>>>][-]---------------------------------------------<[-]<<[->>>+>+<<<<]>>>>[-<<<<+>>>>]<[[-]<+>][-]<[->+>+<<]>>[-<<+>>]<-[[-]<<+<[-]++>>>][-]--------------------------------------------------------------<[-]<<[->>>+>+<<<<]>>>>[-<<<<+>>>>]<[[-]<+>][-]<[->+>+<<]>>[-<<+>>]<-[[-]<<+<[-]+++>>>][-]------------------------------------------------------------<[-]<<[->>>+>+<<<<]>>>>[-<<<<+>>>>]<[[-]<+>][-]<[->+>+<<]>>[-<<+>>]<-[[-]<<+<[-]++++>>>][-]-------------------------------------------------------------------------------------------<[-]<<[->>>+>+<<<<]>>>>[-<<<<+>>>>]<[[-]<+>][-]<[->+>+<<]>>[-<<+>>]<-[[-]<<+<[-]+++++>>>][-]---------------------------------------------------------------------------------------------<[-]<<[->>>+>+<<<<]>>>>[-<<<<+>>>>]<[[-]<+>][-]<[->+>+<<]>>[-<<+>>]<-[[-]<<+<[-]++++++>>>][-]--------------------------------------------<[-]<<[->>>+>+<<<<]>>>>[-<<<<+>>>>]<[[-]<+>][-]<[->+>+<<]>>[-<<+>>]<-[[-]<<+<[-]+++++++>>>][-]----------------------------------------------<[-]<<[->>>+>+<<<<]>>>>[-<<<<+>>>>]<[[-]<+>][-]<[->+>+<<]>>[-<<+>>]<-[[-]<<+<[-]++++++++>>>][-]---------------------------------<[-]<<[->>>+>+<<<<]>>>>[-<<<<+>>>>]<[[-]<+>][-]<[->+>+<<]>>[-<<+>>]<-[[-]<<<[-]>[-]++>>][-]-<[-]<[->>+>+<<<]>>>[-<<<+>>>]<[[-]<+>][-]<[->+>+<<]>>[-<<+>>]<-[[-]<<[-]>>>][-]<[-]<--][-]<<[<]<<<<<<<<<<<<<<<<<-[+<<[->>>>>>+>+<<<<<<<]>>>>>>[->>>>>>>>>>>>>+<<<<<<<<<<<<<]>[-<<<<<<<+>>>>>>>]>>>>>>>>>>>>[->[-<<<+>>>]<[->+<]<+[->+<]>>]>[-<+>]<<[-[-<+>]>[-<+>]<<<[->>>+<<<]>]>[-<<<<<<<<<<<<<<<<+>>>>>>>>>>>>>>>>]<<<<<<<<<<<<<<<<[->>>>>+>+<<<<<<]>>>>>>[-<<<<<<+>>>>>>]<-[[-]<+>][-]<[->+>+<<]>>[-<<+>>]<-[[-]<<<<+>>>>][-]--<[-]<<<<[->>>>>+>+<<<<<<]>>>>>>[-<<<<<<+>>>>>>]<[[-]<+>][-]<[->+>+<<]>>[-<<+>>]<-[[-]<<<<->>>>][-]---<[-]<<<<[->>>>>+>+<<<<<<]>>>>>>[-<<<<<<+>>>>>>]<[[-]<+>][-]<[->+>+<<]>>[-<<+>>]<-[[-]<<<<<<<[->>>>>+>+<<<<<<]>>>[->>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<]>>[->>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<]>[-<<<<<<+>>>>>>]>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>[->>[-<<<<+>>>>]<[->+<]<[->+<]<+[->+<]>>]>[->+<]<<[-[-<+>]<<[->>>>+<<<<]>]<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+[->>>>>+>+<<<<<<]>>>>>[->>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<]>[-<<<<<<+>>>>>>]>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>[->[-<<<+>>>]<[->+<]<+[->+<]>>]>[-<+>]<<[-[-<+>]>[-<+>]<<<[->>>+<<<]>]>[-<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>]<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<][-]----<[-]<<<<[->>>>>+>+<<<<<<]>>>>>>[-<<<<<<+>>>>>>]<[[-]<+>][-]<[->+>+<<]>>[-<<+>>]<-[[-]<<<<<<<[->>>>>+>+<<<<<<]>>>[->>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<]>>[->>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<]>[-<<<<<<+>>>>>>]>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>[->>[-<<<<+>>>>]<[->+<]<[->+<]<+[->+<]>>]>[->+<]<<[-[-<+>]<<[->>>>+<<<<]>]<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<-[->>>>>+>+<<<<<<]>>>>>[->>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<]>[-<<<<<<+>>>>>>]>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>[->[-<<<+>>>]<[->+<]<+[->+<]>>]>[-<+>]<<[-[-<+>]>[-<+>]<<<[->>>+<<<]>]>[-<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>]<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<][-]<[-]<<<<[->>>>>>>+>+<<<<<<<<]>>>>>>>>[-<<<<<<<<+>>>>>>>>]<-----[[-]<+>][-]<[->+>+<<]>>[-<<+>>]<-[[-]<<<<<<[->>>>>>+>+<<<<<<<]>>>>>>>[-<<<<<<<+>>>>>>>]<[[-]<+>][-]<[->+>+<<]>>[-<<+>>]<-[[-]<<<+<<<<<<<[->>>>>+>>>>+<<<<<<<<<]>>>>>>>>>[-<<<<<<<<<+>>>>>>>>>]<<[<<+[->+>>>+<<<<]>[->>>>>>>>>>>>>+<<<<<<<<<<<<<]>>>[-<<<<+>>>>]>>>>>>>>>>[->[-<<<+>>>]<[->+<]<+[->+<]>>]>[-<+>]<<[-[-<+>]>[-<+>]<<<[->>>+<<<]>]>[-<<<<<<<<<<<+>>>>>>>>>>>]<<<<<<<<<<<[->>+>+<<<]>>>[-<<<+>>>]<-----[[-]<+>][-]<[->+>+<<]>>[-<<+>>]<-[[-]<<<+>>>][-]------<[-]<[->>+>+<<<]>>>[-<<<+>>>]<[[-]<+>][-]<[->+>+<<]>>[-<<+>>]<-[[-]<<<->>>][-]<[-]<[->>>>>>>>>>>+<<<<<<<<<<<]<<<[->+>>>+<<<<]>[->>>>>>>>>>>>+<<<<<<<<<<<<]>>>[-<<<<+>>>>]>>>>>>>>>[->>[-<<<<+>>>>]<[->+<]<[->+<]<+[->+<]>>]>[->+<]<<[-[-<+>]<<[->>>>+<<<<]>]<<<<<<<<<<]<<<<<++>>>>>>>>][-]<[-]>][-]<[-]<<<<<<[->>>>>>>>>+>+<<<<<<<<<<]>>>>>>>>>>[-<<<<<<<<<<+>>>>>>>>>>]<------[[-]<+>][-]<[->+>+<<]>>[-<<+>>]<-[[-]<<<<<<<<[->>>>>>>+>+<<<<<<<<]>>>>>>>>[-<<<<<<<<+>>>>>>>>]<[[-]<<+<<<<<<<<<[->>>>>+>>>>>>+<<<<<<<<<<<]>>>>>>>>>>>[-<<<<<<<<<<<+>>>>>>>>>>>]<<[<<<<-[->+>>>>>+<<<<<<]>[->>>>>>>>>>>>>+<<<<<<<<<<<<<]>>>>>[-<<<<<<+>>>>>>]>>>>>>>>[->[-<<<+>>>]<[->+<]<+[->+<]>>]>[-<+>]<<[-[-<+>]>[-<+>]<<<[->>>+<<<]>]>[-<<<<<<<<<+>>>>>>>>>]<<<<<<<<<[->>+>+<<<]>>>[-<<<+>>>]<-----[[-]<+>][-]<[->+>+<<]>>[-<<+>>]<-[[-]<<<->>>][-]------<[-]<[->>+>+<<<]>>>[-<<<+>>>]<[[-]<+>][-]<[->+>+<<]>>[-<<+>>]<-[[-]<<<+>>>][-]<[-]<[->>>>>>>>>+<<<<<<<<<]<<<<<[->+>>>>>+<<<<<<]>[->>>>>>>>>>>>+<<<<<<<<<<<<]>>>>>[-<<<<<<+>>>>>>]>>>>>>>[->>[-<<<<+>>>>]<[->+<]<[->+<]<+[->+<]>>]>[->+<]<<[-[-<+>]<<[->>>>+<<<<]>]<<<<<<<<]<<<<<<<++>>>>>>>>>][-]>][-]-------<[-]<<<<<<<<[->>>>>>>>>+>+<<<<<<<<<<]>>>>>>>>>>[-<<<<<<<<<<+>>>>>>>>>>]<[[-]<+>][-]<[->+>+<<]>>[-<<+>>]<-[[-]<<<<<<<<,>>>>>>>>][-]--------<[-]<<<<<<<<[->>>>>>>>>+>+<<<<<<<<<<]>>>>>>>>>>[-<<<<<<<<<<+>>>>>>>>>>]<[[-]<+>][-]<[->+>+<<]>>[-<<+>>]<-[[-]<<<<<<<<.>>>>>>>>][-]<[-]<<<<<<<<[->>>>>>>>>+>+<<<<<<<<<<]>>>>>>>>>>[-<<<<<<<<<<+>>>>>>>>>>]<[[-]<+>][-]<[->+>+<<]>>[-<<+>>]<-[[-]<<<<<<<<<<+>>>>>>>>>>][-]<[-]<<<<<<<<<<<[->>>>>>+>>>>>+<<<<<<<<<<<]>>>[->>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<]>>>[->>>>>>>>>>>>+<<<<<<<<<<<<]>>>>>[-<<<<<<<<<<<+>>>>>>>>>>>]>>>>>>>[->>[-<<<<+>>>>]<[->+<]<[->+<]<+[->+<]>>]>[->+<]<<[-[-<+>]<<[->>>>+<<<<]>]<<<<<<<<<<<<<<<[->>>>>>>>>>+>+<<<<<<<<<<<]>>>>>>>>>>>[-<<<<<<<<<<<+>>>>>>>>>>>]<--[[-]<+>][-]<[->+>+<<]>>[-<<+>>]<-[[-]<<<<<<<<<<<<[-]>>[-]>>>[-<<<<<+>>>>>]>>>>>>>][-]<[-]<<<<<<<<<<<+>>-]+
This is 11567 operators. Large spans of them being moves of the tape pointer. This inefficiency is caused by the fact that everytime we want to do something, like move a value from the operating memory to the memory parking, we need to move across the whole the program array.
We can fix this by moving around the memory layout a little bit:
[prog content][prog parking].[operating memory].[mem parking][mem content]
Note the program array being both before the operating memory AND being flipped. (If we had kept the parking before the content, we would still need to jump over all of the content of the program array, except, this time, instead of when it would be to access the memory, it would be to access the program) That new layout saves us the jumps over the program array, but still keeps the program array capacity limit.
This is much better in concept, but is quite annoying in execution. You won't be able to use the same meta-instructions to get/set the program and memory arrays due the fact that the array is flipped. This means two new whole instructions that will need debugging. It also causes a huge amount of code duplication, since you probably just need to change the assumed pointer increments by decrements and decrements by increments for the most part. If you make it, please don't show it to "Don't Repeat Yourself" (aka DRY) people!
While this would probably save many operations, it still would not be enough to cram the interpreter in itself without a little finagling.
In the advent you would want to make this bf interpreter run itself,
you would need to find a way to move from 0 to the operating memory segment while using >
less than Vprog_array_cap
to move over the program array.
Not being able to do so would mean that your interpreter is always bigger than its program array.
This is of course something you want to avoid if you were to want to run it on itself.
You can fix this issue with the usage of flyer, but I will leave this for you to solve.
Get a Better Bf Interpreter / Use a Compiler
Finally, if you want better performance, don't use the bf interpreter built-in with the basm
cli
or, generally, just don't use an interpreter and get a compiler.
While I believe the built-in interpreter is capable of running an helping somewhat with debugging via
its dump feature (with flag -d
), it is not the best for performance in no way.
If you want to get just the transpiled basm code, use basm compile
rather than basm run
.
You can then use that transpiled bf output in another bf interpreter / compiler. (What I mean by "compiler", is something that compiles to machine code)
Conclusion
You've now learnt possibly one of the most useless languages of the world, should I congratulate you? Yes! (but you should really find other things to do with your time)
If you still have unanswered questions, want to contribute to this book or want to contribute to the basm programming language head over to the github repository.
Full Code
For your copy-pasting convinience, here is the full file for the bf interpreter as implemented in this book:
// sets a value to a specific value by zeroing it before writing
[@SET Aaddr Vval] [
ZERO Aaddr;
INCR Aaddr Vval;
]
// copies the content of a cell whilst keeping the source (conservatively)
[@COPC Asrc Adst sp] [
ALIS Atmp sp;
COPY Asrc Adst Atmp;
ADDP Asrc Atmp;
]
// if not equal conditional
[@IFNE Aaddr Vval [scp] sp] [
ALIS Atmp1 sp;
ALIS Atmp2 sp+1;
ALIS sp sp+2;
COPY Aaddr Atmp1 Atmp2;
ADDP Aaddr Atmp2;
WHNE Atmp1 Vval [
ZERO Atmp1;
INLN [scp];
INCR Atmp1 Vval;
];
ZERO Atmp1;
]
// if equal conditional
[@IFEQ Aaddr Vval [scp] sp] [
ALIS Aflag sp;
ALIS sp sp+1;
ALIS Vnot_equal 1;
IFNE Aaddr Vval [
INCR Aflag Vnot_equal;
] sp;
IFNE Aflag Vnot_equal [scp] sp;
ZERO Aflag;
]
// moves the value of the cell at Aarray[value of Aindex] to Adst
[@GETD Aarray Aindex Adst] [
ADDP Aarray+3 Aindex;
BBOX Aarray+1;
ASUM 0;
ALIS Aswap 0;
ALIS Areturn 1;
ALIS Aindex 2;
ALIS Aelement 3;
WHNE Aindex 0 [
DECR Aindex 1;
INCR Areturn 1;
ADDP Aswap Aelement;
ADDP Aindex+1 Aindex;
ADDP Areturn+1 Areturn;
BBOX 1;
ASUM 0;
];
ALIS Acell Aindex;
ADDP Acell Aelement;
ALIS Aswap Aelement;
ALIS Aelement 0;
WHNE Areturn 0 [
DECR Areturn 1;
ADDP Areturn-1 Areturn;
ADDP Acell-1 Acell;
BBOX 0;
ASUM 1;
ADDP Aswap Aelement;
];
BBOX Aelement;
ASUM Aarray+1;
ADDP Adst Aarray+3;
]
// adds the value of the cell at Asrc to the Aarray[value of Aindex] cell
[@ADDD Aarray Aindex Asrc] [
ADDP Aarray+3 Asrc;
ADDP Aarray+2 Aindex;
BBOX Aarray;
ASUM 0;
ALIS Aswap 0;
ALIS Areturn 1;
ALIS Aindex 2;
ALIS Acell 3;
ALIS Aelement 4;
WHNE Aindex 0 [
DECR Aindex 1;
INCR Areturn 1;
ADDP Aswap Aelement;
ADDP Acell+1 Acell;
ADDP Aindex+1 Aindex;
ADDP Areturn+1 Areturn;
BBOX 1;
ASUM 0;
];
ADDP Aelement Acell;
ALIS Aswap Aelement;
ALIS Aelement 0;
WHNE Areturn 0 [
DECR Areturn 1;
ADDP Areturn-1 Areturn;
BBOX 0;
ASUM 1;
ADDP Aswap Aelement;
];
BBOX Aelement;
ASUM Aarray;
]
[@init_input Aarray] [
BBOX Aarray+4;
ASUM 0;
ALIS Aflag 1;
ALIS Vappended 1;
ALIS Vexit 2;
ALIS sp 2;
// we can still use sp, since we know the cells will be zeroed
WHNE Aflag Vexit [
IN 0;
// -- mapping operators --
IFEQ 0 '+' [
SET 0 1;
INCR Aflag Vappended;
] sp;
IFEQ 0 '-' [
SET 0 2;
INCR Aflag Vappended;
] sp;
IFEQ 0 '>' [
SET 0 3;
INCR Aflag Vappended;
] sp;
IFEQ 0 '<' [
SET 0 4;
INCR Aflag Vappended;
] sp;
IFEQ 0 '[' [
SET 0 5;
INCR Aflag Vappended;
] sp;
IFEQ 0 ']' [
SET 0 6;
INCR Aflag Vappended;
] sp;
IFEQ 0 ',' [
SET 0 7;
INCR Aflag Vappended;
] sp;
IFEQ 0 '.' [
SET 0 8;
INCR Aflag Vappended;
] sp;
// check for end
IFEQ 0 '!' [
ZERO 0;
SET Aflag Vexit;
] sp;
// if we added something, we need to move up one
IFEQ Aflag Vappended [
ZERO Aflag;
BBOX 1;
ASUM 0;
] sp;
// we don't clear the last character, even if it was invalid,
// as the next IN will overwrite the cell
];
// NEVER forget cleanup
ZERO Aflag;
// -- going back --
// because the last char will be '!', we need to offset by at least 1
BBOX 0;
ASUM 1;
// moves back until we reached the zeroed cells of the parking
WHNE 0 0 [
BBOX 0;
ASUM 1;
];
ASUM Aarray+3;
]
[main] [
ALIS Voperating_memory 16;
ALIS Aprog_pointer 0;
ALIS Amem_pointer 1;
ALIS Aflag 2;
ALIS Vexit 1;
ALIS Vbracket 2;
ALIS Aoperator 3;
ALIS Acell 4;
ALIS Abracket_jump 5;
ALIS sp 6;
ALIS Vprog_array_cap 256;
ALIS Aprog Voperating_memory;
ALIS Amem Aprog+Vprog_array_cap+4;
// get program from user
init_input Aprog;
WHNE Aflag Vexit [
ALIS Atmp sp;
ALIS sp sp+1;
// 1. load in operator
COPC Aprog_pointer Atmp sp;
GETD Aprog Atmp Aoperator;
// 2. operator specific logic
// + + + + +
IFEQ Aoperator 1 [
INCR Acell 1;
] sp;
// - - - - -
IFEQ Aoperator 2 [
DECR Acell 1;
] sp;
// > > > > >
IFEQ Aoperator 3 [
// store the old cell
COPC Amem_pointer Atmp sp;
ADDD Amem Atmp Acell;
// get the new cell
INCR Amem_pointer 1;
COPC Amem_pointer Atmp sp;
GETD Amem Atmp Acell;
] sp;
// < < < < <
IFEQ Aoperator 4 [
// store the old cell
COPC Amem_pointer Atmp sp;
ADDD Amem Atmp Acell;
// get the new cell
// NOTE: this DECR may underflow the tape pointer
DECR Amem_pointer 1;
COPC Amem_pointer Atmp sp;
GETD Amem Atmp Acell;
] sp;
// [ [ [ [ [
IFEQ Aoperator 5 [
IFEQ Acell 0 [
ALIS Acounter sp;
INCR Acounter 1; // 1.
ALIS Asearch_op sp+1;
ALIS sp sp+2;
// 2.
// we don't need to define our search counter here,
// we can use Abracket_jump directly
COPC Aprog_pointer Abracket_jump sp;
WHNE Acounter 0 [
// 3.
INCR Abracket_jump 1;
// 4.
COPC Abracket_jump Atmp sp;
GETD Aprog Atmp Asearch_op;
// 5.
// if op == '['
IFEQ Asearch_op 5 [
INCR Acounter 1;
] sp;
// if op == ']'
IFEQ Asearch_op 6 [
DECR Acounter 1;
] sp;
// 6.
COPC Abracket_jump Atmp sp;
ADDD Aprog Atmp Asearch_op;
]; // 7.
INCR Aflag Vbracket;
] sp;
] sp;
// ] ] ] ] ]
IFEQ Aoperator 6 [
IFNE Acell 0 [
ALIS Acounter sp;
INCR Acounter 1; // 1.
ALIS Asearch_op sp+1;
ALIS sp sp+2;
// 2.
COPC Aprog_pointer Abracket_jump sp;
WHNE Acounter 0 [
// 3.
DECR Abracket_jump 1;
// 4.
COPC Abracket_jump Atmp sp;
GETD Aprog Atmp Asearch_op;
// 5.
// if op == '['
IFEQ Asearch_op 5 [
DECR Acounter 1;
] sp;
// if op == ']'
IFEQ Asearch_op 6 [
INCR Acounter 1;
] sp;
// 6.
COPC Abracket_jump Atmp sp;
ADDD Aprog Atmp Asearch_op;
]; // 7.
INCR Aflag Vbracket;
] sp;
] sp;
// , , , , ,
IFEQ Aoperator 7 [
IN Acell;
] sp;
// . . . . .
IFEQ Aoperator 8 [
OUT Acell;
] sp;
// check for the end
IFEQ Aoperator 0 [
INCR Aflag Vexit;
] sp;
// set the operator back
COPC Aprog_pointer Atmp sp;
ADDD Aprog Atmp Aoperator;
IFEQ Aflag Vbracket [
ZERO Aprog_pointer;
ADDP Aprog_pointer Abracket_jump;
ZERO Aflag;
] sp;
// increase program pointer
INCR Aprog_pointer 1;
];
]
Addendum - [setup] scope
After having fully written this book and as I started working on an improved version of the current bf interpreter (the naive version of which is the one discussed in this very book), I finally caved to implement something which had been bugging my mind for a long time: a way to have global aliases. Such a feature always was on the list of things to introduce into the language had I had enough dedication and time, and as things went forth and as I made, quite honestly, fairly simple programs I thought that such a thing would not be required for v1.0. The full breath of consequences of this retrospectively earnestly stupid decision of mine only came down onto me as finally undertook a true "project of size", which was as previously stated the improved version of the bf interpreter in basm. Such an undertaking required to refactor and modify many values which were repeated all over, like the ids associated with each bf operator and the size of dynamically readable arrays.
So, to solve this I added one more field: the [setup]
field.
Such a field had always been in my notes, only it is now that I see that this field is truely required to make a workable language.
What is it?
The [setup]
field is very similar to [main]
in function. It's scope contains zero or more scopes or instructions.
Any code included within [setup]
will be put right before [main]
.
There can only be one [setup]
field.
Right about now this field sounds pretty useless: "Just another place to put code in..."
Well, not really. Where [setup]
shines is in it's two unique characteristics:
- It is normalized before any other fields, and
- All aliases contained within the lowest scope are available from anywhere in the file
"It is normalized before any other fields"
This statement may not seem like too much, but it's ramifications are immense.
At least if you can understand it. In the case of basm, normalizing means replacing aliases by their value.
This means that meta-instructions are not normalized before [setup]
.
Then logically, meta-instructions cannot be used in [setup]
.
To be more precise, meta-instructions are not declared when [setup]
is normalized,
so if there are any meta-instructions they will raise an error stating that they are undefined.
Despite the fact that they may be defined elsewhere in the file, they are not defined at the time of [setup]
normalization.
A meta-instruction cannot be inlined as it is not defined!
This clause is limits quite a lot your ability to put logic within this field, however it is required for the second much, more empowering characteristic, of [setup]
.
"All aliases contained within the lowest scope are available from anywhere in the file"
This does what it says on the tin.
Any scope, including most importantly the scopes of meta-instructions, can access aliases defined in [setup]
unless that alias has been shadowed in scopes lower than the current one.
That behaviour allows the creation of aliases available through the whole program, practically creating global aliases.
Although, the wording on this is a little clumsy.
"anywhere in the file", in this context, means that they are available in any scope other than the ones contained within [setup]
.
Of course, [setup]
could not reference an alias which it has not yet created.
Here's an example of the global aliases at work:
[setup] [
// the idiomatic way to name global aliases is to prefix with "G"
ALIS GVfrob 42;
ALIS GVdefault 10;
// any built-in instruction can be inserted here,
// but it is not recommended to do so for anything other than value setting
]
[@FROB Acell] [
INCR Acell GVfrob;
]
[main] [
ALIS Acell 0;
INCR Acell GVdefault;
FROB Acell;
OUT Acell; // should return 52
// overwriting the global (only affects this scope and subscopes)
ALIS GVdefault 5;
// this will overwrite the global in this scope, but not in the meta
// since meta-instructions aliases are independent from the caller's
ALIS GVfrob 0;
ALIS Acell 1;
INCR Acell GVdefault;
FROB Acell;
OUT Acell; // should return 47 (despite the fact that we set GVfrob to 0 in main)
]
Built-in Instructions
Here is the list of all built-in instructions.
Name | Arguments | Function |
---|---|---|
ZERO | addr | sets the value ofaddr to 0 |
INCR | addr, value | increments the value of theaddr cell by value |
DECR | addr, value | decrements the value of theaddr cell by value |
ADDP | addr1, addr2 | addsaddr2 to addr1 , the result is stored in addr1 (in place) |
SUBP | addr1, addr2 | subtractaddr2 from addr1 , the result is stored in addr1 (in place) |
COPY | addr1, addr2, addr3 | copies the value ofaddr1 into addr2 and addr3 |
WHNE | addr, value, [scope] | while the value ofaddr cell is not equal to value runs the [scope] . addr is not consumed |
IN | addr | takes input from the user and sets it inaddr , behaviour will vary between bf implementations |
OUT | addr | outputs the value ofaddr , addr is not consumed |
LSTR | start_addr, "str" | loads the string character by character into cells from thestart_addr advancing forward |
PSTR | addr, "str" | prints the string character by character using the celladdr as a buffer |
ALIS | ident, value or [scope] | creates an alias to a value or scope namedident . This instruction is purely abstraction |
INLN | [scope] | inlines a scope |
RAW | "str" | includes the string after transpilation, this can be used to include brainfuck operators |
BBOX | addr | moves the tape pointer toaddr |
ASUM | addr | tells to compiler to assume that the tape pointer is ataddr . If that assumption is wrong all cells accesses will be offset |
Keywords
Here is a list of keywords specific to basm and their meaning.
allocated
: a cell is said to be "allocated" when it is reserved/used by an operationdynamic/relative
: an access to memory which the address is not known to the compiler at compile time. It can vary over executions.address number
: a value of numeric type in the basm source code file, which represents the address of a cell.pure number
: a value of numeric type in the basm source code file, which represents a number.static
: an access to memory which the address is known to the compiler at compile time. It does not vary over executions.inlining
: expanding the code referred to in the caller's body, all meta-instructions inline in basm.
Cli Capabilities
Alongside the language itself, I'd like to include some information about the transpiler I wrote for it, because, of course, there's no point in writing a book about a compilerless language.
The only (probably forever) basm compiler is
available on github
under the retrospectively confusing name basm
.
This tool doesn't just include a compiler,
it also comes with a bf interpreter to run transpiled basm directly.
Currently, the cli has two subcommands: compile
and run
.
Both of them are similar in that they both take in a file path and have some flags in common.
compile
The compile
subcommand is the simplest way to use the basm cli.
Similar to tool like gcc
, it will create a file in the current working directory named like the one passed in with the extension replaced with .bf
.
File name and output path can be specified with the -o
flag.
If compilation fails, error information will be printed to the terminal.
Flags
Shorthand | Longhand | Description |
---|---|---|
-o | --out <OUT> | The path to put the compiled file |
-p | --show | Print the transpiled bf script |
-u | --unoptimized | Skips the use of the inbuilt brainfuck optimizer |
-h | --help | Print help |
run
This command compiles then run the specified basm source file.
You can use the bf interpreter directly if you use this command the -r
flag,
which specifies that the file is a bf source file, not a basm source file.
Unlike compile
, this does not create a file containing the compiled bf.
If the compilation fails, error information will be printed to the terminal.
Flags
Shorthand | Longhand | Description |
---|---|---|
-c | --cell-size <CELL_SIZE> | Sets the size of cells in bits (only 8, 16 and 32) [default: 8] |
-i | --signed | Sets the cells as signed containing signed numbers |
-a | --abort-overflow | Aborts the execution of the program when a cell over/under-flows |
-t | --tape-limit <TAPE_LIMIT> | Limits the lenght of the tape in cells, aborts execution if it is reached |
-n | --number-input | Treats the input as integer numbers rather than characters |
-m | --number-output | Treats the output as integer numbers rather than characters |
-s | --single-input | Removes the bufferring of the unused parts of inputs to provide to them to later bf inputs |
-r | --raw | Interprets the file as brainfuck, skips the compiling process |
-p | --show | Print the transpiled brainfuck |
-u | --unoptimized | Skips the use of the inbuilt brainfuck optimizer |
-d | --dump | Dump the tape and tape pointer position to terminal once the program ends (includes by erroring out) |
-h | --help | Print help |
Bf Optimizations
basm
applies some basic optimizations to the bf resulting from the transpilation process by default.
It can merge operators to reduce redundant use (e.g: +++-
would turn into ++
)
and it can reorder operations so that less tape pointer moves are used.
Overall, the purpose of the built in optimizer is to reduce the number of operators in compiled scripts.
By default, basm doesn't remove redundant operations and is generally quite stupid.
For example, an instruction like INCR 12 0;
, which does nothing,
would still move the tape pointer to the 12th cell. This gets optimized out.
All optimization are made to have no impact on the resulting program, at a few exceptions:
- Willingfully moving the tape pointer before 0 will be optimized out if there is a cancelling move afterwards (e.g:
<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>
) - Any pointer moves that don't lead to other operations (at the end of the program) are removed
If you suspect that optimizations are messing with your program,
you can disable them with the -u
flag.