Collapse OS Documentation Browser


asm/ code/ hw/ avr.txt blk.txt bootstrap.txt cross.txt dict.txt ed.txt faq.txt grid.txt hal.txt impl.txt intro.txt me.txt primer.txt protocol.txt rsh.txt sega.txt selfhost.txt usage.txt

Collapse OS usage guide

If you already know Forth, start here. Otherwise, read
primer.txt first.

We begin with a few oddities in Collapse OS compared to tradi-
tional forths, then cover higher level operations.


Both () and \ comments are supported. The word "(" begins a
comments and ends it when it reads a ")" word. It needs to be a
word, that is, surrounded by whitespaces. "\" comments the rest
of the line.

Cell size and memory map

Cell size is hardcoded to 16-bit. Endian-ness is arch-dependent
and core words dealing with words will read-write according to
native endian-ness.

Memory is filled by 4 main zones:

1. Boot binary: the binary that has to be present in memory at
   boot time. When it is, jump to the first address of this bin-
   ary to boot Collapse OS. This code is designed to be able to
   run from ROM: nothing is ever written there.
2. Work RAM: As much space as possible is given to this zone.
   This is where HERE begins.
3. SYSVARS: Hardcoded memory offsets where the core system
   stores its things. It's $80 bytes in size. If drivers need
   more memory, it's bigger. See impl.txt for details.
4. PS+RS: Typically around $100 bytes in size, PSP starts at
   the top and RSP starts at the bottom and both stacks "grow"
   against each other. When they meet, it's a stack overflow.

Unless there are arch-related constraints, these zones are
placed in that order (boot binary at addr 0, PSP at $ffff).

Number Literals

Whenever a word is parsed in the interpreter loop, we first try
parsing the word as a number literal. There are 3 literal types.

1. A 100% digits number is parsed as a decimal.
2. A string starting with $ is parsed as hexadecimal ($ab12).
3. A character inside quotes is parsed as that character ('A').

Strings and lines

Strings in Collapse OS are an array of characters in memory
associated with a length. There are no termination.

This length, when refering to that string in the different
string handling words, is usually passed around as a separate
argument in PS. It is common to see "sa sl", "sa" being the
string's address, "sl" being its length.

How that "sl" is encoded depends on the situation. For example,
the LIT" word, which writes the enclosed string and, at runtime,
yields "sa sl", is wrapped around a branch word (so that the
string isn't evaluated by forth) followed by 2 number literals.

When we refer to a "line", it's a string that is of size LNSZ,
a constant that is always 64. It corresponds to the size of the
input buffer and to the size of a line in a Block (16 lines per

Because those lines have a fixed length, we sometimes want to
know the length of the actual content in it (for example, to
EMIT it). When we do so, for example in LNLEN, we go through the
whole line and check when is that last visible character, that
is, the last one that is higher than $20 (space). That's where
our line ends.

We don't use any termination character for lines, it's too
messy.  Blocks might not have them, and when we want to display
lines in a visual mode (that is, always the full 64 characters
on the screen), we need complicated CR handling. It's simpler
to fill lines in blocks with spaces all the way.


For simplicity purposes, numbers are generally considered
unsigned. For convenience, decimal parsing and formatting
support the "-" prefix, but under the hood, it's all unsigned.

This leads to some oddities. For example, "-1 0 <" is false.
To compare whether something is negative, use the "0<" word
which is the equivalent to "$7fff >".


Branching in Collapse OS is limited to 8-bit. This represents
64 word references forward or backward. While this might seem
a bit tight at first, having this limit saves us a non-
negligible amount of resource usage.

The reasoning behind this intentional limit is that huge
branches are generally an indicator that a logic ought to be
simplified. So here's one more constraint for you to help you
towards simplicity.

Interpreter and I/Os

Collapse OS' main I/O loop is line-based. INTERPRET calls WORD
which then iterates over the current "input buffer" (INBUF) for
characters to eat up. That input buffer is generally a 64 char-
acter space in SYSVARS where typed characters are buffered from
KEY, but that's not always the case.

During a LOAD, the input buffer pointer changes and points to
one of the 16 lines of the BLK buffer. WORD eats it up just the
same, but it ain't coming from KEY anymore. When the 16th line
is read, we come back to the regular program.

Back to KEY. It always yields a characters, which means it
blocks until it yields. It loops over KEY? which returns a
flag telling us whether a key is pressed, and if there is one,
the character itself.

KEY? is an ALIAS which points to a driver implementing this
routine. It can also be overridden at runtime for nice tricks.
For example, if you want to control your computer from RS-232,
you can do "' RX<? *TO KEY?".

Interpreter output is unbuffered and only has EMIT. This word
can also be overriden, mostly as a companion to the raison
d'etre of your KEY? override.

Interpreting and compiling words

When the INTERPRET loop reads from INPUF, it separates its input
in words which yields chunks of characters.

Whenever we have a word, we begin by checking if it's a number
literal with PARSE. If yes, push it on the stack and get next
word. Otherwise, check if the word exists in the dictionary.
If yes, EXECUTE. Otherwise, it's a "word not found" error.

Compiling words with ":" follows the same logic, except that
instead of putting literals on the stack, it compiles them with
LITN and instead of executing words, it writes their address
down (except immediates, which are executed).

This "PARSE then FIND" order is the opposite of many traditional
Forths, which generally go the other way around. This is because
traditional forths often don't have hexadecimal prefixes for
their literals and the "PARSE then FIND" order would prevent the
creation of words like "face", "beef", cafe", etc. This is not
a problem we have in Collapse OS.

"PARSE then FIND" is faster because it saves us a dictionary
lookup when parsing a literal.

Native words

Native words can be assembled in two ways.

With the proper assembler loaded in memory, you can compile
words that directly execute native code. See doc/asm/intro.txt.

Otherwise, without anything loaded, you can use the HAL to
generate native code. See doc/hal.txt.

Native words are created with CODE:

CODE foo 42 i>w, PUSHp, ;CODE \ same as : foo 42 ;

Native code can also be inlined in a regular word:

: foo 42 CODE[ INCw, ]CODE . ; \ prints 43


Cell access with @ becomes heavy in cases where a cell is read
at many places in the code and seldom written to. It is also

Collapse OS has a special "value" word type which is very
similar to a cell, but instead of pushing the cell's address to
PS, it reads the value at that address and pushes it to PS in
a much faster and lighter way than "MYVAR @". You create such
word with VALUE:

FOO . \ prints 42

Modifying that value is a bit less straightforward than with
a regular cell, but can be done with TO:

FOO . \ prints 43

Traditional forths have CONSTANT, modern forths have CONSTANT
and VALUE, Collapse OS only has VALUE.

There's an additional word that facilitates the declaration of
multiple values: VALUES. You call it with the number of values
to declare an then type down the associations, like this:

3 VALUES FOO 42 BAR $55 BAZ $1234

Values can be either a literal value or a single word yielding
a single value on PS. If you want to assign a complex result to
a value, you have to use VALUE.

To set a value in a compiled word, use [TO] instead of TO.

There is also *VALUE, which is an indirect version of VALUE. It
contains an address, and calling it returns the value at that
address. It has companion *TO and [*TO] words. To change the
indirect value. Using TO on it changes the address.


A common pattern in Forth is to add an indirection layer with
a pointer word. For example, if you have a word "FOO" for
which you would like to add an indirection layer, you would
rename "FOO" to "_FOO", add a variable "FOO*" pointing to
"_FOO" and re-defining "FOO" as ": FOO FOO* @ EXECUTE".

This is all well and good, but it is resource intensive and
verbose, which make us want to avoid this pattern for words
that are often used.

For this purpose, Collapse OS has two special word types:
ALIAS and *ALIAS (indirect ALIAS).

An alias is a variable that contains a pointer to another word.
When invoked, we invoke the specified pointer with minimal over-
head. Using our FOO example above, we would create an alias with
"' _FOO ALIAS FOO". Invoking FOO will then invoke "_FOO".

An alias is structured exactly like a VALUE and you can change
the alias' pointer with TO like this:

' BAR TO FOO \ FOO now invokes BAR.

Aliases execution takes place in native code and is much, much
faster than the "@ EXECUTE" form.

A ialias is like an alias, but with a second level of indi-
rection, exactly like a *VALUE. 

*ALIAS and *VALUE are used by core code which point to hardcoded
addresses in RAM (because the core code is designed to run from
ROM, we can't have regular variables). You are unlikely to need
*ALIAS or *VALUE in regular code.

Mass storage through disk blocks

Collapse OS can access mass storage through its BLK subsystem.
See doc/blk.txt for more information.

Useful little words

In Collapse OS, we try to include as few words as possible into
the cross-compiled core, making it minimally functional for
reaching its design goals.

However, in its source code, it has a section of what is called
"Useful little words" at B120 and you'll probably want to load
some of them quite regularly because they make the system more


B122 provides the word "context" allowing multiple dictionaries
to exist concurrently. This allows you to develop applications
without having to worry too much about name clashes because
those names exist in separate namespaces.

A context is created with a name like this:

context foo \ creates context "foo"

When a context is created, it is "branched off" CURRENT as it
was at the moment the context was created.

To activate a context, call its name (in the case, "foo"). This
will do two things:

1. Save CURRENT in the previously active context.
2. Restore CURRENT to where it was the last time "foo" was
   active (or created).

Note that creating a context doesn't automatically activate it.


In traditional forths, DOES> is often used with CREATE. Not in
Collapse OS. To use the DOES> word, you must pair it with DOER.
See doc/primer.txt for details.

Collapse OS and its documentation are created by Virgil Dupras and licensed under the GNU GPL v3.

This documentation browser by James Stanley. Please report bugs on github or to

This page generated at 2021-09-19 21:05:03 from documentation in CollapseOS snapshot 20210917.