Collapse OS usage guide
If you already know Forth, start here. Otherwise, read
We begin with a few oddities in Collapse OS compared to tradi-
tional forths, then cover higher level operations.
Both () and \ comments are supported. The word "(" begins a
comments and ends it when it reads a ")" word. It needs to be a
word, that is, surrounded by whitespaces. "\" comments the rest
of the line.
Cell size and memory map
Cell size is hardcoded to 16-bit. Endian-ness is arch-dependent
and core words dealing with words will read-write according to
Memory is filled by 4 main zones:
1. Boot binary: the binary that has to be present in memory at
boot time. When it is, jump to the first address of this bin-
ary to boot Collapse OS. This code is designed to be able to
run from ROM: nothing is ever written there.
2. Work RAM: As much space as possible is given to this zone.
This is where HERE begins.
3. SYSVARS: Hardcoded memory offsets where the core system
stores its things. It's $80 bytes in size. If drivers need
more memory, it's bigger. See impl.txt for details.
4. PS+RS: Typically around $100 bytes in size. Their implemen-
tation is entirely arch-specific. Overflows aren't checked,
PS underflows art checked through SCNT.
Unless there are arch-related constraints, these zones are
placed in that order (boot binary at addr 0, PSP at $ffff).
Whenever a word is parsed in the interpreter loop, we first try
parsing the word as a number literal. There are 3 literal types.
1. A 100% digits number is parsed as a decimal.
2. A string starting with $ is parsed as hexadecimal ($ab12).
3. A character inside quotes is parsed as that character ('A').
Strings and lines
Strings in Collapse OS are an array of characters in memory
associated with a length. There are no termination.
This length, when refering to that string in the different
string handling words, is usually passed around as a separate
argument in PS. It is common to see "sa sl", "sa" being the
string's address, "sl" being its length.
How that "sl" is encoded depends on the situation. For example,
the LIT" word, which writes the enclosed string and, at runtime,
yields "sa sl", is wrapped around a branch word (so that the
string isn't evaluated by forth) followed by 2 number literals.
When we refer to a "line", it's a string that is of size LNSZ,
a constant that is always 64. It corresponds to the size of the
input buffer and to the size of a line in a Block (16 lines per
Because those lines have a fixed length, we sometimes want to
know the length of the actual content in it (for example, to
EMIT it). When we do so, for example in LNLEN, we go through the
whole line and check when is that last visible character, that
is, the last one that is higher than $20 (space). That's where
our line ends.
We don't use any termination character for lines, it's too
messy. Blocks might not have them, and when we want to display
lines in a visual mode (that is, always the full 64 characters
on the screen), we need complicated CR handling. It's simpler
to fill lines in blocks with spaces all the way.
For simplicity purposes, numbers are generally considered
unsigned. For convenience, decimal parsing and formatting
support the "-" prefix, but under the hood, it's all unsigned.
This leads to some oddities. For example, "-1 0 <" is false.
To compare whether something is negative, use the "0<" word
which is the equivalent to "$7fff >".
Branching in Collapse OS is limited to 8-bit. This represents
64 word references forward or backward. While this might seem
a bit tight at first, having this limit saves us a non-
negligible amount of resource usage.
The reasoning behind this intentional limit is that huge
branches are generally an indicator that a logic ought to be
simplified. So here's one more constraint for you to help you
Interpreter and I/Os
Collapse OS' main I/O loop is line-based. INTERPRET calls WORD
which then iterates over the current "input buffer" (INBUF) for
characters to eat up. That input buffer is a 64 characters space
in SYSVARS where typed characters are buffered from KEY, but
that's not always the case.
During a LOAD, the input buffer pointer changes and points to
one of the 16 lines of the BLK buffer. WORD eats it up just the
same, but it ain't coming from KEY anymore. When the 16th line
is read, we come back to the regular program.
Back to KEY. It always yields a characters, which means it
blocks until it yields. It loops over KEY? which returns a
flag telling us whether a key is pressed, and if there is one,
the character itself.
KEY? is an alias which points to a driver implementing this
routine. It can also be overridden at runtime for nice tricks.
For example, if you want to control your computer from RS-232,
you can do "' RX<? 'KEY? !".
Interpreter output is unbuffered and only has EMIT. This word
can also be overriden, mostly as a companion to the raison
d'etre of your KEY? override.
Interpreting and compiling words
When the INTERPRET loop reads from INBUF, it separates its input
in words which yields chunks of characters.
Whenever we have a word, we begin by checking if it's a number
literal with PARSE. If yes, push it on the stack and get next
word. Otherwise, check if the word exists in the dictionary.
If yes, EXECUTE. Otherwise, it's a "word not found" error.
Compiling words with ":" follows the same logic, except that
instead of putting literals on the stack, it compiles them with
LITN and instead of executing words, it writes their address
down (except immediates, which are executed).
This "PARSE then FIND" order is the opposite of many traditional
Forths, which generally go the other way around. This is because
traditional forths often don't have hexadecimal prefixes for
their literals and the "PARSE then FIND" order would prevent the
creation of words like "face", "beef", cafe", etc. This is not
a problem we have in Collapse OS.
"PARSE then FIND" is faster because it saves us a dictionary
lookup when parsing a literal.
Native words can be assembled in two ways.
With the proper assembler loaded in memory, you can compile
words that directly execute native code. See doc/asm/intro.txt.
Otherwise, without anything loaded, you can use the HAL to
generate native code. See doc/hal.txt.
Native words are created with CODE:
CODE foo 42 i>, ;CODE \ same as ": foo 42 ;" but faster
Native code can also be inlined in a regular word:
: foo 42 CODE[ INLINE 1+ ]CODE . ; \ prints 43
VALUE, TO, CONSTANT
Cell access with @ becomes heavy in cases where a cell is read
at many places in the code and seldom written to. It is also
Collapse OS has a special "value" word type which is very
similar to a cell, but instead of pushing the cell's address to
PS, it reads the value at that address and pushes it to PS in
a much faster and lighter way than "MYVAR @". You create such
word with VALUE:
42 VALUE FOO
FOO . \ prints 42
Modifying that value is a bit less straightforward than with
a regular cell, but can be done with TO:
43 TO FOO
FOO . \ prints 43
To set a value in a compiled word, use [TO] instead of TO.
There's an additional word that facilitates the declaration of
multiple values: VALUES. You call it with the number of values
to declare an then type down their name, like this:
3 VALUES FOO BAR BAZ
All values are initialized to 0.
If you don't need to modify your value, it's better to use
CONSTANT instead. It's much faster because it spits native code
to push that value to PS directly. It's faster than a literal.
42 CONSTANT foo
2 CONSTS 43 bar 44 baz
Sometimes, often for fulfilling protocols, we want to "plug" a
word into another, for example, we want FOO and BAR to mean the
same thing. Of course, you can do ": BAR FOO ;", but this
represents an annoying overhead, both in terms of speed and RS
space. In this case, you'll want to create an alias like this:
ALIAS FOO BAR
Which means "make BAR point to FOO". This generates a native
jump which is pretty much as low overhead as you can be.
Those aliases are read-only. Once created, they can't be
changed. If you want to use a word as an indirection, you need
to use execute like this:
: FOO ;
' FOO VALUE 'BAR
: BAR 'BAR EXECUTE ; \ BAR executes FOO
: BAZ ;
' BAZ TO 'BAR \ BAR EXECUTES BAZ
Core words have 2 special aliases, which jump to an address
determined in their corresponding SYSVAR. These are EMIT and
Each of these system aliases have their corresponding "'" SYSVAR
address CONSTANT. You go through them to modify where the alias
jumps to. Example:
' RX<? 'KEY? !
' TX> 'EMIT !
Most SYSVARS described in doc/impl.txt have a CONSTANT
corresponding to their absolute address. For example, you get
the value of "NL" with "NL @" and set it with "NL !".
Some SYSVARS are very often used and necessitate faster access.
These SYSVARS are split in 2 words: the accessor and the
address. For example, we have HERE and 'HERE. HERE returns
HERE's value directly and 'HERE returns HERE's address.
Therefore, you get HERE with "HERE" and set it with "'HERE !".
The list of such SYSVARS is:
HERE CURRENT IN( IN> BLK>
Most traditional Forths have DO..LOOP, Collapse OS has BEGIN..
NEXT. It only stores one number on RS instead of 2. It's a
number that is decremented at each NEXT and the loop exits when
that number is zero.
The initial value for this loop counter must be manually placed
on RS. Example: 42 >R BEGIN NEXT.
The A and B registers
The A and B registers are out of stack temporary values that
often help minimize stack juggling. Their location is arch-
dependent, but it's often in SYSVARS. On register-rich CPUs,
it's a register.
Access to them is fast, but their downside is that words using
them must be careful not to use words that also use the same
register. doc/dict.txt indicate such words with *A* and *B*.
Mass storage through disk blocks
Collapse OS can access mass storage through its BLK subsystem.
See doc/blk.txt for more information.
Useful little words
In Collapse OS, we try to include as few words as possible into
the cross-compiled core, making it minimally functional for
reaching its design goals.
However, in its source code, it has a section of what is called
"Useful little words" at B120 and you'll probably want to load
some of them quite regularly because they make the system more
B122 provides the word "context" allowing multiple dictionaries
to exist concurrently. This allows you to develop applications
without having to worry too much about name clashes because
those names exist in separate namespaces.
A context is created with a name like this:
context foo \ creates context "foo"
When a context is created, it is "branched off" CURRENT as it
was at the moment the context was created.
To activate a context, call its name (in the case, "foo"). This
will do two things:
1. Save CURRENT in the previously active context.
2. Restore CURRENT to where it was the last time "foo" was
active (or created).
Note that creating a context doesn't automatically activate it.
DOER, DOES> and CDOES>
In traditional forths, DOES> is often used with CREATE. Not in
Collapse OS. To use the DOES> word, you must pair it with DOER.
See doc/primer.txt for details.
On top of that, Collapse OS has a nice extra: CDOES>. It can
be used instead of DOES> and is followed by native code, which
is of course much faster. Example:
: adder DOER , CDOES> INLINE @ INLINE 1+ ;CODE
42 adder foo
foo . \ prints 43
This page generated at 2022-01-16 21:05:03 from documentation in CollapseOS snapshot 20220115.