Before you read this primer, let's try a few commands, just for
This will push the number 42 to the stack, then print the number
at the top of the stack.
4 2 + .
This pushes 4, then 2 to the stack, then adds the 2 numbers on
the top of the stack, then prints the result.
42 $8000 C! $8000 C@ .
This writes the byte "42" at address $8000 ($ prefix is for hex
notation), and then reads back that byte from the same address
and print it.
Forth's main interpeter loop is very simple:
1. Read a word from input.
2. Is it a number literal? Put it on the stack.
3. No? Look it up in the dictionary.
4. Found? Execute.
5. Not found? Error.
A word is a string of non-whitepace characters. We consider that
we're finished reading a word when we encounter a whitespace
after having read at least one non-whitespace character.
Collapse OS doesn't support any other encoding than 7bit ASCII.
A character smaller than $21 is considered a whitespace,
others are considered non-whitespace.
Characters above $7f have no special meaning and can be used in
words (if your system has glyphs for them).
Forth's dictionary link words to code. On boot, this dictionary
contains the system's words (look in dict.txt for a list of
them), but you can define new words with the ":" word. For
: FOO 42 . ;
defines a new word "FOO" with the code "42 ." linked to it. The
word ";" closes the definition. Once defined, a word can be
executed like any other word.
You can define a word that already exists. In that case, the new
definition will overshadow the old one. However, any word def-
ined *before* the overshadowing took place will still use the
: foo 42 . ;
: bar foo ;
: foo 43 . ;
foo \ prints 43
bar \ prints 42
The cell size in Collapse OS is 16 bit, that is, each item in
stacks is 16 bit, @ and ! read and write 16 bit numbers.
Whenever we refer to a number, a pointer, we speak of 16 bit.
To read and write bytes, use C@ and C!.
Traditional Forths often uses HEX/DEC switches to go from deci-
mal to hexadecimal parsing. Collapse OS has no such mode.
Straight numbers are decimals, numbers starting with "$" are
hexadecimals (example "$12ef"), char literals are single
characters surrounded by ' (example 'X'). Char literals can't be
used for whitespaces (conflicts with the concept of "word" as
Unlike most programming languages, Forth execute words directly,
without arguments. The Parameter Stack (PS) replaces them. There
is only one, and we're constantly pushing to and popping from
it. All the time.
For example, the word "+" pops the 2 number on the Top Of Stack
(TOS), adds them, then pushes back the result on the same stack.
It thus has the "stack signature" of "a b -- n". Every word in
a dictionary specifies that signature because stack balance, as
you can guess, is paramount. It's easy to get confused so you
need to know the stack signature of words you use very well.
There's a second stack, the Return Stack (RS), which is used to
keep track of execution, that is, to know where to go back after
we've executed a word. It is also used in other contexts, but
this is outside of the scope of this primer.
Code can be executed conditionally with IF/ELSE/THEN. IF pops
PS and checks whether it's nonzero. If it is, it does nothing.
If it's zero, it jumps to the following ELSE or the following
THEN. Similarly, when ELSE is encountered in the context of a
nonzero IF, we jump to the following THEN.
Because IFs involve jumping, they only work inside word defin-
itions. You can't use IF directly in the interpreter loop.
: FOO IF 42 ELSE 43 THEN . ;
0 FOO --> 43
1 FOO --> 42
Loops work a bit like conditionals, and there's 3 forms:
BEGIN..AGAIN --> Loop forever
BEGIN..UNTIL --> Loop conditionally
X >R BEGIN..NEXT --> Loop X times
UNTIL works exactly like IF, but instead of jumping forward to
THEN, it jumps backward to BEGIN.
NEXT decreases RS' TOS by one and if zero isn't reached, jumps
backward to BEGIN.
Why not have a FOR which would be the equivalent of ">R BEGIN"?
Because in many cases, maybe even most, the order of arguments
in PS is such that it's more convenient to perform the ">R" a
little earlier. Doing so right before BEGIN results in needless
stack juggling. The lack of FOR makes all NEXT loop look the
same, which helps overall readability.
You can use the word "LEAVE" to exit a NEXT loop early. When
used, it will finish the current loop and then stop looping when
NEXT is reached.
: foo 5 >R BEGIN R@ 3 = IF LEAVE THEN R@ . NEXT ;
foo \ prints 543
You can leave a word early with EXIT:
: foo 42 . EXIT 43 . ;
foo \ only 42 is printed
When you're inside a BEGIN..AGAIN or BEGIN..UNTIL, you can use
EXIT just fine, but if you're inside a NEXT loop, you have to
drop RS' TOS with R~ calling EXIT or else you have a messed up
Return Stack and all hell breaks loose.
Memory access and HERE
We can read and write to arbitrary memory address with @ and !
(C@ and C! for bytes). For example, "1234 $8000 !" writes the
word 1234 to address $8000. We call the @ and ! actions
"fetch" and "store".
There's a 3rd kind of memory-related action: "," (write). This
action stores value on PS at a special "HERE" pointer and then
increases HERE by 2 (there's also "C," for bytes).
HERE is initialized at the first writable address in RAM, often
directly following the latest entry in the dictionary. Explain-
ing the "culture of HERE" is beyond the scope of this primer,
but know that it's a very important concept in Forth. For examp-
le, new word definitions are written to HERE.
Linking names to addresses
Accessing addresses only with numbers can become confusing, us
humans often need names associated to them. You can do so with
CREATE. This word creates a dictionary entry of the "cell" type.
This word, when called, will put its own address on the stack.
You are responsible for allocating a proper amount of memory to
For example, if you want to store a single 16-but number, you
would do "CREATE foo 2 ALLOT". You can then do stuff like
"42 foo ! foo @ . ( prints 42 )"
Cells can store more than just a number, they can hold
structures and array. Simply ALLOT appropriately and then use
this memory as you wish.
Another way to link a name to an address is VALUE. The "VALUE"
word takes a value parameter and creates a special "value" type
word. This word type always allocates 2 bytes of memory and when
called, instead of spitting its address, spits the 16-bit value
at that address.
You can change the number associated with a VALUE with TO (or
[TO] if you're inside a definition). Example:
42 VALUE foo foo . ( prints 42 )
43 TO foo foo . ( prints 43 )
VALUEs make more readable code in cases where the value is more
often read than written. It is also significantly faster.
If your VALUE never changes, you can also use CONSTANT, which is
created like a VALUE, but cannot be changed. Its advantage over
VALUE is that it's much faster.
DOER and DOES>
DOER and DOES> allow to bind data and behavior together in a
space-efficient way. Those words are called "does words" and,
when created, behave a bit like a cell (a CREATE word): it
pushes its own address to PS. But then, instead of just
continuing along, it executes its DOES> instructions. Example:
: printer DOER , DOES> @ . ;
42 printer foo
foo \ prints 42
DOER creates a special "does" entry and DOES> tells the latest
DOER entry where to jump for its behavior. The instructions
following DOES> are not executed when the DOER is defined, only
when it's executed. This execution always happen in a context
where the DOER's address in on PS. This is why, in the example
above, we call "@" before ".".
We approach the end of our primer. So far, we've covered the
"cute and cuddly" parts of the language. However, that's not
what makes Forth powerful. Forth becomes mind-bending when we
throw IMMEDIATE into the mix.
A word can be declared immediate thus:
: FOO ; IMMEDIATE
That is, when the IMMEDIATE word is executed, it makes the
latest defined word immediate.
An immediate word, when used in a definition, is executed
immediately instead of being compiled. This seemingly simple
mechanism (and it *is* simple) has very wide implications.
For example, The words "(" and ")" are comment indicators. In
: FOO 42 ( this is a comment ) . ;
The word "(" is read like any other word. What prevents us from
trying to compile "this" and generate an error because the word
doesn't exist? Because "(" is immediate. Then, that word reads
from input stream until a ")" is met, and then returns to word
Words like "IF" and "DO" are all regular Forth words, but their
"power" come from the fact that they're immediate.
Starting Forth by Leo Brodie explains all of this in detail.
Read this if you can. If you can't, well, let this sink in for
a while, browse the dictionary (dict.txt) and try to understand
why this or that word is immediate. Good luck!
This page generated at 2022-01-16 21:05:03 from documentation in CollapseOS snapshot 20220115.