Collapse OS Documentation Browser


../ 6502.txt 6809.txt 8086.txt avr.txt intro.txt z80.txt

Assembling binaries

Collapse OS features many assemblers. Each of them have their
specificities, but they are very similar in the way they work.

This page describes common behavior. Some assemblers stray from
it. Refer to arch-specific documentation for details.

Initial setup

Assemblers live in their arch-specific blkfs. To load it, you
first need to run "ARCHM" to have arch-specific loaders, and
then call your assembler loader (for example, "Z80A"). After
that, you have to set it up before spitting opcodes. More
specifically, you might have to set ORG and BIN( variables.

ORG, defaulting to 0, specifies where the binary begins in
memory. It allows the PC word to return the proper value.
Generally, when you're ready to spit upcodes, you run
"HERE TO ORG" so that PC is set to 0.

BIN(, defaulting to 0, specifies where the resulting binary
lives in memory. If all you spit are relative jumps, it doesn't
matter, but if you need to jump to an absolute address, BIN(
needs to be correct. Note that ;CODE spits an absolute jump in
many arches, so BIN( often needs to be correct.

If you compile for a "live" target (the computer running
Collapse OS), you don't need to set ORG and BIN(.

Wrapping native code

You will often want to wrap your native code in such a way that
it can be used from within forth. You have to main options.

CODE allows you to create a new word, but instead of compiling
references to other words, you write native code directly.


This word can then be used like any other (and is of course
very fast).

Unlike the regular compiling process, you don't go in "compile
mode" when you use CODE. You stay in regular INTERPRET mode.
All CODE does is spit the proper ENTRY head.

Be sure to read about your target platform in doc/code. These
documents specify which registers are assigned to what role.

Another option is "inline assembler". When you're in a tight
spot inside a word that you'd like to be faster, but that
creating a whole word is too much, you can use CODE[. Example:

; foo 42 CODE[ BC INCd, BC INCd, ]CODE . ; \ prints 44

This example above would be significantly faster than "2 +".
At runtime, the overhead, in terms of speed, is the same as a
regular CODE word, but in terms of binary size, it's better.


To spit binary code, use opcode words such as "LDrr," in the
Z80 assembler which spits LD in its "r1, r2" form. Unlike
typical assemblers, operation arguments go before the opcode
word, not after it. Therefore, the "LD A, B" you would write in
a regular assembler becomes "A B LDrr,"

Those opcode words, of which there is a complete list in each
arch-specific documentation, end with "," to indicate that their
effect is to write (,) the corresponding opcode.

The "argtype" suffix after each mnemonic is needed because the
assembler doesn't auto-detect the op's form based on arguments.
It has to be explicitly specified.

Although efforts are made to keep those argtypes consistent
across arches, there are differences. Arch-specific doc has
precise definitions for those argtypes.

For example, in Z80 asm, "r" is for 8-bit registers, "d" for
16-bit ones, "i" for immediate, "c" is for conditions.

Labels and flow

All assemblers and HALs implement standard flow words:

JRi, ( off -- ) \ relative unconditional jump
?JRi, ( off -- ) \ relative conditional jump
Z? \ make Z the condition
C? \ make C the condition
^? \ invert condition
JMPi, ( addr -- ) \ unconditional absolute jump
CALLi, ( addr -- ) \ unconditional absolute call

See doc/hal.txt for details about those words.

The ASMH (asm common words, "high" part) builds upon those words
to implement useful structured flow words:

IFZ, .. ELSE, .. THEN, \ part 1 if Z is set, part 2 otherwise
IFNZ, .. THEN, \ execute if Z is unset
IFC, .. THEN, \ execute if C is set
IFNC, .. THEN, \ execute if C is unset
BEGIN, .. BR JRi, \ loop forever
BEGIN, .. BR Z? ?JRi, \ loop if Z is set
FJR JRi, .. THEN, \ unconditional forward jump

These structured flow are elegant, but limited because they need
to be symmetric. There is no way, for example, to jump out of
an infinite loop using only those words.

Labels can also be used with those flow words for more

LSET L1 .. L1 BR JRi, .. L1 JMPi, \ backward jumps
FJR JRi, TO L1 .. L1 FMARK \ forward jump
BEGIN, FJR JRi, TO L1 .. BR JRi, .. L1 FMARK \ exiting loop

To avoid using dict memory in compilation targets, we pre-
declare label variables here, which means we have a limited
number of it. We have 3: L1, L2, L3.

You can define your own labels with a simple "0 VALUE lblname",
but you have to do so before you begin spitting opcodes.


As explained in cross.txt, all assembler supply words allowing
to write 16bit numbers in a target's endian-ness. Common words
at B2 already supply these words and they're all dependent on
the BIGEND? variable which defaults to 0. Assemblers for big-
endian architectures have to set this to 1.

Collapse OS and its documentation are created by Virgil Dupras and licensed under the GNU GPL v3.

This documentation browser by James Stanley. Please report bugs on github or to

This page generated at 2022-01-16 21:05:03 from documentation in CollapseOS snapshot 20220115.