VIUA-ISA(7) Viua VM Manual VIUA-ISA(7)
NAME
viua-isa - Viua VM instruction set architecture
SYNOPSIS
Viua is a RISC-style architecture, with:
• 128-bit wide type-aware registers
• dedicated call instructions, and no manual call stack management
• built-in memory allocation and deallocation instructions
• pointer tracking
• I/O through asynchronous ports
DESCRIPTION
In Viua VM's ISA design the emphasis is put on reliability of the pro‐
grams, performance begin a secondary consideration--the guiding idea
being that an answer delivered quickly is useless if it is wrong; or
that a program that runs fast but crashes often is less useful than one
that works correctly.
REGISTERS
Registers are organised in sets.
┌───────────┬───────┬─────────────────────────────────────────────────┐
│ set │ label │ no. of registers/description │
├───────────┼───────┼─────────────────────────────────────────────────┤
│ local │ l │ 64, fixed │
├───────────┼───────┼─────────────────────────────────────────────────┤
│ argument │ a │ 0-64, variable │
├───────────┼───────┼─────────────────────────────────────────────────┤
│ parameter │ p │ 0-64, variable │
├───────────┼───────┼─────────────────────────────────────────────────┤
│ special │ void │ nothing │
├───────────┼───────┼─────────────────────────────────────────────────┤
│ special │ zero │ wired to signed zero for arithmetic types, in‐ │
│ │ │ valid for other types │
├───────────┼───────┼─────────────────────────────────────────────────┤
│ special │ uzero │ wired to unsigned zero for arithmetic types, │
│ │ │ invalid for other types │
└───────────┴───────┴─────────────────────────────────────────────────┘
A general purpose register is accessed using the following notation:
$I.S
where I is the index of a register, and S is the register set label eg,
• $1.l means register 1 from local register set
• $0.p means register 0 from parameter register set
• $2.a means register 2 from argument register set
A special register is accessed using its label eg,
• void means an empty register
• zero means a signed 0
• uzero means an unsigned 0
General purpose registers
There are three general-purpose register sets:
• local with 64 registers numbered from 0 to 63
• parameter with between 0 and 64 registers
• argument with between 0 and 64 registers
All three register sets are allocated per-frame.
The local register set is always local to a frame. In contrast, parame‐ter and argument register sets "move" during a call. The caller allo‐
cates the argument register set using the frame instruction eg,
frame $4.a
The example above allocates a frame with 4 registers in the argument
register set. The program then sets the registers as appropriate, and
issues a call instruction eg,
call void, foo
In the example above, the call instruction uses the allocated frame for
a call to the foo function. Inside foo, the callee, the argument regis‐
ter set allocated by the caller is visible as the parameter register
set. Why?
Arguments are the actual parameters ie, what the caller actually gives
to the callee. Parameters are the formal parameters ie, what the callee
expects to receive from the caller. The difference in how the register
set is viewed by the caller and the callee signifies this change in
meaning.
FUNDAMENTAL TYPES
Registers in Viua are aware of the type of the value they hold, and may
only hold values of the fundamental types. Each of the fundamental
types is described below.
Some instructions may require the use of a specific fundamental type
for an operand, but apply further restrictions on the exactl values
consumed or produced; eg, a saturating addition of 8-bit wide signed
integers expects to receive a signed integer as its input and will pro‐
duce the same type as its output, but will never produce a value out‐
side of the [-128, 127] range.
┌─────────┬──────────────────┬────────────────────────────────────────┐
│ symbol │ type name │ short type description │
├─────────┼──────────────────┼────────────────────────────────────────┤
│ │ void │ represents "nothing" │
├─────────┼──────────────────┼────────────────────────────────────────┤
│ │ undefined │ represents raw, uninterpreted values │
│ │ │ loaded from memory │
├─────────┼──────────────────┼────────────────────────────────────────┤
│ atom │ atom │ represents values that represent them‐ │
│ │ │ selves │
├─────────┼──────────────────┼────────────────────────────────────────┤
│ double │ double │ the C double │
├─────────┼──────────────────┼────────────────────────────────────────┤
│ float │ float │ the C float │
├─────────┼──────────────────┼────────────────────────────────────────┤
│ int │ signed integer │ signed 64-bit integer, in twos-comple‐ │
│ │ │ ment format │
├─────────┼──────────────────┼────────────────────────────────────────┤
│ uint │ unsigned integer │ unsigned 64-bit integer │
├─────────┼──────────────────┼────────────────────────────────────────┤
│ pid │ process id │ represents actor (process) identifiers │
├─────────┼──────────────────┼────────────────────────────────────────┤
│ pointer │ pointer │ represents memory addresses │
└─────────┴──────────────────┴────────────────────────────────────────┘
Weird types
There are two "weird" types, which are not immediately useful: void and
undefined.
void
The void type represents the concept of "nothing". It has no bit
representation, and exists only as a flag marking a register as
containing nothing.
undefined
The undefined type represents values loaded from memory that have
not yet been interpreted. Attempts to interpret such values by any
instruction other than cast will abort execution of the offending
process immediately.
However, values of the undefined type may be safely handled aslongastheyarenotinterpreted. The following instructions may be used
to shuttle undefined values around:
• copy
• move
• swap
• return
Since they do not try to use the values in any way, they do not
have to interpret them and do not run afoul of the "undefined can‐
not be interpreted" rule.
INSTRUCTION FORMATS AND ENCODING
The general instruction format looks like this:
63 16│15 0
┌─────────────────────────────┼────────┐
│ workload │ opcode │
└─────────────────────────────┴────────┘
Every instruction is encoded on 64 bits, with the opcode always occupy‐
ing the lowest half-word.
If an instruction stores a value in a register the output register is
always encoded in the third least-significant byte, immediately adja‐
cent to the opcode.
N-format
An instruction with no (N) operands.
63 16│15 0
┌─────────────────────────────┼────────┐
│ │ opcode │
└─────────────────────────────┴────────┘
Bits 63-16 are reserved, and have unspecified value.
T-format
An instruction with a three-way (triple - T) register access. T-format
instructions are used, for example, for arithmetic and logic opera‐
tions, which have two inputs and a single output.
63 40│39 32│31 24│23 16│15 0
┌───────────┼─────┼─────┼─────┼────────┐
│ │ snd │ src │ dst │ opcode │
└───────────┴─────┴─────┴─────┴────────┘
Bits 63-40 are reserved, and have unspecified value.
When the order is relevant, src is the left-hand side of the operation,
and snd is the right-hand side of the operation.
D-format
An instruction with a two-way (double - D) register access.
63 32│31 24│23 16│15 0
┌─────────────────┼─────┼─────┼────────┐
│ │ src │ dst │ opcode │
└─────────────────┴─────┴─────┴────────┘
Bits 63-32 are reserved and have unspecified value.
S-format
An instruction with a one-way (single - S) register access.
63 32│31 24│23 8
┌───────────────────────┼─────┼────────┐
│ │ rgr │ opcode │
└───────────────────────┴─────┴────────┘
Bits 63-24 are reserved and have unspecified value.
M-FORMAT
An instruction with a two-way register access, and a 32-bit memory off‐
set. M, because the format is used for memory loads and stores.
63 32│31 24│23 16│15 0
┌─────────────────┼─────┼─────┼────────┐
│ offset │ src │ dst │ opcode │
└─────────────────┴─────┴─────┴────────┘
For loads and stores, dst is the register into which the data will be
put or from which it will be taken, and src is the register that holds
the pointer to the memory location.
Every memory instruction has a unit embedded in its opcode:
┌──────┬────────┬─────────────────────────────────────────────────────┐
│ unit │ sizeof │ meaning │
├──────┼────────┼─────────────────────────────────────────────────────┤
│ b │ 1 │ byte │
├──────┼────────┼─────────────────────────────────────────────────────┤
│ h │ 2 │ half-word │
├──────┼────────┼─────────────────────────────────────────────────────┤
│ w │ 4 │ word │
├──────┼────────┼─────────────────────────────────────────────────────┤
│ d │ 8 │ double-word │
├──────┼────────┼─────────────────────────────────────────────────────┤
│ q │ 16 │ quad-word │
├──────┼────────┼─────────────────────────────────────────────────────┤
│ o │ 32 │ octa-word │
├──────┼────────┼─────────────────────────────────────────────────────┤
│ x │ 64 │ hexadecimal-word │
├──────┼────────┼─────────────────────────────────────────────────────┤
│ u │ 128 │ duotrigesimal-word │
└──────┴────────┴─────────────────────────────────────────────────────┘
This unit appears as a part of an instruction's name.
For example, the lm (load memory, not actually usable as assembler will
reject the base unit-less form) instruction may be written as lb (load
byte), or as lq (load quad-word).
Struct layout considerations
Let us examine a sample of code which would load a quad-word from ad‐
dress 0xdeadbeef into register $2.l:
li $1.l, 0xdeadbeef
lq $2.l, $1.l, 0
The value of the off operand is 0, meaning that there is no additional
offset to add to the base pointer from src. This is the simplest form
of the lq (Load Quad-word) instruction, and is equivalent to loading
the first element of an array at an address.
However, what if we wanted to load the, for example, fourth double-word
at an address? Something like to the following C code:
uint64_t arr[];
uint64_t ele = arr[3]; /* what do we do here? */
The translation is simple:
; get the base pointer ie, the address of the array
li $1.l, arr
; load the double-word at offset (unit * 3) from
; the base address
ld $2.l, $1.l, 3
This design makes sequence access easy, but also nudges compilers to
lay structures out in memory in a certain way for greatest flexibility
and efficiency.
Given the following C struct:
struct example {
uint64_t dw;
uint16_t hw;
uint8_t b1;
uint32_t w;
uint8_t b2;
};
let us examine the code accessing the struct's fields if the compiler
could not reorder fields and had to lay them out in memory in the order
specified by the programmer (assuming no padding is added for align‐
ment):
; get the address of the struct
li $1.l, ...
; field: dw
lq void, $1.l, 0
; field: hw
lh void, $1.l, 4
; field: b1
lb void, $1.l, 6
; field: w1
; The base pointer has to be adjusted because
; there is no way to make the offset work here.
addi $2.l, $1.l, 9u
lw void, $2.l, 0
; field: b2
lb void, $1.l, 13
Most cases are simple, but there are two weird ones: loading b2 and w1
looks "bad" with the high offset or base address manipulation.
To fix the situation the compiler could either insert padding, or re‐
order the fields in such a way:
struct example {
uint8_t b1;
uint8_t b2;
uint16_t hw;
uint32_t w;
uint64_t dw;
};
Then, the access code is greatly simplified:
; get the address of the struct
li $1.l, ...
; field: b1
lb void, $1.l, 0
; field: b2
lb void, $1.l, 1
; field: hw
lh void, $1.l, 1
; field: w1
lw void, $2.l, 1
; field: dw
ld void, $1.l, 1
In general, it is a good idea to put the smallest fields first, and
align the first field with an increased size (ie, align half-words on 2
bytes, words on 4 bytes, etc). This will make field accesses easy write
and read.
The offset being multiplied by unit also means that putting the small‐
est fields first allows manipulating bigger structs than putting
biggest fields first would; because while the maximum reach of the lb
instruction is a respectable 4'294'967'296 bytes (4GiB), the reach of
the lq instruction is an even more impressive 34'359'738'368 bytes
(32GiB), and the reach of the lu (load duotrigesimal-word) is a stun‐
ning 549'755'813'888 bytes (512GiB)—all with a single instruction!
Ordering the fields from the smallest lets a single instruction access
any field in structs of up to 512GiB without any extra processing.
I-format
An instruction with one-way register access, and a 32-bit wide immedi‐
ate (hence I) value.
63 32│31 24│23 16│15 0
┌─────────────────┼─────┼─────┼────────┐
│ immediate │ │ dst │ opcode │
└─────────────────┴─────┴─────┴────────┘
Bits 63-32 are reserved and have unspecified value.
One of the most useful I-format instructions is lui ie, the Load Upper
Intermediate, which may be familiar to people who have some RISC-V
background. It loads an immediate value into the higher 32 bits of a
register:
lui $1.l, 0xdeadbeef
U-format
An instruction with a two-way register access, and a 32-bit immediate
value. It is a useful (hence U) format.
63 32│31 24│23 16│15 0
┌─────────────────┼─────┼─────┼────────┐
│ immediate │ src │ dst │ opcode │
└─────────────────┴─────┴─────┴────────┘
This format is used for eg, the addi instruction, which adds a 32-bit
immediate to the value stored in the src register, and places the re‐
sult in the dst register. Combined with the I-format lui instruction,
it can be used to efficiently load a 64-bit value:
; load the upper half of the value
lui $1.l, 0xdeadbeef
; add the lower half of the value
addi $1.l, $1.l, 0xbadc0ffe
OPCODE ENCODING
Opcodes are encoded on the lowest 16 bits of an instruction.
15 13│12 10│ 9 │8 0
┌───────┼───────┼───┼──────────────────┐
│ fmt │ flg │ u │ operation │
└───────┴───────┴───┴──────────────────┘
Bits 15-13 encode the instruction format. The formats are described in
section INSTRUCTION FORMATS AND ENCODING.
Bits 12-10 encode the flags, which are mostly relevant for memory and
arithmetic instructions.
Bit 9 encodes the unsigned flag, which is relevant for some arithmetic
instructions, and for the luiu (load upper intermediate unsigned) in‐
struction.
Bits 8-0 encode the actual operation code.
Instruction dispatch depends on bits 15-13 and 9-0. This means there
are 8192 possible instructions.
LISTING OF INSTRUCTIONS
The following instructions are available:
┌──────────────────┬─────┬────────────────────────────────────────────┐
│ │ fmt │ meaning │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ noop │ N │ no operation │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ halt │ N │ halt the process │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ ebreak │ N │ break execution, shows environment dump by │
│ │ │ default │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ ecall │ N │ issue an environment call (a system call) │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ add │ T │ addition │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ addi │ U │ add immediate │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ addiu │ U │ add (unsigned) immediate │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ sub │ T │ subtraction │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ subi │ U │ subtract immediate │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ subiu │ U │ subtract (unsigned) immediate │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ mul │ T │ multiplication │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ muli │ U │ multiply by immediate │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ muliu │ U │ multiply by (unsigned) immediate │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ div │ T │ division │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ divi │ U │ divide by immediate │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ diviu │ U │ divide by (unsigned) immediate │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ earithmeticwidth │ D │ limit bit width of some operations │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ bitshl │ T │ left bit shift │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ bitshr │ T │ right bit shift │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ bitashr │ T │ arithmetic (sign-preserving) right bit │
│ │ │ shift │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ bitrol │ T │ left bit roll │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ bitror │ T │ right bit roll │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ bitand │ T │ bitwise "and" │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ bitor │ T │ bitwise "or" │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ bitxor │ T │ bitwise "xor" │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ bitnot │ D │ bitwise "not" │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ eq │ T │ lhs = rhs │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ lt │ T │ lhs < rhs │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ gt │ T │ lhs > rhs │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ cmp │ T │ compare two values, returning -1 for less- │
│ │ │ than, 0 for equal-to, and 1 for greater- │
│ │ │ than relation │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ and │ T │ logical "and" │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ or │ T │ logical "or" │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ not │ D │ logical "not" │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ io_submit │ T │ submit an I/O request to an I/O port │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ io_wait │ T │ wait for a completion of an I/O request │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ io_shutdown │ T │ shut an I/O port down │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ io_ctl │ T │ inspect and manipulate an I/O port │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ io_peek │ D │ inspect an in–flight I/O request │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ frame │ S │ allocate a frame for a call │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ call │ D │ call a function │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ actor │ D │ spawn a new actor (a concurrent function │
│ │ │ call) │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ return │ S │ return from a function call │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ copy │ D │ copy bits between two registers │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ move │ D │ move bits between two registers, erasing │
│ │ │ the source │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ swap │ D │ swap bits between two registers │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ if │ D │ take a conditional branch │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ gts │ D │ store value in the Global Table │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ gtl │ D │ load value from the Global Table │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ atom │ S │ load an atom │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ double │ S │ load a double │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ float │ I │ load a float │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ self │ S │ load the PID of the current actor │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ lui │ I │ load upper intermediate │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ luiu │ I │ load upper (unsigned) intermediate │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ arodp │ I │ create a pointer to a .rodata symbol │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ atxtp │ I │ create a pointer to a .text symbol │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ sm │ M │ store in memory │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ lm │ M │ load from memory │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ cast │ I │ cast a raw value loaded from memory │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ aa │ M │ allocate automatic memory │
├──────────────────┼─────┼────────────────────────────────────────────┤
│ ad │ M │ allocate dynamic memory │
└──────────────────┴─────┴────────────────────────────────────────────┘
Detailed descriptions are provided in viua-ops(7).
ENVIRONMENT
Some of the machine's behaviour is affected by its internal environment
eg, the bit width of some arithmetic operations (see the earith‐meticwidth instruction). The following table lists the environment reg‐
isters:
┌──────────┬────────┬─────────────────────────────────────────────────┐
│ register │ values │ meaning │
├──────────┼────────┼─────────────────────────────────────────────────┤
│ earw │ 1-64 │ bit width of styled arithmetic operations │
└──────────┴────────┴─────────────────────────────────────────────────┘
MEMORY
Memory provides additional storage and work space to a process, in ad‐
dition to the registers. While registers are fixed-size and can only
hold a limited amount of data, objects stored in memory can be nearly
unlimited in size and complexity.
The .rodata section
Objects in the .rodata section are available to all functions.
The .data and the .bss sections
The .data and the .bss sections are not used by the VM, and the concept
of globally accessible read-write segment of memory does not exist.
Layout
Executable code ie, the .text section, and the read-only global data
ie, the .rodata section are allocated in the regions of memory with low
addresses. The programs are not able to manipulate memory in those re‐
gions.
Memory that programs are able to manipulate begins near the highest ad‐
dress and grows downwards, as the program allocates more memory. The
first read-write byte a process can access is always allocated at the
address 0xbfff'ffff'ffff'fff0.
Each process receives a page of memory when it starts. The size of a
page is 256 bytes, divided into lines of 16 bytes.
NOTE: The 256-byte pages are just a temporary measure during the
alpha stage of development, to make it easier to inspect the ma‐
chine's internal state in tests. This is not the intended final
page size.
The stack
Memory on the stack is allocated using the aa (allocate automatic) in‐
struction.
The virtual machine manages stack memory automatically: the programmer
is expected to request memory to be allocated, but the deallocation is
handled by the machine.
Each allocation is assigned to a call frame, and the machine drops al‐
located chunks as soon as the frame they were allocated in is popped
off the call stack. If a function needs memory to return a result, the
memory MUST be allocated by the caller.
Usable stack memory lies between the addresses stored in the fp and
sbrk registers:
┌────────────────────┬────────────────────────────────────────────────┐
│ fp (frame pointer) │ Points at the first byte directly accessible │
│ │ from within the frame. │
├────────────────────┼────────────────────────────────────────────────┤
│ sbrk (stack break) │ Points at the byte onepast the last byte di‐ │
│ │ rectly accessible from within the frame. │
└────────────────────┴────────────────────────────────────────────────┘
Since the stack grows downwards, the fp is also the highest address,
and the sbrk-plus-one the lowest address, directly accessible from
within the frame. To get the amount of memory allocated to a process,
simply subtract sbrk from fp.
Objects located in the .rodata section are treated differently and are
available from anywhere; however, since they exist outside of the
stack, this does not break the rule of "no direct access to the stack
beyond the frame's boundaries from within the frame".
What does directaccess mean? It means that no stack address above the
fp or below-or-at the sbrk is accessible, unless the frame received a
pointer to it from the caller.
The heap
Memory on the heap is allocated using the ad (allocate dynamic) in‐
struction.
Heap memory is not implemented at the current moment.
SEE ALSOviua-asm(5), viua-ops(7).
elf(5).
Patterson, David A. and Waterman, Andrew. TheRISC-VReader. Strawberry
Canyon LLC, 2017.
ISBN 978-09-9924-911-6
Web site
‹https://viuavm.org›
Source code repository
‹https://git.sr.ht/~maelkum/viuavm›
VIUA VM
Part of the viua(1) toolchain.