VIUA-ISA(7) Viua VM Manual VIUA-ISA(7)
NAME viua-isa - Viua VM instruction set architecture SYNOPSIS Viua is a RISC-style architecture, with: • 128-bit wide type-aware registers • dedicated call instructions, and no manual call stack management • built-in memory allocation and deallocation instructions • pointer tracking • I/O through asynchronous ports DESCRIPTION In Viua VM's ISA design the emphasis is put on reliability of the pro‐ grams, performance begin a secondary consideration--the guiding idea being that an answer delivered quickly is useless if it is wrong; or that a program that runs fast but crashes often is less useful than one that works correctly. REGISTERS Registers are organised in sets. ┌───────────┬───────┬─────────────────────────────────────────────────┐ │ set │ label │ no. of registers/description │ ├───────────┼───────┼─────────────────────────────────────────────────┤ │ local l │ 64, fixed │ ├───────────┼───────┼─────────────────────────────────────────────────┤ │ argument a │ 0-64, variable │ ├───────────┼───────┼─────────────────────────────────────────────────┤ │ parameter p │ 0-64, variable │ ├───────────┼───────┼─────────────────────────────────────────────────┤ │ special │ void │ nothing │ ├───────────┼───────┼─────────────────────────────────────────────────┤ │ special │ zero │ wired to signed zero for arithmetic types, in‐ │ │ │ │ valid for other types │ ├───────────┼───────┼─────────────────────────────────────────────────┤ │ special │ uzero │ wired to unsigned zero for arithmetic types, │ │ │ │ invalid for other types │ └───────────┴───────┴─────────────────────────────────────────────────┘ A general purpose register is accessed using the following notation: $I.S where I is the index of a register, and S is the register set label eg, • $1.l means register 1 from local register set • $0.p means register 0 from parameter register set • $2.a means register 2 from argument register set A special register is accessed using its label eg, • void means an empty register • zero means a signed 0 • uzero means an unsigned 0 General purpose registers There are three general-purpose register sets: • local with 64 registers numbered from 0 to 63 • parameter with between 0 and 64 registers • argument with between 0 and 64 registers All three register sets are allocated per-frame. The local register set is always local to a frame. In contrast, parame‐ ter and argument register sets "move" during a call. The caller allo‐ cates the argument register set using the frame instruction eg, frame $4.a The example above allocates a frame with 4 registers in the argument register set. The program then sets the registers as appropriate, and issues a call instruction eg, call void, foo In the example above, the call instruction uses the allocated frame for a call to the foo function. Inside foo, the callee, the argument regis‐ ter set allocated by the caller is visible as the parameter register set. Why? Arguments are the actual parameters ie, what the caller actually gives to the callee. Parameters are the formal parameters ie, what the callee expects to receive from the caller. The difference in how the register set is viewed by the caller and the callee signifies this change in meaning. FUNDAMENTAL TYPES Registers in Viua are aware of the type of the value they hold, and may only hold values of the fundamental types. Each of the fundamental types is described below. Some instructions may require the use of a specific fundamental type for an operand, but apply further restrictions on the exactl values consumed or produced; eg, a saturating addition of 8-bit wide signed integers expects to receive a signed integer as its input and will pro‐ duce the same type as its output, but will never produce a value out‐ side of the [-128, 127] range. ┌─────────┬──────────────────┬────────────────────────────────────────┐ │ symbol │ type name │ short type description │ ├─────────┼──────────────────┼────────────────────────────────────────┤ │ │ void │ represents "nothing" │ ├─────────┼──────────────────┼────────────────────────────────────────┤ │ │ undefined │ represents raw, uninterpreted values │ │ │ │ loaded from memory │ ├─────────┼──────────────────┼────────────────────────────────────────┤ │ atom │ atom │ represents values that represent them‐ │ │ │ │ selves │ ├─────────┼──────────────────┼────────────────────────────────────────┤ │ double │ double │ the C double │ ├─────────┼──────────────────┼────────────────────────────────────────┤ │ float │ float │ the C float │ ├─────────┼──────────────────┼────────────────────────────────────────┤ │ int │ signed integer │ signed 64-bit integer, in twos-comple‐ │ │ │ │ ment format │ ├─────────┼──────────────────┼────────────────────────────────────────┤ │ uint │ unsigned integer │ unsigned 64-bit integer │ ├─────────┼──────────────────┼────────────────────────────────────────┤ │ pid │ process id │ represents actor (process) identifiers │ ├─────────┼──────────────────┼────────────────────────────────────────┤ │ pointer │ pointer │ represents memory addresses │ └─────────┴──────────────────┴────────────────────────────────────────┘ Weird types There are two "weird" types, which are not immediately useful: void and undefined. void The void type represents the concept of "nothing". It has no bit representation, and exists only as a flag marking a register as containing nothing. undefined The undefined type represents values loaded from memory that have not yet been interpreted. Attempts to interpret such values by any instruction other than cast will abort execution of the offending process immediately. However, values of the undefined type may be safely handled as long as they are not interpreted. The following instructions may be used to shuttle undefined values around: • copymoveswapreturn Since they do not try to use the values in any way, they do not have to interpret them and do not run afoul of the "undefined can‐ not be interpreted" rule. INSTRUCTION FORMATS AND ENCODING The general instruction format looks like this: 63 16│15 0 ┌─────────────────────────────┼────────┐ │ workload │ opcode │ └─────────────────────────────┴────────┘ Every instruction is encoded on 64 bits, with the opcode always occupy‐ ing the lowest half-word. If an instruction stores a value in a register the output register is always encoded in the third least-significant byte, immediately adja‐ cent to the opcode. N-format An instruction with no (N) operands. 63 16│15 0 ┌─────────────────────────────┼────────┐ │ │ opcode │ └─────────────────────────────┴────────┘ Bits 63-16 are reserved, and have unspecified value. T-format An instruction with a three-way (triple - T) register access. T-format instructions are used, for example, for arithmetic and logic opera‐ tions, which have two inputs and a single output. 63 40│39 32│31 24│23 16│15 0 ┌───────────┼─────┼─────┼─────┼────────┐ │ │ snd │ src │ dst │ opcode │ └───────────┴─────┴─────┴─────┴────────┘ Bits 63-40 are reserved, and have unspecified value. When the order is relevant, src is the left-hand side of the operation, and snd is the right-hand side of the operation. D-format An instruction with a two-way (double - D) register access. 63 32│31 24│23 16│15 0 ┌─────────────────┼─────┼─────┼────────┐ │ │ src │ dst │ opcode │ └─────────────────┴─────┴─────┴────────┘ Bits 63-32 are reserved and have unspecified value. S-format An instruction with a one-way (single - S) register access. 63 32│31 24│23 8 ┌───────────────────────┼─────┼────────┐ │ │ rgr │ opcode │ └───────────────────────┴─────┴────────┘ Bits 63-24 are reserved and have unspecified value. M-FORMAT An instruction with a two-way register access, and a 32-bit memory off‐ set. M, because the format is used for memory loads and stores. 63 32│31 24│23 16│15 0 ┌─────────────────┼─────┼─────┼────────┐ │ offset │ src │ dst │ opcode │ └─────────────────┴─────┴─────┴────────┘ For loads and stores, dst is the register into which the data will be put or from which it will be taken, and src is the register that holds the pointer to the memory location. Every memory instruction has a unit embedded in its opcode: ┌──────┬────────┬─────────────────────────────────────────────────────┐ │ unit │ sizeof │ meaning │ ├──────┼────────┼─────────────────────────────────────────────────────┤ │ b │ 1 │ byte │ ├──────┼────────┼─────────────────────────────────────────────────────┤ │ h │ 2 │ half-word │ ├──────┼────────┼─────────────────────────────────────────────────────┤ │ w │ 4 │ word │ ├──────┼────────┼─────────────────────────────────────────────────────┤ │ d │ 8 │ double-word │ ├──────┼────────┼─────────────────────────────────────────────────────┤ │ q │ 16 │ quad-word │ ├──────┼────────┼─────────────────────────────────────────────────────┤ │ o │ 32 │ octa-word │ ├──────┼────────┼─────────────────────────────────────────────────────┤ │ x │ 64 │ hexadecimal-word │ ├──────┼────────┼─────────────────────────────────────────────────────┤ │ u │ 128 │ duotrigesimal-word │ └──────┴────────┴─────────────────────────────────────────────────────┘ This unit appears as a part of an instruction's name. For example, the lm (load memory, not actually usable as assembler will reject the base unit-less form) instruction may be written as lb (load byte), or as lq (load quad-word). Struct layout considerations Let us examine a sample of code which would load a quad-word from ad‐ dress 0xdeadbeef into register $2.l: li $1.l, 0xdeadbeef lq $2.l, $1.l, 0 The value of the off operand is 0, meaning that there is no additional offset to add to the base pointer from src. This is the simplest form of the lq (Load Quad-word) instruction, and is equivalent to loading the first element of an array at an address. However, what if we wanted to load the, for example, fourth double-word at an address? Something like to the following C code: uint64_t arr[]; uint64_t ele = arr[3]; /* what do we do here? */ The translation is simple: ; get the base pointer ie, the address of the array li $1.l, arr ; load the double-word at offset (unit * 3) from ; the base address ld $2.l, $1.l, 3 This design makes sequence access easy, but also nudges compilers to lay structures out in memory in a certain way for greatest flexibility and efficiency. Given the following C struct: struct example { uint64_t dw; uint16_t hw; uint8_t b1; uint32_t w; uint8_t b2; }; let us examine the code accessing the struct's fields if the compiler could not reorder fields and had to lay them out in memory in the order specified by the programmer (assuming no padding is added for align‐ ment): ; get the address of the struct li $1.l, ... ; field: dw lq void, $1.l, 0 ; field: hw lh void, $1.l, 4 ; field: b1 lb void, $1.l, 6 ; field: w1 ; The base pointer has to be adjusted because ; there is no way to make the offset work here. addi $2.l, $1.l, 9u lw void, $2.l, 0 ; field: b2 lb void, $1.l, 13 Most cases are simple, but there are two weird ones: loading b2 and w1 looks "bad" with the high offset or base address manipulation. To fix the situation the compiler could either insert padding, or re‐ order the fields in such a way: struct example { uint8_t b1; uint8_t b2; uint16_t hw; uint32_t w; uint64_t dw; }; Then, the access code is greatly simplified: ; get the address of the struct li $1.l, ... ; field: b1 lb void, $1.l, 0 ; field: b2 lb void, $1.l, 1 ; field: hw lh void, $1.l, 1 ; field: w1 lw void, $2.l, 1 ; field: dw ld void, $1.l, 1 In general, it is a good idea to put the smallest fields first, and align the first field with an increased size (ie, align half-words on 2 bytes, words on 4 bytes, etc). This will make field accesses easy write and read. The offset being multiplied by unit also means that putting the small‐ est fields first allows manipulating bigger structs than putting biggest fields first would; because while the maximum reach of the lb instruction is a respectable 4'294'967'296 bytes (4GiB), the reach of the lq instruction is an even more impressive 34'359'738'368 bytes (32GiB), and the reach of the lu (load duotrigesimal-word) is a stun‐ ning 549'755'813'888 bytes (512GiB)—all with a single instruction! Ordering the fields from the smallest lets a single instruction access any field in structs of up to 512GiB without any extra processing. I-format An instruction with one-way register access, and a 32-bit wide immedi‐ ate (hence I) value. 63 32│31 24│23 16│15 0 ┌─────────────────┼─────┼─────┼────────┐ │ immediate │ │ dst │ opcode │ └─────────────────┴─────┴─────┴────────┘ Bits 63-32 are reserved and have unspecified value. One of the most useful I-format instructions is lui ie, the Load Upper Intermediate, which may be familiar to people who have some RISC-V background. It loads an immediate value into the higher 32 bits of a register: lui $1.l, 0xdeadbeef U-format An instruction with a two-way register access, and a 32-bit immediate value. It is a useful (hence U) format. 63 32│31 24│23 16│15 0 ┌─────────────────┼─────┼─────┼────────┐ │ immediate │ src │ dst │ opcode │ └─────────────────┴─────┴─────┴────────┘ This format is used for eg, the addi instruction, which adds a 32-bit immediate to the value stored in the src register, and places the re‐ sult in the dst register. Combined with the I-format lui instruction, it can be used to efficiently load a 64-bit value: ; load the upper half of the value lui $1.l, 0xdeadbeef ; add the lower half of the value addi $1.l, $1.l, 0xbadc0ffe OPCODE ENCODING Opcodes are encoded on the lowest 16 bits of an instruction. 15 13│12 10│ 9 │8 0 ┌───────┼───────┼───┼──────────────────┐ │ fmt │ flg │ u │ operation │ └───────┴───────┴───┴──────────────────┘ Bits 15-13 encode the instruction format. The formats are described in section INSTRUCTION FORMATS AND ENCODING. Bits 12-10 encode the flags, which are mostly relevant for memory and arithmetic instructions. Bit 9 encodes the unsigned flag, which is relevant for some arithmetic instructions, and for the luiu (load upper intermediate unsigned) in‐ struction. Bits 8-0 encode the actual operation code. Instruction dispatch depends on bits 15-13 and 9-0. This means there are 8192 possible instructions. LISTING OF INSTRUCTIONS The following instructions are available: ┌──────────────────┬─────┬────────────────────────────────────────────┐ │ │ fmt │ meaning │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ noop │ N │ no operation │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ halt │ N │ halt the process │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ ebreak │ N │ break execution, shows environment dump by │ │ │ │ default │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ ecall │ N │ issue an environment call (a system call) │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ add │ T │ addition │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ addi │ U │ add immediate │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ addiu │ U │ add (unsigned) immediate │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ sub │ T │ subtraction │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ subi │ U │ subtract immediate │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ subiu │ U │ subtract (unsigned) immediate │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ mul │ T │ multiplication │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ muli │ U │ multiply by immediate │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ muliu │ U │ multiply by (unsigned) immediate │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ div │ T │ division │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ divi │ U │ divide by immediate │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ diviu │ U │ divide by (unsigned) immediate │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ earithmeticwidth │ D │ limit bit width of some operations │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ bitshl │ T │ left bit shift │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ bitshr │ T │ right bit shift │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ bitashr │ T │ arithmetic (sign-preserving) right bit │ │ │ │ shift │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ bitrol │ T │ left bit roll │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ bitror │ T │ right bit roll │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ bitand │ T │ bitwise "and" │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ bitor │ T │ bitwise "or" │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ bitxor │ T │ bitwise "xor" │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ bitnot │ D │ bitwise "not" │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ eq │ T │ lhs = rhs │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ lt │ T │ lhs < rhs │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ gt │ T │ lhs > rhs │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ cmp │ T │ compare two values, returning -1 for less- │ │ │ │ than, 0 for equal-to, and 1 for greater- │ │ │ │ than relation │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ and │ T │ logical "and" │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ or │ T │ logical "or" │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ not │ D │ logical "not" │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ io_submit │ T │ submit an I/O request to an I/O port │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ io_wait │ T │ wait for a completion of an I/O request │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ io_shutdown │ T │ shut an I/O port down │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ io_ctl │ T │ inspect and manipulate an I/O port │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ io_peek │ D │ inspect an in–flight I/O request │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ frame │ S │ allocate a frame for a call │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ call │ D │ call a function │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ actor │ D │ spawn a new actor (a concurrent function │ │ │ │ call) │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ return │ S │ return from a function call │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ copy │ D │ copy bits between two registers │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ move │ D │ move bits between two registers, erasing │ │ │ │ the source │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ swap │ D │ swap bits between two registers │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ if │ D │ take a conditional branch │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ gts │ D │ store value in the Global Table │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ gtl │ D │ load value from the Global Table │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ atom │ S │ load an atom │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ double │ S │ load a double │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ float │ I │ load a float │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ self │ S │ load the PID of the current actor │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ lui │ I │ load upper intermediate │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ luiu │ I │ load upper (unsigned) intermediate │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ arodp │ I │ create a pointer to a .rodata symbol │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ atxtp │ I │ create a pointer to a .text symbol │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ sm │ M │ store in memory │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ lm │ M │ load from memory │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ cast │ I │ cast a raw value loaded from memory │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ aa │ M │ allocate automatic memory │ ├──────────────────┼─────┼────────────────────────────────────────────┤ │ ad │ M │ allocate dynamic memory │ └──────────────────┴─────┴────────────────────────────────────────────┘ Detailed descriptions are provided in viua-ops(7). ENVIRONMENT Some of the machine's behaviour is affected by its internal environment eg, the bit width of some arithmetic operations (see the earith‐ meticwidth instruction). The following table lists the environment reg‐ isters: ┌──────────┬────────┬─────────────────────────────────────────────────┐ │ register │ values │ meaning │ ├──────────┼────────┼─────────────────────────────────────────────────┤ │ earw │ 1-64 │ bit width of styled arithmetic operations │ └──────────┴────────┴─────────────────────────────────────────────────┘ MEMORY Memory provides additional storage and work space to a process, in ad‐ dition to the registers. While registers are fixed-size and can only hold a limited amount of data, objects stored in memory can be nearly unlimited in size and complexity. The .rodata section Objects in the .rodata section are available to all functions. The .data and the .bss sections The .data and the .bss sections are not used by the VM, and the concept of globally accessible read-write segment of memory does not exist. Layout Executable code ie, the .text section, and the read-only global data ie, the .rodata section are allocated in the regions of memory with low addresses. The programs are not able to manipulate memory in those re‐ gions. Memory that programs are able to manipulate begins near the highest ad‐ dress and grows downwards, as the program allocates more memory. The first read-write byte a process can access is always allocated at the address 0xbfff'ffff'ffff'fff0. Each process receives a page of memory when it starts. The size of a page is 256 bytes, divided into lines of 16 bytes. NOTE: The 256-byte pages are just a temporary measure during the alpha stage of development, to make it easier to inspect the ma‐ chine's internal state in tests. This is not the intended final page size. The stack Memory on the stack is allocated using the aa (allocate automatic) in‐ struction. The virtual machine manages stack memory automatically: the programmer is expected to request memory to be allocated, but the deallocation is handled by the machine. Each allocation is assigned to a call frame, and the machine drops al‐ located chunks as soon as the frame they were allocated in is popped off the call stack. If a function needs memory to return a result, the memory MUST be allocated by the caller. Usable stack memory lies between the addresses stored in the fp and sbrk registers: ┌────────────────────┬────────────────────────────────────────────────┐ │ fp (frame pointer) │ Points at the first byte directly accessible │ │ │ from within the frame. │ ├────────────────────┼────────────────────────────────────────────────┤ │ sbrk (stack break) │ Points at the byte one past the last byte di‐ │ │ │ rectly accessible from within the frame. │ └────────────────────┴────────────────────────────────────────────────┘ Since the stack grows downwards, the fp is also the highest address, and the sbrk-plus-one the lowest address, directly accessible from within the frame. To get the amount of memory allocated to a process, simply subtract sbrk from fp. Objects located in the .rodata section are treated differently and are available from anywhere; however, since they exist outside of the stack, this does not break the rule of "no direct access to the stack beyond the frame's boundaries from within the frame". What does direct access mean? It means that no stack address above the fp or below-or-at the sbrk is accessible, unless the frame received a pointer to it from the caller. The heap Memory on the heap is allocated using the ad (allocate dynamic) in‐ struction. Heap memory is not implemented at the current moment. SEE ALSO viua-asm(5), viua-ops(7). elf(5). Patterson, David A. and Waterman, Andrew. The RISC-V Reader. Strawberry Canyon LLC, 2017. ISBN 978-09-9924-911-6 Web site ‹https://viuavm.org› Source code repository ‹https://git.sr.ht/~maelkum/viuavmVIUA VM Part of the viua(1) toolchain.