Model and description of Viua VM

In theory the implementation should follow a model, but Viua is in such an early stage of development that no model exists yet for it. Below document is an attempt at creating an abstract model of Viua VM.

This document is meant to explain and describe the general concepts and ideas behind Viua. For example: what is a process, a call frame, a stack, and how are they related. Apart from being a glossary of a kind, it also aims to explain how various mechanisms are intended to work in Viua, e.g. exception handling (throw-catch), function calls, and concurrency.

Descriptions of individual instructions are not included in this document, and are provided on the ISA description page.


Registers and register sets

Viua is a register-based VM. Programs running on it manipulate values held in registers instead of on a stack.

Summary

On the lowest level, Viua VM programs are sequences of instructions. Instruction fetch, manipulate (modify, delete, move, copy, etc.), and produce values. Values are stored in registers. This section describes what is a register, a register set, and how values can be stored in and fetched from registers.

Register sets

A "register set" is array of registers, with limited capacity. Size of register sets is determined statically, at compile time. There are three main register sets:

Values held in registers from these register sets can be manipulated directly by instructions, and can be accessed by copy, move, swap, and delete instructions.

There are also a few "special" register sets (that are not really register sets). Values in their registers cannot be manipulated directly, and must be first brought into one of the three main register sets. These sets are:

All register sets are process-local, meaning that no register set is shared between processes.

Local register set

Lifetime

This register set's lifetime is bound to the call frame for which it has been spawned, or to the closure for which is has been created. At any point in time there may be many local register sets spawned.

In the first case, the lifetime of the register set (and its registers and their contents) can be statically determined by analysing when the call frame will be popped off the stack.

In the second case, the liftetime is more dynamic since the closure can be returned as a value from a function and thus outlive its original environment. Determining the lifetime of local register set of a closure requires analysing lifetime of the closure, which almost always can be done.

Creation

Local register sets are spawned either by frame, or by closure instruction. Capacity of register set spawned by frame instruction is stated explicitly. Register set spawned using closure instruction inherits capacity from its enclosing environment.

-- spawn a frame whose local register set will contain 20 registers
-- the frame will accept no parameters
--
--  frame {no-of-parameters} {no-of-local-registers}
frame %0 %20

-- closure stored in register 1, using body of function foo/0
--
--  closure {index-of-register-where-closure-is-stored} {function-implementing-body-of-the-closure}
closure %1 foo/0
                    

Destruction

Local register sets spawned using frame destroyed when the frame they are associated with is popped off the call stack by return or tailcall instruction, or during stack unwinding when an exception is thrown.

Closure-local register sets are destroyed when their closure is destroyed. During lifetime of the closure-local register set it may be pushed to the stack, and popped off it many times without being destroyed. Closures obey the lifetime rules of Viua values, which are described in another section.

Access

The user program has access to the local register set of only the top-most frame on the stack. It is not possible to access registers in local register set of any frame lower on the stack. Also, contents of local register sets of lower frames do not have any effect on contents of local register sets of the upper frames. Local register sets are isolated from each other, and disposable - they are created anew for every call (closure-local register sets being the exception).

Capacity

Capacity of each local register set may be different and is determined by the user program. Capacity of local register sets is limited to 4'294'967'296 (2^32) registers.

Miscellaneous notes

Tail calls do not inherit local register sets of their original frames. They start with a fresh register set.

Static register set

Lifetime

Static register sets live as long as the process inside which they have been spawned.

Creation

Static register sets are created before their first use. This means that they may be spawned eagerly when a process is created for every function that uses a static register set, or lazily - when the runtime detects a function is about to access its static register set. It does not make a difference to the function when the static register set is allocated as long as it can use it.

Destruction

Static register sets are destroyed after the last frame of a process is popped off the stack, i.e. when the process is no longer able to run any more instructions.

Access

Static register sets are assigned per-function and are local to a single process. A function foo/0 does not have access to static registers of function bar/0. If function foo/0 inside process A puts 42 in first static register, the same function foo/0 inside process B will not see that value inside its first static register. The user program also has access only to the static register set of the function function that is currently being executed by the top-most frame on the stack.

Capacity

Capacity of static register sets is currently fixed at 16 registers.

Miscellaneous notes

User functions should always check if their static registers are empty before using them. It is a function's responsibility to initialise its own static registers - static register set will be always provided when the function requests it but will be completely empty during the first access. Checking for empty registers can be done using the isnull instruction.

Global register set

Lifetime

Global register set's lifetime is bound to a process for which it has been spawned.

Access

The user program has access to the global register set at all times, and from any call frame on the stack. There are no restrictions similar to those of static or local register sets, except that global register set is spawned per-process and isolated between processes.

Capacity

Capacity of global register sets is currently fixed at 255 registers.

Miscellaneous notes

There are no special notes related to the global register set.

Registers

Registers are "slots" that are used to hold values Viua VM instructions operate on. A register can hold any value representable by Viua program, or be empty.

Register indexes

Registers are indexed slots in a register set. Register indexes start from 0, and go to register set's capacity minus 1. For example, a register set with capacity 16 has registers numbered from 0 to 15.

Register addressing

When an instruction wants to fetch a value held in a register, or to put a value in a register, it must properly address the register it wants to access. If the register address supplied by one of instruction's operands is not valid the VM throws an exception. To be valid the address must consist of three parts:

Register set specifiers

Register address must include the register set which should be used.

Local register set

Identified by the local. This register set is resolved at compile time.

Static register set

Identified by the local. This register set is resolved at compile time.

Global register set

Identified by the local. This register set is resolved at compile time.

"Current" register set

Identified by the current . The register set to use is determined at runtime, based on what the "current" register set means at the exact moment in the program. Depending on the state in which the program is "current" may mean that local, static, or global register set may be used.

Fetch modes

Fetch mode instructs the VM how it should fetch the value an instruction requests. There are three fetch modes: plain, pointer-dereference, and register-indirect. In source code they are identified by sigils.

"Plain" fetch mode

Identified by percent sign - "%".

The simplest fetch mode. It involves just fetching a value from a register at given index from a given register set. For example:

                    -- print contents of register 1 from local register set
                    print %1 local

                    -- copy contents of register 4 from static register set
                    -- into register 2 from local register set
                    copy %2 local %4 static

                    -- store text in register 1 from global register set
                    text %1 global "Hello World!"
                

Box analogy

If registers were boxes and values were balls the "plain" fetch mode would mean just taking the ball from a box.

"Pointer dereference" fetch mode

Identified by star sign - "*".

This mode is composed of two phases. The first one involves fetching a value of a register at specified index from a specified register set. In the second phase the VM dereferences the pointer.
The value obtained after dereferencing is the one supplied to the instruction.

Value fetched by the first phase of this mode MUST be a pointer. Otherwise the VM throws an exception. An exception is also thrown if the pointer is expired.

An example:

                    -- store text in register 1 from local register set
                    text %1 local "Hello World!"

                    -- store pointer to a value in register 1 from local register set
                    -- in register 2 from local register set
                    ptr %2 local %1 local

                    -- print the pointer
                    -- "TextPointer" will be printed to standard output
                    print %2 local

                    -- print the value pointed-to by the pointer
                    -- "Hello World!" will be printed to standard output
                    print *2 local
                

Pointers to values

It important to note, that Viua pointers point to values. The code below works even though the text value was moved between taking the pointer to it and dereferencing the pointer.

                        text %1 local "Hello World!"
                        ptr %2 local %1 local

                        -- move the value from register 1 from local register set
                        -- to register 4 from local register set
                        move %4 local %1 local

                        -- this still works and prints "Hello World!"
                        print *2 local
                    

Box analogy

If registers were boxes and values were balls the "plain" fetch mode would mean putting your hand in a box and instead of getting a ball you got a piece of string. You can pull the string to get the ball that is attached to the other end, no matter where the ball currenlty is. Be wary, though, as there is no guarantee that there actually will be a ball attached to the other end of the string (in which case you get an exception)!

"Register indirect" fetch mode

Identified by "at" sign - "@".

This mode is composed of two phases. The first one involves fetching a value of a register at specified index from a specified register set. In the second phase the VM fetches a value from the register index specified by the integer fetched in the first phase.
The second phase fetches from the same register set as the first one.
The value obtained after the second fetch is the one supplied to the instruction.

Value fetched by the first phase of this mode MUST be an integer. Otherwise the VM throws an exception. An exception is also thrown if the register that would be accessed in the second phase does not exist (i.e. the index is out of bounds), or is empty (i.e. there is no value to be fetched; this is not true for the isnull instruction).

An example:

                    -- store text in register 1 from local register set
                    text %1 local "Hello World!"

                    -- store integer 1 in register 2 from local register set
                    istore %2 local 1

                    -- print the value using register-indirect fetch mode
                    -- "Hello World!" will be printed to standard output
                    print @2 local
                

Box analogy

If registers were boxes and values were balls the "plain" fetch mode would mean putting your hand in a box and instead of getting a ball you got a piece of paper with a number written on it, and fetching the ball from a box with the number you read from the piece of paper.

Common exceptions

Using empty registers as source operands will result in an exception being thrown by the VM. The isnull instruction may be used to check if a register is empty.

Accessing "out of range" registers either as destination or source operands will result in an exception being thrown by the VM. There is no instruction that can be used to check if a register index is "in range".

Using incorrect fetch mode for either destination or source operands will result in an exception being thrown by the VM. Sometimes several fetch modes are correct from the VM point of view. It is programmer's responsibility to ensure the right fetch mode from their point of view is used. For example, for pointers either "plain" or "pointer dereference" fetch mode can be used, but the value supplied to the instruction will be different.

Values

Values are instances of types supported by Viua VM.

Values as entities

From the user point of view values are "whole pieces" - VM does not provide instructions to access, for example, a third byte of a floating point value. If a value is an instance of a compound type (e.g. a vector) the VM may expose instructions to access individual elements of the complex value, but each accessed element will be a Viua value. There is no way a program may learn about internal structure of a value short of invoking a FFI function and then casting the value into a unsigned char* to access the bytes that from the value.

Also, there is no concept of "memory" in the traditional sense. Viua VM programs do not request "eight bytes for an integer", they just request an integer. The concept of "memory" is abstracted away in Viua VM. Programs cannot put values in "memory", values can only exist in registers.

Value semantics

Viua programs see values as "values" not as "references to values", and this rule is applied consistently. It is probably most visible during function calls - when vectors are passed by value and do not decay into pointers (a la C arrays) or are passed by reference (a la Python lists).

Moves

To avoid frequent copying which would be implied by the no-references rule, many Viua instructions and mechanisms use move semantics instead. For example: function returns, and exception throwing are always done by moving values.

The thinking behind this is that if you want a copy you can create it yourself before the move happens, but it is not easy the other way around, that is: if you wanted a move but the language by default gives you a copy, what can you do about it?

Moves in Viua are real moves instead of a "steal-my-guts" moves. Moved value really is moved from one place to another, e.g. from one register to another, from a register into a slot in a vector, etc.

Inter-process communication and message passing

Some places when moves are not used is message passing and other forms of inter-process communication.

Although, in theory, values could be moved between processes without violating the no-sharing-between-processes rule when sent as messages they are copied. This is done this way to provide consistency as there are no differneces between sending a value to a process running on the same VM instance (i.e. in the same underlying address space) or over a network to a process running in a different VM instance (and different address space).

The same thinking is applied when values are transfered between processes as exceptions or return values during process joins. Even though a value is normally moved in such cases it is copied when return or catch involves crossing process boundaries.

Pointers

Pointers in Viua point to values, not to locations. For example, a pointer to a vector can be taken, and when this vector is moved (e.g. to another register, or moved as a parameter to a nested call) the pointer is still valid. This is in contrast to pointers known from C or C++.

Safety and expored pointers

Pointers in Viua may be considered safe by virtue of being aware of the fact whether the value they point to exists or not. While it is not possible to create a null pointer, it is possible that the value to which the pointer has been taken was destroyed. In such an event the pointer becomes "expired" and dereferencing it produces an exception (which can be handled).

Invalidation

Pointers become invalid when either the value they point to is destroyed (then they are said to have expired), or when they are sent as a message to another process as pointers are only valid inside the process in which they were created. When a pointer crosses process boundaries any dereference of it will produce an exception.

Viua does not prevent the pointers from being sent as messages, but enforces the isolation of process by making dereferences of pointers taken in process A illegal in process B. If such dereferences were allowed then processes could be made to share values which is not permitted in Viua.

Nesting

It is possible to create pointers to pointers in Viua. Dereferencing must be done one nest-level at a time. This means that if a value is hidden behind a three-level pointer, three Viua instructions will be needed to get it.

However, it is impossible to make a pointer immediately pointing to itself, because pointers are taken to existsing values and the pointer does not exist before its creation. Also, pointers cannot be rebound so it is impossible to take a pointer and then rebind it to point to itself.

Data types

Description of basic (primitive and complex) data types available in Viua, and of the way user-defined types work in Viua.

Numeric data types

All numeric data types support basic arithmetic operations, and can be compared with each other.

Integers

Integers have unlimited size, and are always signed. They are intended for arithmetic (basic math) operations.

Floats

Typical, 64 bit, floating point numbers. Fast enough but come with their set of imperfections.

Decimals

Slower but more accurate than floating point numbers. No rounding issues known from floats, and with unlimited precision.

Booleans

Represent true, or false values. Can be obtained by comparison (e.g. eq), logic (e.g. and), and some rare other instructions (e.g. isnull).

Every Viua value is implicitly castable to Boolean type. By default a value is casted to false. If a value should be casted to true, the reasons why are explained in the documentation for said data type.

Text

Viua provides a basic text data type. Text is visible to users as a sequence of characters (each character being a Unicode codepoint). Internally, text is encoded as UTF-8.

Common simple operations are provided as instructions: equality comparison, substring extraction, obtaining the character at an index, extracting common suffixes and prefixes, and concatenation.

Byte strings and bit strings

Byte string is a string whose element is an 8-bit unsigned integer. Bit string is a string whose element is single bit.

Byte strings

Byte strings are useful for storing a variable amount of bytes. They are the typical unstructured bag-o'-bytes.

Byte strings may be freely converted to bit strings.

Bit strings

Bit strings are useful for storing bitmasks, and fixed-size signed or unsigned integers.

The VM provides instructions to perform (bitwise) or, and, not, xor operations.

Shifts and rotates are provided in "naive" and arithetic variant: bitshl, bitshr, bitrol, bitror, bitashl, bitashr, bitarol, bitaror. Shifts modify bit strings in-place, and the shifted-out parts is produced as the output of the instruction. Rotates modify bit strings in place.

Access to individual bits and substrings is provided by bitat, and bitsbetween instructions.

Conversion to Integer type

Bit strings whose length is a multiple of 8 may be freely converted to Integers. When the conversion is performed, it is assumed that the bit string represents a big-endian integer.

Integers may be converted to bit strings without additional constraints. Big endian encoding is used when converting integers to bit strings.

PIDs

PIDs represent process identifiers. PIDs are used to send messages to processes.

Callables

Function

Closure

Structures

Vectors

Arrays

Atoms

Boxes

Pointers

Processes and stacks

Process

Stack

Errors

Isolation and communication

Asynchrony, concurrency, parallelism


Technical details

Scheduling

Virtual processes

FFI

I/O

FFI

Viua to foreign functions

Calling foreign functions from Viua code.

Foreign functions to Viua

Calling Viua code from foreign functions.

Foreign functions calling Viua code should be split into two parts. After calling Viua code, foreign function should return to free FFI scheduler. Otherwise, some kind of suspension mechanism must be devised to allow suspending foreign functions mid-call while allowing Viua code called from them to run.