SECD virtual machine for pure

SECD virtual machine for pure_LISP

SECD: [ index ] [ collection ] [ hard_SECK reference ]
LISP: [ reference ] [ tutorial ] [ implementation scheme ]

This document describes an modified SECD virtual machine, used for implementation of the pure_LISP language.
This modified version of the Landin's original SECD machine may be named hard_SECK machine.
The main differences with the original:

regiter D is removed. S is used to hold E and C during calls.
codes and parameters are packed into one 32-bit integer.
original LD (i.j) sequence is replaced with #ld_i_j integer (see below info about big integer mnemonicks)
there is additional set of registers for user/kernel process swaping.
SECD codes are optimised to do as less as possible conses.
This cause changes in eval-apply circle - the arguments of the user functions are not cons-ed during the eval part.
The whole S register is used as local variables list in apply-like commands.
Each call-like code has tail-recursive twin code.

This document is not yet finished, so you may use the next additional sources of info for my SECD machine:

file History.txt describing the evolution of my SECD architecture.
pascal sources in subdirectory meta, especially file sec_codes.pp, defining the machine.

1. Data types, registers, syntax of the command codes.

Data types:

Terminal data types:

nil - represents empty list
integer - represents 32-bit integer value (the codes of SECD machine are also represented by integers).
atom - used for variable/function names. Atom names are used also as short strings.

Composite data types:

node - used to represent pair (a.b) where a and b are some data structures.
List (x1 x2 ... xn) is represented as (x1.(x2.(...(xn.nil)...)).
non-functional pure_LISP objects - used for I/O, interprocess communications and lazy evaluations.
They are variants of the node (subtypes of the node type).
These objects are created and handled by the strict_LISP program kernel.sys or by some control codes of the hard_SECK machine.
All non-regular nodes are used to save sometimes some or group of pending processes.
The full algorithm for multitasking, timesharing and message passing is represented by strict_LISP function (kernel i r0 r1 io) from the file kernel.sys.
Each node has a subtag field, representing the subtype of the node.
I use the following notation for nodes:
- (a.b) for nodes with subtag 0 (normal nodes)
- [a.b] for nodes with subtag 1 (recipes)
- [k|a.b] for nodes with subtag k.
Possible subtags:
- 0 - normal node / pipe without message handler
  If such node is treated as a pipe, it has the form (q0.q1) where q0 and q1 represents the queuee of sleepeng processes, waiting to send message to unknown message handler.
- 1 - recipe (thunk). There is 3 forms of the recipes:
  - [c.e] - delayed expression (waiting recipe).
    c is non-empty list - the command code for the expression.
    e is the local scope in witch the expression must be evaluated.
  - [p.q] - expression in process of evaluation (working recipe).
    p is an integer, representing primitive code that must be executed after the evaluation.
    q is a queuee of processes, waiting for the result of the evaluation.
  - [nil.x] - evaluated expression (ready recipe).
    x is the result of evaluation if it is a terminal value.
    If the result is a node, the recipe is replaced with this node.
    In the last case the subtag of the recipe is changed to 0 and it becomes a normal node.
- 2 - pipe with message handler
  It has the form [2|s.(e.c)] where s, e and c represents the message handler, waiting to receive a message.
- 3 - IO stream
- 4 - TCP socket, waiting to be connected (connwait)

Registers:

hard_SECK has many registers, but 3 of them (S, E and C) may be treated as main registers:

S is the stack.
E is the environment.
C is the code (the executed program).
S1, E1 and C1 is a swap set for S, E and C.
When the machine works in kernel mode S1, E1 and C1 holds the last user process state.
In this case they are accessible as pseudo-vars p_S, p_E and p_C in program kernel.sys.
When kernel starts some user process, the kernel state is saved into S1, E1 and C1, while the user process is placed into S, E and C.
Sk, Ek and Ck is a set of registers holding the kernel state at the begining of the user interrupt.
It is used if some runtime error hapens while the kernel proceed the interrupt.
In this case the kernel state is restored from Sk, Ek and Ck and runtime error handling is started.
U holds the command code of the last user command or the last interrupt number. It is used in error handling.

hard_SECK include the following additional variables:

boolean userMode. While the kernel is working, userMode is false and if some user process is invoked, userMode becomes true.
counter CPCnt (Current Process Counter). It is incremented at any user process step. When this counter reaches the constant ShdLimit (511 in current version), the user process is interrupted and control is passed to the kernel. After such interrupt kernel will shedule the process with anoder user process.

Syntax of the SECD codes and commands:

SECD codes are packed togeder with 1 or 2 parameters into 32-bit integers.
If the SECD code is i and parameter is p, then the appropriate SECD command is represented by integer i*2^24+p.
In my SECD machine there is special mnemonicks for the positive integers, bigger than 2^24.
Such integers are represented by strings with syntax #cod_par, where cod is the SECD code mnemonick, par is the parameter value.
Examples:

#app_5 is integer 36*2^24+5 and means 'Apply the 5-argument function'.
#car is integer 48*2^24 and means 'Get the left element of the node'.
#ld_2_3 is 2-parameter command. It is integer 2^24+2*256+3 and means 'Load the 3-th element from 2-th level of the environment E'.
#trrec_cdr is the most complex example. It is the integer 41*2^24+49 and means 'Recover the expression and then get the right element of the result'

Note that this is only other syntax for integers, you may do all arithmetick operations with it as with normal integers.
In compiler.lib source there is many function calls like (add #trrec i), constructing the proper SECD commands.
The purpose of these mnemonicks is to make SECD executables (files *.sec) more readable.

2. Main codes (single process commands).

These codes may be devided into 2 groups:

1. Control codes - they may change E and C registers or may intrrupt user process in some cases.

For proper handling of tail recursion there is the only code for the control restoration - #rtn, represented by the nil value of the C register.
There is also a pre-code CleanX, used by codes #rtn, #rec and #trrec.
Post-code StartRec(p) begins a process of recovering for wating recipe:

Non-regular control codes:

CleanX - pre-code, substitutes evaluated recipe with their value:
while top of S is an evaluated recipe [nil.x]
do ([nil.x].s) e c --> (x.s) e c ;
if top of S is a working recipe then #intr_2 ;
StartRec(p) - post-code, begins evaluation of a waiting recipe:
([c'.e'].s) e c --> ([p.nil].s) e' c'
#rtn - restores the control.
S must have the form (x R.s) where x is allready evaluated value, R is working recipe of the form [p.q] or the previous state of the environment E:
if R is recipe [p.q] then begin
CleanX;
if S is ([c'.e'] [p.q].s)
then StartRec(0)
else begin
if x is a node (l.r) then begin
replace [p.q] with (l.r);
(x (l.r).s) e nil --> ((l.r).s) e nil
end else begin
replace [p.q] with [nil.x] ;
(x [nil.x].s) e nil --> (x.s) e nil
end;
if q is node then #intr_3 else #prim_p ;
end;
end
else (x e' c'.s) e nil --> (x.s) e' c'
40 #rec_p - recovers a delayed evaluation.
CleanX;
if S is ([c.e].s) then begin
([c.e].s) e' c' --> ([c.e] e' c'.s) e' c'
StartRec(p)
end else #prim_p
41 #trrec_p - tail recursive form of #rec_p.
CleanX;
if S is ([c.e].s)
then StartRec(p)
else #prim_p

Regular control codes:

1 #ld_x_y
s e c --> ((GetNth(GetNth(e,x),y).s) e c
Loads a value from the environment E.
2 #ldc
s e (x.c) --> (x.s) e c
Loads a constant.
3 #ldf
s e (f.c) --> ((f.e).s) e c
Loads a function.
4 #delay
s e (c'.c) --> ([c'.e].s) e c
Delay evaluation of the control list c'.
5 #case_k
if i in [0..k-1] then c_i:=ci else c_i:=ck
(i.s) e (c0 ... ck.c) --> (c_i.s) e c
Starts conditional evaluation - loads the control list ci.
6 #dum
s e c --> s (nil.e) c
Creates new dummy environment for letrec block.
7 #stop
Terminates the SECK machine (allowed only in kernel mode).
8 #call
if c=nil
then (c'.s) e c --> s e c'
else (c'.s) e c --> (e c.s) e c'
Calls the control list c'.
This code is used to finish the control functions if, case and fork.
14 #pop
(x.s) e c --> s e c
Drop the top element of S.
15 #prim_p
(x.s) e c --> (p(x).s) e c
This code has 3 cases:
- p in [0..63] - starts the primitive code p
- p in [64..127] - starts Interrupt(p)
- p in [128..255] - executes TRApp(p-128)

Apply-like control codes:

These codes are twins of normal/tail-recursive forms.
Normal forms push C and E into the stack S, working like procedure call in procedure languages.
Tail-recursive forms work like goto statement.
All they have parameter n, representing the number of actual parameters for corresponding LISP function.
All they bind the list of actual parameters S to some environment (current for let/letrec, other for app), changing the register E.
#let and #trlet are used for implementation of let block.
#letrec nad #trletr implements letrec block.
#app and #trapp are used when function, defined by lambda expression is called.
Sn is S with top n elements droped (Sn:=Drop(S,n)).

32 #let_n S e (c'.c) --> (e c.Sn) (S.e) c'
33 #trlet_n S e c --> Sn (S.e) c
34 #letrec_n S e (c'.c) --> ((cdr e) c.Sn) SetCar(e,S) c'
35 #trletr_n S e c --> Sn SetCar(e,S) c
36 #app_n ((c'.e').S) e c --> (e c.Sn) (S.e') c'
37 #trapp_n ((c'.e').S) e c --> Sn (S.e') c'

2. Primitive codes - they change only the S register.

Two argument codes:

16 #cons - creates new node.
(x y.s) --> (cons(x,y).s)
17 #eq - tests if 2 arguments are equal.
(x y.s) --> (eq?(x,y).s)
22 #add - adds integers.
(x y.s) --> (y+x.s)
23 #sub - subtracts integers.
(x y.s) --> (y-x.s)
24 #mul - multiplies integers.
(x y.s) --> (y*x.s)
25 #div - divides integers.
(x y.s) --> ((y div x).s)
26 #rem - gets the remainder from division of integers.
(x y.s) --> ((y mod x).s)
27 #leq - tests integers for less or equal.
(x y.s) --> ((y <= x).s)
28 #lexleq - tests atoms for lexical less or equal.
(a b.s) --> ((b <= a).s)
30 #concat - concats two atoms in a new one.
(a b.s) --> ((b||a).s)

One argument codes:

48 #car - gets the first element of a node.
(x.s) --> (car(x).s)
49 #cdr - gets the second element of a node.
(x.s) --> (cdr(x).s)
50 #int - test if the argument is an integer.
(x.s) --> (int?(x).s)
51 #atom - test if the argument is an atom.
(x.s) --> (atom?(x).s)
52 #null - test if the argument is an empty list (nil).
(x.s) --> (null?(x).s)
53 #node - test if the argument is a node.
(x.s) --> (node?(x).s)
55 #stat - print in a log file some statistics about SECD machine.
(x.s) --> (x.s)
56 #inc - increments the integer argument.
(x.s) --> (x+1.s)
57 #dec - decrements the integer argument.
(x.s) --> (x-1.s)
58 #chr - creates an one-character atom from the ASSCI code of the argument.
(x.s) --> (chr(x).s)

3. Special codes (user calls, interrupts, runtime error handling).

These codes changes all registers. They are used to swap control between kernel and user processes.

44 #user0
(s' e' c'.s) e c --> s' e' c'
S1:=s; E1:=e; C1:=c; U:=nil;
userMode:=true
This code invokes the user process (s' e' c') and is used by 3 argument strict_LISP function (user0 S E C)
45 #user1
((s'.(e'.c')).s) e c --> s' e' c'
S1:=s; E1:=e; C1:=c; U:=nil;
userMode:=true
CPCnt:=0
This code invokes the user process (s' e' c') and is used by 1 argument strict_LISP function (user1 SEC).
Note that it sets the shedule counter to 0.
46 #intr_i
This code have different behavior in user and kernel mode.
It is started also when runtime error happens.
See inplementation in file meta/sec_codes.pp for details.
In user mode this code interrupts current user process and swap it with the kernel, saved in regiters S1, E1 and C1.
Inrerrupt number i is passed as parameter to the kernel.
47 #prev_i
s e c --> (S1/E1/C1 s) e c
Loads some of swaped registers.

Interrupts:

Interrupts 0..63 are caused by some SEC machine commands:
- #intr_0 is started by code #fork
- #intr_1 is started when current user process reaches the timesharing limit (CPCnt > ShdLimit).
- #intr_2 is started when CleanX pre-code (started from #rtn, #rec and #trrec) find working recipe at top of S.
- #intr_3 is started by #rtn when some lazy evaluation is completed, but there is a queue of processes, waiting for the result of this evaluation.
  S = (q p s) where q is a queue, p is the primitive code which must be executed after the recipe substitution.
Interrupts 64..127 are started by #intr code and are treated as soft_SEC primitives:
- #intr_64 implements the pure_LISP function (fopen fmod fn)
- #intr_65 - function (fclose f)
- #intr_66 - function (eof f)
- #intr_67 - function (connect IP port)
- #intr_68 - function (listen maxconn port)
- #intr_96 - function (send msg io)
- #intr_97 - function (recv io)
- #intr_98 - function (sendc i stream)
- #intr_99 - function (recvc stream)
- #intr_100 - function (fail n)
Interrupts 128..255 are caused by runtime errors.

4. Kernel codes (low-level i/o, ipc, synchronization).

Kernel primitive codes:

64 #openf
(m fn.s) e c --> (ps.s) e c
open file fn with mode m (0-input 1-output), result ps is an integer, representing a stream pointer to the file descriptor.
65 #putlex
(ps lex.s) e c --> (0/1.s) e c
try to send lexeme lex to stream with descriptor ps. result is 0 if the try finished, 1 if operation is not finished.
66 #putc
(ps i.s) e c --> (0/1.s) e c
same as putlex, but sends ascii code i to stream.
67 #settag
(i [x.y].s) e c --> ([i|x.y].s) e c
changes subtag of the node [x.y].
68 #setcar
(b [x.y].s) e c --> ([b.y].s) e c
changes first element (car) of the node [x.y].
69 #setcdr
(b [x.y].s) e c --> ([x.b].s) e c
changes second element (cdr) of the node [x.y].
70 #opens
(m socket.s) e c --> (ps.s) e c
open stream, representing one direction of the TCP connection socket. m is the direction (0-input 1-output), result ps is an integer, representing a stream pointer.
71 #tcpconn
(P H.s) e c --> (socket.s) e c
Start TCP connection to host with IP address H on port P. Result socket is an integer, representing pointer to Linux socket, traing to establish the connection.
72 #tcplsn
(P M.s) e c --> (socket.s) e c
Start TCP listener on port P, allowing maximum M connections. Result socket is an integer, representing pointer to Linux socket, waiting for the client connections.
80 #getlex
(ps.s) e c --> ((b.lex).s) e c
try to receive lexeme from stream with descriptor ps. b is 0 if the try finished, 1 if operation is not finished. lex is received lexeme if the try finished.
81 #getc
(ps.s) e c --> ((b.i).s) e c
same as getlex, but receives ascii code i.
82 #scont
(ps.s) e c --> ((b.r).s) e c
continue get/put operation for stream ps. If operation finished, b is 0 and r is the result, else b is 1.
83 #seof
(ps.s) e c --> (eof(ps).s) e c
Tests stream ps for eof. If ps is a TCP connection test is for closed/broken connection.
84 #close
(ps.s) e c --> (ps_nil.s) e c
Closes stream ps. Returns null stream pointer.
85 #pause
(n.s) e c --> (n.s) e c
pause SECK machine for n milliseconds. This code is used to leave resources for other programs in Linux if there is no work for the kernel and user processes.
86 #pintr
(n.s) e c --> #intr_n(SECK)
This code is same as #intr_i, but gets its interrupt number from S.
87 #gettag
([t|x.y].s) e c --> (t.s) e c
Returns subtag of the node [x.y].
88 #getconn
(socket.s) e c --> ((b.new_socket).s) e c
try to receive new connection from socket (created by #tcplsn or #tcpconn).
b is 0 if the try finished, 1 if operation is not finished.
new_socket is a ready TCP connection if the try succeed.
89 #parstr
(n.s) e c --> (ParamStr(n).s) e c
Gets the n-th parameter passed to secd interpreter by underlaying OS (nil if not present).

Mail me at home or work

SECD:	[ index ]	[ collection ]	[ hard_SECK reference ]
LISP:	[ reference ]	[ tutorial ]	[ implementation scheme ]