SECD virtual machine for pure_LISP
This document describes an modified SECD virtual machine, used for implementation
of the pure_LISP language.
This modified version of the Landin's original SECD machine may be named
hard_SECK machine.
The main differences with the original:
- regiter D is removed. S is used to hold E
and C during calls.
- codes and parameters are packed into one 32-bit integer.
original LD (i.j) sequence is replaced with #ld_i_j
integer (see below info about big integer mnemonicks)
- there is additional set of registers for user/kernel process swaping.
- SECD codes are optimised to do as less as possible conses.
This cause changes in eval-apply circle - the arguments of the
user functions are not cons-ed during the eval part.
The whole S register is used as local variables
list in apply-like commands.
- Each call-like code has tail-recursive twin code.
This document is not yet finished, so you may use the next additional
sources of info for my SECD machine:
- file History.txt describing the evolution
of my SECD architecture.
- pascal sources in subdirectory meta, especially file
sec_codes.pp, defining the machine.
1. Data types, registers, syntax of the command codes.
Data types:
Terminal data types:
- nil - represents empty list
- integer - represents 32-bit integer value (the codes
of SECD machine are also represented by integers).
- atom - used for variable/function names. Atom names are used
also as short strings.
Composite data types:
- node - used to represent pair (a.b) where a and b are some
data structures.
List (x1 x2 ... xn) is represented as (x1.(x2.(...(xn.nil)...)).
- non-functional pure_LISP objects - used for I/O, interprocess
communications and lazy evaluations.
They are variants of the node (subtypes of the node type).
These objects are created and handled by the strict_LISP program
kernel.sys or by some control codes of the hard_SECK machine.
All non-regular nodes are used to save sometimes some or group of pending
processes.
The full algorithm for multitasking, timesharing and message passing
is represented by strict_LISP function (kernel i r0 r1 io)
from the file kernel.sys.
Each node has a subtag field, representing the subtype of the node.
I use the following notation for nodes:
- (a.b) for nodes with subtag 0 (normal nodes)
- [a.b] for nodes with subtag 1 (recipes)
- [k|a.b] for nodes with subtag k.
Possible subtags:
- 0 - normal node / pipe without message handler
If such node is treated as a pipe, it has the form (q0.q1)
where q0 and q1 represents the queuee
of sleepeng processes, waiting to send message to unknown message
handler.
- 1 - recipe (thunk). There is 3 forms of the recipes:
- [c.e] - delayed expression (waiting recipe).
c is non-empty list - the command
code for the expression.
e is the local scope in witch the
expression must be evaluated.
- [p.q] - expression in process of evaluation
(working recipe).
p is an integer, representing primitive
code that must be executed after the evaluation.
q is a queuee of processes, waiting
for the result of the evaluation.
- [nil.x] - evaluated expression (ready recipe).
x is the result of evaluation if
it is a terminal value.
If the result is a node, the recipe is replaced with this
node.
In the last case the subtag of the recipe is changed to 0
and it becomes a normal node.
- 2 - pipe with message handler
It has the form [2|s.(e.c)] where s, e
and c represents the message handler, waiting to
receive a message.
- 3 - IO stream
- 4 - TCP socket, waiting to be connected (connwait)
Registers:
hard_SECK has many registers, but 3 of them (S, E and
C) may be treated as main registers:
- S is the stack.
- E is the environment.
- C is the code (the executed program).
- S1, E1 and C1 is a swap set for S, E and C.
When the machine works in kernel mode S1, E1 and C1 holds
the last user process state.
In this case they are accessible as pseudo-vars p_S, p_E and
p_C in program kernel.sys.
When kernel starts some user process, the kernel state is
saved into S1, E1 and C1, while the user process is placed
into S, E and C.
- Sk, Ek and Ck is a set of registers holding the kernel
state at the begining of the user interrupt.
It is used if some runtime error hapens while the kernel proceed
the interrupt.
In this case the kernel state is restored from Sk, Ek and Ck
and runtime error handling is started.
- U holds the command code of the last user command or
the last interrupt number. It is used in error handling.
hard_SECK include the following additional variables:
- boolean userMode. While the kernel is working, userMode
is false and if some user process is invoked, userMode becomes true.
- counter CPCnt (Current Process Counter). It is incremented
at any user process step. When this counter reaches the constant ShdLimit
(511 in current version), the user process is interrupted and control
is passed to the kernel. After such interrupt kernel will shedule the
process with anoder user process.
Syntax of the SECD codes and commands:
SECD codes are packed togeder with 1 or 2 parameters into 32-bit integers.
If the SECD code is i and parameter is p, then
the appropriate SECD command is represented by integer i*2^24+p.
In my SECD machine there is special mnemonicks for the positive integers,
bigger than 2^24.
Such integers are represented by strings with syntax #cod_par,
where cod is the SECD code mnemonick, par is the
parameter value.
Examples:
- #app_5 is integer 36*2^24+5 and means 'Apply the 5-argument
function'.
- #car is integer 48*2^24 and means 'Get the left element
of the node'.
- #ld_2_3 is 2-parameter command. It is integer 2^24+2*256+3
and means 'Load the 3-th element from 2-th level of the environment
E'.
- #trrec_cdr is the most complex example. It is the integer
41*2^24+49 and means 'Recover the expression and then get
the right element of the result'
Note that this is only other syntax for integers, you may do all arithmetick
operations with it as with normal integers.
In compiler.lib source there is many function calls like (add
#trrec i), constructing the proper SECD commands.
The purpose of these mnemonicks is to make SECD executables (files *.sec)
more readable.
2. Main codes (single process commands).
These codes may be devided into 2 groups:
1. Control codes - they may change E and C
registers or may intrrupt user process in some cases.
For proper handling of tail recursion there is the only code for the
control restoration - #rtn, represented by the nil value
of the C register.
There is also a pre-code CleanX, used by codes #rtn, #rec
and #trrec.
Post-code StartRec(p) begins a process of recovering for wating
recipe:
Non-regular control codes:
- CleanX - pre-code, substitutes evaluated recipe
with their value:
while
top of S is an evaluated recipe [nil.x]
do ([nil.x].s) e c --> (x.s) e c ;
if top of S is a working recipe then #intr_2 ;
- StartRec(p) - post-code, begins evaluation of a
waiting recipe:
([c'.e'].s)
e c --> ([p.nil].s) e' c'
- #rtn - restores the control.
S must have the form (x R.s) where x
is allready evaluated value, R is working recipe of the form
[p.q] or the previous state of the environment E:
if R is recipe [p.q] then begin
CleanX;
if S is ([c'.e'] [p.q].s)
then StartRec(0)
else begin
if x is a node (l.r) then
begin
replace [p.q]
with (l.r);
(x (l.r).s)
e nil --> ((l.r).s) e nil
end else begin
replace [p.q]
with [nil.x] ;
(x [nil.x].s)
e nil --> (x.s) e nil
end;
if q is node then #intr_3
else #prim_p ;
end;
end
else (x e' c'.s) e nil --> (x.s) e' c'
- 40 #rec_p - recovers a delayed evaluation.
CleanX;
if S is ([c.e].s) then begin
([c.e].s) e' c' --> ([c.e] e' c'.s) e' c'
StartRec(p)
end else #prim_p
- 41 #trrec_p - tail recursive form of #rec_p.
CleanX;
if S is ([c.e].s)
then StartRec(p)
else #prim_p
Regular control codes:
- 1 #ld_x_y
s
e c --> ((GetNth(GetNth(e,x),y).s) e c
Loads a value from the environment E.
- 2 #ldc
s
e (x.c) --> (x.s) e c
Loads a constant.
- 3 #ldf
s
e (f.c) --> ((f.e).s) e c
Loads a function.
- 4 #delay
s
e (c'.c) --> ([c'.e].s) e c
Delay evaluation of the control list c'.
- 5 #case_k
if
i in [0..k-1] then c_i:=ci else c_i:=ck
(i.s) e (c0 ... ck.c) --> (c_i.s) e c
Starts conditional evaluation - loads the control list
ci.
- 6 #dum
s
e c --> s (nil.e) c
Creates new dummy environment for letrec block.
- 7 #stop
Terminates the SECK machine (allowed only in kernel mode).
- 8 #call
if
c=nil
then (c'.s) e c --> s e c'
else (c'.s) e c --> (e c.s) e c'
Calls the control list c'.
This code is used to finish the control functions if,
case and fork.
- 14 #pop
(x.s)
e c --> s e c
Drop the top element of S.
- 15 #prim_p
(x.s)
e c --> (p(x).s) e c
This code has 3 cases:
- p in [0..63] - starts the primitive code p
- p in [64..127] - starts Interrupt(p)
- p in [128..255] - executes TRApp(p-128)
Apply-like control codes:
These codes are twins of normal/tail-recursive forms.
Normal forms push C and E into the stack S, working
like procedure call in procedure languages.
Tail-recursive forms work like goto statement.
All they have parameter n, representing the number of actual
parameters for corresponding LISP function.
All they bind the list of actual parameters S to some environment
(current for let/letrec, other for app), changing the register E.
#let and #trlet are used for implementation of let
block.
#letrec nad #trletr implements letrec block.
#app and #trapp are used when function, defined by lambda
expression is called.
Sn is S with top n elements droped (Sn:=Drop(S,n)).
- 32 #let_n S
e (c'.c) --> (e c.Sn)
(S.e) c'
- 33 #trlet_n S
e c --> Sn
(S.e)
c
- 34 #letrec_n S
e (c'.c) --> ((cdr e) c.Sn) SetCar(e,S)
c'
- 35 #trletr_n S
e c --> Sn
SetCar(e,S) c
- 36 #app_n ((c'.e').S) e
c --> (e c.Sn) (S.e')
c'
- 37 #trapp_n ((c'.e').S) e
c --> Sn
(S.e') c'
2. Primitive codes - they change only the S register.
Two argument codes:
- 16 #cons - creates new node.
(x y.s) --> (cons(x,y).s)
- 17 #eq - tests if 2 arguments are equal.
(x y.s) --> (eq?(x,y).s)
- 22 #add - adds integers.
(x y.s) --> (y+x.s)
- 23 #sub - subtracts integers.
(x y.s) --> (y-x.s)
- 24 #mul - multiplies integers.
(x y.s) --> (y*x.s)
- 25 #div - divides integers.
(x y.s) --> ((y div x).s)
- 26 #rem - gets the remainder from division of integers.
(x y.s) --> ((y mod x).s)
- 27 #leq - tests integers for less or equal.
(x y.s) --> ((y <= x).s)
- 28 #lexleq - tests atoms for lexical less or equal.
(a b.s) --> ((b <= a).s)
- 30 #concat - concats two atoms in a new one.
(a b.s) --> ((b||a).s)
One argument codes:
- 48 #car - gets the first element of a node.
(x.s) --> (car(x).s)
- 49 #cdr - gets the second element of a node.
(x.s) --> (cdr(x).s)
- 50 #int - test if the argument is an integer.
(x.s) --> (int?(x).s)
- 51 #atom - test if the argument is an atom.
(x.s) --> (atom?(x).s)
- 52 #null - test if the argument is an empty list (nil).
(x.s) --> (null?(x).s)
- 53 #node - test if the argument is a node.
(x.s) --> (node?(x).s)
- 55 #stat - print in a log file some statistics about SECD machine.
(x.s) --> (x.s)
- 56 #inc - increments the integer argument.
(x.s) --> (x+1.s)
- 57 #dec - decrements the integer argument.
(x.s) --> (x-1.s)
- 58 #chr - creates an one-character atom from the ASSCI code
of the argument.
(x.s) --> (chr(x).s)
3. Special codes (user calls, interrupts, runtime error handling).
These codes changes all registers. They are used to swap control between
kernel and user processes.
- 44 #user0
(s' e' c'.s) e c --> s' e' c'
S1:=s; E1:=e; C1:=c; U:=nil;
userMode:=true
This code invokes the user process (s' e' c') and is used by
3 argument strict_LISP function (user0 S E C)
- 45 #user1
((s'.(e'.c')).s) e c --> s' e' c'
S1:=s; E1:=e; C1:=c; U:=nil;
userMode:=true
CPCnt:=0
This code invokes the user process (s' e' c') and is used by
1 argument strict_LISP function (user1 SEC).
Note that it sets the shedule counter to 0.
- 46 #intr_i
This code have different behavior in user and kernel mode.
It is started also when runtime error happens.
See inplementation in file meta/sec_codes.pp for details.
In user mode this code interrupts current user process and swap
it with the kernel, saved in regiters S1, E1 and C1.
Inrerrupt number i is passed as parameter to the kernel.
- 47 #prev_i
s e c --> (S1/E1/C1 s) e c
Loads some of swaped registers.
Interrupts:
- Interrupts 0..63 are caused by some SEC machine commands:
- #intr_0 is started by code #fork
- #intr_1 is started when current user process reaches
the timesharing limit (CPCnt > ShdLimit).
- #intr_2 is started when CleanX pre-code (started from
#rtn, #rec and #trrec) find working recipe at top of S.
- #intr_3 is started by #rtn when some lazy evaluation
is completed, but there is a queue of processes, waiting
for the result of this evaluation.
S = (q p s) where q is a queue, p is the primitive code
which must be executed after the recipe substitution.
- Interrupts 64..127 are started by #intr code and are treated
as soft_SEC primitives:
- #intr_64 implements the pure_LISP function (fopen fmod
fn)
- #intr_65 - function (fclose f)
- #intr_66 - function (eof f)
- #intr_67 - function (connect IP port)
- #intr_68 - function (listen maxconn port)
- #intr_96 - function (send msg io)
- #intr_97 - function (recv io)
- #intr_98 - function (sendc i stream)
- #intr_99 - function (recvc stream)
- #intr_100 - function (fail n)
- Interrupts 128..255 are caused by runtime errors.
4. Kernel codes (low-level i/o, ipc, synchronization).
Kernel primitive codes:
- 64 #openf
(m fn.s) e c --> (ps.s) e c
open file fn with mode m (0-input 1-output), result ps is an
integer, representing a stream pointer to the file descriptor.
- 65 #putlex
(ps lex.s) e c --> (0/1.s) e c
try to send lexeme lex to stream with descriptor ps. result
is 0 if the try finished, 1 if operation is not finished.
- 66 #putc
(ps i.s) e c --> (0/1.s) e c
same as putlex, but sends ascii code i to stream.
- 67 #settag
(i [x.y].s) e c --> ([i|x.y].s) e c
changes subtag of the node [x.y].
- 68 #setcar
(b [x.y].s) e c --> ([b.y].s) e c
changes first element (car) of the node [x.y].
- 69 #setcdr
(b [x.y].s) e c --> ([x.b].s) e c
changes second element (cdr) of the node [x.y].
- 70 #opens
(m socket.s) e c --> (ps.s) e c
open stream, representing one direction of the TCP connection
socket. m is the direction (0-input 1-output), result ps is an integer,
representing a stream pointer.
- 71 #tcpconn
(P H.s) e c --> (socket.s) e c
Start TCP connection to host with IP address H on port P.
Result socket is an integer, representing pointer to Linux socket,
traing to establish the connection.
- 72 #tcplsn
(P M.s) e c --> (socket.s) e c
Start TCP listener on port P, allowing maximum M connections.
Result socket is an integer, representing pointer to Linux socket,
waiting for the client connections.
- 80 #getlex
(ps.s) e c --> ((b.lex).s) e c
try to receive lexeme from stream with descriptor ps. b
is 0 if the try finished, 1 if operation is not finished. lex is received
lexeme if the try finished.
- 81 #getc
(ps.s) e c --> ((b.i).s) e c
same as getlex, but receives ascii code i.
- 82 #scont
(ps.s) e c --> ((b.r).s) e c
continue get/put operation for stream ps. If operation finished,
b is 0 and r is the result, else b is 1.
- 83 #seof
(ps.s) e c --> (eof(ps).s) e c
Tests stream ps for eof. If ps is a TCP connection test
is for closed/broken connection.
- 84 #close
(ps.s) e c --> (ps_nil.s) e c
Closes stream ps. Returns null stream pointer.
- 85 #pause
(n.s) e c --> (n.s) e c
pause SECK machine for n milliseconds. This code is used
to leave resources for other programs in Linux if there is no work
for the kernel and user processes.
- 86 #pintr
(n.s) e c --> #intr_n(SECK)
This code is same as #intr_i, but gets its interrupt number
from S.
- 87 #gettag
([t|x.y].s) e c --> (t.s) e c
Returns subtag of the node [x.y].
- 88 #getconn
(socket.s) e c --> ((b.new_socket).s) e c
try to receive new connection from socket (created by #tcplsn
or #tcpconn).
b is 0 if the try finished, 1 if operation is not finished.
new_socket is a ready TCP connection if the try succeed.
- 89 #parstr
(n.s) e c --> (ParamStr(n).s) e c
Gets the n-th parameter passed to secd interpreter by
underlaying OS (nil if not present).
(c) Skelet, Nov 2002 in terms of GNU GPL
Mail me at home or work