Ah, I'll have a closer look at arcformat then. Might be good to have a reader :)
Continuations & tail-calls are treated the way Steele proposed it in the lambda papers.
The following code :
(set f (fn (n)
(if (< n 2)
n
(+ (f (- n 1)) (f (- n 2))))))
(prn (f 30))
is first translated so as to have unique names for variables and so that primitive operations are prefixed with a % (we won't treat them the same way as other operations in the next steps) :
Then we perform continuation-passing-style transformation : each function call is given, as an extra parameter, the portion of code (the continuation) it will send its result to : functions do not return, they call another function with the value they calculated :
(Sorry for the indentation, I don't feel like indenting this by hand, but you get the idea I guess ?)
Generating C code from there is then easy. It is almost only Marc feeley's code, based on Steele's ideas a few years ago. I only changed the code so that it understands Arc code and primitives instead of Scheme's...
I assume that for code blocks 3 and 4, 'let is the Lisp 'let, not the Arc one? (let ((var val)) ...) not (let var val ...)?
Also on code block 1 you show (prn (f 30)) but in code block 2 you seem to pass (f <continuation> 20)? I assume this is just a mistake?
I managed to understand the transformation up to code block 3, but not code block 4; what does '%closure do? Allocate heap memory for the closure? Also, what's the syntax for "goto"?
Finally: how about the equivalent C code?
LOL, maybe I should just wait for you to push it on the git and experiment with it myself.
Aruu, okay okay I finally actually looked at the presentation docs, which I probably should have looked at first. One of the things that threw me off was 'self - I was thinking of Arc's sense of 'self as in 'afn!
The output C code looks suspiciously like assembly language to me. Perhaps we can also target a temporary assembly syntax so that we can do minimal peephole opts, such as convert stuff like PUSH(x); y = TOS(); to MOVE(x,y);. Can't wait to actually see this code on the git ^^.
P.S. Given that the transformations are (gasp!) syntactic, it might actually be possible to implement the entire compiler as (gasp gasp!) a treeparse parser (or at the very least a piped chain of treeparse parsers) ^^.
Maybe treeparse would be the right thing to use... The code is getting uglier everytime I try to add a new primitive / special form... I dunno...
As for the generated code, yes, it's a lot like a portable assembly code. There are certainly easy optimizations to perform on it, but as for now, it's working and that's a lot :)
And yes, let is the traditional one -- with tons of parens everywhere.
I'll go through the code later to see what can be done. Certainly the AST looks representable as plain lists to me, although I haven't fully analyzed it yet.
As an aside compile-file could be restructured like the following:
(def compile-file (filename)
(compile-ast (parse-file filename) (+ (strip-ext filename) ".c")))
; to allow programmatic access
(def compile-ast (ast dest)
; chain of conversions
(let chain
(list
(list cps-convert "CPS-CONVERSION")
(list closure-convert "CLOSURE-CONVERSION"))
; do reduction
(let final-ast
(reduce
(fn (ast (f desc))
(let new-ast (f ast)
(prn "----------------- AST after " desc)
(prn (source new-ast))
new-ast))
chain ast)
(prn "-------------------- C Code:")
(w/outfile f dest
(w/stdout f
(prn:liststr:code-generate final-code))))))
This should allow easy insertion of any steps (e.g. optimization steps) along the way.
In fact the chain list should probably be better use `(,), so that we can support flags or suchlike for optimizations:
For that matter my concern is with the expansion of PUSH and POP:
PUSH(x); y = POP();
=>
*sp++ = x; y = *--sp;
Can gcc peephole the above, completely eliminating the assignment to the stack (which is unnecessary in our case after all)?
y = x; //desired target
Without somehow informing gcc to the contrary, gcc will assume that writing to * sp is significant, even though in our "higher-level" view the assignment to * sp is secondary to transferring the data between two locations.
Well, with full optimizations on gcc (-O3), it doesn't change anything (at least in execution time, I didn't compare generated machine codes). Wow, gcc seems really clever. Now that I know how hard it is to implement a compiler, I can only applaud :)