As an aside, there already exists a parser combinator library for Arc; my position is that it would be preferable to use that if possible.
Also, the tokenizer function could be redone as a scanner using (scanner 'car (car-expression) 'cdr (cdr-expression)). You can even have each state of the tokenizer as a separate sub-function and model state transfers as tail-function-calls. You could do something like this:
(def tokenize-scanner (s (o ind 0))
(with ((reading-comment reading-unquote ...) nil
cur-token nil
next-ind ind)
(= reading-comment
(fn () ... (default-state)))
...
(default-state)
(scanner 'car cur-token
'cdr (tokenize-scanner s next-ind))))
It might also be good to abstract away the generator portion and put the generator as a library (arguably though, scanners are already monadic generators)
It might be useful too to have the reader read from a list or scanner, and in my opinion will be the way it will be done in SNAP and/or arc2c.
Still, I'm not above actually using your code ^^ Good job!
? I've been browsing arki source for examples but I'm completely out of my depth there ...
One aspect of the token generator is that sometimes it recognises two tokens simultaneously. In other words, when it sees the right-paren in
... foo)
, it recognises "foo" and right-paren all at once. Perhaps this is the wrong way to do it, and I should be using 'peekc instead. But I suppose I can do this with a scanner:
Note that the expressions in the 'scanner form are delayed, i.e. they are not evaluated until a 'car or 'cdr is performed on your scanner, and they are evaluated only once.
edit: an important note: scanners have the exact read semantics of lists. So simply zapping cdr at a scanner will not advance the scanner, it will only advance the place you are zapping.
There's no need to use 'peekc or similar: all you need is to use stuff like 'cadr, 'caddr.
Because scanners have the exact read semantics of lists, you can use such things as 'each, 'map, etc. Just don't write using scar, scdr, or sref.
If you wanted to emulate lists, you can do something like:
(def my-cons (a d)
(scanner 'car a
'cdr d))
Of course, since a and d are just var references, there's little point in delaying their execution.
edit2: Here's how you might make a generator:
(def generator (f v)
(scanner 'car v
'cdr (generator f (f v))))
(= b (generator [+ _ 1] 0))
(car b)
=> 0
(cadr b)
=> 1
(cadr:cdr b)
=> 2
'map, 'keep, and a few other functions become lazy when applied on scanners, so you can use an infinite-series generator on them safely