Where w/locals expands into a big (withs ...) form with key-value pairs taken from the list.
Use an alternative def to put function definitions in such a table instead of the global namespace:
(def-in tokeniser-helpers make-token (kind tok start length) ...)
This way, arc-tokeniser would be less horribly big, easier to hack, and non-namespace-polluting. As a kind of plugin system, it doesn't seem terribly obtrusive does it? The disadvantage is that you need to search further for the definitions of your helper functions.
Is this something like how your prefix-abbrev macro would work?
I think it's not just a question of worrying about clashes that may never happen - it also feels inelegant, dirty even, to have globally-accessible functions that are relevant only in a very specific context. Otherwise I would completely agree - it would be a kind of premature optimisation to worry about them.
it also feels inelegant, dirty even, to have globally-accessible functions that are relevant only in a very specific context
Yes, but how do you know that your Arc parser functions are only going to be relevant in the code you've written? Perhaps someday I'll be writing my own parser, or something completely different, and I'll find it useful to use one of your functions in a way that you didn't think of!
I suggest trying out writing your code is in the simplest possible way. For example, in your original:
(def arc-tokeniser (char-stream)
(withs (make-token (fn (kind tok start length)
(list kind tok start (+ start length)))
"make-token" does not use "char-stream", so we can make this simpler:
(def make-token (kind tok start length)
(list kind tok start (+ start length))
Now I can look at "make-token" in isolation. I can easily understand it. I know that all that other stuff in arc-tokeniser isn't affecting it in some way. And, if I'm writing my own parser and I want to use "make-token", I can do so easily.
And sure, down the road there may be some other library that also defines "make-token". At that point, it will be easy to make a change so that they work together. Perhaps by renaming one or the other, or by doing something more complicated. The advantage of waiting is that then we'll know which functions actually conflict, instead of going to a lot of work now to avoid any possibility of future conflict, the majority of which may never happen.
Now of course I'm not saying to pull every single function out of arc-tokenizer. You've some functions that depend on char-stream and token and states and so on. So those it makes perfect sense to leave inside arc-tokenizer. My claim is to today write the simplest possible parser.arc library, explicitly not worrying about future namespace clashes. That it is better to deal with them in the future, when they actually happen.