"What do people think of this? Basically I want all list operations to treat atoms as degenerate dotted lists. Does this have any adverse implications for other aggregates (tables)?"
I don't see a problem, but I may not be the right person to ask. If it came down to it, I'd be willing to use 'table-pushnew, 'alist-pushnew, 'maxheap-pushnew, and so on. :-p
What other list operations do you have in mind? While 'pushnew makes sense to use with degenerate dotted lists, it's a very special case: It only clobbers the list at a finite depth, and that depth is actually zero. (Or zero in the unmodified list, anyway.)
Oh, whoops. I forgot 'pushnew actually does traverse the input list to find out whether the element is new. I was thinking of 'push. XD
I disagree with the notion of membership you're using for dotted lists. If I 'pushnew nil onto (1 2 3 . nil), I want to get (nil 1 2 3 . nil), and it won't work that way if nil is already considered to be a member.
I think the membership check you're using is like this:
(The final cdr is an element iff it isn't nil.)
If the list is...
A cons cell:
If the car is the value we're looking for, succeed. Otherwise,
continue by searching the cdr.
Nil:
Fail.
A non-cons, non-nil value:
Continue by comparing the list to the value we're looking for.
I feel we could simplify this specification by rewriting the last two cases in one of these ways, ordered from my least to most favorite:
(The final cdr is an element.)
(This breaks (pushnew nil ...) in existing Arc code.)
...
A non-cons value:
Continue by comparing the list to the value we're looking for.
(The final cdr is not an element.)
...
A non-cons value:
Fail.
(The final cdr is always nil and is not an element.)
(Arc 3.1 already uses this.)
...
Nil:
Fail.
A non-cons, non-nil value:
Raise an exception.
By using the "element iff it isn't nil" approach, you're able to use 'pushnew to traverse the simple argument lists you build as intermediate results of that 'make-br-fn implementation. But I don't know if it's worthwhile to complicate the notion of "list" just to accommodate a special case of argument lists.
"I don't know if it's worthwhile to complicate the notion of "list" just to accommodate a special case of argument lists."
Yeah I see your point. My aim was to extend the correspondence between the syntax for vararg params, rest params and lists of regular params. But those features merely match a (proper) list of args to some template. I'm entangling the template with the notion of a list. Hmm..
"Frankly, I think that bracket functions should always have at least one argument permitted."
Groovy does this. The Closure syntax { ... } is shorthand for { it = null -> ... }, which is a one-argument Closure whose argument "it" defaults to null. When I moved from Groovy to Arc, I missed this.
"Well, how did the change not break setforms in arc.arc? Was the original definition for make-br-fn silently overwritten somewhere?"
Yes. The [...] syntax is redefined in make-br-fn.arc after arc.arc is loaded.
I described my opinion of this at https://github.com/nex3/arc/commit/957501d81bafecab070268418.... It did break my code, but I don't mind breaking changes these days (maybe because I'm hardly a user!), and I think a:b syntax is already more useful than either version of [...].
Wow, this makes a lot of sense, and I don't think I've ever heard of it before. I wonder how the pursuit this feature could impact the design of a language's macro system.
Less optimistically, I wonder whether it'll distract too much from meaningful patterns that span across nesting levels (e.g. imperative code in continuation-passing style).
"I wonder how the pursuit of this feature could impact the design of a language's macro system."
Whoa, how do you mean?
One idea I've been thinking about a lot is to attach metadata to code. We can do a lot to code (pretty-print, optimize, ...) besides just eval it, and metadata might help with those without affecting eval.
My favorite use for this idea is to give functions levels:
This can be used in a variety of ways by different tools. You could flag as errors any calls from a lower to a higher level, so that level 3 functions could only call functions in levels 1-3. You could tell tracers to trace everything until level x, not below. You could tell debuggers to automatically step past calls below level x and step _into_ calls above that level. In gdb I often accidentally hit 'next' when I meant 'step', and now I have to restart and laboriously try again from scratch.
You could syntax-highlight calls within a level to be more salient to calls to lower levels. We've all seen open source projects with this structure.
void main() {into
... // lots of option parsing crap
if (doSomething()) { // the actual app code
... // more crap
}
... // more crap
return -1;
}
Here wouldn't it be cool to be able to highlight the doSomething() compared to all the other calls?
In general, syntax highlighting sucks[1] because we simply highlight the things it's easiest to infer. Metadata might help reduce the inference burden and open up new uses for color.
If we syntax-highlight (withs ...) this way, since it expands into several (fn ...) forms it'll give a different color to each of its variables. But that's assuming we're able to perform some amount of macroexpansion frequently enough to be useful as the programmer edits, and it also assumes we can relate the original syntax to the transformed version returned by the macro.
It would be interesting to see a macro system tailor-made for purposes like these. Some hygienic macro systems might already apply, but I wonder what the options are here. I think Reactive Demand Programming would be an effective ingredient for this kind of incremental program visualization, but that's not a complete plan on its own.
Fun fact: if you used boxes, it would be really easy to associate the "withs" form with its expanded form, since the boxes would be the same.
The Nulan IDE already incrementally macroexpands as you're typing, and syntax-highlights boxes different colors depending on their scope. So this would be fairly trivial to add in.
Arc just uses Racket's hash tables, which come with this disclaimer:
"Caveat concerning mutable keys: If a key in an equal?-based hash table is mutated (e.g., a key string is modified with string-set!), then the hash table’s behavior for insertion and lookup operations becomes unpredictable." - http://docs.racket-lang.org/reference/hashtables.html
If you're just interested in what's happening in the current implementation, I have a guess for your table-in-table example: When you initially put h2 in the table, its hash is calculated, and it's filed in a bin of entries that have that hash. Then when you look up h2 again, it calculates a different hash, so it searches the wrong bin and doesn't find the entry. When you look up (table), it finds the right bin, but the only key in that bin is h2. Since h2 is not currently equal? to (table), the search fails again.
I can't reproduce your second example, but what I get is consistent with this explanation:
Been using Racket 5.1.1. That's a bit reassuring then. I guess that means I shouldn't be worrying too much about the behaviour of mutating keys in the implementation of Arcueid. If Racket says that the insertion and lookup operations of its hash tables become unpredictable then the same is true for reference Arc as well, and it doesn't matter if Arcueid does something completely different as well.
It's not something I'm eager to build off of, for a few reasons:
- Rainbow strives to be consistent with official Arc, in the sense that they can be used to run the same programs. It goes out of its way to implement Arc's warts, making it a more complicated language than it has to be.
- Rainbow.js strives to be consistent with Rainbow, this time in the sense that their code can be maintained side-by-side. I've gone out of my way to make the JavaScript code similar to the original Java code, even if it would have been easier to use more first-class functions or dynamic JavaScript compilation. And if I see a bug, I take note of it but I don't fix it.
- By necessity, I implemented some of the primitive I/O operations in terms of asynchronous JavaScript I/O. This opens up a window for other JavaScript code to run while the side effect is executing. I find it unclear how to make this consistent with either Rainbow or official Arc; both of them support preemptive threading, and yet neither of them spells out exactly which moments in the program are possible preemption points.
- I was only able to develop Rainbow.js because Rainbow was mostly stagnant. But the converse worries me: Since I developed Rainbow.js, it will take more work to update Rainbow (or Arc) because multiple implementations must follow along with the update. Seems standards processes are slow and frustratingly arbitrary for a reason. :)
After Pauan's comment at http://arclanguage.org/item?id=17449, I think I've figured out a convoluted but surprisingly comprehensive approach Arcueid could take. This would support (most) existing Arc code, including a programming style that still uses Arc's unhygienic macros, while also supporting first-class namespaces and hygienic macros using Pauan's get-variable-box recommendation.
A programmer would not observe it to have a hyper-static global environment, because neither does Arc! Nevertheless, a hyper-static environment could be supported as a compiler option, in the sense of having alternative versions of 'eval, 'load, and the REPL. I think the important part of the hyper-static discussion was the use of first-class namespaces.
What dido's been talking about is a system like Common Lisp's or Clojure's. I haven't used either system firsthand, but it seems they both transform all unqualified symbols in a file by associating them with an implicit prefix, and both languages use unhygienic macros and support large community projects.
In response to this, Pauan was saying something about implementing first-class namespaces by using symbol replacement tables. What I'm about to describe is a spin on that: We use first-class namespaces with Pauan's notion of boxes, as I currently understand it, and we also use symbol replacement tables to stand in for CL-like symbol qualification.
I'm actually going to call these concepts "first-class (global) environments," "compilation boxes," and "(symbol) replacement rules," because otherwise I'd confuse myself. We essentially have two notions of first-class namespace at the same time, a compilation box won't be just any generic kind of box, and I'll actually represent symbol replacement rules as functions, not tables.
---
Okay, here's a comprhehensive overview of the command loop:
1) Read a command as though by using 'read. The result can contain some symbols of the form "Moral::sin", and that's not a problem.
2) Code-walk over the command s-expression. For each symbol:
2a) If the symbol does not contain "::", walk over its non-ssyntax segments and replace them according to the current replacement rule. For instance, the symbol "sin.x" could become "ns/math/1-sin.ns/example/1-x". (The "1" is here in case we want to load the same file multiple times.)
2b) If the symbol begins with a non-ssyntax string followed by "::", look up that string using the current replacement rule and the current global environment, and use that value as the current replacement rule for the rest of the symbol. For instance, the symbol "Math::sin.pi" could become "ns/math/1-sin.ns/math/1-pi", and "Foo::Bar::abc:def" could first look up "ns/foo/1-Bar" and then become "ns/bar/1-abc:ns/bar/1-def".
2c) If the symbol contains "::" but has ssyntax before that, report an error. The precedence is unclear, and it's pretty much unimportant, since the programmer can almost always write out their code without using ssyntax at all.
3) Macroexpand the resulting s-expression using the usual Arc semantics. If programmers inspect the s-expressions they're manipulating here, they'll see things of the form "ns/bar/1-abc:ns/bar/1-def", but that's fine.
4) Compile the results of macroexpansion--possibly as the macroexpander goes along, as Arc 3.1 does internally.
4a) When compiling a (get-variable-box ...) form or global variable reference, look up that name in the current global environment. If it doesn't exist, create a new compilation box with a unique ID, and entirely replace the current global environment with another environment that has this binding. The compilation box's ID isn't necessarily related to symbols; it could be a numeric index or a JavaScript identifier, or whatever else the compiler will need.
4b) When compiling a literal compilation box or a global variable reference, use the compilation box's ID as part of the compilation result.
5) Execute the compiled code.
---
As usual, the programmer gets to run code during macroexpansion and execution. At this point, the programmer needs access to several low-level builtins in order to take full advantage of the system. Here are some reasonable builtins we could provide:
- Create and manipulate first-class environments. An environment can just be (a function that takes (a symbol) and returns (a zero- or one-element list where the element is (a compilation box)). This representation offers no way to view the set of bound variables, but neither does Arc 3.1.
- Get, set, or dynamically bind the global environment.
- Create and manipulate a symbol replacement rule. A symbol replacement rule can be just be (a function that takes (a non-ssyntax symbol) and returns (a non-ssyntax symbol)).
- Get, set, or dynamically bind the current symbol replacement rule.
- Dynamically evaluate an s-expression using a particular global environment and a particular symbol replacement rule, and return three things: The final global environment, the final symbol replacement rule, and the evaluated result.
---
Sometimes (in parts 2a and 2b of the command loop), replacement symbols will be inserted in the midst of a single ssyntax symbol. When the replacements are interned symbols, we can just use string manipulation for this. But what if they're uninterned? We have a few different options for dealing with this:
1) It just won't happen, because all symbols in the language are interned. Even the gensyms returned by (uniq) are interned. This is true in Arc 3.1, Jarc, and Rainbow, but Anarki "fixes" it.
2) Uninterned symbols exist, but symbol replacement rules must always return interned symbols anyway, just so that they can take part in ssyntax.
3) Replacement symbols can be uninterned, but there's a dynamic error if these replacements would end up taking part in ssyntax.
4) Interned and uninterned symbols exist, but there's also a third category of symbols with some rigid nested structure. Given an interspersed list of symbols and ssyntax operators, we can construct a symbol that will execute that ssyntax properly, even if some of the original symbols are gensyms.
5) Interned and uninterned symbols exist, and every uninterned symbol is associated with an arbitrary ssexpansion result (usually itself). (In Racket versions of Arc, a weak table would suffice to implement this.) If we try to insert uninterned symbols into ssyntax, instead we make a fictional ssexpansion and create a new uninterned symbol that will ssexpand to that.
My favorite options are 3 and 5. It would be nifty to see 5 in other versions of Arc, even if it somehow breaks assumptions I made in my old code.
---
I have to continue this in a separate post. This is the first time I've seen "That comment is too long." XD
Now to build a high-level import system on top of this, the kind dido is looking for. I'll assume we're restricted to interned symbols.
Just before loading arc.arc, the initial symbol replacement rule is the identity function, and the initial environment contains only the builtins. Each builtin value is located in its own compilation box (no sharing!) and filed under its familiar name as an interned symbol. For instance, the + procedure is filed under the "+" symbol, with no prefixing.
Just before we go to the REPL or the main program file, we take a snapshot of the current global environment and the current symbol replacement rule.
Suppose we want (import foo) to do an unqualified import of everything in foo.arc, including the things it depends on. This will be very much like (load "foo.arc"), but foo.arc will see only the core utilities, not our additional definitions. Here are the steps:
1) Create a new global environment based on the core snapshot.
2) Generate a unique symbol prefix like "ns/foo/1-".
3) Create a new symbol replacement rule which checks whether or not the symbol exists in the core global environment. If it does, it's returned as-is. Otherwise, the prefix is attached.
4) Process each command in foo.arc as described above, using the created environment and replacement rule as context. Then come back to our original context.
5) Replace our current global environment with a function that tries the final foo.arc environment first and our preexisting environment second. (This lookup won't affect run time efficiency, since we use compilation boxes.)
6) Replace our current symbol replacement rule with a function that checks whether or not the symbol exists in the final foo.arc environment. If it does, the function defers to the final foo.arc replacement rule. Otherwise, it defers to our preexisting replacement rule. Now, if we write "bar" and foo.arc defines bar, it'll rewrite to "ns/foo/1-bar", which is part of our new environment.
If we want to do a qualified import (import-as foo Foo) instead, then step 6 is different:
6) Replace our current global environment again so that "Foo" maps to the final foo.arc symbol replacement rule. Now, if we write "Foo::bar" and foo.arc defines bar, it'll rewrite to "ns/foo/1-bar", which is part of our new environment.
---
Whew! I know this is too much to expect anyone to read and understand it all at once, let alone to implement all at once. ^_^
Here are some things to consider, dido, if you generally like this synthesis of ideas but want to spin it up or simplify it:
In the (import ...) mechanisms I described, we filter the symbol replacement rule by querying the environment, so this mechanism actually requires both notions of first-class namespace. If we provide some other way to filter, such letting each file build its own list of export symbols, then replacement rules could exist as the sole namespace system.
On the other hand, if we use first-class environments, we can simply bind "bar" in our namespace to a compilation box we get from foo.arc, and we don't need to fiddle with replacements like "ns/foo/1-bar" and the ssyntax issues. I think there's much more conceptual simplicity to this approach. Unfortunately, it demands the use of hygienic macros, which undermines Arc compatibility.
Whoops, I guess I never posted it here. XD I made it right after we had a thread about yet another one of these languages, and then I chatted about it with Pauan and never returned to it.
Maybe I was waiting until I could document what I meant with the columns. Google Pages disappoints me, because although it'll support a sortable table like this, there's no option to add an explanatory caption. I was thinking of posting an explanation in a comment, but that's not a community-editable part of the wiki, right?
Guess I'll describe my intentions right here. Feel free to migrate these explanations to a better home, and of course, feel free to redesign the table to make it better.
---
Language: The name of the language.
Whitespace: In what way does whitespace matter to the parsing of a program text?
Infix arithmetic: Some users care about expressing their arithmetic-heavy code using traditional infix notation. Does the language syntax tailor itself for this purpose by special-casing particular arithmetic operators? If so, the answer is "Hardcoded." Otherwise, does it provide generic mechanisms so this user can be satisfied? If it doesn't, the answer is "No." If it doesn't provide hardcoded mechanisms but does provide generic ones, what are they?
Affects semantics?: To make it onto this list, the language must have a syntax that transforms into a uniform yet non-semantic nested structure like s-expressions. It must also have some sugar that's explained as an abbreviation of particular kinds of structure. By the time the programmer has a structured representation of the code to work with, are there still visible remnants of these abbreviations? If so, how so?
---
Maybe the infix arithmetic column would be better off as two: A "generic infix" column and a "domain-specific notation" column. Just because a language provides good generic mechanisms doesn't mean it can't hardcode some traditional notations too.
Maybe if the "Affects semantics?" answer is "No", we should add another shade of meaning to indicate that the project actually has no semantics to affect. I think Readable Lisp S-expressions pretty much falls into this category, and so does "Gel: A Generic Extensible Language."
On the other hand, Gel actually saves a lot of detail about the user's input (e.g. which parentheses they used), and any language that uses Gel would probably have to normalize away the unwanted detail in a second stage, possibly in a way that the programmer can notice.
To digress further... Racket allows macros to discover which parentheses were used, but then it treats all parentheses the same after macroexpansion. Do alternate parentheses count as an s-expression abbreviation? If so, I think Racket should make this list, and it should be listed as having an abbreviation layer that affects semantics. (Consider that Racket also has an infix "abbreviation": (a . -> . b) means (-> a b). Not to mention that (a b c) abbreviates (a . (b . (c . ()))) in a wide variety of lisps.)
"But I don't follow the 'affects semantics?' column."
I think it means whether the parsing happens before macros are run or not. In Arc, ssyntax happens after macros, so macros can affect the meaning/behavior of ssyntax. In most Lisps, syntax is read before macros, so macros can't change the meaning.
---
Ah yes, Nulan has changed since that page was added, I'll go in and update it...
Huh. Somehow that rep trick never occurred to me either.
One of my big motivations for first-class macros was to be able to organize my code in any order. If there's no external constraints it increases the odds that a given snapshot will have some reasonable organization that's easy for newcomers to understand. Any constraints just increase the odds of stale organization. I've ranted before about this: http://arclanguage.org/item?id=15587 (footnote 1); https://github.com/akkartik/wart/blob/master/001organization. If macros can be written in any order, that reduces a lot of my need for first-class macros.
Sorry to say, but I don't believe any of this actually lets you use macros before they're defined. Calling the macro's rep will just succeed at expanding the macro, not executing the resulting code (and not suppressing expansion and execution of the arguments).