Arc Forumnew | comments | leaders | submitlogin
A fix for arc's gensyms
5 points by eds 6130 days ago | 9 comments
Arc's gensyms are currently broken.

Maybe this is old news to some of you, but I was quite surprised when I discovered this.

The essential problem is that gensyms created in arc are not unique symbols, they are merely created from a string prefix and a counter, and if someone happened to use a symbol with the same name as the printed representation of the gensym, there would be a name conflict.

  (define (ar-gensym)
    (set! ar-gensym-count (+ ar-gensym-count 1))
    (string->symbol (string-append "gs" (number->string ar-gensym-count))))
In arc, if gensyms were truly unique, the last part of the following would fail.

  arc> (= g (uniq))
  gs1767
  arc> (eval `(= ,g "foo"))
  "foo"
  arc> (eval g)
  "foo"
  arc> gs1767 ; this would result in an error if g was not 'gs1767
  "foo"
  arc> (= gs1767 "bar")
  "bar"
  arc> gs1767
  "bar"
  arc> (eval g) ; this would still be "foo" if g was a unique symbol
  "bar"
This behavior isn't surprising considering that uniq just increments a counter and creates a new symbol from the resulting string. (And a comment above the definition admits the current implementation is something of a joke.)

This problem does not occur in mzscheme; symbols created with gensym, and symbols that happen to have the same name, are distinct and separate.

  > (define g (gensym))
  > g
  g2
  > (eval (list 'define g "foo"))
  > (eval g)
  "foo"
  > g2
  reference to undefined identifier: g2
  > (define g2 "bar")
  > g2
  "bar"
  > (eval g)
  "foo"
Changing the definition of 'ar-gensym in ac.scm to the following seems to fix half the problem:

  (define ar-gensym gensym)
The part fixed is for local variables declared as function parameters (and therefore 'let and 'with). However, for global variables, the problem persists because of arc's explicit addition of a prefix, which results a symbols that does not act as a gensym.

  (define (ac-global-name s)
    (string->symbol (string-append "_" (symbol->string s))))
I believe that global gensyms will work properly if we simply don't attempt to add the prefix. I don't know of any way to definitively determine if a symbol is a gensym, but my best guess at this point is to convert the symbol to a string and back again, and if the two symbols are equal, then the original is an interned symbol (and thus not a gensym). The following change to 'ac-global-name seems to fix the behavior of the global gensym test case above.

  (define (ac-global-name s)
    (if (equal? s (string->symbol (symbol->string s)))
        (string->symbol (string-append "__" (symbol->string s)))
        s))
So I guess my question is, is this (assuming the last hack is correct) the right way to solve the gensym issue? If so, how do you determine if a symbol is a gensym or not? (Is there a better way to determine if a symbol has been interned?) Are there any other reasons why pg didn't want to use mzscheme's 'gensym?

If people like the idea of using mzscheme's gensyms, and if they think this hack is good enough, I'll put it up on Anarki.

Also, I hope pg reads this and either uses my fix (or fixes it himself), because I really would like working gensyms in the official distro.



2 points by drcode 6130 days ago | link

Saying that they're "broken" is a bit extreme... The current design has clear benefits (it's trivial to check the value of a gensym, the implementation code is simpler, etc.) I kind of like how they work, actually- The fact that gensyms thrash the main namespace, the main drawback of this design, is pretty typical arc behavior. I hope gensyms continue to work in the current manner.

By that line of thinking, almost everything in arc is broken, from macros to equality checks (what, no EQL support?) , because arc is always willing to accept a risk of rare bugs/inefficiencies in exchange for simplicity.

-----

3 points by eds 6129 days ago | link

Ok, so I exaggerated a little in my initial language. Sorry if I mislead you.

Perhaps the current design is slightly simpler (but not by much, because my changes remove about as much code as they add). And personally, I think it is more unintuitive to think that gensyms are converted to globals and interned into the main arc namespace, than just left alone when assigned in a global context.

But either way, the whole point of gensyms in the first place is to protect from unintended variable capture, right? I don't really see how a design which basically fails to do that can be considered acceptable. Especially when every other Lisp I have ever used ensures somehow that gensyms do not interfere with other variables.

Much of what might be called broken in arc is either just broken until someone gets around to fixing it (like first class macros), or intentionally designed that way and not broken at all (like unhygienic macros). The former will presumably be fixed at some point, and the latter, well, isn't as "broken" as people claim it to be (ask kennytilton for figures on how many macros he has written).

Gensyms are the former, as indicated by the comment above the implementation of ar-gensym. On the other hand, one might argue that gensyms are the latter because unintended variable capture doesn't occur very often. I won't directly argue against this, but consider that CL uses unhygenic macros, yet goes to the trouble of ensuring safe gensyms. (In fact, it can party get away with unhygenic macros because it ensures unique gensyms.)

Also, I think there is a matter of elegance involved in an issue's classification as "to be fixed" or "not really broken". I can't conclusively prove that gensyms increase the elegance of code that uses them, it is just a gut feeling I have. And likewise I feel unhygienic macros decrease elegance (but this may be slightly more arbitrary, since I have never actually written a hygienic macro).

There is always an issue of risk vs. gain, but in this case, I consider the gain to be greater than the risk/cost. Your opinion may differ.

P.S. I think it would settle things pretty decisively if pg told us what he intends to do with gensyms.... pg?

-----

2 points by tung 6129 days ago | link

The arc implementation doubles as the specification. Piggy-backing on mzscheme's gensyms seems like an attractive prospect at first, until you realise that we now have a part of arc which isn't defined in arc. The only way that could work is if gensym becomes considered as a language axiom, and I'm not sure if it's really a candidate for that.

In practice, I wouldn't believe to be that much of a problem. Who would want to mimic the abominable symbols generated by arc with the current methods anyway, intentionally or otherwise?

-----

1 point by eds 6129 days ago | link

You have a good point there. Fortunately it can be fixed trivially, by returning the definition of 'ar-gensym to its original form and using 'string->uninterned-symbol instead of 'string->symbol. (I suspect this is what mzscheme 'gensym does anyways.)

  (define (ar-gensym)
    (set! ar-gensym-count (+ ar-gensym-count 1))
    (string->uninterned-symbol (string-append "gs" (number->string ar-gensym-count))))

-----

1 point by pg 5702 days ago | link

The reason I didn't simply use Scheme's gensym is that it would have imported the concept of interning into Arc with it. That has always seemed rather a hack to me. It may turn out to be the best solution, but for now I want to take more time to think about it.

-----

1 point by elibarzilay 5701 days ago | link

You could come up with a new "subtype" of symbols which is reserved for gensymed symbols -- then use some mapping of unique names for new such gensyms (say, a counter), and then use some new syntax like `__name' to evaluate to these things (assuming that you want code to be able to write such gensyms literally)...

But doing all of that is equivalent to just creating gensyms as done now, with a `__' prefix -- so if you really want that last property then nothing needs to be changed (perhaps only the `gs' prefix)...

-----

2 points by sacado 6130 days ago | link

That means that, with the FFI hack now in Anarki, we would have to add a third underscore to Arc names ? wow :)

-----

1 point by eds 6129 days ago | link

Wait, what?

Looking at my code I realize it is inconsistent with respect to underscores, but that was because I did testing on both official and Anarki, and forgot to check the code I pasted into the submission.

I'm not sure if that has anything to do with your comment about three underscores or not. (I'd appreciate an explanation either way.)

-----

1 point by sacado 6129 days ago | link

I hadn't read very close your code, so obviously I'm wrong.

Anarki's FFI is based on mzscheme's (of course), which itself imports symbols beginning with an underscore (_int, _pointer, _string, ...) These can clash with Arc's names (and they actually do, the string function for example). To overcome this issue, I added an underscore to Arc's names in Anarki (Arc's string is now mzscheme's __string).

Reading the original ac-global-name code and the corrected one I saw yours had one more underscore. I thought it was part of the correction, but it seems you just give arc2's original ac-global-name and Anarki's corrected code. Hence the mistake. Sorry for that noise :)

-----