Arc Forum | > Why is the arc community so interested in adding lots of intrasymbol synta...

Arc Forum

new | comments | leaders | submit

1 point by almkglor 6259 days ago | link | parent

> Why is the arc community so interested in adding lots of intrasymbol syntax?

Because PG suggested adding them to Lisp and related languages, citing the case of 'quote and 'quasiquote and friends.

That said I don't understand your concern about this here.

Edit2: Personally I'm also somewhat concerned about exploding syntax. This makes things harder, sometimes a lot harder. http://russ.unwashedmeme.com/blog/?p=58

>Adding all of these special case reader macros sounds like it might be nice to use if it was easy to add

cref. my ssyntaxes.arc on Anarki

> Calling (package symbol) would just return the value of the symbol from that package

This will require special handling for macros; if I wanted to use a macro explicitly from a particular package, I'd need to do something like this:

  ; library.arc
  (in-package foo)
  (mac a-macro (x) (+ "the foo macro says: " x))

  ; my-program.arc
  (in-package my-program)
       ; since a.b == (a b)
  (pr (foo.a-macro "temp"))

However this means that the arc base system has to check that a list in head position might resolve to a macro or a symbol for a macro. Someone actually posted a patch for this but didn't push it on Anarki (I'll search for it later when I have a bit more time).

Your suggestion is certainly valid; one can implement packages as macros that perform lookup for a hidden global table and would probably use gensyms.

However the problem lies not in functions or macros but in types. Currently the standard Arc Way is to use plain symbols for types. Now, suppose package A defines a type 'parser, while package B also defines a type 'parser. If symbols are kept global (unpackaged!!) then we have a type clash: if you have an object of type 'parser, is it 'parser as known by A or 'parser as known by B? My proposal, on the other hand, will transform them into '<A>parser and '<B>parser.

Edit: PG: "I can't imagine why users would want to have type labels other than symbols, but I also can't see any reason to prevent it." http://www.paulgraham.com/ilc03.html

Edit: Of course, currently Arc is semi-standardized on using symbols, and I doubt it'll change sometime soon. In principle it's possible to use non-symbol type tags (say some random object containing inheritance and fields information)

Digression:

I'm also implementing multimethods/generic functions, and they depend on type. So you can do something like, say:

  (in-package foo)
  (using <arc>v3)
  (interface v1 my-type)
  (def my-type (x)
    (annotate 'my-type x))
  (defm <base>+ ((t a my-type) (t b my-type))
    (my-type (<base>+ (rep a) (rep b))))

... and you can do something like:

  (using <foo>v1)
  (= a (my-type 1))
  (= b (my-type 2))
  (= c (my-type 3))
  (+ a b c)
  => #3(tagged <foo>my-type 6)

It involves some trickery on the Scheme side (mostly to keep '+ efficient even though it's mostly defined on the Arc side), but hey, overloading + is cute ^^

Of course, package <bar> can define <base>+ on its own my-type, and adding a <foo>my-type and <bar>my-type will correctly throw a typing error ^^, which is even cuter.

> However, wouldn't that mean that the library file would need to maintain all of the old versions, even if you weren't using them?

How good or bad is backwards compatibility? If you use a library today and someone fixes a bug in it, but in the process completely changes the interface from under you, would you be pleased or pissed?

Backwards compatibility accumulates cruft, true. BUT, it helps with a very human need to be lazy. If updating a library means I might have to look through all my other tools, checking that something the library provides in older versions is changed to conform to newer versions, I might prefer not to update anyway.

> How's snap going?

Badly. I'm stuck on I/O. The bad thing is with sockets. A socket is bidirectional (one FD number for input and output directions), but mzscheme splits it into two mzscheme ports, one for input and one for output. Arc-on-mzscheme thus expects monodirectional sockets. This means I might have to redesign a bit of the central I/O, since I was designing with listen/socket-accept returning a single port object.

The bits I'm doing are actually listed in the second-to-the-last post on the SNAP VM blog, i.e. the backlog.^^

3 points by shader 6258 days ago | link

I've been thinking a bit more on the "package" problem, which is a bit dangerous, I know.

I don't really know how you're doing scoping in SNAP, or even much of how it's done in arc, but if the lexical scoping is done the same way as it is according to SICP (which I'm just now reading), then each "environment" has a pointer to it's parent frame, and if the interpreter can't find the definition in the current frame, it moves up one level.

What if you just merged the concept of "packages" "environments" and "scope" into the same thing? I don't really know what the syntax should be but basically when you load a package at a certain level of the program, it adds a pointer to the current frame (maybe between this frame and its parent? I don't know how well multiple parents works for lexical scoping) and it's treated by the interpreter as if it was just another scope.

Then, to make things fun and interesting for more than just packages, you add a syntax for naming scopes, and referring to them either by name or by relative height. This way you can access not only package specific variables, but also get around some shadowing problems by telling it "I only want 'a if it's in this scope" or "I want 'a from the frame two levels up"

I don't really care too much what the syntax looks like, but this would be an interesting way to reference packages that should be (if scoping actually works in a way similar to I've presumed) a) fast and b) capable of managing more than just packages.

One major problem is that it would probably require lots of low-level munging in scheme to get this to work. It would probably be easier to do in SNAP, since I think you're implementing most of this from scratch there.

What do you think?

Edit: I've been thinking again ^^, and I just realized that though this might be an interesting way to implement packages, I don't think it would fix your typing issue. Maybe variables should all carry around info of which scope they're from (kind of what you were suggesting)? That way the system can look in the right place for variables more quickly, too. If 'a says it's from scope this+2, we wouldn't have to wait for the interpreter to miss twice before it found the value.

-----

2 points by almkglor 6258 days ago | link

> then each "environment" has a pointer to it's parent frame

Your understanding of this is quite correct.

But this is how it's equivalent to, not necessarily how it's implemented. AST-traversal?

The problem with the nested-environment implementation is that mere variable lookup can be O(N) where N is the number of nested environments:

  (def foo (x)
    (fn (y)
      (fn (z)
          ;accessing x takes 2 indirections!
        (+ x y z))))

Also, consider an alternative:

  (def foo (x)
    (fn (y)
      (let tmp (+ x y)
        (fn (z)
          (+ tmp z)))))

In a true nested-environment implementation, the innermost function will still carry around 'x and 'y even though it doesn't need them. This means that GC will not be able to collect the contents of those variables.

A faster way would be to "flatten" the environment. In effect, what's really done is:

  (def foo (x)
    (fn (y)
      (fn (z)
        (+ x y z))))
  =>
  (def foo (x)
    (closure (list x)
      (fn (y)
        (closure (list (my-environment 0) y)
          (fn (z)
            (+ (my-environment 0) (my-environment 1) z))))))

For mutated and shared variables, most implementations have some sort of "boxed" variable; this is what are called "shared variables" in arc2c/SNAP.

------- BUT!

Just because environment flattening is how it's usually implemented doesn't mean it's not possible to capture and name an environment.

For example consider that what we call a "function" is really an object constructed by a 'closure very-hidden function. It's really a pair of the environment and the function code. Similarly if Arc had ways of destructuring a function, we could implement a call-with-current-environment function. Suppose we had a 'environment-of function which extracts just the data of a given function:

  (def call-with-current-environment (f)
    (f:environment-of f))

Then a theoretical 'w/environment:

  (mac w/environment (var . body)
    `(call-with-current-environment
       (fn (,var) ,@body)))

> What if you just merged the concept of "packages" "environments" and "scope" into the same thing?

This has been implemented. Take a look at lib/module/module1.arc . But aside from having problems with macros, there's also....

> I've been thinking again ^^, and I just realized that though this might be an interesting way to implement packages, I don't think it would fix your typing issue.

Quite right ^^

> Maybe variables should all carry around info of which scope they're from (kind of what you were suggesting)?

Err. I don't quite understand this. Also, I don't quite understand how this realates to symbols-as-types...

-----

2 points by shader 6257 days ago | link

I seem to be slowly arriving to similar conclusions that you have, the only differences being things that I didn't know about the way that lisp is optimized. Maybe I'll end up learning something through all of this.

> Err. I don't quite understand this. Also, I don't quite understand how this realates to symbols-as-types...

Well, what I had originally thought when I said this was rather silly, which was that in the value of the variable there should be information about which scope it's located in. Doh! Maybe that would work if it looked in the current env to figure out where the "real" var is.

What I now think is something very similar to what you've been saying, except I still think it would be cool if this could be a method of managing general environments, instead of just modules. I don't know if that was what you were thinking, or not; it's more of an implementation detail.

I do like the idea of being able to extract and name the environment at any point in the code. If all that a package was was an environment with a name and we had the ability to access items from each environment directly, it could (possibly?) be useful for a lot more than just modules.

I really like the syntax (env sym), where env is the environment object, and 'var is the symbol we're looking up. This way env.sym works. That's probably a syntax that many people are used to. I just don't know if it would work or not.

For your typing problem, I would do what you recommended. If it made any sense, I would use the syntax env.sym, so that it can be evaluated to figure out what type it is. You would still have to evaluate everything in head position to see if it evaluated to a macro. On the other hand, you could just check the car of the first item, and see if it was an environment. Or you could treat the whole thing as a single symbol and have it bypass the reader? How does arc's typing system work?

Also, how much would changing the type symbol to <env>sym help? How would it help the code decide how to interpret it?

I just don't know enough about scheme's pointer and environment system, or arc's typing system to know if any of my thoughts are even going in the right direction. So I just make more of them to cover the bases, and have you explain why none of them work. ^^

Basically at this point, I'll agree with anything you say, because I don't know enough to disagree with you. I would be very interested in learning/having you teach me all of the details about arc's implementation. As always, I'd love to help, but I don't know nearly enough to be useful.

Maybe we should start a new thread for the module system that you're proposing?

As per one of you previous comments: If symbols weren't global, how would you do them?

Edit: PG: It would be really nice if there were "reply" and "View Thread" buttons for each comment on the comments page. How hard would these be to add?

-----

2 points by almkglor 6257 days ago | link

> What I now think is something very similar to what you've been saying, except I still think it would be cool if this could be a method of managing general environments, instead of just modules. I don't know if that was what you were thinking, or not; it's more of an implementation detail.

Yes, it would definitely be cool to access the environment. For one, it would make serializing functions possible: aside from giving accessors to the environment and to the function code, you can give constructors for functions that accept a serializable environment and a serializable function code.

> If it made any sense, I would use the syntax env.sym, so that it can be evaluated to figure out what type it is.

Typical 'isa pattern:

  (isa foo 'cons)

Translating to module.sym form:

  (isa foo 'arc.cons)

.... except that arc.cons == (arc cons) and you'd be comparing the type to the unevaluated (arc cons), because of the ' mark.

With <arc>cons syntax, the "<" and ">" characters are simply passed directly, i.e. <arc>cons is just a single symbol.

-----

1 point by shader 6256 days ago | link

Is there any reasonable way to prevent the dot from being interpreted there? Or alternatively, evaluate the whole statement?

And how is the isa statement going to be updated to work independent of which module it's running in? Does each module type all of it's objects that way from the beginning, or were you going to have the interpreter do something fancy?

-----

1 point by almkglor 6254 days ago | link

> Is there any reasonable way to prevent the dot from being interpreted there? Or alternatively, evaluate the whole statement?

First things first. We want to make the type arc.cons readable, don't we? So if say 'arc here is a macro, it will expand to a symbol, which is basically "the symbol for cons in the arc package".

Now that symbol can't be a 'uniq symbol, since those are perfectly unreadable.

What we could do is....... oh, just make up a symbol from the package name and the given symbol. Like, say...... <arc>cons. The choice of <> is completely arbitrary.

Now.... if (arc cons) is a macro that expands to <arc>cons, why go through the macro?

> And how is the isa statement going to be updated to work independent of which module it's running in? Does each module type all of it's objects that way from the beginning, or were you going to have the interpreter do something fancy?

Something fancy, of course. Specifically it hinges on the bit about the "contexter". Remember that in my proposal I proposed adding an additional step after the reader, viz. the contexter.

Basically the contexter holds the current package and it puts any unpackaged symbols it finds into the mapping for the current package.

So for example we have:

  (in-package foo)
  (using <arc>v3)
  (interface <foo>v1
    make-a-foo foo-type is-a-foo)
  (def make-a-foo (x)
    (annotate 'foo-type x))
  (def is-a-foo (x)
    (isa x 'foo-type))

Now the contexter goes through it and maintains hidden state. This state is not shared and is not assured of being shared across threads (it might, it might not, implementer's call - for safety just assume it doesn't)

Initially the contexter has the package "User".

It encounters:

  (in-package foo)

This changes its internal package to "foo". This (presumably) newly-created package is given a set of default mappings, corresponding to the arc axioms: fn => <axiom>fn, quote => <axiom>quote, if => <axiom>if, etc. The contexter then returns:

The t is evaluated and returns.... t.

Then it accepts:

  (using <arc>v3)

This causes the contexter to look for a "v3" interface in the "arc" package. On finding them, it creates default mappings; among them are:

  def => <arc>def
  isa => <arc>isa
  annotate => <arc>annotate
  ...etc.

Upon accepting this and setting up the "foo" package to use <arc>v3 interface, it again returns:

Then it accepts:

  (interface <foo>v1
    make-a-foo foo-type is-a-foo)

This causes the contexter to create a new interface for the package "foo", named "v1". This interface is composed of <foo>make-a-foo, <foo>foo-type, and <foo>is-a-foo.

After creating the interface, it then returns:

Then it accepts:

  (def make-a-foo (x)
    (annotate 'foo-type x))

Since it isn't one of the special contexter forms, it simply uses the mapping of the current package - specifically the package foo - and returns the form:

  (<arc>def <foo>make-a-foo (<foo>x)
    (<arc>annotate '<foo>foo-type <foo>x))

Notice how x is implicitly mapped into the package foo; basically any unpackaged symbol which doesn't have a mapping in a package is automatically given to that package.

Then the contexter accepts:

  (def is-a-foo (x)
    (isa x 'foo-type))

Which is returned as:

  (<arc>def <foo>is-a-foo (<foo>x)
    (<arc>isa <foo>x '<foo>foo-type))

So there: the contexter automagically inserts packages.

-----

1 point by shader 6253 days ago | link

See, I told you you'd convince me to come around to your way of thinking, once I'd asked enough questions to find out the reasons for the choices you've made.

So, how hard is it to take what you've made so far, and generalize it to work with any kind of environment, and not just packages? So, things like destructuring functions, naming environments, passing them around, etc. Would this allow us to avoid shadowing variables by naming which level they came from explicitly?

Also, if I type in the symbol '<arc>bar, will the contexter read that and presume I'm looking for the bar in arc, or will it rename it <foo><arc>bar? And which is better?

If it's the latter, how would you propose we directly reference a particular environment? Would that be pack.sym, like I had thought earlier?

-----

1 point by almkglor 6252 days ago | link

> So, how hard is it to take what you've made so far, and generalize it to work with any kind of environment, and not just packages?

Hmm. Well, if you want to be technical about things, one Neat Thing (TM) that most compilers of lexical-scope languages (like Scheme and much of CL, as well as Arc) do is "local renaming". Basically, suppose you have the following code:

  (fn (x)
    (let x (+ x 1)
      x))

This is transformed (say by arc2c) into:

  (fn (x@1)
    (let x@2 (+ x@1 1)
      x@2))

Note that the renaming could actually be made more decent, i.e. something readable. For example, in theory the renaming could be done more like:

  (fn (<fn>x)
    (let <fn/let>x (+ <fn>x 1)
      <fn/let>x))

In fact I am reasonably sure that Scheme's hygienic macros work after local renaming, unlike CL's which work before renaming. You'll probably have to refer to some rather turgid papers though IMO, and decompressing turgid papers is always hard ^^.

> Also, if I type in the symbol '<arc>bar, will the contexter read that and presume I'm looking for the bar in arc, or will it rename it <foo><arc>bar? And which is better?

Well, since packages are constrained to be nonhierarchical, <arc>bar is considered already packaged and the contexter will ignore them. The contexter will add context only to unpackaged symbols.

-----

1 point by rincewind 6253 days ago | link

If I write this into the foo-package-file:

   (let list '(this is a test case)
       (all is-a-foo (map make-a-foo list)))

would list be read as <foo>list or <arc>list? Should the expression

   (sym "make-a-foo")

evaluate to <foo>make-a-foo?

-----

2 points by almkglor 6253 days ago | link

> If I write this into the foo-package-file: ....

Assuming you mean in the same file with the example foo package:

  (<arc>let <arc>list (<axiom>quote (<arc>this <arc>is <foo>a <foo>test <arc>case))
    (<arc>all <foo>is-a-foo (<arc>map <foo>make-a-foo <arc>list)))

> Should the expression ....

No, it evaluates to the unpackaged symbol 'make-a-foo.

-----

1 point by shader 6259 days ago | link

Well, I'm sure that since you're actually having to use arc, and implement it in a multi-thread environment, you'd know much more about the problems that a module system is likely to have than I. How does erlang do theirs? I found a short whitepaper on the subject, but it didn't have very many details. I'm sure that whatever they use is perfectly safe for whatever you'd be doing in snap, though the implementation/syntax might not be the nicest.

About macros, hasn't there been some interest for a while in getting macros to be "first-class" and so forth? How are they implemented, exactly, that makes this so hard? Is it really that hard for the reader to take the output of a function and automatically interpret it as code?

Also, you mention some effort being made into allowing arc to see if the form in head position resolves to a macro. How hard is that to do? I really don't know how macros work, though I understand what they do.

Aren't they just functions tagged to let the interpreter know that they can a) be expanded at compile time, because the output is not dependent upon the value of the input, and b) return forms that need to be evaluated?

How hard would it be to let normal functions do this? Suppose we had a syntactical feature on function definitions, such that (fn x 'a) prequoted a, so that the form located at a is captured, instead of it's value. Would that be enough to "simulate" macros? Or does the interpreter need to know that the forms that come out the other end probably need evaluation?

I should think that all a macro is is a tag that guarantees to the compiler that it's output is completely independent of it's input, and can thus be expanded at compile time.

I know we kind of went through something like this before, but what am I missing. If you clear this up, maybe I'll finally understand how macros really work ;)

And good luck on SNAP. I would love to help you, as lisp + erlang (feature wise) is something I am very interested in. Unfortunately, it's just way beyond my abilities at this point. I shall look forward to reading your blog!

Is the problem with the sockets due to the fact that you're trying to maintain compatibility with the present arc-on-mzscheme? Or is it something more fundamental than that?

-----

1 point by almkglor 6259 days ago | link

The difference is that Arc evaluates forms from a loaded file one expression at a time. 'def in Arc is simply an assignment to a global variable; technically, 'load doesn't load a module, it executes a program (which in most cases just assigns functions to global names).

Erlang source files, on the other hand, are a set of function definitions - they aren't executed at the time you load. There are no globals in Erlang (although the function names are effectively equivalent to global variables that can't be mutated normally). Each Erlang source file is compiled as a single unit, meaning one source file == one Erlang module.

> About macros, hasn't there been some interest for a while in getting macros to be "first-class" and so forth?

I presume you mean something like this:

  (let my-macro (annotate 'mac
                  (fn (x y)
                    (+ "my-macro says " x " and " y)))
    (my-macro "hmm" "haw"))

> How are they implemented, exactly, that makes this so hard?

It isn't how macros, per se, are implemented that makes this hard, it's how efficient interpreters are implemented that makes this hard.

One of the slowest implementations of interpreters are what's called "AST traversers". Basically, the interpreter simply goes through the list-like tree structure of the code and executes it. In a Lisp-like, the AST is the list structures input by the s-expression syntax. This is what macros fool around with.

The slowness of this is usually because it needs to enter each sub-AST (i.e. a sub-expression, e.g. in (foo bar (qux quux)), (qux quux) is a sub-AST) and then return to the parent AST (in the example, it has to return to (foo bar _)).

However a faster way to do it is to pre-traverse the syntax tree and create a sequence of simple instructions. This is usually called a "bytecode" implementation, but take note that it doesn't have to be a byte code.

For example (foo bar (qux quux)) would become:

  (call qux quux) ; puts the return value in 'it
  (call foo bar it)

The increase in speed per se is not big (you just lose the overhead of the AST-traversal stack while retaining the overhead of the function-call stack), but it gives an opportunity for optimization. For example, since the code is now a straight linear sequence of simple instructions, the interpreter loop can be very tight (and relatively dumb, so there's very little overhead). In addition, it's also possible to transform the linear sequence of simple instructions to even simpler instructions... such as assembly language.

However, consider the above sequence if 'foo turns out to be a macro. If it is, then it's too late: the program has already executed 'qux. If it were part of say a 'w/link macro, then it shouldn't have executed yet. Also, recreating the original form is at best difficult and in general highly intractible, and remember that the macro expects the original form.

So in general for efficient execution most Lisplike systems force macros to execute before pretraversing the AST into the bytecoded form. This also means that macros aren't true first class, because they must be executed during compilation.

In short: most lisplikes (mzscheme included) do not execute the AST form (i.e. the list structures). They preprocess it into a bytecode. But macros work on the AST form. So by the time the code is executed, macros should not exist anymore.

> Also, you mention some effort being made into allowing arc to see if the form in head position resolves to a macro. How hard is that to do?

Trivial, just add a few lines in ac.scm. However rntz didn't push it on Anarki, which suggests that the modification hasn't been very well tested yet. http://arclanguage.com/item?id=7451 but the patch itself has been lost T.T . I think it'll work, but I haven't done the patch too either ^^.

> And good luck on SNAP. I would love to help you, as lisp + erlang (feature wise) is something I am very interested in.

Thanks. oh and yeah: http://weblog.thatmattbone.com/search/label/erlang-in-lisp

-----

1 point by shader 6258 days ago | link

Ah, I see now. How naive of me to presume that lisp actually worked with the AST like it says it does. Oh well.

Is there any way to optimize the interpreter without sacrificing AST interpretation? Or should I write my own language that says "interpreted languages are supposed to be slow; don't worry about it" for the sake of more powerful (in theory) macros? ^^

Or is there actually no difference between the qualities of the two macro systems? Would you care to enumerate the pros and cons of each system? You can do it on a new thread, if you like.

-----

1 point by almkglor 6258 days ago | link

> Or should I write my own language that says "interpreted languages are supposed to be slow; don't worry about it" for the sake of more powerful (in theory) macros? ^^

You might be interested in Common Lisp's 'macrolet form. I've actually implemented this in Arc waaaaaaay back.

Considering that CL is targeted for direct compilation to native machine code (which is even simpler than mere bytecode), you might be interested in how CL makes such first-class macros unnecessary.

-----

1 point by shader 6256 days ago | link

I'm very interested in both of those. Would you care to explain? If not, do you have any particularly good resources (besides a google search, which I can do myself)?

-----

1 point by almkglor 6255 days ago | link

The macrolet form in Arc: http://arclanguage.com/item?id=3085

As for how CL makes first-class macros unnecessary most of the time, it's primarily 'macrolet and packages.

-----

1 point by shader 6249 days ago | link

So, how does that work, exactly? Does macrolet tell lisp that since the macro is only defined in that scope, it should search more carefully for it, because it doesn't have to worry about slowing down the whole program?

-----

1 point by almkglor 6247 days ago | link

Err, no. It simply means that the particular symbol for it is bound only within the scope of the 'macrolet form. In practice, most of the time, the desire for first-class macros is really just the desire to bind a particular symbol to a macro within just a particular scope, and 'macrolet does that.

For other cases where a macro expansion should be used more often than just a particular scope, then usually the module or whatever is placed within a package and a package-level macro is used.

-----