Arc Forumnew | comments | leaders | submitlogin
A Psyco equivalent for Arc : is it easy, possible or inconceivable ?
12 points by sacado 6147 days ago | 13 comments
Hello everyone. This is a veeeeery long post, I know, so feel free not to read it all at once (or not to read it at all anyway).

PART I. "n" operators and the 20 x factor

You may know I am interested in improving Arc's performance (I am sometimes crunching numbers or doing equivalent things and would like to do so in Arc, at least at the prototype level, so that's an issue). I was just thinking about making an optimizer for Arc in the same way Psyco works (or the way I think it works).

Recently, I've been adding a few more axioms in ac.scm (secretly, I didn't share it on Anarki yet) that optimize the generated mzscheme code when compiled. These are, basically, n+, n-, n<, n>, nis, nisnt (and friends), all versions of their counterparts without the initial n that know that their args are of type 'num. When you type (+ a b), the generated code is :

  (ar-funcall2 __+ __a __b)
That means : find the __+ version, apply it the values a and b, check the type for these values, ah, they are numbers ? So call mzscheme's +. It then checks they are effictively numbers, adds them and returns the result.

If you call this in a loop, you have a very big overhead besides the simple computation. That's the price to pay for polymorphic functions and dynamic typing. Well, when I know my vars are all nums and I want performance, I call (n+ a b), and the generated code becomes :

  (+ __a __b)
This is inlined and optimized by mzscheme's JIT, so it just becomes : "add __a and __b".

That gives impressive results : (fib 40) with only "canonical" operators takes 350 s. while (nfib 40) with "n" operators only takes 18 s. Yes, a factor of 20. I get no honor of course, mzscheme's JIT does 95% of the work. Have a look at this one now :

  (time:for i 1 10000000)
  time: 11375 msec.

  (time:nfor i 1 10000000)
  time: 722 msec.
That does work with numbers of course, but it could be done with string operators too, for example (intensive regexes anyone ?)

PART II. This is not really Arcish !

Well, there are a few drawbacks of course. First, n operators are special forms, not regular functions. They have the same status as if or fn and are not first-class values. You cannot reduce a list of integers with n+, for example : it has to be embedded in a lambda.

Next, when you want to optimize something, you have to change its definition so as to manually change all the operators to optimized ones. It can be discouraging for code you didn't write yourself (the for macro, for example). I would like to have that performance when I need it, without having to take care about it and having to rewrite functions whenever I need it !

That brings me to the subject of my post (yes ! finally !)

PART III. A Psyco equivalent for Arc : is it easy, possible or unconceivable ?

The goal of Python Psyco is quite the same as mine. Take a bunch of Python code. It's elegant but slow, particularily when you calculate numbers in loops, as, when it sees an innocent "+" operator, it does dynamic dispatching, type checking of arguments and all that stuff, taking 95% of the time checking or are fine adding numbers. Yes, I am, thanks for asking !

There comes Psyco. You are calling that fib function with a number. Hmm, OK, let's generate a particular function for when fib gets called with a number. So, n is a number, right ? Then, that call n < 2 is just comparing numbers, let's inline it. n -1 and n - 2 are numbers, too, etc. Now, every time fib is called, we look at its arg. If it is a number, the inlined, faster version is called. Otherwise, the more general version is (but it is never the case). Congratulations, you won a 50 x speed factor.

How hard is it to do such a tool for Arc ? I'd like a function named, say, arco, doing this : everytime the function is called with different arg types, generate an optimized version of the code for that type. If you arco fib, the first time you call it with an 'int value, it will generate a version specialized on that type. As < is called with n and 2, since n is currently an int and 2 is, well, an int obviously, change the code to (n< n 2). That would generate something like that (the code could be changed again later if we call fib with, say, a string) :

  (def fib (n)
    (if (isa n 'int)
      (if (n< n 2) ; Specialized, automatically generated code
        n
        (n+ (fib (n- n 1)) (fib (n- n 2))))
      (if (< n 2)   ; Original code
        n
        (+ (fib (- n 1)) (fib (- n 2))))))
What are the gotchas there ? + could have been modified, n could have been changed by a function call inside the optimized function (not possible here, but you never know), many other things I guess... Does this mean my dream is an impossible one ? I think these potential traps exist in Python too, however it does work... But if I'm not wrong, the very optimizing Scheme compilers require the code to be encapsulated, i.e., it has to be a closed world : see Stalin or Ikarus. Even mzscheme's JIT is only working when you enclose the code inside a module where it knows you didn't redefine + (that's why my nfib is faster than mzscheme's version ran from the REPL, btw).

What's your advice / opinion / feeling ? Oh, and thanks for reading all that !



4 points by almkglor 6146 days ago | link

Well, you just have to study type inference. ^^ Haskell is a good example ^^.

re: "n could have been changed by a function call inside the optimized function". This normally can't happen on a function call. This can only happen in the following cases:

1) n is a global variable, in which case you can't safely optimize on it anyway

2) n is enclosed both by this function and the function it calls - e.g. if the function it calls is a subfunction that can see this local variable - and the function it calls does 'set on that variable.

3) if the function itself does 'set on that variable.

Otherwise, 'n cannot possibly change (i.e. purely functional programming)

Note also that you simply need to create a really big data structure to handle each part of the program.

Say you have the following canonical definition:

  (def fib (n)
    (if (< n 2)
      n
      (+ (fib (- n 1)) (fib (- n 2)))))
This defines an infinite set of functions, where (type n) is all the possible types (including types that haven't been defined yet, which is why it's an infinite set). Most of those functions will really be (err "<: expects <number>, got ..."), but hey ^^.

(of course, infinite sets can be modelled using lazy eval.)

Now let's also consider the "+ could have been modified". In most cases this happens only if someone has a new type. Otherwise, any modification of '+ which doesn't involve someone defining a new type is probably someone screwing your system, in which case we assume that it's screwed anyway so you might as well ignore that possibility (and consider how to stop someone screwing your system in the first place).

But if the modification of + is itself based on having a new type, then we should also include '+ itself in the optimizing type system. So now '+ itself represents an infinite set of functions, based on (type n). Adding a definition for '+ based on type must thus extend the lazy set of '+ functions.

Thus at compilation, we only have to detect what gets called at top-level (that is, the repl). We can trivially determine the types of the arguments at top-level (just do 'type on each of them!).

Now any variables defined in the called function will be either the arguments themselves or will be other local variables, based on computations. Those computations will simply be a call to another function, based on the arguments (and globals).

For the functions called within other functions, we simply analyze them downwards and determine: if a function is given type a, what is the output type? Then we can determine the type of local variables.

We can determine the output type of the called functions by simply recursively determining what the possible outputs are. The output of the function is always the result of the last expression in the function and we can then determine the type from there. If the last expression is some sort of conditional, then it could be the output of the types from any of them.

Then we eventually reach one of a set of basic functions. These basic functions would have defined exactly what type it returns, based on its inputs. For example '+ would be applied to a list of stuff, [a], and return a type of a. (obviously such basic functions can also be defined by the user - all you have to do is support some sort of a -> a syntax, like in Haskell). Basically, '+ promises that if given a list of objects of type a, it will return an object of type a, or die trying (i.e. raise an exception).

Then we can select which '+ to use, based on the actual type we've determined.

So the flow on your 'fib function would be:

1) 'fib receives an object of type a. If it's received an object of this type before, it will use the function created for that type.

2) otherwise, the analyzer goes through the 'fib code, looking for locals and temporaries (i.e. return values of functions, which will be temporarily placed in a stack-like location) and determining their types.

3) the analyzer reaches (if ...) and enters into each expression in it

4) it sees (< n 2) and determines that < is a (a,b) -> t/nil basic function. It then knows that the 'if it has to use just has to handle t and nil, and doesn't have to handle random objects (could be useful nfo, who knows? maybe some computer hardware somewhere can branch based on that faster than normal)

5) since the last expression in 'fib is an 'if form, it knows that 'fib itself returns several possible types. The first type is (type n) itself.

6) The other type is the type returned on '+. It knows that the type returned by '+ is the same type as all its inputs.

7) The type input into '+ is the type of 'fib. Since we're analyzing 'fib itself, we use a separate type, b.

8) The input into 'fib is the output of '-, which is also a basic function whose type is the same as its inputs. Since the inputs are of (type n), we know that the input to 'fib is therefore (type n).

9) The hard part: how do we know that 'fib therefore is a b -> b function (i.e. returns a b, given a b)?

-----

1 point by sacado 6146 days ago | link

Hmm... I think I'll have to read that more deeply later on... There are obviously interesting ideas in it, but hard to implement... Not sure I understood everything yet...

Currently, I'm working on a naïve approach : it supposes native numerical functions (only numerics as for now) are not redefined, or, at least, that they are redefined in a conservative way (that is, as you mention it, e.g. + was redefined to be able to add apple, a user type, but still works the regular way with numbers), and it knows that, if a numerical operation is called only with numbers :

- it can use its inlined version,

- the result will be a number too, so nested numerical operations can be taken into account too.

For example, when we call (fib 30), the compiler knows the n arg and literal numbers are numbers, so (- n 1) and (- n 2) and (< n 2) are numbers too and this gets translated into (n- n 1), (n- n2) and (n< n 2). However, it cannot know (yet) (fib (n- n 1)) is a number, so the final sum cannot rely on the inlined + :

  (gen-ad-hoc (listtab '((n int))) '(fn (n)
    (if (< n 2)
      n
      (+ (fib (- n 1) (fib (- n 2)))))))

  -> (fn (n) (if (n< n 2) n (+ ((fib (n- n 1) (fib (n- n 2)))))))
The gen-ad-hoc function generates the ad hoc code, based on the fact that n is an int. I still have a few bugs (too many parens somewhere), but that's a good start :)

-----

4 points by nlavine 6145 days ago | link

I've actually been thinking about this a lot lately. I belive psyco is just a special case of partial application.

nfib is faster than fib because it's compiled with the knowledge that its argument is an integer. But you could make it even faster if you knew that its argument was an integer that fit into a machine word. (That might not be what you want to do with nfib, but I imagine a lot of numerical functions are going to be used almost always with numbers that fit in machine words.)

My thought is that you should be able to specialize functions for arbitrary predicates. For instance,

  (def fib (n)  ; yeah, it's a bad definition
    (specialize (and (number? n) (< n  (exp 2 32)) (> n 0)))
    (if (< n 2)
        n
        (* n (fib (- n 1)))))
Or maybe you could even declare specializations outside of the function. For instance, imagine writing

  (specialize (fib n) (and (number? n) (> n 0) (< n (exp 2 32))))
right next to some code that really needed a fast fibonacci function. And of course, if you think about it like that, there's no need to restrict it to function calls - that's just an arbitrary restriction. Let's say you need really fast mapping of a function.

  (specialize [map f _]).
Or even,

  (= fmapper (specialize [map f _]))

-----

1 point by sacado 6144 days ago | link

Yep, that's probably the next step after psyco.

-----

2 points by sacado 6145 days ago | link

OK, I've got a working implementation now. It works this way : I defined a macro named adef which works exactly as def, except that it defines a hashtable associating tuples of types to optimized functions definitions. When the function is defined, this table is emtpy.

Now, every time the function is called, it :

- determines the types of all args

-checks whether this list of types is a key in the table

- if it is, the associated function is called with the current args

- if not, a new function is generated based on the given types. In the fib example, since the type of n is an int, we try to optimize the code. This way, (- n 1) is rewritten (n- n 1), etc.

Actually, the code that was given when calling the adef macro is never really called : it is just a skeleton to generate code adapted to the given types (if possible).

The algorithm is currently very naïve, as it is only able to know the return type of functions if it was manually declared (i.e., as for now, mathematical operators). In the fib example, it does not know the result of (fib (- n 1)), so we can't optimize the '+ operator there. almkglor's suggestions are the way to do I guess, but I already had hard time fighting with these macros, so we'll do type inference later ;)

And, well, there is another little problem. The lookup in the hash table takes a very long time. Most of the time is spent looking for the actual function to be called, thus slowing down the whole process... :( Maybe I should rewrite it using redef instead, or maybe I should write all of this directly in mzscheme (as what we are after is generated the best mzscheme code possible).

  (mac adef (name parms . body)
     (w/uniq (htypes tparms types)
        `(let ,htypes (table)
           (def ,name ,parms
              (withs
                 (,tparms (mergel ',parms (map type (list ,@parms)))
                  ,types (map cadr ,tparms))
                 (aif (,htypes ,types)
                    (apply it (list ,@parms))
                    (apply (= (,htypes ,types) (gen-fn ,parms ,tparms ,@body)) (list ,@parms))))))))

-----

3 points by stefano 6146 days ago | link

To avoid the +(and other operators) redefinition problem we could add an optional directive to the interpreter promising that, from now on, we won't redefine basic operators. As an example:

(promise 'no-redef)

This would let the interpreter/compiler do the optimizations.

-----

1 point by sacado 6146 days ago | link

Good idea. Not perfect as you have to annotate your code to get performance (but after all CL and GambitC Scheme work this way), but this could be tried...

-----

1 point by sacado 6144 days ago | link

OK, so I started doing it in mzscheme. It shouldn't be done in pure Arc for the following reasons :

- compiling Arc code is better done on the compiler side rather than on the arc side

- that way, I can get rid of n+ et al., as they never really get called in Arc code

- manipulating what is generated by the 'ac function is easier than manipulating raw Arc code : 'ac does macro-expansions and translates ssyntax in Scheme syntax.

In practice, until now, I added a function, inside 'ac, wrapping the result of the latter. This function explores and modifies the code generated by 'ac. Every time it sees a 'lambda, it takes its args and body and generates an actual lambda that looks like the arc code I wrote there : http://arclanguage.org/item?id=5216 .

So, 2 hash tables are generated for each lambda. Well, as you can guess, the code is eating as much memory as you can imagine, particularily if you arco the code in arc.arc (which should be done anyway, should it only be to accelerate loops). Now, I'm quite sure the solution based on hash tables is a dead end.

Maybe I should do otherwise : instead of using hash tables, I could make the function's code grow everytime a new type is applied on it :

  (if (type-is-the-new-type)
    (call new-generated-code-for-this-type)
    (call the-previously-generated-code))
I don't know if this would work better, however I will probably not work on this today. Writing macros generating macros generating lambdas analysing other lambdas to generate other lambdas is an unfailing source of headaches. And bugs too.

-----

2 points by eds 6147 days ago | link

Wow, long post. And yes I did read all of it.

It looks like a very interesting idea, and would certainly be very valuable if we can actually get it to work. But some of the issues you mention do look difficult.

Does Psyco itself have any documentation that discusses their implementation? It might be useful for finding ways around the difficulties in writing an "arco".

Possibly this could make a good Arc Summer of Code project? (http://arclanguage.org/item?id=5175)

-----

1 point by sacado 6147 days ago | link

"Possibly this could make a good Arc Summer of Code project?" Yes, probably. Well, actually I don't know how these things work. At all. But I guess it is...

Edit : yes, I think that would be a good idea !

Btw, yes, these are really hard problems I think. It won't be done in a few hours. However, in our case, we already have mzscheme doing a good part of the work (it could be interesting to get even closer to the bare metal, though, but that could be done later). Psyco, on the other hand, compiles down to assembly code.

I found a few references about how Psyco works, I'll give back the links here next time I use my regular machine.

-----

3 points by mattknox 6146 days ago | link

I'm pretty heavily involved with LispNYC, and they're the lispiest of the Google SoC sponsoring organizations. I'm also pretty interested in psyco, so if someone finds a student who wants to work on this, I'd be happy to mentor.

-----

1 point by mayson 6142 days ago | link

Re: Psyco you might like to take a look at <a href=http://www.sagemath.org/doc/html/prog/node35.html>Why Cython is the best available option for Sage</a> perhaps something like Cython's cdef and cpdef might be possible in Arc.

-----

1 point by wfarr 6147 days ago | link

Pretty interesting stuff. I'll have to think on it some more, and re-read your post before I can really decide how I feel about it.

-----