akkartik, I was wondering if you might like to test this approach and see how fast it is compared to your version? (I could test the speed myself if you wanted, but without your JSON data my results might not be representative).
I believe the particular require planet line below requires MzScheme version 4, let me know if you're using version 3...
(def deepcopylist (xs)
(if (no xs)
nil
(cons (deepcopy (car xs)) (deepcopylist (cdr xs)))))
(= scheme-f (read "#f"))
(= scheme-t (read "#t"))
(def deepcopy (x)
(if (is x scheme-t)
t
(is x scheme-f)
nil
(is x #\nul)
nil
(case (type x)
table (w/table new
(each (k v) x
(= (new (deepcopy k)) (deepcopy v))))
cons (deepcopylist x)
x)))
($ (require (planet dherman/json:3:0)))
(= scheme-read-json ($ read-json))
(def read-json (s)
(deepcopy (scheme-read-json s)))
The require planet line gave me a bunch of warnings about not having scribble installed, but it worked anyway.
The idea is we run the original unmodified Scheme module, and then convert its Scheme return value to Arc.
I expect this would probably be slower than your code. The question is, how much slower? If it turns out to be, oh, 1% slower for example, we might not care! :) And a lot easier than going into every Scheme module we might want to use and modifying it to return correct Arc values.
OK, here's a version that uses the lshift parser that you ported. Uncomment the require file line and put in the path to the original json.ss module...
I timed a few runs of your port and this version against your data set; times for both versions varied between 585ms and 831ms on my laptop, but there wasn't a difference that I could see between the two versions given the spread of times for each run.
I'm running Linux on my laptop. There could well be a difference, I simply haven't run the tests enough times to be able to tell, given that I'm getting a pretty wide spread of run times for each version. Which could be for example my web browser sucking up some CPU at random times or whatever...
Great idea. I thought of that approach but discounted it without attempting.
I tried running it but got a parse error.
default-load-handler: cannot open input file: "/usr/lib/plt/collects/scribble/base.ss" (No such file or directory; errno=2)
default-load-handler: cannot open input file: "/usr/lib/plt/collects/scribble/base.ss" (No such file or directory; errno=2)
setup-plt: error: during making for <planet>/dherman/json.plt/3/0 (json)
setup-plt: default-load-handler: cannot open input file: "/usr/lib/plt/collects/scribble/base.ss" (No such file or directory; errno=2)
setup-plt: error: during Building docs for /home/pair0/.plt-scheme/planet/300/4.1.3/cache/dherman/json.plt/3/0/json.scrbl
setup-plt: default-load-handler: cannot open input file: "/usr/lib/plt/collects/scribble/base.ss" (No such file or directory; errno=2)
Error: "read: expected: digits, got: ."
Yeah, be sure to try each test multiple times. You can get that much of a variance simply from the garbage collector running at one time but another, from the file needing to be read from disk vs. already cached in memory by the operating system, some other process hitting the CPU, or in EC2, the virtual CPU getting fewer cycles at the moment...
Seems to work. (require (file "lib/.../foo.ss")) appears to be the right thing to do for .ss files; a Scheme load uses Arc's readtable which messes up on square brackets, and a plain (require "lib/.../foo.ss") doesn't like periods in directory names.
Note that you're not converting Scheme '() terminated lists to Arc nil terminated lists. This often isn't noticeable since Arc largely treats '() as a synonym for nil, but it does have an effect in some cases, such as keys in a table:
arc> (= a (table))
#hash()
arc> (= k (fromstring "[1,2]" (json-read (stdin))))
(1 2)
arc> (= (a k) 'foo)
foo
arc> k
(1 2)
arc> (a k)
foo
arc> (a '(1 2))
nil
Here k is (1 2 . ()) while '(1 2) is (1 2 . nil), so they're actually different keys in the table.
Whether you feel the need to fix this depends on your application... I ran into a similar issue myself in another project when I was using lists as keys in a table, but it might not affect you for what you're doing.
Arc does treat Scheme's '() as a synonym for nil quite a bit, so there's only a few odd corner cases where the use of '() becomes visible.
Not sure what you mean by getting the reader to recognize #t, #f, #<void>?
For fun I once hacked ac.scm so that Arc's nil was represented internally by Scheme's '(). Inside of Arc everything was still the same: nil was still a symbol and so on. It even seemed to work, though I didn't test it very much.
Most interesting, thanks for these ideas and tips.
The original scheme version converted true to #t, false to #f, and null to (void) which turns into the #<void> literal. These break in arc because #f isn't nil, and #<void> can't even be read by the reader. So I think I have to take a comprehensive look at mzscheme's syntax at some point and make sure that anything scheme code can emit can be read and understood by the arc compiler.
Scheme libraries can be loaded dynamically from Arc using Anarki's $ or my ac-scheme. And, using the the "file" variant of require means that PLTCOLLECTS doesn't need to be set. Thus in a fresh clone of Anarki as of this morning, with nothing uncommented:
I decided that the json library perhaps shouldn't even be in anarki, so I've been experimenting with keeping it in my project sources. It seems useful to be able to mix arc with scheme in a project, especially for performance reasons.
Problem is, I can't do it without literally hardcoding a string literal:
($ (require (file "/path/to/json.ss")))
I've tried defining a helper function, but require must be at the top level.
I've tried saying (require (file (+ dir "json.ss"))), but it seems the inside of the require form isn't lisp but some toy, bizarro universe.
This whole experience is bringing home just how much I hate the mzscheme module system. Lisp is all flowing lines; require is a brick. Just one way to use it; once you release it all it can do is sink.
We have a number of "things" in the language: function names, a literal false value, user-defined symbols... the question is, does it make sense to use one representation for these, or to create several disjoint types?
Part of the power of Lisp is the ease of treating programs as data, to create, generate, and manipulate code with code, making Lisp as they say a "programmable programming language".
Representing all the tokens of the language as symbols simplifies manipulating code since we don't need separately recognize symbols vs. booleans.
Someone who feels that well defined, disjoint types are important to correctly reasoning about programs might prefer to have separate symbol and boolean types. But I would ask, "how does this help me actually write shorter programs?"
I don't have a problem with nil being a symbol, but speaking of writing shorter programs, I can't think of a single time I've actually needed to coerce nil/null to a string in any language except when printing it out for debugging purposes, and even then, in Arc:
(is "nil" (tostring pr.nil)) ; because 'nil is sent to MzScheme's 'display
So when...
(is "" string.nil) ; because 'coerce special-cases 'nil to do this
...I'm taken by surprise, and I'm not sure where this discrepancy would pay off in brevity.
I suppose string-returning procedures that need to be convenient in boolean contexts can return nil instead of "" and expect anyone who calls them to wrap them up as string:foo in string contexts... but even so, I can't think of any practical examples of when doing that would be shorter than having the procedures return only strings and wrapping them up as ~empty:foo in boolean contexts.
Is there some brilliant use of (is "" string.nil) I'm not thinking of?
This just gave me an idea... x!first translates into (x 'first), which for a list in arc3 expects an integer so that it can return the item in the list at that position. But this means that non-number arguments to calling a list is available: currently we get an error that the argument is not an integer. So if the argument isn't a number, calling a list could do an association list lookup instead. Thus:
arc> x!first
"First Name"
but
arc> x!0
(first "First Name")
as Arc does now.
The difference with your syntax is that with your syntax we could use integers as association list keys.
I'm interesting in this topic because in many places in my code where I'm using tables, I'm using tables for the convenience of the x!first syntax, not because I have a large number of keys and values that I'd need the efficiency of a hash table for.