Arc Forumnew | comments | leaders | submitlogin
Random thoughts on the substring procedure (more-magic.net)
3 points by gus_massa 3880 days ago | 10 comments


3 points by akkartik 3880 days ago | link

I'm conflicted about this post.

1. Bounds checking errors are certainly in keeping with the spirit of scheme, no arguments there. If you care about distinguishing () from [], require if to have an 'else' branch, and consider it important to distinguish between #f and nil, then such substring behavior will fit right into that aesthetic milieu.

2. Incredibly, arc has never had a substring function, so I can't be sure what pg would do. My immediate reaction is that exceeding bounds would be permitted without error, but then sbcl seems to throw an error as well.

I looked at my toy language, which is more in the arc mould and more relaxed about the set of inputs acceptable to low-level primitives:

  ("abc" 0 10)
  024string.cc:21 no such end-index in string: "abc" 10
  => nil
My unit tests show that I never considered out-of-bounds indices. Now that I consider them I'm tempted to permit them silently.

3. I am not persuaded by the reasons in the post. "Do one thing" is subjective about what "one thing" means. The unix creators have spoken out against all the flags that standard commands have creepingly added: http://harmful.cat-v.org/cat-v, recently https://twitter.com/rob_pike/status/410830396444532736. This disagreement with the community at large suggests that the rule of thumb leaves some room for interpretation, making it useless to resolve arguments. The phrase "referentially transparent" is grotesquely misused. The list of useful facts didn't feel compelling. The connection to recent security holes in other languages feels tenuous; you could make the same argument for not permitting lists to contain elements of multiple types if you were a java programmer, and comparing bounds checking to implicit type coercion is just a strawman. The two possible interfaces for substring are just different, there's no magic in either. I'm not convinced that the author made a good-faith attempt to build the "simple" version atop the more complex one.

There's clearly a difference of opinion here, but escalating a simple question to such diverse subjects is just an invitation for flames. Just pointing at the other ways that scheme has a similar policy toward strict and immediate errors suffices to keep aesthetics from devolving into religion.

4. The flame war itself feels, with hindsight and just from reading the archives, unnecessary and a poor advertisement for this community. Responses like http://lists.nongnu.org/archive/html/chicken-hackers/2013-02... and http://lists.nongnu.org/archive/html/chicken-hackers/2013-02... made me wince. Perhaps that's partly a culture gap; scanning the archives I found that the noob is still around, so perhaps no major offense was caused. Regardless, this wasn't a flame war so much as the insiders getting together to flame a noob. I looked around for possible reasons why it might have happened, but the mailing list seems to consistently stay around 25-50 participants per year, not high enough volume to justify such frustration. With such a small inflow every newcomer is precious, and every bad experience is hugely significant. The subject of substring doesn't seem to have come up before either, to make them so impatient.

Arc's attitude, championed by aw, is to avoid generalities and look for concrete code examples where an alternative makes for greater concision. That's our aesthetic and it has served well to avoid friction without avoiding debate altogether.

-----

4 points by rocketnia 3879 days ago | link

"Incredibly, arc has never had a substring function, so I can't be sure what pg would do."

Here you go:

  arc> (cut "example" 2 7)
  "ample"
  arc> (cut "example" 2 10)
  Error: "string-ref: index 7 out of range [0, 6] for string: "example""
(Transcript from http://tryarc.org)

-----

2 points by rocketnia 3879 days ago | link

"Arc's attitude, championed by aw, is to avoid generalities and look for concrete code examples where an alternative makes for greater concision."

Yeah, this can be a pretty nice metric sometimes. :) This is probably one of the leading reasons to use weakly typed (or otherwise sloppy) versions of utilities.

-----

3 points by rocketnia 3878 days ago | link

I've been looking for Olin Shiver's "100% and 80% solutions" for a while, and this finally linked to it again!

Here's the link on its own: http://scsh.net/docu/post/sre.html

I keep thinking back to this... and begrudging all extant languages for being 80% solutions. :-p

-----

1 point by akkartik 3878 days ago | link

I'm not sure I'd ever seen that before, thanks.

"I am not saying that these three designs of mine represent the last word on the issues -- "100%" is really a bit of a misnomer, since no design is ever truly 100%. I would prefer to think of them as sufficiently good that they at least present low-water marks -- future systems, I'd hope, can at least build upon these designs."

But it's not clear that an 85% solution or 95% solution helps to solve the problem he identified earlier in the post:

"[The 80%] socket interface isn't general. It just covers the bits this particular hacker needed for his applications. So the next guy that comes along and needs a socket interface can't use this one. Not only does it lack coverage, but the deep structure wasn't thought out well enough to allow for quality extension. So he does his own 80% implementation. Five hackers later, five different, incompatible, ungeneral implementations had been built. No one can use each others code."

At best the number of different, incompatible systems will be lower over time. But there's no reason to believe that dissatisfaction with prior solutions will be more likely to build on them.

I'm curious if Conrad Barski was aware of Olin Shivers's regular expression library when he built http://www.lisperati.com/arc/regex.html and if so, if he built on the design. That would be a strong counter-example to my hypothesis.

-----

2 points by rocketnia 3878 days ago | link

"At best the number of different, incompatible systems will be lower over time. But there's no reason to believe that dissatisfaction with prior solutions will be more likely to build on them."

Could you clarify this? I think there might be a typo in here, but I don't know where.

-----

1 point by akkartik 3878 days ago | link

Say a 95% solution leaves 1 in 20 hackers dissatisfied, where an 80% solution leaves 1 in 5 hackers dissatisfied. The number of dissatisfied hackers goes down. But won't they still continue to react to their dissatisfaction by creating new libraries that don't build on prior attempts?

-----

2 points by rocketnia 3876 days ago | link

"The number of dissatisfied hackers goes down. But won't they still continue to react to their dissatisfaction by creating new libraries that don't build on prior attempts?"

Who's saying they won't? If you're willing to believe that 100% solutions will lead to fewer dissatisfied hackers and less duplication of effort, I can't tell what other claims you're trying to challenge here.

---

To bring in my personal goals, I'm interested in using programming to improve the expressiveness of communication, so that we have less severity in our petty misunderstandings, our terminology barriers, etc.

While I'm interested in reducing duplicated effort or saving people from dissatisfaction, I'm mainly interested in these things because they go hand-in-hand with establishing better communication. If hackers are more productive, they can communicate more; and if they communicate more, they can find satisfactory tools and avoid duplicated effort.

---

I think it'll be more interesting to look the "80% solution" idea with a scenario that combines more than one solution, so that it's not a simple feedback loop anymore.

Suppose we have two completely unrelated projects A and B, with A being a 95% solution and B being an 80% solution.

Now let's say some hackers have goals in the A+B domain, so that people might suggest for them to use project A and/or B as part of the solution.

       +A    -A
  +B   76%    4%  = 80%
  -B   19%    1%  = 20%
      ----  ----
       95%    5%
Only 76% are actually satisfied with A and satisfied with B at the same time. About 4/5 of the remaining hackers can still build on top of A, but they have to reinvent the functionality they hoped to get from B.[1]

Unfortunately, sometimes it's very difficult to combine the two dependencies A and B in a single program. If A and B involve different operating systems, different programming languages, etc., then the task of combining them may be even more difficult than reinventing the wheel. In these cases, the 76% quadrant of happy hackers must redistribute itself among the other quadrants.

How does it get distributed? Well, the user community itself is a feature of the system, so I'm assuming a 95% solution tends to have a more helpful community than the 80% solution by definition. We also haven't considered any reason that a helpful hacker would make a different technology choice than an unhelpful one, so I'll make the simplifying assumption that all communities have the same population-to-helpfulness ratio.[2] So I'd guess the 76% "+A+B" quadrant is necessarily redistributed to the 19% "+A-B" and 4% "-A+B" quadrants while roughly preserving that 19:4 ratio.

       +A    -A
  +B    0%   17%
  -B   82%    1%
This fuzzy reasoning suggests that if an 80% solution B is incompatible with a 95% solution A, approximately 83% of hackers won't use B to achieve their A+B goals. So we see, an 80% solution can leave the vast majority of hackers dissatisfied.

But 80% solutions must be acceptable in our software, or else all our software will be monolithic projects that take lots of resources to develop and maintain, and we won't be communicating very effectively.

Personally, one thing I take away from this is that the world could really use a near-100% approach to modularity, so that almost any two well-designed projects A and B can be used together. I've been working toward this.

(However, I also think this could backfire to some degree. Reusable software that doesn't bit-rot is in a way immortal, and certain immortal software may get in the way of or add noise to our communication.)

---

[1] While this 4/5 may look like it corresponds to the 80% solution, that's just a coincidence. This 4/5 roughly comes from the remaining 20% and 5% that each solution doesn't cover. We'd get a similar 4/5 result if we compared a 96% solution with a 99% solution.

[2] Some reasons this might be wrong: A big community invites people to use it as a symbol for inclusion-exclusion politics, even if that's non-sequitur with the original purpose of the community. Sometimes technology has lots of users not because its community is helpful or political, but because few people have the kind of expertise it would take to reinvent this wheel at all (not to mention how many people don't even know they have a choice).

-----

1 point by akkartik 3876 days ago | link

Hmm, my interpretation of the socket complaint (that I quoted above) was that it was a qualitative rather than a quantitative argument.

Ah, I think I've found a way to reframe what Olin Shivers is saying that helps me make sense of it. The whole 80% thing is confusing. The real problem is solutions that don't work for the very first problem you try them out at. That invariably sucks. Whereas a solution that is almost certain to support the initial use case of pretty much anyone is going to sucker in users, so that later when they run into its limitations it makes more sense to fix or enhance it rather than to shop around and then NIH ("not invented here", or reinvent) a new solution for themselves. In this way it hopefully continues to receive enhancements, thereby guaranteeing that still more users are inclined to switch to it.

:) I'm using some critical-sounding language there ("sucker") just to help highlight the core dynamic. No value judgement implied.

---

I think both of us are starting with the same goal of helping hackers communicate better, share code better. But we're choosing different approaches. Your approach of fixing modularity is compatible with the Olin Shivers model, but I think both of you assume that receiving enhancements is always a good thing, and things monotonically improve. I don't buy that more and more hackers adding code and enhancements always improves things in the current world. There's a sweet spot beyond which it starts to hurt and turns away users who go off trying to NIH their own thing all over again.

My (more speculative) idea instead is to ensure more people understand the internals so that the escape hatch isn't NIH'ing something from scratch but forking from a version that is reasonable to them and also more likely to be understandable to others. By 'softening' the interface I'm hoping to make it more sticky over really long periods, more than a couple of generations. I don't think modularity will work at such long timescales.

-----

2 points by akkartik 3871 days ago | link

I just came up with an example of my version of an "80% solution": http://www.reddit.com/r/vim/comments/22ixkq/navigate_around_...

It's not packaged up, all the internals are hanging out, some assembly is required. But it enumerates the scenarios I considered so that others can understand precisely what I was trying to achieve and hopefully build on it if their needs are concordant.

-----