Arc Forumnew | comments | leaders | submitlogin
Poor Man's Regular Expressions
3 points by drcode 6008 days ago | 3 comments
Until a decent parser for arc becomes available (whether regexp-based or not) I've created the following regexp parser, which simply calls out to perl at the command line (tested on linux only)

Here's the code:

  (def mmatch (regex fmt fname)
    (tostring:system:string "perl -0ne 'print " (esc fmt) " while (/" regex "/igs)' " fname))

  (def match (regex fname)
    (mmatch regex "$1\n" fname))
That's all there is!

These functions support the full perl regexp syntax :-)

Example #1- Let's say you have a file with lots of junk and want to extract anything that looks like an email address. Use the "match" function, which finds a single subexpression multiple times:

  arc> (prn:match "([a-zA-Z0-9]+@[a-zA-Z0-9]+\\.[a-zA-Z0-9]+)" "~/temp/foo.txt")
  bob@gmail.com
  lauren@yahoo.com
Example #2- If you want to break each email down into multiple subexpressions at the same time, simply call the "mmatch" function. It requires an extra parameter that controls how we want the output to be formatted. In this example, we want the three parts of the email address broken out as items in a list:

  arc> (readall:mmatch "([a-zA-Z0-9]+)@([a-zA-Z0-9]+)\\.([a-zA-Z0-9]+)" "($1 $2 $3)\n" "~/temp/foo.txt")
  ((bob gmail com) (lauren yahoo com))


2 points by almkglor 6008 days ago | link

On Anarki there's the 're function which matches regular expressions. It uses the underlying mzscheme's regular expressions.

Also, we do have a decent parser library, treeparse: it's just not a very concise one. And I should really, really finish the treeparse tutorial I started.

-----

2 points by drcode 6008 days ago | link

thanks for the info- in that case, these functions aren't so useful (at least for anarki users)

-----

2 points by kostas 5999 days ago | link

I haven't played with Arc in while, but I have these two lines added to my ac.scm to add mzscheme's underlying regexp functionality to Arc:

  (xdef 'regexp pregexp)
  (xdef 'r-match (lambda (x y) (ar-nill (pregexp-match x y))))
Look up pregexp and pregexp-match in the mzscheme documentation for details on how they are called.

-----