Thursday, July 12, 2007

There is a new version of the regex engine that includes a few bugfixes.

Thursday, November 14, 2002

A variety of small improvements to clawk, also changed the semantics of {n} in regex to mean {n,n}.

Wednesday, November 13, 2002

Fixed a variety of bugs:

  • In regex, now interpret ']' immediately following the opening '[' as a shorthand for '\]'.
  • In regex, changed +special-class-names+ from a defconstant to a defparameter.
  • In clawk, removed dependence on the utils package.
  • In lexer, fixed deflexer macro to allow lexers to be compiled to a file.

Sunday, October 06, 2002

Posted a new version of regex that implements the Perl \d \D \w \W \s \S metasequences, as well as the egrep \< and \> metasequences. It also implements (?=...) lookahead and (?!...) negative lookahead. The code is in the file regexexp.tgz (in tputils.tgz or available by itself).

If there are no problems over the next few weeks, I'll upgrade this version from experimental to stable.

Friday, September 20, 2002

The Regex distribution now comes with the source for the GNU benchmark program.

The previous Lexer distribution used a very large floating point number in the example that was blowing up CMUCL. The new one uses a smaller one that should be ok.

Thursday, September 12, 2002

All utilities should now come with a BSD-style license.

Wednesday, September 11, 2002

Fixed a bug in regex that broke the deflexer system.

Saturday, September 07, 2002

Fixed another bug in patterns of the form "(abc)*" and "(abc)+".

Friday, July 26, 2002

Fixed bug in patterns of the form "(abc)*" and "(abc)+".

Tuesday, July 16, 2002

oops, spoke too soon. expand.lisp was indeed being used (by defregex, which was used by the speed test code). A new version that fixes this is now available.

Monday, July 15, 2002

Uploaded a new version of regex.tgz and tputils.tgz that doesn't have the spurious reference to "expand.lisp" in the system file. It wasn't being used by this release.

Sunday, July 14, 2002

Uploaded a new version of regex.tgz and tputils.tgz that doesn't use the "finally return" extensions to LOOP, which apparently CMUCL and ACL don't like very much.

Thursday, June 06, 2002

After skimming Wall's latest Apocalypse on regexes , I think the next version of the CLAWK regex engine will move towards supporting something close to this, although probably as a separate syntax. Several of the features he's talking about are supported by the intermediate representation and the backend compilers, but there's no good way to add them to the surface syntax and remain compatible with AWK regexes.

Besides the changes to the surface parsers, I will also be putting in support for a sexpr surface syntax.

Thursday, May 09, 2002

After further testing the new sexpr-generating backend, I think I'm gonna have to abandon it. While it works just fine, the compile times get incredibly long for complicated patterns. The problem seems to be in the Lispworks compiler itself, trying to chew on the large number of internal functions, and the sheer size of the code.

Given the relatively small improvements in matching speed, I think my next tack is to rewrite it to simply generate code to build a closure-based matcher.

Wednesday, May 08, 2002

It looks like the new back-end to my common-lisp regular expression engine is about finished. This one returns lisp s-exprs instead of closures, so it's suitable for use in macros like deflexer. It still needs a bit more brushing up before I'm ready to make it available, but it looks good so far. At the moment it doesn't seem to match any faster than the closure-based code. Using Lispworks 4.2, turning on inlining seems to improve the speed by 30% or so, but increases compile times exponentially and unacceptably. The old sexpr-generating regex compiler had the same unpleasant exploding-compile-time behavior with the Symbolics compiler, although there it gave an order-of-magnitude improvement in matching speed. At any rate, given the specialized nature of the code involved, I'm pretty sure I can hand-roll a customized inliner that that is linear-time.