This code is released into the public domain; you may do anything you wish with it.
The regular-expression engine here is inspired by the one in the book “Software Tools in Pascal” by B.W. Kernighan and P.J. Plauger, Addison-Wesley, 1981. (ISBN 0-201-10342-7). It was originally written in PDP-11 assembly and Pascal (in 1984) to run on a Terak 8510a running UCSD Pascal; the Pascal files were then converted to a homebrew language called P2 for use on a Terak 8510a running a much-modified version of UCSD Pascal. In 1985, it was rewritten in 68000 assembly language and C for use on the Atari ST; modifications, tweaks, and enhancements have been applied over the years since then. The current version will probably run on any 680x0 machine that has a C compiler and a motorola-format 680x0 assembler on it.
Installing the regex library on your system is fairly simple;
move libregex.a
into your library directory and move regex.h
into your include-file directory.
char *compile(char *pattern) | - compile regular expressions |
char *execute(char *start, char *end) | - match strings with compiled regular expressions |
extern char *RElastp; | - pointer at last char matched |
extern char *REargs[]; | - arguments array |
This is a simple regular-expression compiler/interpreter inspired
by the one in Kerningham & Plauger’s Software Tools in
Pascal. To use it, you feed compile()
a regular expression,
then execute()
each text object you want to do searches on.
Compile()
understands regular expressions containing the
following tokens:
$
\n
)^
\n
).
[
xxx]
[abc]
,
for example, matches a single a, b, or c. [a-z]
matches any
lower- case alphabetic (to match a literal -, it must be the
first or last character in the set. If the first character in
the set is a ^
, it means match anything except characters found
in the set; the only way you can match a ]
character is to have
it be the first character in the set (to match ]
and -
, you’d
need to do []-]
)*
^
,
$
, *
, or +
, though.+
\<
\>
\(
\)
\
digit\(
and \)
)\
\*
matches a *
.){
{
minumum,
maximum}
and matches the previous token at
least minumum times but not more than maximum times. (For
example, to match the previous token zero or more times, use
{0,0}. Note that this is the *
token – for +
, it’s {1,0} -
using 0 for the maximum means it can match an infinite number
of times.)If compile()
successfully compiles the string you give it, it
will return a null pointer; otherwise it will return a string
describing the reason that it couldn’t compile the expression.
The compiled pattern can be found in the string _REpattern
, which
is a pascal-style string (1 word length, then arbitrary data out
to that length.)
Execute()
tries to match the last pattern fed to compile()
against whatever text string you feed it. It doesn’t care what
the format of that string is, just that you give it a pointer to
the start and end of the string. If it finds a match to the pattern
in the string, it returns the start of the pattern; otherwise it
returns a null pointer.
On a successful match, execute() fills in a couple of globals -= it fills in RElastp with a pointer at the last character matched and REargs with pointers to the start and end of each argument contained in the pattern.
To match “hello” on stdin:
char *msg;
char line[200];
char *args[20]; /* 10 arguments, 2 pointers each */
char *p;
msg = compile("hello");
if (msg != (char *)0) {
fprintf(stderr, "RE: %s\n", msg);
exit(0);
}
while (gets(line))
if (p=execute(line, line+strlen(line)-1))
puts(line);
jessica l. parsons (Orc), 25-Mar-1991