Regular Expressions

Defines

nelem(x)

Macro for the number of elements of a statically allocated array.

Typedefs

typedef Regex

This typedef is for convenience. See the documentation for struct Regex.

Functions

Regex reread(char * str)

Read in a program from a string. This takes the “assembly like” representation and turns it into compiled instructions. Every instruction must be on a single line, and spaces are used as delimiters. Also, labels must be on their own lines. Here is an example of code in this format:

save 0 L1: char a split L1 L2 L2: match

Return
The bytecode of the program.
Parameters
  • str: The text representation of the program.

Regex refread(FILE * f)

Reads in a program from a file instead of a string.

Return
The regex bytecode.
Parameters
  • f: File to read from.

void rewrite(Regex r, FILE * f)

Writes a program to the same format as the reread() functions do.

Parameters
  • r: The regex to write.
  • f: The file to write to.

void refree(Regex r)

Free a Regex object. You must do this when you’re done with it.

Parameters

Regex recomp(const char * regex)

Compile a regular expression!

Return
The compiled bytecode for the regex.
Parameters
  • regex: The text form of the regular expression.

Regex recompw(const wchar_t * regex)

Compile a wide regular expression!

Return
The compiled bytecode for the regex.
Parameters
  • regex: The text form of the regular expression.

ssize_t reexec(Regex r, const char * input, size_t ** saved)

Execute a regex on a string.

Return
Length of match, or -1 if no match.
Parameters
  • r: Compiled regular expression bytecode to execute.
  • input: Text to use as input.
  • saved: Out pointer for captured indices.

ssize_t reexecw(Regex r, const wchar_t * input, size_t ** saved)

Execute a regex on a string.

Return
Length of match, or -1 if no match.
Parameters
  • r: Compiled regular expression bytecode to execute.
  • input: Text to use as input.
  • saved: Out pointer for captured indices.

size_t renumsaves(Regex r)

Return the number of saved index slots required by a regex.

Return
Number of slots.
Parameters
  • r: The regular expression bytecode.

Captures recap(const char * s, const size_t * l, size_t n)

Convert a string and a capture list into a list of strings.

This copies each capture into a newly allocated string, and returns them all in a newly allocated array of strings. These things need to be freed when you’re done with them. You can either manually free each string and then the array, or you can use recapfree() to do this for you.

Return
A new Capture object.
Parameters
  • s: String to get strings from.
  • l: List of captures returned from reexec().
  • n: Number of saves - use renumsaves() if you don’t know.

void recapfree(Captures c)

Free a capture list from recap()

Since the array and strings were all newly allocated by recap(), they need to be cleaned up. This function does the cleanup. It’s nothing complicated - you can do it yourself, but it’s convenient to have this to do it for you. Note that if you want to keep one of the strings from the capture list, you’ll have to set its entry in the array to NULL (so free() does nothing), or else do manual cleanup.

Parameters

WCaptures recapw(const wchar_t * s, const size_t * l, size_t n)

Convert a string and a capture list into a list of strings.

This copies each capture into a newly allocated string, and returns them all in a newly allocated array of strings. These things need to be freed when you’re done with them. You can either manually free each string and then the array, or you can use recapfree() to do this for you.

Return
A new Capture object.
Parameters
  • s: String to get strings from.
  • l: List of captures returned from reexec().
  • n: Number of saves - use renumsaves() if you don’t know.

void recapwfree(WCaptures c)

Free a capture list from recap()

Since the array and strings were all newly allocated by recap(), they need to be cleaned up. This function does the cleanup. It’s nothing complicated - you can do it yourself, but it’s convenient to have this to do it for you. Note that if you want to keep one of the strings from the capture list, you’ll have to set its entry in the array to NULL (so free() does nothing), or else do manual cleanup.

Parameters

struct Regex
#include <re.h>

Represents a compiled regular expression. This consists of a pointer to the instructions, and a counter for how many instructions there are. This is not a large struct, so you shouldn’t pass around pointers to it. You might as well just pass around copies of this struct.

Public Members

size_t n

Number of instructions.

Instr* i

Pointer to instruction buffer.

struct Captures
#include <re.h>

A convenience data structure for getting copies of captured strings.

The recomp() function returns an array of start and end points for captured strings, which is technically all you need. Functionally, though, you would like a slightly more convenient way to access your captures. So, you can use the recap() to convert the string and capture list to a list of freshly allocated strings. Be sure to call recapfree() on the capture list when you’re done (or manually clean up).

Public Members

size_t n

The number of captured strings.

char** cap

An array of length n captured strings. Each one is independently allocated, so you’ll need to free them when you’re done.

struct WCaptures
#include <re.h>

A convenience data structure for getting copies of captured wide strings. This is just a wide version of Captures.

Public Members

size_t n

The number of captured strings.

wchar_t** cap

An array of length n captured strings. Each one is independently allocated, so you’ll need to free them when you’re done.