News Story

Some program source code (for parsing)

Parsing and Tokenizing Libraries

Published at 7:26pm on 29 Jul 2007

We've added new open source libraries to our REALbasic source page, including classes for tokenizing and parsing complex formatted strings such as programming languages and structured data.

The Parser and Tokenizer classes we've added to the libraries section of our open source page offer REALbasic developers the power to add all kinds of new and exciting functionality to their projects.

The Tokenizer class provides sophisticated string chopping capability, far in advance of the built-in Split and NthField operations available in REALbasic. The Tokenizer lets you define tokens using regular expressions for both pattern matching and replacement. The event-based model allows for maximum flexibility and performance when handling large data sets. Some simple example implementations are also included for those who do not need so much flexibility.

For even more powerful string handling, the Parser class can take the token stream produced by the Tokenizer and perform semantic processing, collapsing the tokens into syntactic structures of your own design. Developers could use this as the basis of a custom XML parser or validator for example, or even a scripting language interpreter.

Ever wanted to create your own programming language in REALbasic? With these classes you can, and we're giving them away completely FREE and open source. If you do find our open source tools useful, please consider making a small donation to support future development.

Like all our open source offerings to date, the Parser and Tokenizer libraries are protected by the PKWARE zlib/pnglib licence. Similar to the BSD licence, this places very few restrictions on how the software is used, and does not require that any derived works must themselves be open source or non-commercial.