Some Parse Tools

Introduction

I've written some scripts that leverage PARSE and they could be useful to others I'm writing this short article to draw attention to them.

They can be found on this site (each link - latest version) and at REBOL.org.

I start with my motivations for the script then end with some usage ideas.

PARSE-ANALYSIS.R

My first insight was to realise I could track parse rules by modifying them to call tracking code. In this way I could have some functions help me debug complex parse rules.

I used the concept of hooking into exsiting parse rules. This allows tracing of parse rules that you may have downloaded or even the parse rules used by REBOL's mezzanine functions (e.g REBOL's parse-xml function).

PARSE-ANALYSIS-VIEW.R

My second insight was to realise I could visualise how the rules break up text by displaying the textual input in a window and overlaying it with boxes and colour that represent the rules.

Parse-Analysis-View.r uses Parse-Analysis.r to tokenise the input and then creates an interactive display of the input and rules.

The input can be text or a REBOL block.

Could be handy for developing REBOL dialects.

Screenshot of Token Stepper highlighting an ABNF rule using abnf-parser.r:

Screenshot of Token Stepper

LOAD-PARSE-TREE.R

My third insight was to realise that parse rules describe the structure of a format implicitly and that each parse rule name (a word) represents a term in the structure.

The normal way to build output with parse rules is to add actions (parens) to the rules that build up the output structure, but I figured I should be able to create a function that would automatically generate an output structure given the input and the rules to parse it. This is a token tree

It allows an tree of the input to be built automatically.

[27 May 2015] Update. I've created an updated function called Get-Parse.

REBOL-TEXT-PARSER.R

I needed a way to map blocks to their text representations within a larger body of text, in order to add the block input mode to PARSE-ANALYSIS-VIEW.

ABNF-PARSER.R

The iCalendar RFC has a huge number of rules, as part of my work with that RFC, I decided I wanted a script that could convert ABNF into REBOL parserules. It might come in handy for other RFCs, it saves typing and silly mistakes in translation.

Now I've got all the rules of the iCalendar RFC I have to decide if I need them because there's a couple of ways to look at parsing iCalendar files. Just because you get the rules doesn't mean you should use them - some structures might be parsed in simpler ways.

RFC-PARSER.R

Well if I was going to have automatic conversion of ABNF to REBOL, I should be able to have a way to extract the ABNF from the RFC document in the first place...

PARSERULE-PARSER.R

The generated iCalendar rules were complex and so running some tests on these generated rules to identify problems might would be worthwhile. There's more to do in this script.

BNF-PARSER.R

An example script for parsing a simple BNF grammer.

Using theses scripts

The functions I use most out of these scripts are:

LOAD-PARSE-TREE offers interesting possibilities:

Filtering information:

With tokenise-parse, explain-parse, visualise-parse, etc, it is important to note that any rules you do track are not do not appear in the outputs.

There are two ways to prevent tracking of terms:

This is useful when you want to filter out terms that are not important to your application. On the other hand you do not want to filter out terms that are necessary to get a complete picture of the parsing.

For example, if your data is described by [a b c] and you filter out b - you will miss important information.

But if your data is descibed by [a b] where [b: [x y z]] and you filter out b OR you filter out x, y and z then there is no problem because your input is completely specified by the rules.

Caveat

These routines intoduce extra overhead during parsing via recursion and extra memory usage, therefore it is more likely an internal parse limit will be hit, more memory will be used, and the parse will be slower using these routines than just coding actions into the parse rules directly.

So if you're thinking of using these routines in a production server instead of just on an ad-hoc basis during development think and test carefully.

In future my intention is to rewrite some of these functions to be more memory efficient and to emit more useful output structures.