Yacker User Guide

Abstract

Yacker parses ABNF and generates parser executables in a variety of languages.

Table of Contents

  1. Introduction
  2. HTML Demo
  3. Capabilities
    1. Input Formats
    2. Generated Languages
  4. Encoding
  5. Installation
    1. Requirements
    2. Command Line Interface
    3. CGI Interface
  6. Using the Generated Parsers
    1. Encapsulation
    2. Using Multiple Parsers
  7. Development
    1. Wish List
  8. See Also

Introduction

Yacker parses ABNF and generates parser executables in a variety of languages.

HTML Demo

An on-line HTML demonstration of Yacker is at http://www.w3.org/2005/01/yacker. Try it out.

Capabilities

Input Formats

Generated Languages

Encoding

Encoding support is not consistent over the languages. 4GLs tend to have regular expression libraries that work with wide characters. The C variants, however, rely on flex, which compiles a language specified in bytes. Thus, yacker needs to change characters (unicode codepoints) and character ranges to bytes. This process is described in Adding utf-8 Encoding to Lex. Ultimately, it means that the lex specification for the language includes the byte sequences that correspond to the utf-8 encoding of the input characters.

Installation

Requirements

It is generally best to use an operating system package, e.g. libparse-yapp-perl or perl-Parse-Yapp, if one is available. If not, see these CPAN module installation instructions.

Command Line Interface

$ mkdir /tmp/yack
$ cd /tmp/yack
$ cvs -d :pserver:anonymous@dev.w3.org:/sources/public login
enter the password:anonymous
$ cvs -Q -d :pserver:anonymous@dev.w3.org:/sources/public co perl/modules/W3C/{Grammar,Util/{Exception,YappDriver}.pm}
$ wget -O foo.bnf http://www.w3.org/2005/01/yacker/uploads/foo/bnf
$ PERL5LIB=perl/modules perl/modules/W3C/Grammar/bin/yacker --lang=perl -s -o foo foo.bnf
$ yapp foo.yp
$ perl foo.pm 
you type:word "asdf" num 100 end, carrige return, and control d.
you see: word "asdf" num 100 end and nod sagely.

The last step will create a new file called foo.pm which you can

CGI Interface

$ mkdir /tmp/yacker
$ cd /tmp/yacker
$ cvs -d :pserver:anonymous@dev.w3.org:/sources/public login
enter the password:anonymous
$ cvs -Q -d :pserver:anonymous@dev.w3.org:/sources/public co perl/modules/W3C/{Grammar,Util/{Exception,YappDriver,W3CDebugCGI,Filter,FlavorBuffer}.pm}
$ cd perl/modules/W3C/Grammar/bin/
$ echo -e \#\!"/bin/bash\\nexport PERL5LIB=../../..\\n./yacker" > yacker.cgi
$ chmod +x yacker.cgi
$ mkdir uploads
$ chmod go+w uploads

For more instructions on CVS, see the CVS instructions.

Using the Generated Parsers

The parsers are only designed to server as input stream validators. You can replace the semantic actions in the grammar file, or use the as-is. To use a parser that was generated, for example, on the on-line Yacker demo, you can download the parser. For a perl parser called someParser, that would look like:

wget http://www.w3.org/2005/01/yacker/uploads/someParser/someParser.pm
# or with curl:
curl http://www.w3.org/2005/01/yacker/uploads/someParser/someParser.pm > someParser.pm

These perl modules can be used to validate some testInput:

perl -MsomeParser -e test < testInput > /dev/null
# or, if you like to see lots of XML:
perl -MsomeParser -e test < testInput

More information is available in the embedded perldocs:

perldoc someParser.pm

Encapsulation

Building yacc parsers into a large application can be tricky because the generated c file has structures that reference yacc internals. The also have lots of virtual functions so you can't fake a constructor. Yacker C++ grammars include a frob class that is desinged to interface between the intimate parser and the rest of the app. This allows the grammar file to include whatever the knowledge of your application is required in the semantic actions, and the application to include only a simple interface header.

  1. Move the Frob class definition into a header file and include that header file.
  2. Remove main.
  3. include the Frob header file and the parser construction/invocation in the main application.

Using Multiple Parsers

Yacker-generated parsers are designed to be used with each other and other parsers. The Makefile uses the -p <name> argument to build Flex/Bison source.

  1. Remove the main from your grammar files.
  2. Build the object files.
  3. Create a main that references the relevent grammar files, ala
    #include <stdio.h>
    #include "SPARQL/MIN/SPARQLFrob.h"
    #include "test/testFrob.h"
    
    int main (int argc, char **argv)
    {
      int result;
      char* name;
      if (argc > 1) {
        name = "SPARQL";
        SPARQLFrob spfrob;
        result = spfrob.parse();
      } else {
        name = "test";
        testFrob frob;
        result = frob.parse();
      }
      printf("%d Parsing %s result: %s.\n", argc, name, result ? "Error" : "OK");
      return result;
    };
    
    extern "C" int yywrap()
    {
      return(1);
    }
    
  4. Complile and link to interface frobs of the appropriate object files:
    main : main.cc SPARQL/MIN/SPARQLParser.o SPARQL/MIN/SPARQLScanner.o SPARQL/MIN/SPARQL.h test/testParser.o test/testScanner.o test/test.h
    	g++ -Wno-deprecated -g -o $@ main.cc SPARQL/MIN/SPARQLParser.o SPARQL/MIN/SPARQLScanner.o test/testParser.o test/testScanner.o

Development

Below is a list of things that would be nice to do to the Yacker. Help is enthusiastically rewarded and new wish list items are grudgingly accepted.

Wish List

Multiple grammars
BNF grammars are thick on the ground. It would be nice to have support for ABNF (with '|'s or '/'s for OR), EBNF, regular BNF, and yacc. Maybe more?
Cooler compile tree
The grammar objects were kind of hastily hacked. The SchemaValidatorDataTypes compile tree is better-engineered and would be interesting to apply to a non-XML data model. I think one can just ignore attributes and make some more of the functions modal between XML and flat text.
Repository format
I initially was just trying to test grammars against one compiler. At present, yacker supports perl, C, C++, and a couple forms in python. All this crap ends up in one directory and the binaries overrite each other and it's all a bit confusing.
Language tests
Associated with each grammar could be a set of strings that are tested in of the compiled languages (perl, C++...). This also impacts the repository format.
Ports to other languages
We have a program for compiling grammars in various languages which some folks would prefer to work on in pyhon or Java.

See Also


Eric Prud'hommeaux, W3C <eric@w3.org>
$Date: 2012-11-30 14:37:43 $