W3C Lib

Parser Guide

This section describes how to use CSParse or it's children, CSParse_.

Parser States

The parser works through a series of TargetObjects and StateTokens. The TargetObject describes the current data type being filled and the StateTokens describe what tokens are permitted at what SubStates within that TargetObject. For instance, while reading a label service, the name of the service must come before any options. Therefor, the substate that matches the service name advances to the next substate, which will accept options.

One of the parameters when creating a parser is a callback function that is called whenever the parser changes TargetObject. This allows the calling program to see each data element as it is created. See more info in CSParse.html.

Parsing Example

This example is taken from the Label parser. StateTokens are generated from the BNF:
labellist :: '(' 'PICS-1.0' service-info+ ')'
service-info :: 'error' '(no-ratings' explanation* ')'
              | serviceID service-error | serviceID option* labelword label*
From this, we generate two TargetObjects, LabelList_targetObject and ServiceInfo_targetObject.
StateToken_t LabelList_stateTokens[] = { 
    /* A: fresh LabelList
       C: expect end */
     {       "open", SubState_N,    Punct_ALL,              0,        0, 0, 0,   &LabelList_targetObject, SubState_A, Command_MATCHANY|Command_OPEN|Command_CHAIN, 0},
     {"get version", SubState_A,  Punct_WHITE, &LabelList_getVersion, 0, 0, 0, &ServiceInfo_targetObject, SubState_N, 0, 0},
     {"end of list", SubState_C, Punct_RPAREN,              0,        0, 0, 0,   &LabelList_targetObject, SubState_A, Command_MATCHANY|Command_CLOSE, 0}
    };
TargetObject_t LabelList_targetObject = {"LabelList", &LabelList_open, &LabelList_close, &LabelList_destroy, LabelList_stateTokens, raysize(LabelList_stateTokens), CSLLSC_LIST};
StateToken_t ServiceInfo_stateTokens[] = {
    /* A: fresh ServiceInfo
       B: has service id
       C: needs option value
       D: call from Awkward or NoRat to close 
       E: call from Awkward to re-enter */
     {             "open", SubState_N,    Punct_ALL,                0,        0,   0, 0, &ServiceInfo_targetObject, SubState_A, Command_MATCHANY|Command_OPEN|Command_CHAIN, 0},
     {     "error w/o id", SubState_A, Punct_LPAREN,                0, "error",    0, 0, &ServiceNoRat_targetObject, SubState_N, 0, 0},
     {       "service id", SubState_A,  Punct_WHITE, &ServiceInfo_getServiceId, 0, 0, 0,  &ServiceInfo_targetObject, SubState_B, 0, 0},
     .
     .
     .
     {            "close", SubState_D, Punct_ALL,                   0,        0,   0, 0, &LabelList_targetObject, SubState_C, Command_MATCHANY|Command_CLOSE|Command_CHAIN, 0},
     {         "re-enter", SubState_E, Punct_ALL,                   0,        0,   0, 0, &ServiceInfo_targetObject, SubState_N, Command_MATCHANY|Command_CLOSE|Command_CHAIN, 0}
    };
TargetObject_t ServiceInfo_targetObject = {"ServiceInfo", ServiceInfo_open, &ServiceInfo_close, &ServiceInfo_destroy, ServiceInfo_stateTokens, raysize(ServiceInfo_stateTokens), CSLLSC_SERVICE};

LabelList_targetObject will start out needing to have the data structure created and initialized. The "open" line matches any punctuation (Punct_ALL) and any string (Command_MATCHANY). It will call the open command (Command_OPEN), and pass its input (Command_CHAIN) to SubState_A.

SubState_A has only one possible match, "get version". This match is checked by LabelList_getVersion and, if it checks out OK, we proceed to ServiceInfo_targetObject SubState_N. This promotes to SubState_A, as above. From the BNF section on service-info, we see that it must start with 'error' or a serviceID. These are checked by the SubStates "error w/o id" and "service id".

If the token is the string "error", followed by a left paren, the next state is ServiceNoRat_targetObject, not included in the example. If "error" does not match, the parser tries the next StateToken in SubState_A, which is "service id". If ServiceInfo_getServiceId approves of the input, The serviceId is read and the parser proceeds on to SubState_B.

finishing

When the last rating is read, and a close paren is found, the paren is passed through all the open TargetObjects, SingleLabel, Label, ServiceInfo, and LabelList. This is accomplished with SubStates that are specificly for closing out the TargetObjects. This example shows how the ServiceInfo_targetObject SubState_D chains the close paren to LabelList_targetObject SubState_C.

Produit du Parsing

The parsable objects, PICS Labels, machine-readable service descriptions, and, users, all create an object that contains all the information needed to parse and iterate through the data structures. This is container for all the TargetObjects created in the parsing process. These containers are implemented in the following files:

Plans

note

The first field in both the TargetObject and the StateToken is a char * called the note. I have only used this field for debugging; it makes it very easy to track which state you are in and where you are going. I have an idea, though, that it may be useful for giving very usefull error messages that specify what was expected next. At such a time, it may be worth changing the note on the Awkward_TargetObject to something like "after rating set" or something else palletable.

Destroy

All the TargetObjects have a Destroy method which allows them to proceed after an error. I'm not sure when this will be usefull, but the mechanism is there, and only nominally tested.


Eric Prud'hommeaux, Feb 1996