Warning:
This wiki has been archived and is now read-only.

OMM Binary Format Draft

From Object Memory Modeling Incubator Group Wiki
Jump to: navigation, search

Proposal: OMM Binary Format

Two requirements for the binary format have been identified by the members of the XG for the binary representation of an OMM

  • compactness
  • random access

The latter is not addressed by existing, generic binary representations for XML, such as EXI. Therefore, we propose a binary standard representation for the OMM XML alongside the usual XML specification. The binary representation standard only covers a core set of OMM data.

The binary representation is designed so that it can be easiliy generated using the Thrift framework. This facilitates implementation of generators and parsers for the binary representation.

As an example, the binary representation for OMM_ID_BLOCK is defined in Thrift IDL as follows:

enum id_type {
    NONE = 0,
    URI = 1,
    RFID = 2,
    gid96 = 3
}            
struct id {
    1: required id_type type;
    2: required byte length;
    3: required string data;
}
struct omm_id_block {
    1: required byte n;
    2: id primary;
    3: list<id> other;
}


With the above definition and the following example data (SemProM ID Block, not in OMM format):

  • primary id type: RFID
  • primary id value: myotheruri_is_much_longer
  • secondary id 1 type: RFID
  • secondary id 1 value: some_rfid
  • secondary id 2 type: gid96
  • secondary id 2 value: one_more_id

this is the generated binary representation

03 00 01 03 0C 00 02 08 00 01 00 00 00 02 03 00 02 00 0B 00 
03 00 01 03 0C 00 02 08 00 01 00 00 00 02 03 00 02 00 0B 00 
03 00 00 00 19 6D 79 6F 74 68 65 72 75 72 69 5F 69 73 5F 6D 
75 63 68 5F 6C 6F 6E 67 65 72 00 0F 00 03 00 00 00 02 08 00 
01 00 00 00 02 03 00 02 00 0B 00 03 00 00 00 09 73 6F 6D 65 
5F 72 66 69 64 00 08 00 01 00 00 00 03 03 00 02 00 0B 00 03 
00 00 00 0B 6F 6E 65 5F 6D 6F 72 65 5F 69 64 00 00


Using Thrift (or Protocol buffer or any other binary serialization framework) greatly facilitates reference implementation and extensibility. However, the flexibility of the generated binary representation is limited to what their type system provides. Fixed length strings, e.g., are not foreseen and thus it is not possible to make the second id start at a fixed byte addrees. This can be worked around by defining the string as, e.g., 256 bytes. Then the application would have to do the mapping from the byte buffer to String.

More flexible notations, such as ASN.1, are difficult to implement and use. A completely hand-crafted serializer/deserializer has all the flexibility one could wish for, however, with the known problems of adoption, standardization etc.