Warning:
This wiki has been archived and is now read-only.

Syntax

From Media Fragments Working Group Wiki
Jump to: navigation, search

This page discussed the current (evolving) syntax for representing Media Fragments URI.

Query versus Fragment

Character Pro Cons
# The context of the fragment is kept. The fragment is not sent to the server. We need to encode the content of the fragment through means outside the URI, e.g. new HTTP header.
 ? The fragment is sent to the server as a query. A new resource is created. The context of the fragment is lost.

Decisions

  • At the moment, the preference is on the fragment (#) symbol, because it allows to keep the relationship to the main resource and it also fits better with a previous time offset scheme available in RTP/RTSP.
  • The URI fragment (or query) symbol will be followed by a set of name-value parameters:
  • The fragment dimensions considered are: temporal, spatial, track and name.
  • Every fragment dimension should only be specified once as a parameter - only the first one is to be evaluated.
  • We think our projection on the time/space/track axes are commutative, and therefore the parameters can be applied in any order.

Proposed ideas but rejected

  • Include a special character after the URI fragment symbol that is not allowed as an identifier in HTML4/5 but allowed in the URI specification and will clearly differentiate this fragment addressing from web page fragment addressing. The group thinks it burdens the syntax for little added-value.
  • The values of the dimensions still have to be qualified - e.g. time could be discontinuous section, region can only be one square, track can be a set of values, and name can only be one identifier.

Proposed Syntax

This is the result of the brainstorming session we had during our 2nd face-to-face meeting in Ghent (BE), 9-10 of December 2008.

General

The grammar below is written in a pseudo-EBNF syntax.

  • 4 dimensions: time, space, track, name
  • combination of them: name XOR (time, space, track)
    • the reason is that a name has no particular meaning, it can be for example already a specification of a region of an image, or sequence in time, or any combination
  • order is not relevant, the processing will always be:
    1. time or track selection, dealt with at the container level
    2. spatial clipping if any, dealt with at the codec level (i.e. for a particular track!)
  • name often (always?) refers to a temporal selection
  • first class separator is '&' and second class separator is ',' (see WG resolution)
  • a extreme dumb case would be: select all video tracks of a media resource and then do a spatial clipping, the result of such operation would be unspecified; another extreme case would be to select a temporal interval of a still image, resulting in a still image

Dimensions

  1. Time:
    • t = timerange
    • timerange : ["npt" " : "] [clocktime] " , " [clocktime] | format ":" [frametime] "," [frametime] | "clock" ":" [utctime] "," [utctime]
    • clocktime : DIGIT+ ["." DIGIT+ ] ["s"] | DIGIT+ ":" 2DIGIT ":" 2DIGIT ["." DIGIT+ ]
    • frametime : DIGIT+ ":" 2DIGIT ":" 2DIGIT [":" 2DIGIT ["." 2DIGIT] ]
    • utctime : 8DIGIT "T" 6DIGIT [ "." 2DIGIT ] "Z"
    • format : "smpte" | "smpte-25" | "smpte-30" | "smpte-30-drop"
    • npt is the default format, and can be specified as seconds or hh:mm:ss, with optional (dot-separated) fractional seconds
    • other formats are frame based, and always specified as hh:mm:ss, with optional (colon-separated) frame number.
    • the intention is that this follows the SMIL and RTSP spec (with the exception of using ":" to separate format name, where those specs use "=").
  2. Space:
    • xywh = [unit " : "] int " , " int " , " int " , " int
    • unit = pixel | %
    • pixel is the default unit
    • origin (0,0) is always the top-left of the screen
    • aspect = int":"int
    • aspect defines a crop region centered in the center of the current image with the maximum size possible respecting that aspect ratio.
    • rationale for not having cm, inch, or pt since we believe media format will not often produce a mapping between these units and pixels
  3. Track:
    • track = " ' " UTF-8 string " ' " (%-escaped or not?)
    • see #Character_encoding_of_track_names_and_named_fragments_in_container_formats for more information on character encodings used within container formats
    • link with the MAWG to agree on a ROE-like syntax for describing tracks within a media resource (could use the XMP syntax)
    • no pre-definition of track names such as audio, video, subtitles, because it can be ambiguous to select the track depending on the container format
  4. Name:
    • id = " ' " UTF-8 string " ' " (%-escaped or not?)
    • see also RFC 3986 for the %-escaped version, or RFC 3987 for the non %-escaped version
    • we should use simple quotes, as double quotes need to be escaped

Examples

Some valid URI fragments:

Discussion

  • ISSUE 1: "Combining Media Fragment URI with other time-clipping methods". More generally, how to cover the cases where the media fragment is i) encompassing, ii) embedding, iii) disjoint or iv) partially overlapping the boundaries of the other time-clipping method? Specifying a time-clipping method, for example in SMIL, is relative to the (timeline of the) resource. Therefore, if one specifies:
<video clipBegin="5s" clipEnd="15s" src="http://www.example.com/video.mov#t=20,30"/>
  • Two possibilities:
    • EITHER the media fragment is regarded as out-of-context and the clipping method MUST be done relatively to the media fragment but bound to the media fragment, i.e. the UA plays the video segment between the seconds 25 (=max[20,20+5]) and 30 (=min[30,20+15]).
    • OR the media fragment is regarded as in-context and it depends on the application what the best solution is for the UA to do: some UAs may, for example, provide a timeline that encompasses the whole document, not only the time clipping. Implementors SHOULD follow semantics similar in spirit to the previous bullet, but adapted to their situation.
  • It is a good idea to not mix units, i.e. cm with pixels for defining a spatial region or npt with smpte values for defining a temporal interval
  • ACTION-27: Units recommended to be used for the spatial dimension:
    • pixels, percentages (as percentage of width and height)
    • recommendation to NOT have cm/in/pt since the media format provides rarely the mapping between pixels and these units
  • ACTION-28: Units recommended to be used for the temporal dimension:
    • npt, smpte, smpte-25, smpte-30, smpte-30-drop
  • ACTION-84, ACTION-97: Jean Pierre Evain suggested during the Barcelona F2F meeting to investigate whether editUnit could be used as another dimension for addressing media fragment. The WG thinks this mechanism should not be supported, for example because editUnits are different for different tracks, see the full explanation.

Formal Grammar

  • ACTION-41: Yves to propose the formal grammar in ABNF for the syntax
segment       = mediasegment / *( pchar / "/" / "?" ) ; augmented fragment 
                                                     ; definition taken from 
                                                     ; rfc3986
npt-sec       = 1*DIGIT [ "." *DIGIT ]                     ; definitions taken
npt-hhmmss    = npt-hh ":" npt-mm ":" npt-ss [ "." *DIGIT] ; from rfc2326
npt-hh       =   1*DIGIT     ; any positive number
npt-mm       =   2DIGIT      ; 0-59
npt-ss       =   2DIGIT      ; 0-59
;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Media Segment ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;
mediasegment  = namesegment / axissegment
axissegment   = ( timesegment / spacesegment / tracksegment ) 
               *( "&" ( timesegment / spacesegment / tracksegment ) )
; 
; note that this does not capture the restriction to one kind of fragment 
; in the axisfragment definition, unless we list explicitely the 14 cases.
;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Time Segment ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;
timesegment   = timeprefix "=" timeparam
timeprefix    = %x74                                      ; "t"
timeparam     = npttimedef / smptetimedef / clocktimedef
npttimedef    = [ deftimeformat ":"] ( npttime  [ "," npttime ] ) / ( "," npttime )
smptetimedef  = smpteformat ":"( frametime [ "," frametime ] ) / ( "," frametime )
clocktimedef  = clockformat ":"( clocktime [ "," clocktime ] ) / ( "," clocktime )
deftimeformat = %x6E.70.74                                ; "npt"
smpteformat   = %x73.6D.70.74.65                          ; "smpte"
               / %x73.6D.70.74.65.2D.32.35                ; "smpte-25"
               / %x73.6D.70.74.65.2D.33.30                ; "smpte-30"
               / %x73.6D.70.74.65.2D.33.30.2D.64.72.6F.70 ; "smpte-30-drop"
clockformat   = %x63.6C.6F.63.6B                          ; "clock"
timeunit      = %x73                                      ; "s"
npttime       = npt-sec / npt-hhmmss
frametime     = 1*DIGIT ":" 2DIGIT ":" 2DIGIT [ ":" 2DIGIT [ "." 2DIGIT ] ]
clocktime     = (datetime / walltime / date)
datetime      = date "T" walltime
date          = years "-" months "-" days
walltime      = (HHMM / HHMMSS) tzd
HHMM          = hours24 ":" minutes
HHMMSS        = hours24 ":" minutes ":" seconds ["." fraction]
years         = 4DIGIT
months        = 2DIGIT   ; range from 01 to 12
days          = 2DIGIT   ; range from 01 to 31
hours24       = 2DIGIT   ; range from 00 to 23
minutes       = 2DIGIT   ; range from 00 to 59
seconds       = 2DIGIT   ; range from 00 to 59
fraction      = 1*DIGIT
tzd           = "Z" / (("+" / "-") hours24 ":" minutes )
;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Space Segment ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;
spacesegment  = xywhdef / aspectdef
xywhdef       = xywhprefix   "=" xywhparam
aspectdef     = aspectprefix "=" aspectparam
xywhprefix    = %x78.79.77.68                             ; "xywh"
aspectprefix  = %x61.73.70.65.63.74                       ; "aspect"
xywhparam     = [ xywhunit ":" ] 1*DIGIT "," 1*DIGIT "," 1*DIGIT "," 1*DIGIT
xywhunit      = %x70.69.78.65.6C                          ; "pixel"
              / %x70.65.72.63.65.6E.74                    ; "percent"
aspectparam   = 1*DIGIT ":" 1*DIGIT
;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Track Segment ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;
tracksegment  = trackprefix "=" trackparam
trackprefix   = %x74.72.61.63.6B                          ; "track"
trackparam    = utf8string
;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Name Segment ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;
namesegment   = nameprefix "=" nameparam
nameprefix    = %x69.64                                   ; "id"
nameparam     = utf8string
;
;;;;;;;;;;;;;;;;;;;;;;;;;;;; Imported definitions ;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;
DIGIT         = <DIGIT, defined in rfc4234#3.4>
pchar         = <pchar, defined in rfc3986>
unreserved    = <unreserved, defined in rfc3986> 
pct-encoded   = <pct-encoded, defined in rfc3986>
utf8string    = "'" *( unreserved / pct-encoded ) "'"     ; utf-8 character
                                                          ; encoded URI-style
  • Discussion 1: syntax for writing timestamps?
    • Jack has first elaborated an RTSP-style syntax
    • Conrad has proposedto relax this syntax, making the hours optional in case frames are omitted, and hours mandatory in case frames are present.
    • Jack also proposed to use another separator than colon but rather the characters 's', 'm', 'h', closer thus to the Google syntax
    • Resolution: Include in the WD a clear paragraph stating we request feedback from the community on this issue!
  • Discussion 2: errors in the current syntax
    • Include in the WD a paragraph stating that we disallow single quote (') in a utf8string, see Jack message
    • Include in the WD a paragraph stating that we disallow sub-delims (e.g. &) in a utf8string, see Jack message
    • Spell 'percent' instead of the '%' character, see Jack message (DONE)

point 1 and 2 of Discussion 2 done by changing the production of utf8string to use only unreserved and pct-encoded from rfc3986

Character encoding of track names and named fragments in container formats

Container format Track name Named fragment
MP4
  • track number is used for identification
  • no track name
  • Movie fragments are identified by a sequence number
  • MPEG-4 Streaming Text (based on 3GP TimedText): UTF-8 or UTF-16
MOV
  • track number is used for identification
  • no track name
  • QuickTimeText: character encoding is dependent on the textencoding descriptor
3GP
  • track number is used for identification
  • no track name
  • TimedText: UTF-8 or UTF-16
MPEG-21 File Format
  • track number is used for identification
  • no track name
  • MPEG-21 DID: no restrictions on the character encoding used within statements
Ogg
  • used for identification: track number (bitstream_serial_number)
  • encoding used in CMML: UTF-8
Matroska
  • used for identification: no (number is used)
  • encoding: UTF-8
  • syntax element: trackName
  • encoding: UTF-8
  • syntax element: chapName
MXF
  • used for identification: no (number is used)
  • encoding: UTF-16
  • syntax element: TrackName
Dependent on metadata format
ASF
  • a 6-bits integer number is used for identification
  • no track name
  • encoding: UTF-16
  • syntax structure: marker object
AVI
  • used for identification: no (number is used)
  • encoding: not specified (from spec: "a null-terminated text string describing the stream")
  • syntax element: streamName
No character encodings requirements specified for text streams
FLV
  • identification via object structure
  • no track name
  • encoding: UTF-8
  • syntax structures: FrameLabel and DefineSceneAndFrameLabelData tags (aka cue points)
  • Metadata tag: includes a description of the media resource and is compliant with Adobe’s Extensible Metadata Platform (XMP) specification
RMFF
  • used for identification: no (number is used)
  • encoding: ASCII
  • syntax structures: streamName, content description header
?