UA Server RTSP Communication

From Media Fragments Working Group Wiki
Revision as of 09:29, 21 January 2011 by Dvandeur (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The Real-Time Streaming Protocol (RTSP, rfc2326) enables the delivery of one or more time-synchronized, continuous media streams. It does not typically deliver the continuous streams but rather acts as a "network remote control" for multimedia servers.

For media fragment URIs we require to explain how a media fragment specification is mapped to a RTSP protocol activity. We assume here that you have a general understanding of the RTSP protocol mechanism as defined in rfc2326. The following picture provides a general overview of the communication between an RTSP client and server: * from a DESCRIBE activity, in which the client requests from the server what resources it has available,

  • through a SETUP activity, which sets up the communication between the client and the server, including the requested tracks,
  • to a PLAY activity, where time ranges are requested by the client from the server for playback.
  • A PAUSE is always possible in the middle of a RTSP communication, and
  • a TEARDOWN closes the communication.

Note that the RTSP protocol is intentionally similar in syntax and operation to HTTP.

RTSP communication between client and server


How to map Media Fragment URIs to RTSP protocol methods =

Dealing with the media fragment URI dimensions in RTSP

We have the following dimensions: (1) temporal: #t=10,20 (2) tracks: #track=audio&track=video (3) spatial: #xywh=160,120,320,24 (4) id: #id=Airline%20Edit

(1) Temporal Media Fragment URIs

In RTSP, temporal fragment URIs are provided through the PLAY method:

a url such as

  rtsp://example.com/media#t=10,20

will be executed as a series of the following methods (all shortened for readability - full examples below). The data selection is provided in the PLAY method:

  • C->S: DESCRIBE rtsp://example.com/media
  • S->C: RTSP/1.0 200 OK (with an SDP description, see wiki)
  • C->S: SETUP rtsp://example.com/media/video
  • S->C: RTSP/1.0 200 OK
  • C->S: SETUP rtsp://example.com/media/audio
  • S->C: RTSP/1.0 200 OK
  • C->S: PLAY rtsp://example.com/media
           Range: npt=10-20
  • S->C: RTSP/1.0 200 OK
           Range: npt=9.5-20.1

...

We can explain this mapping for all of the media fragment defined time schemes.

Several temporal media fragment URI requests can be sent as pipelined commands without having to re-send the DESCRIBE and SETUP commands.


(2) Track Media Fragment URIs

In RTSP, track fragment URIs are provided through the SETUP method:

a url such as

  rtsp://example.com/media#track=audio&track=video

will be executed as a series of the following methods (all shortened for readability). The data selection is provided in the SETUP methods:

  • C->S: DESCRIBE rtsp://example.com/media
  • S->C: RTSP/1.0 200 OK (with an SDP description, see wiki)
  • C->S: SETUP rtsp://example.com/media/video
  • S->C: RTSP/1.0 200 OK
  • C->S: SETUP rtsp://example.com/media/audio
  • S->C: RTSP/1.0 200 OK

...

The discovery of available tracks is provided through the SDP reply to DESCRIBE, but it could be done through alternative methods, too.

Several consecutive track media fragment URI requests can only be sent with new SETUP commands and cannot be pipelined.


(3) Spatial Media Fragment URIs

In RTSP, spatial fragment URIs are not specifically provided for. Just like in HTTP, it is probably not a good idea to have the server deal with this dimension anyway, so it should be stripped off by the UA and be dealt with in the UA alone.

A url such as

  rtsp://example.com/media#xywh=160,120,320,24

will be executed as the url rtsp://example.com/media.


(4) Named Media Fragment URIs

In RTSP, named fragment URIs are not specifically provided for.

This needs to be discussed, but there are probably several possibilities to resolve this.

One possibility is to extend SDP to provide all information about all named address points in a resource and then to use those as a uri request, e.g.

 rtsp://example.com/media#id=Airline%20Edit

could have been mapped to

 rtsp:// example.com/media#t=50,70

by the UA through the SDP in the reply for the DESCRIBE command.


A second possibility is to resolve these mappings through the server, e.g.

 rtsp://example.com/media#id=Airline%20Edit

could have been redirected to

 location rtsp:// example.com/media#t=50,70

for the UA to re-retrieve. That would even solve it in the cases where we have a realtime stream and ids are written dynamically.


Putting the media fragment URI dimensions together in RTSP

A url such as

 rtsp://example.com/media#xywh=160,120,320,24&t=10,20&track=audio&track=video

will be executed as a series of the following methods (all shortened for readability). The data selection is provided both in the SELECTION method and the PLAY method:

  • C->S: DESCRIBE rtsp://example.com/media
  • S->C: RTSP/1.0 200 OK (with an SDP description, see wiki)
  • C->S: SETUP rtsp://example.com/media/video
  • S->C: RTSP/1.0 200 OK
  • C->S: SETUP rtsp://example.com/media/audio
  • S->C: RTSP/1.0 200 OK
  • C->S: PLAY rtsp://example.com/media
           Range: npt=10-20
  • S->C: RTSP/1.0 200 OK
           Range: npt=9.5-20.1

...

It is the UA's task to only display the rectangle xywh=160,120,320,2 .

It is true that the resolution of the dimensions is done at different levels of the protocol, but that does not create a problem.


Caching and RTSP for media fragment URIs

Since media fragment URIs rely only on existing protocol negotiations in RTSP, there is no need to discuss any changes to the caching approach in RTSP - it still works as before.


Addendum: full examples of the different RTSP protocol methods

An explanation of the RTSP messages sent between client and server is given below.

DESCRIBE

C->S: DESCRIBE rtsp://example.com/media RTSP/1.0
      CSeq: 1

S->C: RTSP/1.0 200 OK
      CSeq: 1
      Content-Base: rtsp://example.com/media
      Content-Type: application/sdp
      m=video 0 RTP/AVP 96
      a=control:rtsp://example.com/media/video
      a=range:npt=0-7.741000
      a=rtpmap:96 MP4V-ES/5544
      a=mimetype:string;"video/MP4V-ES"
      a=AvgBitRate:integer;304018
      a=StreamName:string;"video"
      m=audio 0 RTP/AVP 97
      a=control:rtsp://example.com/audio_en
      a=range:npt=0-7.712000
      a=rtpmap:97 mpeg4-generic/32000/2
      a=mimetype:string;"audio/mpeg4-generic"
      a=AvgBitRate:integer;65790
      a=StreamName:string;"en audio"
      m=audio 0 RTP/AVP 97
      a=control:rtsp://example.com/audio_fr
      a=range:npt=0-7.712000
      a=rtpmap:97 mpeg4-generic/32000/2
      a=mimetype:string;"audio/mpeg4-generic"
      a=AvgBitRate:integer;64510
      a=StreamName:string;"fr audio"

The client requests the description of a media object by using the DESCRIBE request. The server responds with a description of the requested resource. Such a description can be represented by means of the Session Description Protocol (SDP, rfc2327), as illustrated in this example. An SDP description contains, among other things, the available tracks, the coding formats of the media streams, etc. In this example, the media resource contains three media tracks: an MPEG-4 Visual video track and two AAC audio tracks (English and French audio track).

It is important to notice that the way tracks and codec information is discovered, is out of scope for the mapping of Media Fragments to RTSP. Thus, the DESCRIBE message is out of scope for RTSP media fragments.

SETUP

C->S: SETUP rtsp://example.com/media/video RTSP/1.0
      CSeq: 2
      Transport: RTP/AVP;unicast;client_port=8000-8001

S->C: RTSP/1.0 200 OK
      CSeq: 2
      Transport: RTP/AVP;unicast;client_port=8000-8001;server_port=9000-9001
      Session: 12345678

C->S: SETUP rtsp://example.com/media/audio_en RTSP/1.0
      CSeq: 3
      Transport: RTP/AVP;unicast;client_port=8002-8002

S->C: RTSP/1.0 200 OK
      CSeq: 3
      Transport: RTP/AVP;unicast;client_port=8002-8003;server_port=9002-9002
      Session: 12345678

Suppose the User wants to obtain the following media fragment: rtsp://example.com/media#track=video&track=audio%20en&t=5,20.

When the client sends the first SETUP request to the server, a new RTSP session is created. Note that for each requested track, a SETUP request needs to be sent to the server. The session id returned by the server during the SETUP request of the first track will be used for the following SETUP requests. This way, the server creates for each requested track an RTSP subsession. In our example, only the video track and the English audio track are requested. Note that, in case setup information is obtained through an SDP description, the value behind "a=control:" in the SDP description is used to request a track.

PLAY

C->S: PLAY rtsp://example.com/media RTSP/1.0
      CSeq: 4
      Range: npt=5-20
      Session: 12345678

S->C: RTSP/1.0 200 OK
      CSeq: 4
      Session: 12345678
      Range: npt=4.8-20

The client sends the PLAY request to the server in order to start sending data via the mechanism specified in the SETUP request. By making use of the Range header, a temporal fragment can be addressed. In our example, video data corresponding to seconds 5 to 20 is requested. The response also contains a Range header indicating the actual time range returned to the client.

Note that multiple PLAY requests can be sent by the client, each containing a different Range header (e.g., npt=5-20, npt=40-200, and npt=300). In this case, the PLAY requests will be queued by the server and executed in order.

PAUSE

C->S: PAUSE rtsp://example.com/media RTSP/1.0
      CSeq: 5
      Session: 12345678

S->C: RTSP/1.0 200 OK
      CSeq: 5
      Session: 12345678

The client can decide to pause the current session. Playback will be interrupted temporarily. A PLAY request can be sent to resume the playback.

Note that the Range header can also be used with the PAUSE command as illustrated in the following example:

C->S: PAUSE rtsp://example.com/media.mp4 RTSP/1.0
      CSeq: 5
      Session: 12345678
      Range: npt=15

The range header must contain exactly one value rather than a time range when used in combination with the PAUSE command. In the above example, media playback will be stopped after 15 seconds. Note that this time point is absolute (i.e., it is not relative to the moment the PAUSE command was sent). This functionality has no mapping into media fragment URIs.

A combination of PLAY and PAUSE could be used to implement seeking functionality. More specifically, when a user agent wants to seek to a later time point in the media, it sends a PAUSE command. This results in an interruption of the media playback. Subsequently, the user agent sends a PLAY command, including a Range header pointing to the time point that corresponds to the seek request.

TEARDOWN

C->S: TEARDOWN rtsp://example.com/media RTSP/1.0
      CSeq: 6
      Session: 12345678

S->C: RTSP/1.0 200 OK
      CSeq: 6

With a TEARDOWN request from the client, the server stops the stream delivery for the given URI, and frees the resources associated with it.


Addendum: Potential Extension of RTSP for spatial fragment retrieval (deprecated)

RTSP does not provide support to address and request spatial fragments of media resources. As stated in section 1.5 of rfc2326, RTSP can be extended in three ways: existing methods can be extended with new parameters (i.e., new headers), new methods can be added (i.e., next to PLAY, DESCRIBE, etc.), or a new version of the protocol can be created.

For instance, a new header (i.e. Region header) could be defined to signal a particular region of a media resource. Similar to the Range header, this new header could be added to the PLAY method. This is illustrated in the following example where a fragment corresponding to seconds 5 to 20 is requested. Further, only a rectangular region of the video is requested, which could be correspond to the region of interest.

C->S: PLAY rtsp://example.com/media.mp4 RTSP/1.0
      CSeq: 4
      Range: npt=5-20
      Region: rect(40,40,100,50)
      Session: 12345678

Note that the JPEG2000 specification defines a protocol to achieve the efficient exchange of JPEG2000 images. It is called JPEG2000 Interactive Protocol (JPIP, see also http://www.jpeg.org/public/fcd15444-9v2.pdf) and is defined in Part 9 (Interactivity tools, APIs, and protocols) of the JPEG2000 specification. It provides support for the delivery of image fragments. In section C4 (View-window request fields) of ISO/IEC 15444-9 is explained how regions of a JPEG2000 image can be requested. Note that JPIP is not an extension of RTSP, but the way regions are requested could serve as a source of inspiration to define new RTSP headers.

With the current status of the media fragment URI specification, we do not expect spatial media fragments to be supported by either the HTTP or the RTSP protocol. However, these are thoughts around it, in case there is a later need.