UA Server RTSP Communication
From Media Fragments Working Group Wiki
The Real-Time Streaming Protocol (RTSP, rfc2326) enables the delivery of one or more time-synchronized, continuous media streams. It does not typically deliver the continuous streams but rather acts as a "network remote control" for multimedia servers. The RTSP protocol is intentionally similar in syntax and operation to HTTP. The following figure shows the communication between a client and a server.
Contents |
Delivering fragments with RTSP
RTSP provides support for the selection of temporal fragments of media resources, as well as the selection of media tracks. As an example, a media fragment corresponding to seconds 5 to 20 of the video track of the media resource rtsp://example.com/media.mp4 is requested. An explanation of the RTSP messages sent between client and server is given below.
DESCRIBE
C->S: DESCRIBE rtsp://example.com/media.mp4 RTSP/1.0
CSeq: 1
S->C: RTSP/1.0 200 OK
CSeq: 1
Content-Base: rtsp://example.com/media.mp4
Content-Type: application/sdp
m=video 0 RTP/AVP 96
a=control:streamid=0
a=range:npt=0-7.741000
a=length:npt=7.741000
a=rtpmap:96 MP4V-ES/5544
a=mimetype:string;"video/MP4V-ES"
a=AvgBitRate:integer;304018
a=StreamName:string;"hinted video track"
m=audio 0 RTP/AVP 97
a=control:streamid=1
a=range:npt=0-7.712000
a=length:npt=7.712000
a=rtpmap:97 mpeg4-generic/32000/2
a=mimetype:string;"audio/mpeg4-generic"
a=AvgBitRate:integer;65790
a=StreamName:string;"hinted audio track"
The client requests the description of a media object by using the DESCRIBE request. The server responds with a description of the requested resource. Such a description can be represented by means of the Session Description Protocol (SDP, rfc2327), as illustrated in this example. Such a description contains, among other things, the available tracks, the coding formats of the media streams, etc. In this example, the media resource media.mp4 contains two media tracks: an MPEG-4 Visual video track and an AAC audio track.
SETUP
C->S: SETUP rtsp://example.com/media.mp4/streamid=0 RTSP/1.0
CSeq: 2
Transport: RTP/AVP;unicast;client_port=8000-8001
S->C: RTSP/1.0 200 OK
CSeq: 2
Transport: RTP/AVP;unicast;client_port=8000-8001;server_port=9000-9001
Session: 12345678
By using the SETUP request, the client requests a transport mechanism to be used for the streamed media. The server creates a new RTSP session. Note that for each requested track, a SETUP request needs to be sent to the server. The session id returned by the server during the SETUP request of the first track will be used for the following SETUP requests. This way, the server creates for each requested track an RTSP subsession. In our example, only the video track is requested: media.mp4/streamid=0. Note that the value behind "a=control:" in the SDP description is used to address a track.
PLAY
C->S: PLAY rtsp://example.com/media.mp4 RTSP/1.0
CSeq: 4
Range: npt=5-20
Session: 12345678
S->C: RTSP/1.0 200 OK
CSeq: 4
Session: 12345678
RTP-Info: url=rtsp://example.com/media.mp4/streamid=0;seq=9810092;rtptime=3450012
The client sends the PLAY request to the server in order to start sending data via the mechanism specified in the SETUP request. By making use of the Range header, a temporal fragment can be addressed. In our example, video data corresponding to seconds 5 to 20 is requested. Multiple PLAY requests can be sent by the client, each containing a different Range header (e.g., npt=5-20, npt=40-200, and npt=300). In this case, the PLAY requests will be queued by the server and executed in order.
PAUSE
C->S: PAUSE rtsp://example.com/media.mp4 RTSP/1.0
CSeq: 5
Session: 12345678
S->C: RTSP/1.0 200 OK
CSeq: 5
Session: 12345678
The client can decide to pause the current session. Playback will be interrupted temporarily. A PLAY request can be sent to resume the playback.
Note that the Range header can also be used with the PAUSE command as illustrated in the following example:
C->S: PAUSE rtsp://example.com/media.mp4 RTSP/1.0
CSeq: 5
Session: 12345678
Range: npt=15
The range header must contain exactly one value rather than a time range when used in combination with the PAUSE command. In the above example, media playback will be stopped after 15 seconds. Note that this time point is absolute (i.e., it is not relative to the moment the PAUSE command was sent).
A combination of PLAY and PAUSE could be used to implement seeking functionality. More specifically, when a user agent wants to seek to a later time point in the media, it sends a PAUSE command. This results in an interruption of the media playback. Subsequently, the user agent sends a PLAY command, including a Range header pointing to the time point that corresponds to the seek request.
TEARDOWN
C->S: TEARDOWN rtsp://example.com/media.mp4 RTSP/1.0
CSeq: 6
Session: 12345678
S->C: RTSP/1.0 200 OK
CSeq: 6
With a TEARDOWN request from the client, the server stops the stream delivery for the given URI, and frees the resources associated with it.
Extending RTSP
RTSP does not provide support to address and request spatial fragments of media resources. As stated in section 1.5 of rfc2326, RTSP can be extended in three ways: existing methods can be extended with new parameters (i.e., new headers), new methods can be added (i.e., next to PLAY, DESCRIBE, etc.), or a new version of the protocol can be created.
For instance, a new header (i.e. Region header) could be defined to signal a particular region of a media resource. Similar to the Range header, this new header could be added to the PLAY method. This is illustrated in the following example where a fragment corresponding to seconds 5 to 20 is requested. Further, only a rectangular region of the video is requested, which could be correspond to the region of interest.
C->S: PLAY rtsp://example.com/media.mp4 RTSP/1.0
CSeq: 4
Range: npt=5-20
Region: rect(40,40,100,50)
Session: 12345678
Note that the JPEG2000 specification defines a protocol to achieve the efficient exchange of JPEG2000 images. It is called JPEG2000 Interactive Protocol (JPIP, see also http://www.jpeg.org/public/fcd15444-9v2.pdf) and is defined in Part 9 (Interactivity tools, APIs, and protocols) of the JPEG2000 specification. It provides support for the delivery of image fragments. In section C4 (View-window request fields) of ISO/IEC 15444-9 is explained how regions of a JPEG2000 image can be requested. Note that JPIP is not an extension of RTSP, but the way regions are requested could serve as a source of inspiration to define new RTSP headers.

