Sunday, February 1, 2009


Ever wonder how real-time content is controlled? Me too.

One option is RTSP: the Real Time Streaming Protocol. This jewel of 1998 is a classic of the web boom era. I mean, come on, who doesn't like text-based syntax? Unfortunately, the cold truth is that RTSP is more like the crazy inbred cousin of HTTP than the prince of online video content it could have been. Video is everywhere today, RTSP is not. Why not? Well, let's take a look...

RTSP has eleven methods:
  1. SETUP
  3. PLAY
  4. PAUSE
Each method listed here can be sent between an RTSP server and client via either UDP or TCP.

To better understand this protocol, let's follow a typical session. The user has his nice-looking RTSP client GUI ready to go, a blank address bar anxiously awaiting input. Calvin sits down and enters:

>> rtsp://

What happens now? First, the client establishes a TCP connection with the RTSP server on port 554 (RTSP). The client also opens up a UDP socket to receive incoming video traffic. A SETUP request is sent:

SETUP rtsp:// RTSP/1.0\r\n
Cseq: 1\r\n
Transport: RTP/AVP/UDP; unicast; destination=; client_port=32884\r\n

If the server understands the message and recognizes the URL, it will return a SETUP response:

RTSP/1.0 200 OK\r\n
Cseq: 1\r\n
Session: 5748271\r\n
Transport: RTP/AVP/UDP; unicast; destination=; client_port=32884\r\n

The server has now established a state machine for this new RTSP session and a unique session identifier has been assigned (5748271). The SETUP response echoes back the connection sequence number and transport header. All-in-all the server is ready to start streaming content. (woohoo)

From the ready state, a client may issue any method. Some require the session identifier (such as PLAY, PAUSE, and RECORD) while others do not (such as DESCRIBE, and OPTION). Suppose Calvin wants to start watching "spartacus.avi". As you may have guessed, a PLAY is in order:

PLAY * RTSP/1.0\r\n
Cseq: 2\r\n
Session: 5748271\r\n
Range: npt=0.0-\r\n
Scale: 4.0\r\n

The client includes the session identifier provided in the SETUP response. The connection sequence number is incremented. A Range header specifies the actual video time range for which the stream should be transmitted. In Calvin's example above, the range is a "normal play time" format indicating that the video content should be played starting from the beginning. The Scale header indicates the content speed (not the bit rate). Calvin has chosen to fast forward at 4x speed.

So while Calvin's looking for his favorite scene, let's discuss a few of RTSP's shortcomings thus far. Sure, we have the ability to set up, play, and pause content, but what's missing? How about NAT traversal, should that be part of RTSP? Or how does RTSP account for buffering? Is RTSP even a reasonable approach for clips shorter than 10 minutes?

In truth, RTSP's usefulness broke down in a world of short videos and high-memory clients. For that limited 1998 hardware model, the client was assumed to lack sufficient memory for buffering an entire piece of content. Content was assumed to last dozens of minutes, if not hours. Thus, storing content on a media server and controlling it remotely was a reasonable solution. When reality kicked in, clients had more than enough memory. Content, for the most part, lasted just a couple of minutes. The very problems RTSP attempted to solve no longer existed.

However! Over the last few years, long term content and low-memory embedded devices have re-emerged. Many content providers offer feature length video via the internet. Handheld devices have become video-capable. Some ISPs are starting to offer set top boxes with ethernet interfaces rather than coaxial. Some homes are re-adopting the terminal-mainframe model by keeping a single high-capacity media server along with several thin clients for viewing.

On top of all that, RTSP is born again.

Alright, Calvin glimpses a his scene while he's fast forwarding. Time to pause:

PAUSE * RTSP/1.0\r\n
Cseq: 3\r\n
Session: 5748271\r\n
Range: npt=now-\r\n

Aha, found that scene. Calvin issues another PLAY, now with a normal scale:

PLAY * RTSP/1.0\r\n
Cseq: 4\r\n
Session: 5748271\r\n
Range: npt=now-\r\n
Scale: 1.0\r\n

Once Calvin has satisfied his cinematic cravings, he stops the video and allows the server to release the session:

Cseq: 5\r\n
Session: 5748271\r\n

Of course, there's a lot more to RTSP. If this article has piqued your interest, I suggest reading RFC2326 It's a breeze, really. If your eyes are truly on the future, take a look at RTSP 2.0 is still in draft, but it's on the move.

As devices continue to shrink, thin small-footprint video clients will become more and more prevalent. Need a fun project? Write a compact RTSP library... You may not have the next Apache Web Server, but who knows where your efforts may take you.

- John "I am Spartacus!" Calthrup

1 comment: