Showing posts with label protocol. Show all posts
Showing posts with label protocol. Show all posts

Sunday, February 8, 2009

STUN: NAT Traversal

As a software developer, how do you get around the limitations of NAT? STUN - Session Traversal Utilities for NAT (RFC5389) - provides a simple tool for just this purpose. STUN servers and clients are commonplace today, but how do they work? Why do we use them? And what does the average developer need to know?

Prerequisite: NAT Primer or a basic understanding of NAT

Given a decent understanding of NAT, we know that a client behind a NAT device cannot receive inbound packets unless a binding has been established. In the context of UDP, bindings are created by first sending an outbound UDP packet from client to server. This initial packet sets up some state on the NAT device thus allowing subsequent traffic from server to client to proceed unhindered. STUN Binding requests serve this very purpose and more.

For example, suppose a video player client wishes to receive a stream of video packets. The client has somehow learned the video server's source IP and port number (perhaps via RTSP). Two problems must be resolved before video can flow from server to client:
  1. Open a binding on the NAT device
  2. Inform the server of the client's public IP address
Step one is a piece of cake. The client could simply send an empty UDP packet to the server's address. Unfortunately, this one-way transaction doesn't offer enough information to complete step two.

Step two is a little more difficult. The client needs to learn the dynamic mapping that the NAT device created for the first outgoing UDP packet. For example, the client may have sent an empty packet from its internal IP address (192.168.1.100:12345) to the server's public IP address (64.1.2.3:45678). Now, any incoming packets which reach the NAT on the port mapped for 192.168.1.100:12345 will forwarded accordingly. So what is the mapped port? STUN provides the answer.

Using STUN accomplishes step one and easily facilitates step two. Here's how it works: First the video server listens for STUN packets on the port it will be using for video traffic. The client sends a STUN packet to that port on the video server. For the sake of this explanation, assume the STUN packet is just an empty UDP packet with a few attributes used to distinguish it (explained later). Once the server receives this STUN packet, it generates a response. In the response's body, the server will include the source IP and port of the original incoming packet. This address is actually an external ip:port of the NAT device which was dynamically bound to the client's internal address and port earlier (by the first outgoing STUN packet). Once generated, the response is sent back to the NAT device, the NAT device forwards it to the client, and the client receives it. Finally, the client interprets the response's contents and learns its own external address. The client may use this information to refer services. For example, the video server can be told the address and port that it should use for sending video traffic.

One benefit of STUN, over other NAT traversal utilities, is its ability to cope with a complex hierarchy of NAT devices. Protocols such as UPnP lack this ability.

STUN also offers security. Each STUN packet contains a number of key-value attributes. MESSAGE-INTEGRITY is one such field. It provides an encrypted hash of the packet. The server knows the client's encryption key by some other means and it may use that key to authenticate the inbound STUN packet.

Other attributes include the following (refer to RFC5389 for a complete list):

MAPPED-ADDRESSReturned by the server in its response to convey the observed source IP and port of the client.
MESSAGE-INTEGRITYExplained above, used for authentication.
FINGERPRINTUseful for distinguishing STUN packets from other packet types.
SOFTWARETextual description of the software application being used by the agents.
ALTERNATE-SERVERInstructs client to use a different STUN server.

The STUN RFC (5389) is short and certainly a worthwhile read. Note that STUN is not a complete NAT traversal solution. It is merely a NAT traversal "utility." In a future post, I will discuss ICE, Interactive Connectivity Establishment, which puts STUN to work in a complete NAT traversal solution.

If you're looking to get started with STUN and NAT traversal right away, but don't want to spend time implementing your own stack, check out the following:
  1. STUN Client and Server written in C++
  2. stun4j written in Java
  3. JSTUN written in Java
Note that most existing STUN implementations are RFC3489 compliant (which was replaced by RFC5389). If you know of a good RFC5389 implementation, please leave a link in the comments or send me an email and I will add it to this post.

Sunday, February 1, 2009

RTSP

Ever wonder how real-time content is controlled? Me too.

One option is RTSP: the Real Time Streaming Protocol. This jewel of 1998 is a classic of the web boom era. I mean, come on, who doesn't like text-based syntax? Unfortunately, the cold truth is that RTSP is more like the crazy inbred cousin of HTTP than the prince of online video content it could have been. Video is everywhere today, RTSP is not. Why not? Well, let's take a look...

RTSP has eleven methods:
  1. SETUP
  2. TEARDOWN
  3. PLAY
  4. PAUSE
  5. RECORD
  6. ANNOUNCE
  7. DESCRIBE
  8. GET_PARAMETER
  9. SET_PARAMETER
  10. REDIRECT
  11. OPTIONS
Each method listed here can be sent between an RTSP server and client via either UDP or TCP.

To better understand this protocol, let's follow a typical session. The user has his nice-looking RTSP client GUI ready to go, a blank address bar anxiously awaiting input. Calvin sits down and enters:

>> rtsp://192.168.1.150/spartacus.avi

What happens now? First, the client establishes a TCP connection with the RTSP server on port 554 (RTSP). The client also opens up a UDP socket to receive incoming video traffic. A SETUP request is sent:

SETUP rtsp://192.168.1.150/spartacus.avi RTSP/1.0\r\n
Cseq: 1\r\n
Transport: RTP/AVP/UDP; unicast; destination=64.2.3.2; client_port=32884\r\n
\r\n

If the server understands the message and recognizes the URL, it will return a SETUP response:

RTSP/1.0 200 OK\r\n
Cseq: 1\r\n
Session: 5748271\r\n
Transport: RTP/AVP/UDP; unicast; destination=64.2.3.2; client_port=32884\r\n
\r\n

The server has now established a state machine for this new RTSP session and a unique session identifier has been assigned (5748271). The SETUP response echoes back the connection sequence number and transport header. All-in-all the server is ready to start streaming content. (woohoo)

From the ready state, a client may issue any method. Some require the session identifier (such as PLAY, PAUSE, and RECORD) while others do not (such as DESCRIBE, and OPTION). Suppose Calvin wants to start watching "spartacus.avi". As you may have guessed, a PLAY is in order:

PLAY * RTSP/1.0\r\n
Cseq: 2\r\n
Session: 5748271\r\n
Range: npt=0.0-\r\n
Scale: 4.0\r\n
\r\n

The client includes the session identifier provided in the SETUP response. The connection sequence number is incremented. A Range header specifies the actual video time range for which the stream should be transmitted. In Calvin's example above, the range is a "normal play time" format indicating that the video content should be played starting from the beginning. The Scale header indicates the content speed (not the bit rate). Calvin has chosen to fast forward at 4x speed.

So while Calvin's looking for his favorite scene, let's discuss a few of RTSP's shortcomings thus far. Sure, we have the ability to set up, play, and pause content, but what's missing? How about NAT traversal, should that be part of RTSP? Or how does RTSP account for buffering? Is RTSP even a reasonable approach for clips shorter than 10 minutes?

In truth, RTSP's usefulness broke down in a world of short videos and high-memory clients. For that limited 1998 hardware model, the client was assumed to lack sufficient memory for buffering an entire piece of content. Content was assumed to last dozens of minutes, if not hours. Thus, storing content on a media server and controlling it remotely was a reasonable solution. When reality kicked in, clients had more than enough memory. Content, for the most part, lasted just a couple of minutes. The very problems RTSP attempted to solve no longer existed.

However! Over the last few years, long term content and low-memory embedded devices have re-emerged. Many content providers offer feature length video via the internet. Handheld devices have become video-capable. Some ISPs are starting to offer set top boxes with ethernet interfaces rather than coaxial. Some homes are re-adopting the terminal-mainframe model by keeping a single high-capacity media server along with several thin clients for viewing.

On top of all that, RTSP is born again.

Alright, Calvin glimpses a his scene while he's fast forwarding. Time to pause:

PAUSE * RTSP/1.0\r\n
Cseq: 3\r\n
Session: 5748271\r\n
Range: npt=now-\r\n
\r\n

Aha, found that scene. Calvin issues another PLAY, now with a normal scale:

PLAY * RTSP/1.0\r\n
Cseq: 4\r\n
Session: 5748271\r\n
Range: npt=now-\r\n
Scale: 1.0\r\n
\r\n

Once Calvin has satisfied his cinematic cravings, he stops the video and allows the server to release the session:

TEARDOWN * RTSP/1.0\r\n
Cseq: 5\r\n
Session: 5748271\r\n
\r\n

Of course, there's a lot more to RTSP. If this article has piqued your interest, I suggest reading RFC2326 http://tools.ietf.org/html/rfc2326. It's a breeze, really. If your eyes are truly on the future, take a look at http://tools.ietf.org/html/draft-ietf-mmusic-rfc2326bis-19. RTSP 2.0 is still in draft, but it's on the move.

As devices continue to shrink, thin small-footprint video clients will become more and more prevalent. Need a fun project? Write a compact RTSP library... You may not have the next Apache Web Server, but who knows where your efforts may take you.

- John "I am Spartacus!" Calthrup