From its humble beginnings as a means of transferring data packets, the Internet has now grown to become a massive worldwide infrastructure for empowering multimedia communication. These developments date back to the early efforts in establishing a Voice over IP (VoIP) call over the Internet, somewhere during the mid-1990s.
Fast-forward to today, the same technologies behind VoIP hold good for supporting voice and video transfer over the Internet. Additionally, it is now possible to directly extend these capabilities on web applications with the support of new protocols.
WebRTC epitomizes this seamless user experience of enabling voice, video, and other forms of realtime communication over the World Wide Web.
What is WebRTC?
WebRTC stands for Web Real Time Communication. It is an open standard that supports realtime media interaction, such as voice, video, and chat message exchange, between web or mobile apps. It is enabled natively on web browsers through a set of protocols and APIs for handling audio and video sessions.
With WebRTC, users can experience live video calls, video chat, and audio/video streaming. It heralds a new era of apps that support internet telephony right within the browser. As of 2021, all desktop and mobile browser vendors have in-built support for WebRTC.
How does WebRTC work?
WebRTC relies on setting up a peer-to-peer connection between two or more endpoints over the Internet. WebRTC does not explicitly define its signaling protocol. Instead, it relies on a set of existing standard protocols developed over the years since the advent of VoIP and realtime communication.
However, establishing communication over the Internet has its own challenges. In an ideal case, the peers which are accessible over the public IP network can establish a WebRTC session directly. In contrast, most computers and devices connect to the Internet behind middleboxes such as firewalls and NAT. To set up a WebRTC session, the peers have to discover each other even when behind the middleboxes.
Therefore the signaling mechanism for WebRTC must take into account these exceptions. It involves a set of protocols to identify peers behind the network topology before establishing a connection between them.
WebRTC relies on a generic offer/answer exchange mechanism for peers to engage in a session. A few IETF standard protocols facilitate the signaling procedure between WebRTC peers.
ICE (Interactive Connectivity Establishment)
ICE is defined in RFC5245. A protocol based on the offer/answer framework, ICE is used to exchange endpoint details identified by an IP address and port number. As the name suggests, ICE provides an interactive way for peers to discover their respective media communication endpoints and test those endpoints for the feasibility of connection.
STUN (Session Traversal Utilities for NAT)
STUN is a protocol that allows a computer to discover its public IP address. Most consumer-facing devices and computers are connected to the Internet over NAT, on private IP addresses. Therefore, the STUN is a crucial component of the WebRTC handshake to point to each peer's correct public IP address. It is defined in RFC5389.
TURN (Traversal Using Relays around NAT)
A TURN server manages WebRTC exchange between peers by having all the packets relayed through itself. This is done to avoid the problem of packets being rejected by a symmetric NAT configuration. Symmetric NAT does not allow incoming data from an IP address unless there is a prior session with packets sent out to that address. A TURN server accepts an initial outgoing connection from the WebRTC peers to allow subsequent packets upon WebRTC connection establishment. TURN protocol is part of RFC 5766.
SDP (Session Description Protocol)
SDP is a session data format for defining a WebRTC peer's session parameters. It isn't a protocol in the sense that it does not dictate a bi-directional message exchange. SDP is mainly used to let peers publish their media-related capabilities, such as codecs, encryption algorithms, session properties, and more.
All applications over the Internet have to be associated with an underlying transport protocol. WebRTC is no different. Thankfully over the years, a lot of research has gone into transporting realtime data on the Internet, and some robust protocols have been standardized.
RTP (Realtime Transport Protocol) was one of the earlier transport protocols devised for carrying realtime voice and video packets in the VoIP systems. WebRTC employs a secured version of RTP, called the SRTP. SRTP runs over UDP, which is the most preferred mode of underlying transport mechanism for realtime communication.
WebRTC Network Architecture
The various signaling and media transport protocols of WebRTC are encapsulated within specific server components arranged within a network topology over the Internet.
The signaling between the peers is managed via an orchestration server that handles call initiation via the ICE and SDP message exchanges. Subsequently, the peers can initiate a WebRTC connection using the W3C RTCPeerConnection API to establish a session between the browsers at both ends. The STUN and TURN servers play their part in ensuring a smooth flow of packets during the entire sequence of call flow.
Note that this is a very simplified architecture for a direct peer-to-peer WebRTC application. For all practical applications, additional server components such as MCU (Multi Conferencing Unit)/SFU (Selective Forwarding Unit) and AAA (Authentication, Authorization & Accounting) are required, among others, to deploy & host a commercial WebRTC service.
Benefits of WebRTC
The most important differentiator for WebRTC is its interoperability on the web. This is facilitated by the inclusion of <video> and <audio> tags in the HTML5 specification. WebRTC leverages these advancements in web standards to usher in a new era of web applications. These Web 3.0 applications enable seamless media applications on the Internet along with human to machine communication capabilities.
Here are some of the key benefits of using WebRTC.
The Fastest way to Adopt WebRTC
So if you are contemplating using WebRTC for building your next application, then the standard APIs are available to give you a head start. But is that enough?
The standard WebRTC APIs take you only as far as building a simple app between two peers. However, for advanced apps dealing with multiple peers, things get interesting.
These requirements necessitate the need for hosted WebRTC solutions powered by CPaaS.
There are quite a few CPaaS service providers who offer an integrated voice and video experience. Twilio leads the pack as one of the oldest and the biggest CPaaS companies around. Additionally, you can also explore EnableX, Infobip, and Sinch. All these companies provide a robust and highly scalable WebRTC infrastructure, with support for video and voice APIs, along with a host of other communication mediums, like SMS, chat, RCS, and more.