A WebRTC Client as a Skype Alternative

Recently, I've been musing in this post about a self hosted alternative to Skype for communication between family members using Asterisk on a Raspbery Pi and the Ekiga SIP client. With the recent discovery that Microsoft is actually listening into Skype text conversations that need has grown even stronger. Then I've read that the upcoming Firefox 22 will have the full WebRTC API implemented and activated. So far my knowledge about WebRTC has been rather limited, I just knew it had something to do with web browser based peer to peer communication but not much else. Time to fill the gaps:

The Wikipedia entry on WebRTC is actually quite brief and pretty much reflected what I already know: There's almost nothing about how it works. The FAQ over at webrtc.org is quite enlightening however. Here are some key facts that will help you to better understand WebRTC if you know about how VoIP works with SIP:

According to the FAQ, WebRTC can be thought of as a web browser based JavaScript API to SIP (Session Initiation Protocol) functionality. In other words, WebRTC contains a full implementation of a SIP stack that JavaScript programs in the browser can use to establish a communication session. While the communication session is peer-to-peer a centralized SIP server is still needed to initially connect the two endpoints. So instead of a native SIP client that has to be installed once, the JavaScript program in the web browser that is loaded from a (web) server that also hosts a SIP server becomes the client. WebRTC can do more than just abstract the SIP API. However, if you're familiar with SIP then this is the way to start thinking about WebRTC.

According to the FAQ and this blog post, WebRTC can be thought of as a web browser based
JavaScript API for two things:

To access camera and microphone
To connect to another peer (i.e. to the destination user)

What is not defined is how the other peer is discrovered initially and how audio and video codecs are negotiated. Traditionally this is done with a number of different protocols, SIP being one of them. In other words, the SIP protocol and communication with a SIP server is not part of WebRTC and has to be implemented by the web app on its own (for details see the blog post linked above). What is defined however is the use of the Session Description Protocol (SDP) to describe the audio and video codecs available on each end.

What I am wondering at this point is how two JavaScript applications running on different devices can communicate with each other directly, as I always thought JavaScript enforces the rule that the program is only allowed to establish connections back to the site from which the script was loaded. Obviously that can't be the case here anymore.

The FAQ also mentions a number of other interesting facts: WebRTC implements STUN (Session Traversal Utilities for NAT) to establish a peer-to peer session through Network Address Translation gateways, a must in today's IPv4 environment. Also, echo cancellation techniques are mentioned as well as the codecs used that look pretty neat, bandwidth efficient and wideband and HD video enabled. As all functionality is part of the web browser there's hope that performance will not suffer as much as if all code was written in JavaScript.

So one the simplest use case would be to replace native SIP clients with a browser based WebRTC client that implements its own SIP stack. WebRTC clients can even communicate with native SIP clients over a proxy server if both support a common audio and video subset. This seems to be the case with WebRTC supporting the G.711 and G.722 audio codecs that are widely used in the SIP world today.

This obviously fits into Google's overall (Chrome) strategy to have everything running via a centralized server in the network and in the web browser on user devices. While this is not exactly what I have in mind due to my preference of hosting my own web services at home the architecture is open so nothing would prevent me from running my own SIP server at home with an open sourced WebRTC client and proxy. Having the client run in the browser also means that the client software can be deployed without any hassles. The use case implemented by Ericsson over here gives an insight of what's possible with "just in time" deployed communication clients. At this point, WebRTC breaks with today's technology to offer new and interesting possibilities to explore.

For further insight, have a look here on SIP servlets and here for a HTML5 SIP client (+proxy) implementation with WebRTC.

6 thoughts on “A WebRTC Client as a Skype Alternative”

André Silva says:

May 24, 2013 at 9:43 am

You can’t say that browser has a full SIP stack in it. It’s false as SIP is only a signalling protocol and WebRTC is signalling agnostic. The only thing WebRTC provides to SIP is the SDP to send in INVITEs.

There are some projects and specifications to use Web sockets as a protocol to transport SIP, and some stack being migrated to Javascript. The application can use SIP, XMPP, or any other protocol over websockets or Http.
Martin says:

May 26, 2013 at 10:04 am

Hi André,

Thanks very much for this clarification! I got that important point wrong when I wrote the blog entry and have corrected the post accordingly.

Kind regards,
Martin
Aswath Rao says:

May 27, 2013 at 4:09 am

This is how I am using: RPi is running a webrtc app in a server. I am signed into it from a browser which maintains a websocket conn. I give an http URL to you. You visit the URL from a browser. RPi will authenticate you with OpenID/OAuth and if you are in my whitelist (it is a dynamic one), it will maintain a web socket conn with your browser and will inform me. If I respond we two can continue in our conversation.
Kramasundar says:

May 27, 2013 at 7:22 pm

I think G.722 is not part of the standard. G.711 and OPUS are.
Martin says:

May 27, 2013 at 8:49 pm

Hi, according to the FAQ linked above “The currently supported voice codecs are G.711, G.722, iLBC, and iSAC”.
Robert Syouta says:

May 28, 2013 at 11:21 pm

Why incorporate the SIP stack into WebRTC in the first place? I may be missing a lot, however, it would seem to make more sense to not (duplicate) a SIP stack within WebRTC on the basis that it should be called as other applications stacks and might otherwise lead to conflicts in development.

Comments are closed.