What’s harder: setting up a network connection between two smartphones thousands of miles apart; or between two smartphones in the same room? In 2012 it’s surprisingly difficult for smartphones that are physically proximal to detect the presence of each other. Bluetooth is a good option but it can’t be used to pair iPhones with Android phones. NFC is awkward and not widely deployed. BUMP has a smart solution based on GPS and accelerometers, but many users are concerned about location privacy. So we set out to develop our own solution.

When starting this project, our main requirements were:
On the surface it might seem tricky to get smartphones to detect each other, but in fact very little information needs to be exchanged to accomplish the pairing. It is sufficient to bootstrap the connection by exchanging a small “token”. Once this token has been received, traditional TCP sockets can be used to finalize the pairing.
How do you exchange a token between the phones? Due to limitations with the iPhone and Android API, Bluetooth is ruled out; and NFC is only available on a few Android phones. When thinking about I/O on modern smartphones it’s easy to overlook the audio system but it can be used to carry arbitrary data, not just voice/music. Square’s credit card reader uses the audio system to communicate with their app running on the iPhone. The audio system has clear benefits as a carrier for digital communication: 3rd party developers have reasonable access to the audio hardware and every phone has a speaker and a microphone.
Just as the original, acoustically-coupled modems in the 60s and 70s used sound waves as the intermediary between the telephone company’s line and the computer itself, we can wirelessly transmit our digital pairing token via analog sound waves. Unlike those noisy modems of our collective childhood, our system is effectively silent.

The analog telephone system was designed to transmit human voice. Since bandwidth was at a premium, the telephone line was band-filtered to 300-3400 Hz, covering just enough of the human voice spectrum that speech remained intelligible. The original modems had to operate within this spectrum, which means that they were forced to modulate the data at frequencies that were audible to humans(1). The sound that you would hear when dialing up to a BBS or ISP was the sound of the connection handshake protocol and data exchange.
While many people who grew up with computers fondly recall the sound of a dial-up modem, such clicks, screeches and tones coming from their new iPhone would cause concern. So if we are going to use sound waves to carry our token, then we better make sure that it is silent. The limit of human hearing varies with age and degrees of hearing damage, but anything above 19 kHz is either inaudible or barely audible. This establishes the lower frequency bound for our system.
The upper frequency bound can be determined by the maximum sample rate of the smartphones’s DAC (digital to analog converter). In nearly all cases, this is 44.1 kHz. The Nyquist sampling theorem says that an analog signal containing frequencies less than half the sample rate can be discretely sampled without any loss of information. This establishes the theoretical upper frequency bound at 22.05 kHz. However, this is not achievable in practice because audio systems have low-pass filters that begin at roughly 20 kHz to prevent aliasing of frequencies that exceed 22.05 kHz. So the actual upper frequency bound is more like 20 kHz. This 19-20 kHz range gives us a bandwidth of 1 kHz in which to encode a silent signal.
Our earliest attempts to transmit digital data over sound waves used traditional modem techniques such as phase shift keying. These techniques allowed us to send arbitrary amounts of data, but in the end it was just too sensitive to acoustic noise and device movement. Since then we have settled on a simpler technique where the pairing token is encoded in the frequency domain in a manner similar to the way that a unique identifier is encoded in the barcode on a can of tomato soup.

The token’s bit pattern is encoded as a series of ultra-high frequency tones at fixed intervals that are either present or absent based on whether the corresponding bit is a 1 or a 0. This scheme can be efficiently encoded using the Inverse FFT and decoded using the FFT and a thresholding algorithm. The whole token exchange can occur in as little time as it takes to fill the FFT. The amount of buffering imposed by the audio system varies by OS, but our technology can transmit and receive a token in ~400ms.
Once the token has been received, the pairing process moves over to regular TCP sockets. From there the devices are connected in a “Circle” and the application can freely send messages between the devices.
This technology is currently shipping in our PhotoCircle app for iPhone and Android. In a future post, I’ll discuss lessons learned doing real-time mobile networking. In the meantime, feel free to send me any questions or discuss at HackerNews.
Keith Lazuka
keith@circle38.com
Footnotes:
(1) Dial-up modems would have been silent if it were not for a historical accident. When the first modems were built in the 1960s, engineers had to workaround laws that prevented 3rd party devices from being directly connected to a phone line. The solution was to acoustically couple the modem to the phone system over a short distance. After these laws were overruled in the 1980s, modems could be directly, electrically coupled to the phone line. But the modem retained its speaker so that the computer user could monitor the familiar sounds during the connection process.
We’re excited to announce the launch of PhotoCircle. It’s been a busy few months for the team. We want to thank our friends and family for their incredible support.
More updates coming soon on this blog. In the meantime, download PhotoCircle, get together with friends, and take some pictures.
PhotoCircle. Available for iPhone and Android. Watch the demo.