Many Internet-of-Things (IoT) devices are real-world objects like appliances and thermostats, and therefore network security should be a paramount concern for vendors of IoT systems. Nothing erodes trust faster than real-world and personal consequences: Imagine the headlines if your refrigerator stopped working because of a software bug!
There are various layers of security to consider for an IoT product, from base-level traffic encryption to user and device-level authentication. In this post I will focus on network level security.
A typical IoT device resides on a private network, behind a firewall. This architecture presents both benefits and challenges. On one hand, it reduces the device’s security surface to the firewall and Wi-Fi network itself and, right or wrong, defers some security responsibility to the network owner. Delegation of security helps keep memory and CPU costs down on the IoT device. On the other hand, the firewall frustrates cloud server attempts to directly connect to IoT devices for purposes of controlling them with a mobile application.
Some techniques exist that can help bypass a firewall and allow inbound connections from the cloud to the IoT device. However, many of these techniques exploit loopholes that may or may not be allowed on all networks, and in addition this architecture makes it harder to design a truly secure IoT device. By far the easiest, most secure , and most reliable method of bypassing a firewall is for the IoT device to follow the “outbound connections only” rule: All connections should originate from the device to the cloud, and then the device should keep that connection open. This approach is easier and more secure because it doesn’t rely on opening ports in the firewall, or rely on techniques that trick the firewall into routing traffic such as Session Traversal Utilities for NAT (STUN) methods.
Once connectivity is established, we have to choose how to efficiently use the connection, balancing responsiveness with resource consumption. HTTP long polling is one possible “inside out” standard-based technique for two-way communication (Figure 1).
In the long polling model, the IoT device makes an HTTP request to the server. The server delays its response to the HTTP request until either a prescribed time period has passed, or until a command or request of the IoT device needs to be made. If a command is indicated, the server immediately responds with the command, and the IoT client processes the command. The IoT client then immediately makes another HTTP request of the server and begins the cycle over. If a command or request needs to post results to the server, it does so in the body of the new long poll request.
There are several benefits to this approach. From the perspective of the IoT device, HTTP is simple, lightweight, and many support libraries exist. HTTP also provides ultimate route-ability across the Internet, and includes support for persistent connections required by long polling.
HTTP also allows for arbitrary payloads, allowing you flexibility in your choice of device-specific messaging. We have successfully used JSON/REST-like semantics over long polling, as well as Google Protocol Buffer messages.
From the perspective of the cloud server, most frameworks are designed to process HTTP requests, allowing server developers freedom to use existing frameworks and libraries to speed development.
HTTP long polling also allows the server to control bandwidth utilization by simply implementing longer response delays. Furthermore, by adding a simple “poll timer” that forces the IoT client to delay HTTP requests you can easily move to a true polling model, where you are sacrificing response time, on the one hand, for lower resource utilization and the opportunity to be battery powered, on the other.
WebSockets are another viable alternative, and are very similar to HTTP long polling. WebSockets can be far more efficient by dispensing with HTTP headers. WebSockets also offer real-time device-to-server notifications, which some applications may require. To get real-time notifications with the long polling model, a separate socket must be used to preserve the server-controlled long poll channel.
Unfortunately, WebSockets are not universally supported by all server frameworks, and in rare instances network routing can be an issue. At some point in the near future WebSockets may be a better choice, but for the moment I’d avoid them unless you’re sure your target environment will support them.
Underpinning both of these techniques is a Transmission Control Protocol (TCP) socket. Any sensible security model encrypts traffic across this socket connection. Despite recent bad press, Secure Sockets Layer (SSL) or Transport Layer Security (TLS) remains the best choice as an encryption layer. SSL is highly scrutinized, supported, and understood.
Thinking about inventing your own protocol? Given recent security flaws, such as the now infamous “Heartbleed” defect, found in SSL implementations from major companies and open source libraries it should be readily apparent how hard it is to develop a truly secure protocol, let alone implement it correctly. So think twice before attempting your own security scheme.
Unfortunately SSL is an inherently heavy protocol, requiring a nontrivial set of handshaking during connection setup. This overhead further justifies the approach of using a persistent connection for all communication; and thankfully HTTPS long polling (in conjunction with marking the TCP socket as “keep alive”, so that it won’t be re-opened with each poll) fits the bill.
SSL can also consume significant code space and runtime memory. For a resource-rich cloud server, the SSL resource overhead isn’t much of a concern, but it can be worrisome for a low-cost IoT device, since memory and CPU power come at a premium. When considering an SSL library for an IoT device, evaluate what aspects of SSL you need to use and assess if your chosen SSL library provides options to disable the features you’re not using. You should also see if your target hardware can help lighten your load. SSL is based on Advanced Encryption Standard (AES), Secure Hash Algorithm (SHA), and other standard cryptography algorithms. Many hardware platforms provide specialized crypto hardware to offload these computations from the CPU.
In the end, the approach becomes a game of balancing resources and understanding your security requirements. Careful thought and planning up front can save a lot of headaches down the road.
Matt Osminer is a partner and Director of Engineering at Cardinal Peak, an engineering services firm that specializes in developing embedded devices, mobile apps, and digital video products. In his role at Cardinal Peak, Osminer has overseen the development of several Internet-of-Things systems, including developing the connected device, the cloud server, and mobile apps. Prior to Cardinal Peak, Osminer was a director of software engineering at Time Warner Cable, where he was responsible for the development of numerous apps for cable television platforms. Earlier in his career he served as an embedded software engineer at several startups.