Are We Breaking The Internet? – Fast Company
The packetized technology that underlies most of the internet was created by Paul Baran as part of an effort to protect communications by moving from a centralized model of communication to a distributed one. While the Internet Society questions whether the creation of the internet was in direct response to concerns about nuclear threat, it clearly agrees that “later work on Internetting did emphasize robustness and survivability, including the capability to withstand losses of large portions of the underlying networks.”
From there, the foundation was laid for an internet that treated the distributed model as a key component to ensuring reliability. Almost 50 years later, consolidation around hosting and mobile and the development of the cloud have created a model that increases concentration on top of few key players: Amazon, Microsoft, and Google now host a large number of sites across the web. Many of those companies’ customers have opted to host their infrastructure in a single set of data centers, potentially increasing the frailty of the web by re-centralizing large portions of the net.
That’s what happened when Amazon’s S3 service, essentially a large hard drive used by companies like Spotify, Pinterest, Dropbox, Trello, Quora, and many others, lost one of its data centers on Tuesday morning. The problem began around 9:37 a.m. Pacific, the company later explained, after an employee tried to fix a problem with S3’s billing system: “an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers… Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended.”
Companies that had content stored in those sets of servers, located in Northern Virginia, essentially stopped functioning properly, prompting experts to recommend that companies look at storing data across multiple data centers to increase reliability. The failure rippled across Amazon’s other services, many of which depend upon S3, leading to “increased error rates” for sites that rely on AWS, and making engineers’ efforts at recovery that much more difficult. Even the webpage Amazon uses to alert customers to outages was affected.
On a different end of the spectrum, other services intended to provide reliability in the event of an outage or an attack have been experiencing their own issues. Cloudflare, which provides security and hosting services for thousands of websites, revealed last week that it had discovered a security bug that could leak passwords from the sites of its customers, including companies like Betterment, Medium, Uber, and OkCupid. Thousands of companies were forced to ask their customers to change their passwords and make an assessment as to the potential security impact this would have on their overall infrastructure.
While those issues may only be fixed by the owners of the respective sites, the problem of centralization is slowly creeping into the realm of the millions of people who rely upon these services. People using Google Wifi and Google Chromecast found themselves forced to reinstall their systems last week as a bug wiped out centralized configuration files for many of those devices, forcing them offline for a period of time.
As more people and more devices get connected to the internet, the lure of centralizing control—which makes it easier for companies to manage them—is bumping its head against the initial design of the internet: to drive reliability and scalability. With every new largely centralized system that comes online, the internet becomes more brittle, as centralization creates an increased number of single points of failure. In a world where hackers are looking for new ways to take down infrastructures, those centralized services must double down on increasing security and reliability if we want the internet to survive.