Costing a packet…
You are probably a hermit or have recently been on an extended holiday to a deserted tropical
island if you haven’t heard of the whistle-blower Edward Snowden and the infamous PRISM
PowerPoint presentation. On 9th June, Snowden revealed his identity on the Guardian newspaper
website. As a direct consequence, the next day the Guardian website, the online version of the UK-
based daily newspaper, received the largest number of visitors in their history – 6.97 million! This
also set another record for the Guardian: for the first time more people from the US visited its
website than from the UK. Edward Snowden, an American senior security analyst, revealed one of
the greatest state secrets – that the US, with the connivance of the UK, is capturing and analysing
Internet traffic. Snowden, now holed up in a Russian airport and seeking refuge, also revealed that
Internet companies like Google, Facebook, Yahoo, and Microsoft actively facilitate this data
collection and help by providing decryption keys for their data traffic. While it’s long been accepted
in international political circles that China has been monitoring Internet traffic for some years, the
shock of Snowden’s revelations was caused by the duration, the scale and the technological
capability of the American and British effort.
This article explains how this apparently amazing logistical feat is possible. Think about it for a
moment: everything you see in your Internet browser has been broken into small packets of data.
Any website server (like this one) will send the packets that make up this specific web page across
the network using Internet protocols. These packets of data are split up, and travel by many
different routes, before being re-combined by your router and Internet browser into the text and
images you are seeing at the moment. So, if you are an insatiably curious megalomaniac, or a
government, how on earth can you intercept these packets of data as they travel across the
Internet on their many diverse routes towards their final destination, someone’s computer screen?
Look at the map of the grey landmass of south west Britain, almost surrounded by the white ocean,
on the slide above and you might guess what makes Internet data traffic-capture possible.
As you probably know the origin of the Internet goes back to the need for military communications
in the event of a nuclear war. In the mid-1960s a huge amount of publicity was given to the fact
that even if parts of the distributed network were destroyed by a destructive conflict the data
packets would automatically be re-routed to another path to reach their ultimate destination. “The
messages would still get through.” We owe much of this clever capability to the creative electrical
engineer, Paul Baran, who began working in the computer science department at the Pentagon
defence consultancy Rand Corporation in 1959. Baran’s diligence and imagination meant he was
often ahead of his time. In the early 1960s America’s telephone system used vulnerable direct wire
connections between phones, and Baran foresaw that a digital distributed network, with packet
switching and redundant routes, would be far more robust, as well as much more secure. Years
later he explained how impossible it was in 1964 to convince the engineers at AT&T of the
advantages of the technical underpinnings of such a system. The company held a monopoly on the
American phone structure and remained obdurately resistant to changing from analogue direct
connection to digital distributed connections. It wasn’t until the paranoid pressures of the Cold
War increased in the late 1960s that military insistence forced the change on AT&T. But this
changeover, plus the American Defense Department’s Advanced Research Projects Agency,
ARPANET, was based on Baran’s brilliant concept. The ARPANET was the embryo which grew and
eventually developed into the “network of networks” we now call the Internet, and packet switching
still lies at the heart of the network.
Way back in the mid-1960s Baran had already foreseen that the weak point of a digitally distributed
network are the edges – and it is these edges that allow the NSA (National Security Agency) to
monitor and capture vast amounts of Internet traffic. Baran realised that the strength and
robustness of a distributed network lies in the number of connections within the network. The
greater the number of connections, the higher the degree of resiliency if connections get destroyed
in the event of a nuclear attack. Conversely, the opposite is also true, because there are fewer
connections: the edges of distributed networks are their most vulnerable points.
Now look at the map again and you can see why UK involvement in Internet data collection is vital.
Those undersea cables beneath the Atlantic Ocean link all the networks within Europe to America,
and the majority of them come ashore in the peaceful English counties of Cornwall and Devon. So
all the Internet data traffic between Europe and America has to pass through a handful of nodes
(connection points), linked to a very small number of undersea cables. These are the vulnerable
edges of the distributed network. If the data had to pass through many more nodes it would be
extremely difficult to capture. As it is, all the transatlantic data going both ways through a handful
of nodes makes collection a comparatively simple task. The Internet has natural and artificial
choke points: these (cable) nodes are natural choke points but in order to facilitate comprehensive
data gathering China forces all its Internet data traffic through four nodes, they are artificial choke
These thoughts sprang to mind because I recently met someone who had been responsible for
much of the wiring of the telephone infrastructure for the UK. He reminded me that however
much we think the Internet is digitally sophisticated, it still relies on cables running under the
ground or under the sea. I was told how some of the buried cables are now physically impossible
to get at in order to repair them, and how frequently the decision has to be made to re-run the
cable at great expense. As you read this article the text and images you are receiving in your
browser are almost certainly travelling the bulk of their journey to you via undersea and
underground cables. As the cables emerge from the briny they are gathered into nodes or points
where the precious data packets are transferred to a distributed network of hundreds, thousands,
and eventually millions of connections which make up the Internet in the UK. But it is during those
susceptible nanoseconds in the nodes where the cables emerge from the sea when the NSA makes
a copy of everything for later inspection and analysis. Presumably, as a reward for hosting the
interception, the UK gets its share of the data which is then routed to GCHQ in Cheltenham, (the
UK’s Government Communications Headquarters, marked in red on the map), where some of the
fastest super-computers in the UK go to work interpreting and analysing.
It has long been public knowledge that the NSA has a backdoor into the Windows operating system.
Although always denied by Microsoft, Edward Snowden’s information shows that these denials
have been false. Until Snowden made his top secret information public only people involved in the
NSA, were aware of the vast scale of the data capture. Although there have been many confused
rumbles about the privacy implications of this data collection, few people appear to be aware or
concerned about the enormous cost of the PRISM data gathering operation. The storage and
processing of colossal, Internet-sized, amounts of data for analysis is not a cheap operation.
Besides this, the privacy implications are rather trivial. The real, and perhaps the only, limitation
on Internet data collection is the vast amount of money needed to run the operation on the scale
of the Anglo-American effort. In a time of severe economic restraint in these two countries it’s
obvious that the long-suffering taxpayers have no idea of the true cost of such an operation.
Just think about it: harvesting Internet-scale data as Google does so effectively is highly expensive –
the company is currently spending over a billion dollars a quarter on infrastructure alone this year
alone - so how much more does capturing, storing and analysing all Euro-American Internet traffic
cost? More significantly, what is it going to cost in the next four years as Internet traffic grows from
523 exabytes in 2012 to 1.4 zettabytes in 2017? Despite its extraordinary commercial success, at
the moment Google can still only afford to store a truncated cache of partial website data stripped
down to basic keywords. The NSA’s unimaginably huge cache of data probably uses a similar
architecture, and I think Google may well be involved in providing the necessary software
technology. Could it be that the tangible cost of the secret involvement of Google, and other large
American Internet companies, with the NSA is the reason they don’t pay very much tax anywhere?
Perhaps this is a hidden charge which citizens in the US and UK unwittingly pay to keep these
powerful companies involved in PRISM? Whatever the truth of the matter, as the volume of
Internet data continues to escalate at an incredible pace, the expense of capturing and analysing
that data will also escalate alarmingly. The true extent of these costs is one state secret that even
the US and UK governments don’t really know, so for the present this is one secret that is safe from
...with analysis & insight...
So how does the National Security Agency
capture Internet traffic?