"It is a very sad thing that nowadays there is so little useless information." Oscar Wilde, 1894.
This is the first blog written for the OOI CI web site. I’m John Orcutt, the PI for the NSF Ocean Observatories Initiative Cyberinfrastructure Implementing Organization. Generally, the program is referred to as the OOI CI and the acronym will be used to reduce the length of the mouthful in the second sentence. The new web site can be reached through the parent organization, the Consortium for Ocean Leadership, or COL. You will find, with time, that the site and the blog will allow you to drill much more deeply into the thousands of pages comprising the requirements, architecture, meeting notes, presentations, change control board actions and much more. The web site will seek to provide a higher-level, much more readable view of what the OOI CI does. The web site and accompanying Facebook page, Twitter and other social networking connections seek to provide information and encourage interaction. Several of us will be writing blogs in this location with the intent of providing a broad overview in plain English (perhaps even good English!) of much of the underlying complexity of the OOI CI goals, construction progress, testing and annual software releases.
Today, I am going to discuss data in broad terms and hint at the capabilities of the OOI CI in this context. You’re all familiar with the stately Library of Congress, the largest modern library in the world. The digitized portion of the books in the LoC is about 10 TeraBytes (10 TB); if all the books were digitized the collection would comprise something like 25 TB. A byte is a collection of eight 1’s and 0’s, which can be used to identify as many as 256 things. Often a byte is used to designate a character such as a or A, the ascii alphabet. Anyway, a TeraByte is 25,000,000,000,000 bytes or, in scientific notation 25x1012 bytes. These sizes and terms have a hierarchy:
Kilo = 103 = 1000
Mega = 106 = 1,000,000
Giga = 109 = 1,000,000,000
Tera = 1012 = 1,000,000,000,000
Peta = 1015 = 1,000,000,000,000,000
Exa = 1018 = 1,000,000,000,000,000,000
Zetta = 1021 = 1,000,000,000,000,000,000,000
Yotta = 1024 = 1,000,000,000,000,000,000,000,000
While still not common, the Internet today can transfer data from place to place at speeds of 10Gbps/s or even ten and a hundred times that rate. However, commercially, 10Gbps represents the best civil off the shelf (COTS) fiber optic and switch/router technology. Notice I just changed the notation; 10Gbps is 109 bits per second and is only an eighth of the size of 10GB/s. If we were to use one of these modern networks to transfer all the books at the LoC to La Jolla as an example, this would require 200,000,000,000,000/10,000,000,000 seconds, or 5 hours and 36 minutes. Establishing a copy of the Library in La Jolla would be fairly inexpensive. I can now buy a TB disk drive for about $100, so the costs of the drives would only be about $2,500--not a very large investment! In addition to books in the LoC there are also something like 120TB of film, images and video available as well. Transferring this information to La Jolla would take a bit more than a day: 26 hours & 42 minutes--still not a formidable task. While state of the art networks and storage (nevertheless COTS) can achieve this modern miracle, getting all this information and knowledge into the office or home is a much more difficult task today.
The federal government has a new National Broadband Policy and they’ve been testing network speeds to/from homes over the past several months. As an aside, if you go to this site, you can also check your own connection rates and this will be added to the database. The news is not encouraging. The national average network speed to homes is 3.9Mbps, or about 0.04% of the modern 10 Gbps network. This statistic places the US in 18th place globally, while all of the top ten countries are in Asia and Europe. South Korea is fastest (14.6Mbps) and Japan’s half this at 7.9Mbps. The fastest network in the US is in Sandy, UT with an average speed of 32.7Mbps! Amsterdam recently installed a citywide network operating at 1Gbps including houseboats and Google has a contest to install a 1Gbps Internet in a few cities in the US to prime the pump. FCC regulations as well as competition will be required, however, to get modern networking the last mile to the home.
It can be argued, in terms of industrial competitiveness, that providing such open and clear paths to information, knowledge and wisdom is essential. As an aside, there is a great blog that deals with computing, networking and public policy (Ars Technica). It’s worth a try and even joining; nevertheless you can also find a lot of detrractors. Astronomy is seeing an enormous increase in data available for study. The Sloan Sky Survey began in 2000 with the opening of its telescope in New Mexico. In the first few weeks of operation, Sloan accumulated more digital data that had previously been collected in the entire field. Its archive now comprises 140TB of data (about the size of the LoC). When the new Large Synoptic Survey Telescope begins operation in Chile in 2016, the system will collect that much data in five days! The Large Hadron Collider at CERN, which has been in the news this past year with magnet problems, generates 40 TB/s of data during collisions. The International Data Corporation (IDC) has estimated that 1.2 zettabytes of data in all areas will be created this year alone. At UCSD researchers assert that in 2008, 3.6 zettabytes of data were sent to households in the US in the form of TV and games.
This is a fairly significant per capita consumption of 34 GB/person/day. Peter Lyman at UCB estimated that only six exabytes were generated in 2002. The numbers here are not perfectly consistent, but lead to the conclusion that the amounts of data or information are very large and growing at an extraordinary rate. It’s also encouraging that the US population reads more; TV led to a decrease in reading, but there is good evidence that reading has tripled since 1980 given ready access to huge amounts of information. The Economist published a special section on "The Data Deluge" in February of this year--it’s definitely worth a read. You may have access to the Economist, but if not, you can read it here. Data and information today are nearly synonyms, while knowledge derives from strands of data and information.
The OOI CI architecture has been designed to deal not only with data, but the developed knowledge. As is clear from the previous paragraph, the world is becoming more information rich or, from a different point of view, swamped. However, this growth has a very substantial effect on science and, for that matter, business. WalMart manages more than a million customer transactions each hour and the company is possibly managing nearly three petabytes of data and information. If you look back at the first part of this blog, you can see that this is nearly 200 times as much data as all the holdings of the world’s largest library. An example closer to home (or at least to the OOI), climate is becoming an increasingly important part of our future, including adaptation, mitigation and even engineering. Verifying a climate treaty will require a much more complete and advanced observational system than is in place today.
The OOI is a prototype for how such a network can be built, operated, expanded and maintained, while at the same time provide large quantities of multidisciplinary data, information and knowledge. The Economist article noted: “Revolutions in science have often been preceded by revolutions in measurement” (Sinan Aral, NYU). This is a common refrain in oceanography, but is hardly confined to the field with which we are so familiar. The OOI departs in another way from the history of oceanography in that many scientists and engineers will deposit data in repositories and stream the data in real time. Others will create virtual observatories from data streams of interest or extract files from databases. These harvested data will be used for analysis, and the accompanying research even though these scientists may never see the ocean described by those data.
Such “cubicle science” is anathema in the field of oceanography, but the explosion of data accompanied by access to large-scale computing will change oceanography. If oceanography is to scale beyond the individual or individual’s laboratory to increase the scale and scope of measurements to a level and diversity needed to answer growing scientific and societal problems, change is not only inevitable, but also desirable. That’s a limited introduction. Much more to follow!
"Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?" T.S. Eliot.
- jorcutt's blog
- Login to post comments
Printer-friendly version








Knowledge versus Wisdom
We are surrounded today by learned people who know a lot, but are not very wise. So, the Eliot quote may have been accurate for its time, but I doubt it.
A more pragmatic example would be in the area of modern medicine.
We have "learned" how to keep someone alive for a very very long time, but we have lost site of the wisdom that tells us that life requires more than a pulse and brainwaves.
Nonetheless, I think this is a great blog piece and I thank the author.
That Wilde quote
I dunno; I have to say that I think that our problem lately is that there IS so much useless information out there. What we now lack is MEANING.