Here is a FAQ that came to us from a recent community workshop:
Q. How is CI pulling together OOI data and other data like IOOS, NDBC, and other sources?
A. We designed the OOI Cyberinfrastructure to manage and integrate data from many different sources, including OOI sensors, select ocean community models, and external observatories. We have been specifically funded to integrate data from three external observatories: the Integrated Ocean Observing System (IOOS), the World Meteorological Organization (WMO), and NEPTUNE Canada. The integration project schedules are staggered; we have started working on IOOS integration already, with NEPTUNE Canada integration starting in the middle of 2011. Other observatories could be added through new user requirements or additional funding from other sources. Technical details are described on the OOI CI architecture and development web pages as development proceeds.
We're cruising along in Cyberinfrastructure land, and there are always more things to do than we can keep up with. This is the last week of our 3rd and last 8-week iteration in the Elaboration phase of the first release of the project. What this means in plain English is it's the first time we are supposed to be able to show a solid base of working code at a public review.
But meanwhile, we really want to get the new web site up and running, and in front of the you the reader.
So I'm putting in some time to help the web team get the project out the door. Might be tomorrow, might be Monday, but you'll be seeing it soon.
You'll probably see some things that don't work right—well, you might see some things that don't work right, we've gone over it a fair bit—and we'll be fixing those in the coming days. Meanwhile, bear with us as we pull together the trifecta delivery in the next two weeks, then we'll be back with more to see and do.
Happy reading!
My first job at Scripps was as a shipboard computer technician taking care of computer needs, sonar operations, communications systems, and often times deck work on the large SIO research vessels. I quickly learned that a slightly different breed of scientist and tech are found on ships. The job of these sea-going personnel usually involves deploying very expensive, usually custom one-off equipment, often near the vicinity of to the sea floor. As equipment descends hundreds or thousands of meters, the pressure on the instrument package ratchets up, the salt water begins to wear on the materials, and the creatures of the sea start to take notice of the potential habitat. Throughout this process, there are plenty of opportunities for something to go wrong. Cables may part, housings may implode, electronics may short, acoustic releases may not work, gear may be installed wrong, O-rings may leak, you name it. And when something does go wrong, you cant just open it up to fix it right then and there.
The successful scientists and techs are the ones who go through a deployment with a deep calmness. The relaxation is is all about knowing that they have built a robust, deployable package and tested it thoroughly. When something doesnt make sense during development, it is investigated until understood. High quality parts are used. Backup plans are made. The what-if scenarios are thoughtfully considered and plans made for each situation. That's the idea anyways.
It was with this intent that our team of people from various SIO projects assembled a seismometer to attach to the MARS cabled observatory at the Monterey Bay Aquarium Research Institute (MBARI). The parts were robust and well tested on other projects in the past. The gear was assembled, the software was checked and rechecked, configurations were fine tuned, and the requirements were reviewed. The package consisted of a sensor ball to measure movement, an electronics cylinder to house the data logger and networking equipment, and a connector to plug into the MARS switch. The whole package was put on a frame and designed for easy deployment with an ROV.
When the package was ready to go, we did a wet test in the MBARI test tank. With our seismometer's need for precise and accurate timing, we spent the first day verifying that the timing signals we were expecting during the sea deployment were going to work with our gear. The test tank connection offered a simulation of those signals so that we could be sure the seismometer could lock up with the correct high precision time. When we were finally convinced that what we had worked with the simulated signals, we went home and planned to deploy to the sea floor in a couple of weeks. We could only hope that the simulator timing signals had the same characteristics as the real MARS signals...
Deployment day rolled around and our team headed out to sea. With the help of the Point Lobos crew, our package (complete with carefully calculated amounts of float packs attached) was deployed to the sea floor. Chasing it was the ROV Ventana. When everything was safely at the bottom, operations commenced to connect our package to the MARS node. Gear was shuffled about by the ROV, cables were plugged in carefully....and it didnt work due to an electrical short. Despite the testing and attention to details during the development of the gear, we had some bad luck. After much theorizing and testing what we could from the end of the cable, the seas started picking up and we had to leave the site before we could recover the gear for further work. With our package in the ocean, we would have to come back in a few days to recover it and investigate. Thankfully we were able to retrieve the instrument and pressure housing a couple of days later. With the equipment back on shore, we found the problem, corrected it, and scheduled another deployment.
During the next deployment, everything went safely and smoothly to get the equipment on the sea floor. The gear was successfully plugged into the MARS node and powered up. The sensor and electronics all checked out okay and started sending data home. However, the timing was off. Is the issue in the electronics inside our package? Maybe the software running down on the sea floor? The timing pulse through the MARS node? Does it even exist? Is it the same as the simulator? Theories abound, tests have been done, and changes have been tried. There is still more work to be done to get synchronized seismic data from the sea floor. The next opportunity to investigate the connection will hopefully be soon.
-Steve Foley
The MARS seismometer project is funded by the NSF as a testbed for the OOI CI. "ITR: Collaborative Research: Looking Ahead: Designing the Next Generation Cyber-infrastructure to Operate Interactive Ocean Observatories," or "LOOKING." NSF award 0427974.
It was a busy week on the OOI CI "main campus" at UC San Diego. All the design leads and senior developers were on the scene, coming together for the first time to discuss the system we are about to build, and plan our next team activity in late June. Later in the week, the entire senior management of the whole OOI met for two solid days (!) to prepare for the National Science Foundation annual review, which will be held in … the same week in late June.
These double-bookings are inevitable on a project with hundreds of participants and a budget of several hundred million dollars. Technical coordination, program coordination, and letting our funders make sure we're doing the 'right things right' means we'll just have to get used to meetings and meeting scheduling issues.
What struck me this week is the quality of the team, and the quality of the collaboration, that is being created.
I remember some of the first meetings between the teams—marine scientists and computer scientists from different institutions and backgrounds, trying to find ways to work together. Tensions ran high on more than a few occasions. You can still see tension, a bit, but the comaraderie and collaborative spirit seems to grow by the week, and I can sense the excitement of putting together this kind of observatory is starting to take over.
Meanwhile, in our Cyberinfrastructure team, we've gone from desperately understaffed to noticeably understaffed, a considerable improvement. (We started this project sooner than was planned, thanks to stimulus funding, and we're still scrambling to catch up.) With all of the key development systems represented by at least one senior developer, the pace has jumped plenty in the last month. The quality of our own CI team is remarkable so far, and everyone is eager to begin rolling out "real software" that will make good science happen.
We'll be hiring more people over the next few months—watch for announcements on this site or on the UCSD job web site, some positions are open as I write this—and that will be exciting too, even if it takes time away from the software work. Putting together the right team to build this kind of system is its own little project, and we're going to get that done as quick as we can. When we're done itsa gonna be sumpin'.
"It is a very sad thing that nowadays there is so little useless information." Oscar Wilde, 1894.
This is the first blog written for the OOI CI web site. I’m John Orcutt, the PI for the NSF Ocean Observatories Initiative Cyberinfrastructure Implementing Organization. Generally, the program is referred to as the OOI CI and the acronym will be used to reduce the length of the mouthful in the second sentence. The new web site can be reached through the parent organization, the Consortium for Ocean Leadership, or COL. You will find, with time, that the site and the blog will allow you to drill much more deeply into the thousands of pages comprising the requirements, architecture, meeting notes, presentations, change control board actions and much more. The web site will seek to provide a higher-level, much more readable view of what the OOI CI does. The web site and accompanying Facebook page, Twitter and other social networking connections seek to provide information and encourage interaction. Several of us will be writing blogs in this location with the intent of providing a broad overview in plain English (perhaps even good English!) of much of the underlying complexity of the OOI CI goals, construction progress, testing and annual software releases.
Today, I am going to discuss data in broad terms and hint at the capabilities of the OOI CI in this context. You’re all familiar with the stately Library of Congress, the largest modern library in the world. The digitized portion of the books in the LoC is about 10 TeraBytes (10 TB); if all the books were digitized the collection would comprise something like 25 TB. A byte is a collection of eight 1’s and 0’s, which can be used to identify as many as 256 things. Often a byte is used to designate a character such as a or A, the ascii alphabet. Anyway, a TeraByte is 25,000,000,000,000 bytes or, in scientific notation 25x1012 bytes. These sizes and terms have a hierarchy:
Kilo = 103 = 1000
Mega = 106 = 1,000,000
Giga = 109 = 1,000,000,000
Tera = 1012 = 1,000,000,000,000
Peta = 1015 = 1,000,000,000,000,000
Exa = 1018 = 1,000,000,000,000,000,000
Zetta = 1021 = 1,000,000,000,000,000,000,000
Yotta = 1024 = 1,000,000,000,000,000,000,000,000
While still not common, the Internet today can transfer data from place to place at speeds of 10Gbps/s or even ten and a hundred times that rate. However, commercially, 10Gbps represents the best civil off the shelf (COTS) fiber optic and switch/router technology. Notice I just changed the notation; 10Gbps is 109 bits per second and is only an eighth of the size of 10GB/s. If we were to use one of these modern networks to transfer all the books at the LoC to La Jolla as an example, this would require 200,000,000,000,000/10,000,000,000 seconds, or 5 hours and 36 minutes. Establishing a copy of the Library in La Jolla would be fairly inexpensive. I can now buy a TB disk drive for about $100, so the costs of the drives would only be about $2,500--not a very large investment! In addition to books in the LoC there are also something like 120TB of film, images and video available as well. Transferring this information to La Jolla would take a bit more than a day: 26 hours & 42 minutes--still not a formidable task. While state of the art networks and storage (nevertheless COTS) can achieve this modern miracle, getting all this information and knowledge into the office or home is a much more difficult task today.
The federal government has a new National Broadband Policy and they’ve been testing network speeds to/from homes over the past several months. As an aside, if you go to this site, you can also check your own connection rates and this will be added to the database. The news is not encouraging. The national average network speed to homes is 3.9Mbps, or about 0.04% of the modern 10 Gbps network. This statistic places the US in 18th place globally, while all of the top ten countries are in Asia and Europe. South Korea is fastest (14.6Mbps) and Japan’s half this at 7.9Mbps. The fastest network in the US is in Sandy, UT with an average speed of 32.7Mbps! Amsterdam recently installed a citywide network operating at 1Gbps including houseboats and Google has a contest to install a 1Gbps Internet in a few cities in the US to prime the pump. FCC regulations as well as competition will be required, however, to get modern networking the last mile to the home.
It can be argued, in terms of industrial competitiveness, that providing such open and clear paths to information, knowledge and wisdom is essential. As an aside, there is a great blog that deals with computing, networking and public policy (Ars Technica). It’s worth a try and even joining; nevertheless you can also find a lot of detrractors. Astronomy is seeing an enormous increase in data available for study. The Sloan Sky Survey began in 2000 with the opening of its telescope in New Mexico. In the first few weeks of operation, Sloan accumulated more digital data that had previously been collected in the entire field. Its archive now comprises 140TB of data (about the size of the LoC). When the new Large Synoptic Survey Telescope begins operation in Chile in 2016, the system will collect that much data in five days! The Large Hadron Collider at CERN, which has been in the news this past year with magnet problems, generates 40 TB/s of data during collisions. The International Data Corporation (IDC) has estimated that 1.2 zettabytes of data in all areas will be created this year alone. At UCSD researchers assert that in 2008, 3.6 zettabytes of data were sent to households in the US in the form of TV and games.
This is a fairly significant per capita consumption of 34 GB/person/day. Peter Lyman at UCB estimated that only six exabytes were generated in 2002. The numbers here are not perfectly consistent, but lead to the conclusion that the amounts of data or information are very large and growing at an extraordinary rate. It’s also encouraging that the US population reads more; TV led to a decrease in reading, but there is good evidence that reading has tripled since 1980 given ready access to huge amounts of information. The Economist published a special section on "The Data Deluge" in February of this year--it’s definitely worth a read. You may have access to the Economist, but if not, you can read it here. Data and information today are nearly synonyms, while knowledge derives from strands of data and information.
The OOI CI architecture has been designed to deal not only with data, but the developed knowledge. As is clear from the previous paragraph, the world is becoming more information rich or, from a different point of view, swamped. However, this growth has a very substantial effect on science and, for that matter, business. WalMart manages more than a million customer transactions each hour and the company is possibly managing nearly three petabytes of data and information. If you look back at the first part of this blog, you can see that this is nearly 200 times as much data as all the holdings of the world’s largest library. An example closer to home (or at least to the OOI), climate is becoming an increasingly important part of our future, including adaptation, mitigation and even engineering. Verifying a climate treaty will require a much more complete and advanced observational system than is in place today.
The OOI is a prototype for how such a network can be built, operated, expanded and maintained, while at the same time provide large quantities of multidisciplinary data, information and knowledge. The Economist article noted: “Revolutions in science have often been preceded by revolutions in measurement” (Sinan Aral, NYU). This is a common refrain in oceanography, but is hardly confined to the field with which we are so familiar. The OOI departs in another way from the history of oceanography in that many scientists and engineers will deposit data in repositories and stream the data in real time. Others will create virtual observatories from data streams of interest or extract files from databases. These harvested data will be used for analysis, and the accompanying research even though these scientists may never see the ocean described by those data.
Such “cubicle science” is anathema in the field of oceanography, but the explosion of data accompanied by access to large-scale computing will change oceanography. If oceanography is to scale beyond the individual or individual’s laboratory to increase the scale and scope of measurements to a level and diversity needed to answer growing scientific and societal problems, change is not only inevitable, but also desirable. That’s a limited introduction. Much more to follow!
"Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?" T.S. Eliot.
"It is a very sad thing that nowadays there is so little useless information." Oscar Wilde, 1894.
This is the first blog written for the OOI CI web site. I’m John Orcutt, the PI for the NSF Ocean Observatories Initiative Cyberinfrastructure Implementing Organization. Generally, the program is referred to as the OOI CI and the acronym will be used to reduce the length of the mouthful in the second sentence. The new web site can be reached through the parent organization, the Consortium for Ocean Leadership, or COL. You will find, with time, that the site and the blog will allow you to drill much more deeply into the thousands of pages comprising the requirements, architecture, meeting notes, presentations, change control board actions and much more. The web site will seek to provide a higher-level, much more readable view of what the OOI CI does. The web site and accompanying Facebook page, Twitter and other social networking connections seek to provide information and encourage interaction. Several of us will be writing blogs in this location with the intent of providing a broad overview in plain English (perhaps even good English!) of much of the underlying complexity of the OOI CI goals, construction progress, testing and annual software releases.
Today, I am going to discuss data in broad terms and hint at the capabilities of the OOI CI in this context. You’re all familiar with the stately Library of Congress, the largest modern library in the world. The digitized portion of the books in the LoC is about 10 TeraBytes (10 TB); if all the books were digitized the collection would comprise something like 25 TB. A byte is a collection of eight 1’s and 0’s, which can be used to identify as many as 256 things. Often a byte is used to designate a character such as a or A, the ascii alphabet. Anyway, a TeraByte is 25,000,000,000,000 bytes or, in scientific notation 25x1012 bytes. These sizes and terms have a hierarchy:
Kilo = 103 = 1000
Mega = 106 = 1,000,000
Giga = 109 = 1,000,000,000
Tera = 1012 = 1,000,000,000,000
Peta = 1015 = 1,000,000,000,000,000
Exa = 1018 = 1,000,000,000,000,000,000
Zetta = 1021 = 1,000,000,000,000,000,000,000
Yotta = 1024 = 1,000,000,000,000,000,000,000,000
While still not common, the Internet today can transfer data from place to place at speeds of 10Gbps/s or even ten and a hundred times that rate. However, commercially, 10Gbps represents the best civil off the shelf (COTS) fiber optic and switch/router technology. Notice I just changed the notation; 10Gbps is 109 bits per second and is only an eighth of the size of 10GB/s. If we were to use one of these modern networks to transfer all the books at the LoC to La Jolla as an example, this would require 200,000,000,000,000/10,000,000,000 seconds, or 5 hours and 36 minutes. Establishing a copy of the Library in La Jolla would be fairly inexpensive. I can now buy a TB disk drive for about $100, so the costs of the drives would only be about $2,500--not a very large investment! In addition to books in the LoC there are also something like 120TB of film, images and video available as well. Transferring this information to La Jolla would take a bit more than a day: 26 hours & 42 minutes--still not a formidable task. While state of the art networks and storage (nevertheless COTS) can achieve this modern miracle, getting all this information and knowledge into the office or home is a much more difficult task today.
The federal government has a new National Broadband Policy and they’ve been testing network speeds to/from homes over the past several months. As an aside, if you go to this site, you can also check your own connection rates and this will be added to the database. The news is not encouraging. The national average network speed to homes is 3.9Mbps, or about 0.04% of the modern 10 Gbps network. This statistic places the US in 18th place globally, while all of the top ten countries are in Asia and Europe. South Korea is fastest (14.6Mbps) and Japan’s half this at 7.9Mbps. The fastest network in the US is in Sandy, UT with an average speed of 32.7Mbps! Amsterdam recently installed a citywide network operating at 1Gbps including houseboats and Google has a contest to install a 1Gbps Internet in a few cities in the US to prime the pump. FCC regulations as well as competition will be required, however, to get modern networking the last mile to the home.
It can be argued, in terms of industrial competitiveness, that providing such open and clear paths to information, knowledge and wisdom is essential. As an aside, there is a great blog that deals with computing, networking and public policy (Ars Technica). It’s worth a try and even joining; nevertheless you can also find a lot of detrractors. Astronomy is seeing an enormous increase in data available for study. The Sloan Sky Survey began in 2000 with the opening of its telescope in New Mexico. In the first few weeks of operation, Sloan accumulated more digital data that had previously been collected in the entire field. Its archive now comprises 140TB of data (about the size of the LoC). When the new Large Synoptic Survey Telescope begins operation in Chile in 2016, the system will collect that much data in five days! The Large Hadron Collider at CERN, which has been in the news this past year with magnet problems, generates 40 TB/s of data during collisions. The International Data Corporation (IDC) has estimated that 1.2 zettabytes of data in all areas will be created this year alone. At UCSD researchers assert that in 2008, 3.6 zettabytes of data were sent to households in the US in the form of TV and games.
This is a fairly significant per capita consumption of 34 GB/person/day. Peter Lyman at UCB estimated that only six exabytes were generated in 2002. The numbers here are not perfectly consistent, but lead to the conclusion that the amounts of data or information are very large and growing at an extraordinary rate. It’s also encouraging that the US population reads more; TV led to a decrease in reading, but there is good evidence that reading has tripled since 1980 given ready access to huge amounts of information. The Economist published a special section on "The Data Deluge" in February of this year--it’s definitely worth a read. You may have access to the Economist, but if not, you can read it here. Data and information today are nearly synonyms, while knowledge derives from strands of data and information.
The OOI CI architecture has been designed to deal not only with data, but the developed knowledge. As is clear from the previous paragraph, the world is becoming more information rich or, from a different point of view, swamped. However, this growth has a very substantial effect on science and, for that matter, business. WalMart manages more than a million customer transactions each hour and the company is possibly managing nearly three petabytes of data and information. If you look back at the first part of this blog, you can see that this is nearly 200 times as much data as all the holdings of the world’s largest library. An example closer to home (or at least to the OOI), climate is becoming an increasingly important part of our future, including adaptation, mitigation and even engineering. Verifying a climate treaty will require a much more complete and advanced observational system than is in place today.
The OOI is a prototype for how such a network can be built, operated, expanded and maintained, while at the same time provide large quantities of multidisciplinary data, information and knowledge. The Economist article noted: “Revolutions in science have often been preceded by revolutions in measurement” (Sinan Aral, NYU). This is a common refrain in oceanography, but is hardly confined to the field with which we are so familiar. The OOI departs in another way from the history of oceanography in that many scientists and engineers will deposit data in repositories and stream the data in real time. Others will create virtual observatories from data streams of interest or extract files from databases. These harvested data will be used for analysis, and the accompanying research even though these scientists may never see the ocean described by those data.
Such “cubicle science” is anathema in the field of oceanography, but the explosion of data accompanied by access to large-scale computing will change oceanography. If oceanography is to scale beyond the individual or individual’s laboratory to increase the scale and scope of measurements to a level and diversity needed to answer growing scientific and societal problems, change is not only inevitable, but also desirable. That’s a limited introduction. Much more to follow!
"Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?" T.S. Eliot.
We are surrounded today by learned people who know a lot, but are not very wise. So, the Eliot quote may have been accurate for its time, but I doubt it.
A more pragmatic example would be in the area of modern medicine.
We have "learned" how to keep someone alive for a very very long time, but we have lost site of the wisdom that tells us that life requires more than a pulse and brainwaves.
Nonetheless, I think this is a great blog piece and I thank the author.
In November of 2009 we conducted an Observing System Simulation Experiment (OSSE) to test the capabilities of the CI using a distributed ocean observing network in the Mid-Atlantic Bight. For this effort, the team tested the Planning and Prosecution CI software, which provides the ability to monitor and control individual components within an ocean observing network. The CI software coordinates and prioritizes the shared resources, allows for reconfigurable tasking, and enables autonomous execution of observation plans of the fixed and mobile platforms.
A distributed community of ocean scientists provided OOI CI team daily adaptive guidance for a taskable satellite, a fleet of four autonomous Slocum gliders, and a multi-vehicle network of autonomous underwater vehicles. The scientists used the coupled system to study physical forcing of the fall/winter phytoplankton bloom dynamics, which is under-sampled due to the harsh field conditions that hinder traditional sampling techniques.
Efforts were coordinated through a web portal that provided an access point for the observational and model predictions. The development of the portal was simplified by the use of IOOS standard web services for gridded data (OPeNDAP with CF conventions). During this experiment, summaries were distributed daily that described both the atmospheric and oceanographic conditions. The numerical models could be assessed individually or combined as multi-model ensembles; the model performances were evaluated in real-time against the satellite, shore-based radar, and in situ glider measurements. Users could view the model and observation comparison using the web portal.
This infrastructure was used to conduct the following two tests of the CI planning and prosecution capabilities.
1. Using CI software, we remotely coordinated an underwater network of autonomous underwater vehicles (AUVs). Scientists onshore in New Jersey used ocean color satellite data to define an area of operations; this was forwarded to planners at the NASA Jet Propulsion Laboratory in California, who then emailed AUV deployment missions back to the boat teams at sea off New Jersey. Acoustic modems on the AUVs (which enabled underwater communications between vehicles) as well as ship communications via a gateway buoy provided operators with command and control capabilities. Using this communications network, information from the AUVs, including position, speed, heading, and some scientific sensor readings was published on Google Earth and distributed to a wider community of scientists in real time. The AUVs themselves were outfitted with CI software that allowed the vehicles to autonomously adapt to the environmental features measured by their scientific sensors.
2. We enabled coordinated sampling between underwater gliders and the space based Hyperion imager flying on the Earth Observing One spacecraft. The Hyperion images are typically 7.5 km (across track) by over 100km (along track), and resolve 220 spectral bands from 0.4 to 2.5 microns with a spatial resolution of 30m. This small spatial footprint makes it difficult to ensure that in situ assets are present for calibration. The Hyperion is a task-able platform, and therefore an alternative approach would be to mobilize in situ assets and simultaneously adjust the satellite swath to be coincident. During the field experiment, both observational data and multi-model forecasts were analyzed to determine the tasking location for the satellite. These coordinates were used by the EO-1 web cabapility to re-task the spacecraft. The 48 hour model forecast was then used by CI software to plan the optimal path to co-locate any gliders within the tasked EO-1 Hyperion swath. Two gliders were successfully moved to the swath; other gliders that were not capable of reaching the swath were diverted to accomplish other science missions. This represents a major technology breakthrough in simultaneously coordinating satellite and underwater assets guided by multi-model forecasts. It provided a machine-to-machine interactive loop driven by a geographically distributed group of scientists.








I dunno; I have to say that I think that our problem lately is that there IS so much useless information out there. What we now lack is MEANING.