New system will run the Hadoop file system on commodity servers and include search, discovery, and analysis tools.
The National Security Agency is taking a cloud computing approach in developing a new collaborative intelligence gathering system that will link disparate intelligence databases.
The system, currently in testing, will be geographically distributed in data centers around the country, and it will hold "essentially every kind of data there is," said Randy Garrett, director of technology for NSA's integrated intelligence program, at a cloud computing symposium last week at the National Defense University's Information Resources Management College.
The system will house streaming data, unstructured text, large files, and other forms of intelligence data. Analysts will be able to add metadata and tags that, among other things, designate how securely information is to be handled and how widely it gets disseminated. For end users, the system will come with search, discovery, collaboration, correlation, and analysis tools.
The intelligence agency is using the Hadoop file system, an implementation of Google's MapReduce parallel processing system, to make it easier to "rapidly reconfigure data" and for Hadoop's ability to scale.
The NSA's decision to use cloud computing technologies wasn't about cutting costs or seeking innovation for innovation's sake; rather, cloud computing was seen as a way to enable new scenarios and unprecedented scalability, Garrett said. "The object is to do things that were essentially impossible before," he said.
NSA's challenge has been to provide vast amounts of real-time data gathered from intelligence agencies, military branches, and other sources of intelligence to authorized users based on different access privileges. Federal agencies have their own systems for sharing information, but many remain disconnected, while community-wide systems like Intellipedia require significant user input to be helpful.
The NSA effort is part of Intelligence Community Directive 501, an effort to overhaul intelligence sharing proposed under the Bush administration. Current director of national intelligence Dennis Blair has promised that intelligence sharing will remain a priority.
"The legacy systems must be modernized and consolidated to allow for data to actually be shared across an enterprise, and the organizations that collect intelligence must be trained and incentivized to distribute it widely," he said in response to questions from the Senate prior to his confirmation.
The new system will run on commodity hardware and "largely" on commercial software, Garrett said. The NSA will manage the arrayed servers as a pool of resources rather than as individual machines.