Thursday, October 31, 2013, 3pm
Location: Room 1110 of the Nguyen Engineering Building

Edward Wegman
School of Physics, Astronomy, and Computational Sciences
George Mason University

BIG DATA, Technology, and Analysis


On March 29, 2012, the Obama administration announced the Big Data Research and Development Initiative. A number of U.S. federal agencies including the National Science Foundation, the National Institutes of Health, the Department of Defense, the Department of Energy and the U.S. Geological Survey have committed substantial additional funds to Big Data projects. The White House press release described the goals of the Big Data Initiative: “to advance the state-of-the-art core technologies needed to collect, store, preserve, manage, analyze, and share huge quantities of data; harness these technologies to accelerate the pace of discovery in science and engineering; to strengthen our national security, and transform teaching and learning; and to expand the work force to needed to develop and use Big Data technologies.”

It should be noted that the scale of what is considered Big Data has been increasing steadily. Kilobytes (103), megabytes (106), gigabytes (109), and terabytes (1012) by now are familiar to any researcher using modern computer resources. The Earth Observing System of the Jet Propulsion Laboratory introduced serious consideration of petabytes (1015). Data collection systems looming on the horizon such as the Large Synoptic Survey Telescope promise data on the scale of exabytes (1018). It is conceivable that data collection methods in the future may generate data sets of the scale of zettabytes (1021) and yottabytes (1024). The issue with big data is that while computing power doubles every 18 months (Moore’s Law) and I/O bandwidth increases about 10% every year, the amount of data doubles every year. It is clear that conventional distributed systems such as those employed by Google, Facebook, and JPL (distributed active archive centers) must be expanded to include such new technologies as hadoop and new analysis methods. In this lecture, I will focus on aspects of these Big Data issues.