Thursday, October 31,
2013, 3pm
Location: Room 1110 of the Nguyen Engineering Building
Edward Wegman
School of Physics, Astronomy, and Computational Sciences
George Mason University
BIG DATA, Technology, and
Analysis
Abstract
On
March 29, 2012,
the Obama administration announced the Big Data Research and
Development
Initiative. A number of U.S.
federal agencies including the National Science Foundation, the
National
Institutes of Health, the Department of Defense, the Department of
Energy and
the U.S. Geological Survey have committed substantial additional funds
to Big
Data projects. The White House press release described the goals of the
Big
Data Initiative: “to advance the state-of-the-art core
technologies needed to
collect, store, preserve, manage, analyze, and share huge quantities of
data;
harness these technologies to accelerate the pace of discovery in
science and
engineering; to strengthen our national security, and transform
teaching and
learning; and to expand the work force to needed to develop and use Big
Data
technologies.”
It should be noted that the scale of what is considered Big Data has
been increasing steadily. Kilobytes (103),
megabytes (106),
gigabytes (109),
and terabytes (1012)
by now are familiar
to any researcher using modern computer resources. The Earth Observing
System
of the Jet Propulsion Laboratory introduced serious consideration of
petabytes
(1015).
Data collection systems looming on the horizon such as the
Large Synoptic Survey Telescope promise data on the scale of exabytes
(1018).
It is conceivable that data collection methods in the future may
generate data
sets of the scale of zettabytes (1021)
and yottabytes (1024).
The issue with big data is that while computing power doubles every 18
months (Moore’s
Law) and I/O
bandwidth increases about 10% every year, the amount of data doubles
every
year. It is clear that conventional distributed systems such as those
employed
by Google, Facebook, and JPL (distributed active archive centers) must
be
expanded to include such new technologies as hadoop and new analysis
methods. In
this lecture, I will focus on aspects of these Big Data issues.