Wiley's Executive Seminar takes place today,and over the next week we will be featuring issues, trends, and key learning points from the seminar. This year's theme is "The Next Big Thing" and, below, seminar speaker Scott Lachut starts us off discussing human engagement with big data.
What if the scientific method were no longer necessary? A bold statement, but one that Wired’s Chris Anderson posits in his 2008 article, extolling the virtues of “Big Data.” He writes, “Petabytes allow us to say: ‘Correlation is enough.’ We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.” Simply put, take your Excel spreadsheets, throw them in an algorithmic blender, chill your insights and serve.
In the five plus years since Anderson made this claim, the process of pure science remains largely unchanged, though we’re beginning to see early signs of the data revolution beginning to influence the research process. What’s more, its impacts extend beyond the hard sciences into the social sciences and humanities as well, creating new opportunities for academics of every stripe to look for answers at a scale never before possible. Consider that Google alone has digitized more than 30 million texts as of April 2013, creating a vast trove of knowledge accessible by anyone on the planet. And this is just the tip of the proverbial data iceberg, which grows with every Tweet, like, selfie, step logged and internet-enabled device.
On top of this stream of information and updates that are permeating nearly every facet of society and daily life, we’re seeing a host of new initiatives that recognize the value in making data and research archives available to the wider community in the hopes of unleashing new breakthroughs. The Health Data Exploration project from the California Institute for Telecommunications and Information Technology is looking to bring together the disparate silos of data being generated by personal health devices and startups in the space with more traditional information gathered by the medical community to create a comprehensive resource for research and study. Similarly, UK startup Thingful, is working to create a search index for the Internet of Things, providing access to datasets and information streams of the world’s connected objects. Beyond the data, the site’s pages are centered on Twitter-profiles, building a community of people willing to discuss why and how they are using the devices that they add to the index.
Yet despite the sheer volume of raw information available, the vast majority of it - less than 1% has been analyzed according to the Guardian - lacks any significant meaning. Up until recently, researchers have lacked a sophisticated and intuitive enough set of tools to allow anyone, regardless of skill level, to search for answers within this data. For researchers lacking a computer science degree, nuanced data reporting usually begins and ends with a few rows and columns on a spreadsheet, and maybe a formula or two, but even these somewhat antiquated systems can be given new life with the right set of tools. Kinetica is an iPad application that lets users ‘physically’ manipulate data from a spreadsheet, remixing the points into highly visual charts and graphs. Developed by the Human-Computer Interaction Institute at Carnegie Mellon University, the platform was designed to help people uncover new patterns and connections within their data.
While it’s important to consider how to mine insights from fresh research, what are researchers to do with the legacy of information that already exists? Science has classically been adept at handling numbers and stats, but it is only beginning to ascertain how to unearth meaning from other types of content, particularly written text. However, as advanced algorithms become better at understanding human language and sentiment, there is an entirely new set of possibilities for what can be accomplished. Researchers need only ask the right questions. To that end Baylor College of Medicine partnered with IBM to develop software that could read back through 60,000 research papers focused on p53, a protein that is present in most cancers, to look for new discoveries that could be applied to novel drug development. While still in early stage trials, the software has proven to be successful at making connections between disparate pieces of research, identifying isolated facts that many human researchers might otherwise miss.
As the speed of information increases, there is an emerging school of thought that says that knowledge must also follow suit to enable researchers to stay on top of important trends as they happen, and one day even predict future outcomes. In order to achieve this, organizations are combining powerful analytics platforms with visual reporting tools to provide detailed snapshots of real-time data. In the case of Recorded Future, the company’s algorithms scour the web to find data associated with particular events or corporations, and plot that information along a graphic timeline. This highly curated point of view arms their clients with the intelligence to anticipate change and react accordingly.
Looping back to Anderson’s original premise on the ascendency of Big Data, it’s becoming clear that these tools are a valuable addition to existing research methodologies, particularly where large amounts of information are in play, but not a replacement for them. Superseding the human element seems to equate rational analysis with nuanced thinking and quantifiable results with understanding – two aspects that we’ll continue to rely on academics, scientists and researchers to interpret for us.