Jennifer Beal
Jennifer Beal
Events & Ambassador Manager, Wiley

We continue our Data Week with coverage of an insightful "Who's afraid of big data" session from the recent  ALPSP conference.

 

Source: DrAfter123 / Thinkstock
Source: DrAfter123 / Thinkstock

Ah Big Data, how things have changed!  Ten years ago a data talk at a conference would have been nearly empty, but these days it’s standing room only.  And so kicked off an interesting – but sometimes terrifying – session on big data at the ALPSP conference held earlier this month, chaired by Wiley’s Fiona Murphy, Life Sciences Journal Publisher. The first panelist, Eric T. Meyer from the University of Oxford, gave us a brief history of big data-making the earlier noted comparison in the amount of people interested-and stated that big data was now the “sexiest thing in the room.” He certainly managed to grab the audience’s attention with some examples of how big data is being used, including how a major retailer deduced that a teenage girl was pregnant before her father did based upon her buying habits, how Google’s autocomplete search can be mined to reveal differences in cultural perspectives and that even a general election result can be predicted by Wikipedia page views.  Meyer also highlighted that big data differs by discipline. For instance, while in physics and astronomy big data has been around for years,other disciplines are still experimenting with it.. For those of us not quite sure of when data becomes BIG, Meyer tells us “if it’s easily handled on your laptop, it’s not big data!”

The next speaker, David Kavanagh, Managing Director of Scrazzl, described how Scrazzl was created as a “social discovery platform”  to harness the wisdom of crowds. It uses data to validate research-reliant decisions including choices between experimental protocols and the development of research collaborations. The key challenge is that the scholarly record is not full of big data per se, but big unstructured data which needs to be categorized and standardized to be usable. If you direct computational power at structured data, you can get instant results, but unstructured data needs to be processed first, which can generally only be done by human intervention.  With challenges such as turning 380 million unstructured resource matches in PubMed into actionable points, the Scrazzl team have been working to find ways of making this process as automated, and therefore as cost efficient, as possible.

The final speaker, Paul F. Uhlir from the Board on Research & Information at The National Academies, addressed the potential threats of big data, sometimes painting a bleak picture of the future.  For instance, big data is the ‘fuel’ of driverless cars, which will lead to fewer accidents – but what will happen to the car mechanics and insurance companies?  Will Big Data lead to greater inequality?: a powerful means to solidify existing gaps in political or financial power? With a small numberof companies holding most of the data, Uhlir ponders the implications. What if these companies are able to make stock trades seconds before others?   There is also the challenge of complexity: huge amounts of data give us more information but potentially less knowledge – the more we see, the less we know.  It was not all threats, however.  Uhlir also identified areas of current weakness (and therefore of potential business opportunities) as big data becomes….bigger.  For instance, how should data be reviewed, what are the impacts on traditional metrics, and how do we best share data?  And what of the human side; time and education?

The questions arising in this panel discussion showcased many of the issues which need to be addressed in big data and publishing.  The question of whether researchers should be mandated to give data was answered with ‘it depends’; there are some disciplines where this is more appropriate, but the panel agreed that a default starting point of open data would be the ideal approach.  What can publishers do to support issues around data? Underpinning all suggestions were the following needs: cross-publisher collaboration to establish standard data identifiers and labels which authors would have to use at submission stage, established standards for citing data, and standard categorization of author roles on papers, so that researchers can be credited for all different types of involvement.

Were you at this session or do you have thoughts on data you'd like to share?  Leave us a comment below or tweet @WileyExchanges