We're closing out our data week with an interview with Mark Hahnel, founder of figshare, a Digital Science company which enables researchers to publish their data in a citable, searchable and shareable manner. Also-just an update, the special issue of Learned Publishing on data, covered by LIz Ferguson in "Everybody loves data" is now freely available online. 


Q. Why did you decide to start figshare?

A. To begin with, my Digital Science profile describes me as “Genuinely passionate about open science and the potential it has to revolutionize the research community” I started figshare because I was frustrated by so many aspects of the academic system. It seemed both important and possible to address the opening up of digital content – ultimately the way that the funders and policy makers are going, it will need to be shared at some point anyway. The traditional research ecosystem is already being disrupted by the internet – I want to try and evolve it in a way that doesn’t ruin careers.

Q. What are the implications of open data?

A. Opening up research data has the potential to both save lives (say with medical advances) and to enhance them with socio-economic progress. It’s a space where humans and computers can work symbiotically, and where industry can also benefit. As an aside, we’ve just updated the figshare homepage to show the images of the Dreadnoughtus dinosaur remains – which can also be downloaded and printed off by a 3D printer from figshare. This is something that would have been unthinkable even a short time ago – where paleontologists were unwilling to share their fossils. Now clever content, and developments such as Jean-Claude Bradley’s Open Notebook Science illustrate that it is the right time to move in this direction. Publishers and institutions can both benefit from such developments and figshare is continuing to build new partnerships with learned societies, libraries and data providers. Neelie Kroes of the European Commission has stated how important it is for data to be open and discoverable, and of course the OSTP announcement in the US demonstrates similar momentum.

Q. Practically speaking, what does opening data involve from your point of view?

A. Change seems to occur at different rates among scholarly communities. We’re currently noticing a two-pronged spike, if you like, where early career researchers really get what we’re doing, and embrace the ‘open science’ rationale. Similarly, those who have achieved tenure and become very established in their careers are thinking about legacy and subject heritage, and are similarly inclined to share. However, there’s a middle group – post-docs, lecturers, and so forth – who are very much absorbed by the current system of academic accreditation and who don’t feel able to spare the time or take the risk with their data. It’s this group that would most benefit from changes in the academic reward system.

I’ve got a slide which I call “The Six Steps of Open Access” that describes the stages I think we all have to go through (including funders) in order to maximize the benefits of open science. They are:


    1. Recommend Open Access for Primary Publications


    1. Recommend Open Data


    1. Mandate Open Access for Primary Publications


    1. Mandate Open Data


    1. Enforce Open Access for Primary Publications


    1. Enforce Open Data


Currently, I think we’re in the throes of step 4, but I believe that in 5 years’ time we will be safely through all the steps. By then, the technology to enable policing will be well in place.

Q. How do you see open science progressing?

A. I believe that communication is key in order to progress this. The more emphatically we can demonstrate increased impact to researchers as a result of open science, the more attractive this paradigm will become. At the moment there are too many papers, a tidal wave of research content, and we need more filters and more finely-tuned incentives to encourage the right behaviors. Impact factor was a good measure during the print-age, but it doesn’t enable you to dig into the story behind a piece of research. A paper may have been cited many times because it’s wrong, for instance, whereas altmetrics are increasingly looking to provide nuance around positive vs negative citations and a fuller context; in other words,they capture the ‘digital footprint’ (as coined by Jason Priem in ‘Scholarship: beyond the paper’).  Similarly, computational subjects should be able to credit researchers as much for code as for papers, there should also be a system for differentiating between authors’ contributions to a piece of research (in the same way that movie credits display the roles of various contributors).

At figshare, we’re currently building our institutional offering, in which we are scaling up our partnerships with libraries in particular. The libraries typically provide the critical curation layer, while we build a technology layer on top of that which supports metrics, queries, and other intelligence about the information contained. Libraries are going to be key players in the open science landscape and figshare is looking to empower them to be able to support research communities.

Q. What part can the Research Data Alliance play among all of this movement?

A. It’s generating huge momentum as it operates on a global stage and is really raising awareness about data’s importance. I’m hoping that before too long, anyone who has a niche problem or edge-case query (about, say, personal data, commercial sensitivity, long-tail data, sub-dataset citations issues, etc.) will be able to simply check the RDA website or request a response from RDA and have access to the best-practice response.

Q. What does your crystal ball show five years from now?

A. There is still a lot of work to be done in getting through to the research community, and more carrots need to be in place to encourage them to fully participate. I do feel that in five years or so the landscape will look very different with the scientific paper being more of a “wrapper”, or story, with digital files as the real essence of the material. I think altmetrics will continue to be refined and will become ubiquitous as a research impact measurement. I also feel that peer review will be similarly transformed, to become much more open and portable. I certainly see a place for publishers within this system – particularly regarding the peer review management piece. I find myself wondering whether there will be a proliferation of new journals, in that time, or if we’ll be down to one or two mega-journals. It’s certain that journals will change – we could never had had an elife, PeerJ or F1000R until recently, but I’m just not sure how existing titles are going to adapt in response to the new demands being placed upon them by various stakeholders.

But I am certain that the ‘internet of things’, smart labs, and sensors in everyday life, will be far more integrated into the knowledge canon. I’m not concerned about where content resides – I’m a proponent of the Data FAIRport principles (that data should be “findable, accessible, interoperable, re-usable”). So long as there is a persistent identifier, a sound storage policy, openness and a usable interface, this is real progress as far as I’m concerned. I also don’t believe that humans will be replaced by machines; rather, by building better infrastructures and enabling access and re-use of data, humans will be able to interrogate, clean, and re-purpose data in ways that have not been possible before.