During the last week of February, I attended the 54th annual convention of NFAIS – the National Federation of Advanced Information Services. This was my second NFAIS event, both events very high quality. NFAIS seems to be primarily academic publishers and aggregators, with a sprinkling of librarians and non-profits.
The theme was roughly this:
We are in an age of disruption. The Emerging Information Landscape is not just about digitization of existing forms of knowledge (print articles into pdfs) but the new forms of knowledge containers (datasets, multimedia, dynamically changing resources) and the sociological and cultural changes being brought about by the affordances of new technologies. From mobile devices to cloud computing, there are big changes in the way people access and use information. Everyone in any part of the information services business needs to take notice.
What’s the new normal?
The keynote address was given by John Wilbanks, Senior Fellow, Marion Ewing Kauffman Foundation. Here’s my summary of his remarks.
He talked about networks– thinking about networks is key to conceptualizing information landscapes.
Decentralized networks (with hubs and spokes) — in nature, networks generally evolve towards this kind of formation, probably because these kind of networks are hard to disrupt in a bad way.
Centralized networks — big companies like ‘em, but they are expensive. In the Old normal, we condensed our discoveries into text, mailed them to centralized publishers, who redistributed them.
Perhaps publishers thought the web would be a web-based version of this centralized model. If you are the center, you love this (everyone sends stuff to you, you don’t pay for peer review, etc.)
But, the underlying internet is distributed, and offers many more options for the distribution of information, all of which are cheaper for society, faster and easier ways to share, to disseminate.
Hanging on to a centralized network model now is giving people 2 choices – be screwed or pirate the content.
The new normal is not centralized – it’s decentralized. I can broadcast to the world to ask for help getting a copy of a paper and within hours, if not minutes, I will have people emailing me pdf copies. This is sharing info cheaply without recourse to a scholarly system of publishing, everyone does it, and yet there is a huge effort to try and fence that content behind copyright and charge high prices for it. That didn’t work so well for the music industry….
The next topic Wilbanks discussed was crowdsourcing science. He was balanced, pointing out very useful projects like OpenSNP while also saying we don’t want to celebrate the power of the crowd at the exclusion of professionals who can bring smart things to the discussion. He gave other examples of crowdsourcing – a protein folding game – a game that helps scientists solve a problem that, it turns out, humans are much better at than computers. Another example is an app called The Eatery – it’s about carby food, and just through a gamelike interaction they can collect info from 500,000 people a month. In contrast, a grant-funded science project is considered huge if it gathers data from 25,000 people in a years-long study. The cautionary tale is when crowdsourcing activity is science data being collected by companies, not academia.
Data publication is another hot topic, and Wilbanks sees this as another complex part of the information landscape (It just isn’t the simple trend people talk about as if “everyone will share all their data!”). There are lots of reasons people don’t want to share their data.
And more importantly, scientific discovery is complex — we haven’t decided epistemologically what data we need to curate. At what point does the data need to get published—for study verification, for re-purposing? The lab notebooks, the cleaned-up data, the analyzed data? We don’t know about this yet.
Curating data that people want to save, developing good practices, is a growing need in the information landscape – this is a growth occupation, much like physicians assistants were a couple decades ago. Have you ever curated data by yourself? Most people have not. People who can curate data are making HUGE amounts of money in commerce – why would they do it for academics? Scientists will not curate their own, and they do need people who have the skills to help.
So what will the future look like?
Wilbanks offered a few ideas.
Scenario#1: Radical incrementalism
Legislation will keep trying to catch up to technology, and it’s not yet decided, is data going to be open or closed? CostofKnowledge.com (a Boycott Elsevier website with plenty of wellknown scientists signed on publicly). If legislation locks down knowledge sharing, scientists will increasingly engage in civil disobedience.
Wilbanks drew our attention to Figshare. Here’s the description from the Figshare website where users can upload their data sets:
Figshare allows researchers to publish all of their research outputs in seconds in an easily citable, sharable and discoverable manner. All file formats can be published, including videos and datasets … By opening up the peer review process, researchers can easily publish null results, avoiding the file drawer effect and helping to make scientific research more efficient. Figshare uses creative commons licensing …
Figshare gives users unlimited public space and 1GB of private storage space for free.
Figshare is based in based in London and is supported by Digital Science. Digital Science’s relationship with Figshare represents the first of its kind in the company’s history: a community- based, open science project that will retain its autonomy whilst receiving support from the division.
Scenario #2: Tale of 2 sh*ts
It starts with a provocative statement – “I will cure cancer in my garage!”
Bullsh*t, you aren’t going to that!
But Wilbanks showed us how, using open data and equipment bought from ebay and other online outlets for maybe $30,000, it seems possible to do quality science “at home.”
Wilbanks wants us to note the extreme absence of traditional information players. He described a recent grant-funded (?I think) effort in which a group of volunteer graduate students created a textbook (biology, I think) in a rather short period of time, using open sources like tables of content, creative commons content and images, etc. They even found respected editors – the whole thing cost about $13,000.
Scenario #3: Weak, simple, open, together
Create resources in the information landscape that follow the principles of simplicity, open platforms that are openly extensible, weak controls over content, and fostering collaborative work.
If we intentionally design things with only the functions we need, we avoid the necessity to clobber the market with our juggernaut product.
Avoid unnecessary controls or tools. Abstract and modularize so we can add & subtract as we need to.
Open doesn’t mean unpaid.
These are the design principles coded into email, web, Wikipedia, etc.
This enables us to react to things that happen and that everyone thinks are important – for example, we can more feasibly integrate the Arab Spring into our encyclopedia.
People are trying to use legislation to set embargoes before we know where the embargo should end. If we allow enough open data while we are learning what is going on, we can do some research to determine when the value of data plateaus off, and no significant profit is lost if the data is made openly available at that point. We don’t yet have the data to determine the “value curve” for each different type of article, each journal, etc. So we get a blunt instrument (all articles embargoed from open access for 6 months). We could provide way more nuances. If we worked together we could gather the data and get policy for open access that actually works.
We don’t have enough people who can handle or who understand data to start making rules yet. Let 1000 flowers bloom. If you publish, you ought to make your data available.
Wilbanks pointed out that people can route around you. If you put up big fences around your content, people can find other resources in a big distributed network. So embrace weak, simple, open practices.
Will the current publishing companies create the future? Wilbanks observed that the new normal is rarely built by the old normal (e.g., AT&T did not put in a bid to help develop the internet.)
Someone asked about the importance of established journals for getting academic credit—doesn’t blogging, etc, begin to undermine that?
Wilbanks noted that you don’t get credit for saying something first, you get credit for demonstrating that something is true first.
Orcut is in development for establishing a unique author ID system, and could have a big impact on the way academic credit is counted.
There was more discussion of data curation and Wilbanks said that he would love to see the library sciences fully embrace data curation. Where do you want the scholarly record to reside? Probably not in companies and probably not in nonprofits, all of which morph and evolve and fail way more often than institutions like ivy league universities.
What are the pieces offered for good data curation by self-curation, institutional curation and third party curation? The person who is closest to the data can make it possible for it to be curated by a 3rd party later. You need the right metadata to be supplied by the original investigator, but we need to give scholars tools to lower the burden. Preserve enough information around the data for later curation. Then there is preservation function, best done by institutions. Then the ability to come in and add meaning – the true goal of third party curation. Let unexpected third parties come into the picture.