Thursday, March 9, 2017

Data management for complete beginners

Bill Michener is a longtime advocate of data management and archiving practices for ecologists, and I was lucky to catch him giving talk on the topic this week. It clarified for me the value of formalizing data management plans for institutions and lab groups, but also the gap between recommendations for best practices in data management and the reality in many labs.

Michener started his talk with two contrasting points. First, we are currently deluged by data. There is more data available to scientists now than ever, perhaps 45000 exabytes by 2020. On the other hand, scientific data is constantly lost. The longer since a paper is published, the less likely its data can be recovered (one study he cited showed that data had a half life of 20 years). There are many causes of data loss, some technological, some due to changes in sharing and publishing norms. The rate at which data is lost may be declining though. We're in the middle of a paradigm shift in terms of how scientists see our data. Our vocabulary now includes concepts like 'open access', 'metadata', and 'data sharing'. Many related initiatives (e.g.  GenBank, Dryad, Github, GBIF) are fairly familiar to most ecologists. Journal policies increasingly ask for data to be deposited into publicly available repositories, computer code is increasingly submitted during the review process, and many funding agencies now require statements about data management practices.

This has produced huge changes in typical research workflows over the past 25 years. But data management practices have advanced so quickly there’s a danger that some researchers will begin to feel that it is unobtainable, due to the level of time, expertise, or effort involved. I feel like sometimes data management is presented as a series of unfamiliar tools and platforms (often changing) and this can make it seem hard to opt in. It’s important to emphasize good data management is possible without particular expertise, and in the absence of cutting edge practices and tools. What I liked about Michener's talk is that it presented practices as modular ('if you do nothing else, do this') and as incremental. Further, I think the message was that this paradigm shift is really about moving from a mindset in which data management is done posthoc ('I have a bunch of data, what should I do with it?') to considering how to treat data from the beginning of the research process.

Hierarchy of data management needs.

One you make it to 'Share and archive data', you can follow some of these great references.

Hart EM, Barmby P, LeBauer D, Michonneau F, Mount S, Mulrooney P, et al. (2016) Ten Simple Rules for Digital Data Storage. PLoS Comput Biol 12(10): e1005097. doi:10.1371/journal.pcbi.1005097

James A. Mills, et al. Archiving Primary Data: Solutions for Long-Term Studies, Trends in Ecology & Evolution, Volume 30, Issue 10, October 2015, Pages 581-589, ISSN 0169-5347. (lots of references on reproducibility)

K.A.S. Mislan, Jeffrey M. Heer, Ethan P. White, Elevating The Status of Code in Ecology, Trends in Ecology & Evolution, Volume 31, Issue 1, January 2016, Pages 4-7, ISSN 0169-5347.

Thanks to Matthias GreniƩ for discussion on this topic.

Tuesday, January 24, 2017

The removal of the predatory journal list means the loss of necessary information for scholars.

We at EEB & Flow periodically post about trends and issues in scholarly publishing, and one issue that we keep coming back to is the existence of predatory Open Access journals. These are journals that abuse a valid publishing model to make a quick buck and use standards that are clearly substandard and are meant to subvert the normal scholarly publishing pipeline (for example, see: here, here and here). In identifying those journals that, though their publishing model and activities, are predatory, we have relied heavily on Beall's list of predatory journals. This list was created by Jeffrey Beall, with the goal of providing scholars with the necessary information needed to make informed decisions about which journals to publish in and to avoid those that likely take advantage of authors.

As of a few days ago, the predatory journal list has been taken down and is no longer available online. Rumour has it that Jeffrey Beall removed the list in response to threats of lawsuits. This is really unfortunate, and I hope that someone who is dedicated to scholarly publishing will assume the mantle.

However, for those who still wish to consult the list, an archive of the list still exists online -found here.

Thursday, August 13, 2015

#ESA100 The big-data era: ecological advances through data availability

Ecology is in a time of transition –from small-scale studies being the norm to large, global datasets employed to test broad generalities. Along with this ‘big data’ trend is the change in the ethical responsibility of scientists who receive public funds to share their data and ensure public access. As a result big online data repositories have been popping up everywhere.

One thing that I have been doing while listening to talks, or talking with people, is to make note of the use of large online databases. It is clear that the use of these types of data has become commonplace. So much so, that in a number of talks, the speakers simply referred to them by acronyms and we all understood what it was that they used. Here are examples of online data sources I heard referenced (and there are certainly many more):

 It seems difficult to keep track of all the different sources of available data, and these repositories differ in their openness to public access, with some requiring registration, permission requests, and the requirement to include data submitters as authors on publications. With Genbank as the gold standard for a data repository, it is inevitable that other types of ecological data will soon be required to be freely available. I've never figured out why genetic data has different accessibility expectations than, say, leaf trait data.

Despite the attractiveness of huge amounts of data available online, such data can only paint broad pictures of patterns in nature and cannot capture small scale variability very well (Simberloff 2006). We still require detailed experiments and trait measurements at small scales for things like within-species trait variability.

Ecology has grown, and will continue to do so as data is made available. Yet, the classic ecological field experiment will continue to be the mainstay for ecological advancement into the future.

Simberloff, D. (2006) Rejoinder to Simberloff (2006): don't calculate effect sizes; study ecological effects. Ecology Letters, 9, 921-922.

Tuesday, February 3, 2015

Predatory open access journals: still keep'n it classy

As most academics are aware, there are hundreds of predatory open access journals that try to trick authors into submitting to their journals, charge exorbitant fees, and do not ensure that articles are peer reviewed or live up to basic scientific standards. The most celebrated cases are journals that embarrassingly publish non-sensical fake papers. I don't know why, but I sometimes go to the journal websites to see what they publish or who is on their editorial boards. I received such an e-mail this morning from SOJ Genetic Science published by Symbiosis, a recognized predatory publisher. This journal, unlike others, actually has a single published issue with an editorial! I thought: "wow, are they trying to be legitimate?"; then I read the editorial. The editorial is probably best described as a nonsensical diatribe about genetics, which lacks any real connection to modern genetic theory. Here is my favourite paragraph:

Predatory open access journals: still keep'n it classy.

Friday, October 26, 2012

Open access: where to from here?

Undoubtedly, readers of this blog have: a) published in an open access (OA) journal; b) debated the merits of an OA journal; and/or c) received spam from shady, predatory OA journals (I know when my grad students have 'made it' when they tell me they got an e-mail invite to submit to the Open Journal of the Latest Research Keyword). Now that we have had OA journals operating for several years, it is a good time to ask about their meaningfulness for research and researchers. Bob O'Hara has recently published an excellent reflection on OA in the Guardian newspaper, and it deserves to be read and discussed. Find it here.

Wednesday, June 9, 2010

Another reason why a new publishing model is needed...

The finances and ethics of scientific publishing are complex, and there is an inherent tension between commercial publishers and academics and their institutions. On the one hand, we as scientists are (most often) using public money to carry out research, usually in the public interest, and then we typically publish in for-profit journals that restrict public access to our publications. Authors seldom see any of the financial return from publisher profits. On the other hand, publishers provide a level of distribution and visibility for our work, which individual authors could not match. In previous posts I have discussed Open Access publications, but there is another reason to consider other publication models. Recently Nature Publishing Group notified the University of California system of an impending 400% increase in the cost for their publications. The UC administration has responded with an announced plan to boycott NPG publications. The announcement rightly points out a 400% increase is not feasible given the current plight of library budgets, especially in California, and that scientists in the UC system disproportionately contribute to publishing, reviewing and editing NPG publications and thus are the engine for NPG profits. (See a nice story about the boycott in The Chronicle of Higher Education)

This is just the latest symptom of the growing tension between publishing and academia, and is a stark reminder that other publishing models need to actively supported. Perhaps the UC system could invest in open access publishers in lieu of NPGs outrageous costs? Something has to give, and perhaps the UC boycott will remind libraries that they hold the purse strings and could be the greatest driving force for change.

Saturday, October 17, 2009

The making of an open era

With the availability of open access (OA) journals, academics now have a choice to make when deciding where to send their manuscripts. The idealistic version of OA journals represents a 'win-win' for researchers. The researchers publishing their work ensure the widest possible audience and research has shown a citation advantage for OA papers. The other side of the 'win-win' scenario is that researchers, no matter where they are, or how rich their institution, get immediate access to high-caliber research papers.

However, not all researchers have completely embraced OA journals. There are two commonly articulated concerns. The first is that many OA journals are not indexed, in most notably Thomson Reuters Web of Knowledge, meaning that a paper will not show up in topic searches, nor will citations be tracked. I for one do not like the idea of a company determining which journals deserve inclusion, thus affecting our choice of journals to submit to.

The second concern is that some OA journals are expensive to publish in. This is especially true for the more prestigious OA journals. Even though such OA journals often provide cash-strapped authors the ability to request a cost deferment, the perception is that you generally need to allocate significant funds for publishing in OA journals. While this cost may be justifiable to an author for inclusion in a journal like PLoS Biology, because of the level of readership and visibility. However, there are other, new, profit-driven journals, which see the OA model as a good business model, with little overhead and the opportunity to charge $1000-2000 per article.

I think that, with the rise of Google Scholar, and tools to assess impact factors (e.g., Publish or Perish), assessing difference sources for articles is available. The second concern is a little more serious, and a broad-scale solution is not readily apparent.

Number of Open Access journals

Regardless, OA journals have proliferated in the past decade. Using the directory of biology OA journals, I show above that the majority of OA journals have appeared after 2000. Some of these have not been successful having faltered after a few volumes, such as the World Wide Web Journal of Biology which published nine volumes with the last in 2004. I am fairly confident that not all these journals could possibly be successful, but I hope that enough are. By having real OA options, especially higher-profile journals, research and academia benefit as a whole.

Which journals become higher profile and viewed as an attractive place to submit a paper is a complex process depending on a strong and dedicated editorial staff and emergent property of the articles submitted. I hope that researchers out there really consider OA journals as a venue for some of their papers and become part of the 'win-win' equation.

Friday, February 20, 2009

Increased access to science, but who gets to publish?

ResearchBlogging.orgWhat role will open access (OA) journals play as science publishing increasingly moves to the internet and involves a more diverse array of participants? In a recent short article in Science, Evans and Reimer tried to answer this using citation rates from 8253 journals and examine trends in citation rate shifts. They found that researchers from wealthier countries were not likely to shift to citing OA journals while researchers from poorer countries did. The authors conclude that the overall shift to citing OA journals has been rather modest, but these journals have increased inclusion for researchers at institutions in poorer countries that cannot afford commercial subscriptions. However, there is an unfortunate flip side to the OA model -paying to publish. Most OA journals recoup the lack of subscription earnings by placing the financial onus on to the publishing scientists. This means that while researchers from poorer countries can now read and cite current articles in OA journals, they still are limited from publishing in them. True, most OA journals allow for deferring costs for researchers lacking funds, there is usually a cap to the frequency in which this can be done.

J. A. Evans, J. Reimer (2009). Open Access and Global Participation in Science Science, 323 (5917), 1025-1025 DOI: 10.1126/science.1154562