All the cool kids are on arXiv and Haldane's Sieve .. why you should be too

Something very exciting has happened in recent weeks on arXiv, the preprint server which many biologists believe is the reserve of angry physicists, beardy mathematicians and unwashed computer scientists  (joke!!!).

Not any longer. I first felt a disturbance in the force in September when a few high-profile human genomicists started making pledges to send all their manuscripts to arXiv first, including the angriest man in biology, Michael Eisen. I'm pretty sure genomics wunder-kinds Daniel MacArthur and Joe Pickrell are also planning on sending their manuscripts there first. The venerable Ewan Birney is also thinking of getting in on the action, tweeting back in August:

 "Ah. Bugger. Scooped by George Church on arbitrary DNA storage. Our paper is in review <sigh>. (wish we had posted on arXive now"

Things are happening!

Those working in human population genetics and paleogenomics are already posting fascinating, high-impact manuscripts, see for example "The Date of Interbreeding between Neandertals and Modern Humans"  from a team including  Svante Pääbo. Joe Pickrell and crew posted a detailed study on "The Genetic Prehistory of Southern Africa", focusing on groups which speak using click-consonants.

But as interesting and inspiring as these papers are, I am more interested in bioinformatics, microbial genomics, evolution and ecology, so these papers don't really impact my day job. But recently things have got even more interesting .. by which I mean microbial. Witness:

Posted 15th October 2012: Species Identification and Unbiased Profiling of Complex Microbial Communities Using Shotgun Illumina Sequencing of 16S rRNA Amplicon Sequences (Haldane's Sieve, arXiv). This paper describes a novel method of using shotgun sequencing of long 16S amplicons  to permit species-level assignments.

Posted 28th September 2012: Horizontal gene transfer may explain variation in θs (Haldane's Sieve) - a paper from Lenski no less, which gives a possible explanation of the intriguing findings of potential mutational "hotspots" in the E. coli genome published in Nature by Inigo Martincorena and Nicholas Luscombe. This paper suggested an "evolutionary risk management strategy" which challenges our fundamental understanding of genetic mutations being acquired randomly and subsequently selected for (demonstrated beautifully by Luria and Delbruck in 1943). Lenski, using data from his long-term E. coli evolution experiment suggests that in fact undetected recombination is instead the likely cause for these mutational hotspots.

Posted 13 Oct 2012: A 454 survey of the community composition and core microbiome of the common bed bug, Cimex lectularius, reveals significant microbial community structure across an urban landscape ( Notable for being one of the first microbial ecology studies published in arXiv and obviously bed-bugs are kind of cool/gross .

Posted  19 Sep 2012: Maximum Likelihood Estimation of Frequencies of Known Haplotypes from Pooled Sequence Data (Haldane's Sieve)- a really interesting bit of software which may be useful for haplotype reconstruction in metagenomics and pooled sequencing experiments. I plan to give this a whirl and feedback to the authors my findings.

Posted 1 Oct 2012: Best Practices for Scientific Computing  (Haldane's Sieve) - not specifically microbial or biological but a useful treatise on how to write better code which would be useful for those writing bioinformatics software and pipelines, or even just doing analysis.

So, a little taster there.

Make no mistake - these are all quality manuscripts, they aren't being dumped there because they couldn't get past the PLoS ONE reviewers, or something equally banal.

Why are they there? My interpretation is that these authors get it and are posting to arXiv to take advantage of the particular benefits of using a preprint server. Firstly, the immediacy of getting your work out there - simply submit the PDF and it's available to everyone. Your manuscript then gets a permanent home with a citable DOI. Submitting to arXiv may help with establishing priority.

For me, even more useful than these things are the benefits of publishing to a self-selected audience who are genuinely interested in this subject, and actively wish to read and critique such papers out of professional curiosity, not just because they are lucky/unlucky enough to be selected as peer reviewers. On arXiv, the "vibe" seems much different to that of the now-closed Nature Precedings, which sometimes honestly did feel like a dumping ground for unloved or hurried manuscripts.

A potential worry for these authors is that although they have deposited in arXiv, the community as a whole may not be looking there-- arXiv is not archived by PubMed-- and so they may not be cited by others routinely because they weren't seen. Hence this blog post, a small attempt to draw attention to this exciting development!

So I've talked a lot about arXiv - where does Haldane's Sieve come in? This is simply a blog site run by Graham Coop, Bryan Howie and Joe Pickrell. It is important because arXiv provide no facilities for permitting comments on manuscripts, preferring that individual communities figure out the best way to discuss articles (and sensibly recognising this may not be a single place, something that even the open-access publishers can't really understand).

In maths and physics this is usually done on listservs, but in genomics and biology I guess we are more comfortable with the blog format for discussion hence the choice of WordPress. Haldane's Sieve finds new postings on arXiv, mainly in the field of population genetics, and then posts summary articles for you to comment on. It may be in the future we need a similar site for microbial genomics and ecology, but for now it's not so busy that this nascent community needs splitting up. Another place to find links is Twitter, e.g. by following me (shameless link).

It seems to be working; the discussion of Lenski's paper has already generated a vigorous response from Inigo Martincorena, the likes of which you are unlikely to see in a published journal, and all the better for it's frankness and energy-- in my opinion.

So, in summary, you should add Haldane's Sieve and the arXiv qBio category ( to your feed reader if you want to spot exciting new articles and comment on them, and why not think about sending your next manuscript to arXiv first? (No, it doesn't prevent you publishing in peer-reviewed journals)