Goodbye and thank you to Birmingham!

Well, today is my last working day at the University of Birmingham, before I set off after Easter for a new position at the University of Warwick as Professor of Microbial Genomics and Head of a new Division of Microbiology and Infection in Warwick Medical School. I have been at the University of Birmingham since July 2001 and there is no question that my time here has been the pinnacle of my professional life (at least up till now: of course, the best is still to come!). I move on with with a sense of gratitude at having had the privilege of getting to know and work with so many great people and leave with a head stuffed full of wonderful memories. Sadly for me, the co-author of this blog, Nick Loman, is not moving to Warwick with me, but has instead taken up a permanent position in Birmingham. But I draw comfort from the fact that geographically we will still only be just 40 minutes drive apart and psychologically will remain united in our appreciation of all things cool and quirky at the conjunction of sequencers, sequences and software!

This will be my last blog post here, but in any case Nick has already largely made this blog his own. I have set up a new blog for my new life in the new Division of Microbiology and Infection, drawing on an old title that Nick will recognise from when we first met in the late 1990s: the Microbial Underground: catch up on news from Warwick there after Easter and follow us (@WarwickMicrobio) on Twitter too!

I did start writing a long discursive ramble through my memories of my time in Birmingham, but in the end I have decided to sign off with some pictures that encapsulate all the good times! Goodbye Birmingham and thank you!

[gallery link="file"]

Crowd-sourcing killer outbreaks: Nice video from the BBSRC and Arran Frood

The BBSRC have done a nice job making a short video about the E. coli O104:H4 outbreak crowd-sourcing project, featuring little old me as well as the far more telegenic Lisa Crossman. Check it out, it's got some spooky music too.

[embed]http://www.youtube.com/watch?v=ttMnQIE-P-s[/embed]

Also please check out the OpenAshDieBack crowd-sourcing project currently ongoing, coordinated by the chaps at the John Innes Centre, The Sainsbury Lab and TGAC.

A chat with Oxford Nanopore's Clive Brown at AGBT 2013

Don't judge me, reader, because I'd skipped a session at AGBT to go and have a swim in the sea. A man can only spend so much time in dimly-lit, low-ceilinged hotel conference rooms, popping low-sugar sweets, before the will to live ebbs away.

On returning to the conference, passing the bar I spotted a distinctive bald head. Wait. I recognise that guy. Was it him?

I reversed and took another peek. Yes, it was Clive Brown, deep in a meeting. "Hello Clive!". He looked up, slightly grumpy to be interrupted mid-flow. "I'm Nick Loman, wasn't expecting to see you here!". Oh hello Nick. I catch a glimpse of a prototype MinIon on the table. "Hah, yeah, I've buried three of those on the beach. Tell everyone!"

We meet in the bar the next day. Clive talks at machine-gun pace, whilst fiddling with a prototype MinIon which is on the desk, repeatedly taking it apart and reassembling it, like a soldier checking his gun before battle. It feels weighty, substantial, larger than the version announced previously. It's got a mini-connector for USB3. "Feels a bit too expensive for a disposable sequencer, needs to be more plastic-y", I venture. Clive agrees.

Clive is angry. He feels he's been treated unreasonably by the community, and the press, since AGBT 2012's electrifying announcements. "I'm bloody sequencing single molecules directly on this little device here!". The implication is that no-one should be surprised it's taken longer than expected to be released. He is unapologetic.

Clive is angry. He's angry with the guy that marched to the nanopore suite at AGBT, banging and shouting through the door: "Where is your data? Show us the data!!".

Clive is angry with reporters who keep asking him the same questions: why they haven't released data, why they haven't fulfilled their promise of commercialisation by the end of 2012.

Clive has a list. A list of people that says he'll see to it won't get a MinIon when it comes out. I can't tell if he's joking.

So why didn't you release some data, Clive? He tells me that the raw signal data is commercially valuable, that someone in the business could take the traces and reverse engineer details of their customised nanopore this way. The idea that other parties could steal information to further their own nanopore projects is a recurring theme in our chat.

So why didn't the MinIon come out in 2012? Technically, he lists several setbacks. The custom sensor microchip (ASIC) wasn't performing as they wanted, necessitating a redesign from scratch. "That put us back about 5 months, but it was the right thing to do". There have also been problems stabilising the lipid bilayer, and so over days and weeks it degrades. He set his team a new accuracy target of 1%, a major improvement from the 4% error rate announced at AGBT.

I venture the idea that even if the MinIon is a year, two years late, if it's half as good as he says it is, all will be forgotten. Like waiting for the next version of Quake or Grand Theft Auto.

"It's not going to be that long, we're going to start announcing stuff this year, including data from our early-access programme."

Why don't you engage with the community better? I suggest that no tweets and no web updates isn't a good look for a company with so many eyes on. He says that they have to be careful about putting any information out there right now, in case it is used against them. He suggests that now Zoe McDougall, their communications director is back from maternity leave, they will improve their communication with the community.

Technical breakthroughs. They've found that error rates can be improved by having multiple nanopores on the chip with different properties, and then merging the data. Some nanopores are better at recognising certain nucleotide signatures than others, and so they can be complementary. This is a hint that consensus accuracy might ultimately be important, a la Pacific Biosciences. ** see footnote

He's keen on the idea of nanopore as a disruptive technology for proteomics, citing the unfoldase that should permit proteins to pass through the pore.

Clive is a man under pressure. I genuinely got the impression that the company were caught off-guard by all the attention and had no idea they would be under such scrutiny.

"We didn't even know that long reads were so important to people until after that AGBT presentation." He explains that he set his team a technical challenge to go from 20kb to 50kb to 100kb, simply because he likes pushing them further than they think they can go. His focus on getting error rate down to 1% results from similar pushing, sometimes to the chagrin of the commercial side of the operation.

Clive is guarded, and regularly checks himself, ensuring he doesn't say anything that would "get him in trouble".

"You know what, I hated doing that presentation at AGBT. I had to hide in my room for two days afterwards."

"I'm not Jonathan Rothberg".

What do I think? I find it hard to simply write-off nanopore as vapourware, as some seem happy to do. There is a great group of people in this company, and frankly it just wouldn't be cricket to promise so much without delivering. I will wait and see. I feel sure the conversation will have moved on by AGBT 2014.

"I want to believe" as they might say on the X-Files.

Plus, I don't want to end up on Clive's list.

 

** Clive has written to clarify this point: I didnt mean that as an alternative to raw read, but it came up repeatedly during the conference that a number of early access groups are trying to do major projects to "improve the reference" of their given organism. They are currently mixing a number of short reads from different technologies and without the long reads, they have difficulty assembling (a major use of PB data). I have noticed that with two pores (or more) we effectively have two orthogonal error modes, which means this kind of improved reference, with assembly, can be done economically on one platform – which should be a lot easier.

Getting real allele frequencies in VCF files

Today’s problem was getting real allele frequencies in VCF files produced by samtools mpileup/bcftools. I tend to use bwasw-mpileup-bcftools as my default SNP calling pipeline, for no other reason than I am familiar with it.

For calling variants within bacterial genomes the issue is that samtools assumes a diploid organism, and so all variant calls are forced to fit the model of categorical allele frequencies, e.g. 0 (reference), ~0.5 (heterozygote) or 1 (homozygote). This is still usable for most applications, but I find it more intuitive to think about true allele frequencies when dealing with haploid organisms. Allele frequency is useful either for population-based studies (as you are actually sequencing a bunch of bacterial cells) or for an indication as to whether you are accidentally calling in repetitive regions. In many bacterial genome papers a cut-off of 0.9 is commonly used to threshold real SNPs.

You can get read depth per genotype from samtools (with the -D flag to mpileup), you do not get the number of reads that support the reference or variant call respectively.

I asked the Twitter hive mind for help with this issue, and this is what they came up with. Many thanks as always!

From Zev Kronenburg: “You could try using SNVER. I have been using it on pooled pox viral pops. Super easy format”.

From Casey Bergman: “I think @aaronquinlan just put something together for this - try piledriver at https://github.com/arq5x/piledriver

From Jeramia Ory: “I’ve been using FreeBayes, the VCF it produces has read counts, and supports arbitrary ploidy & pooled reads.”

It later occurred to me that Dan Koboldt’s VarScan2 will also give you this information.

Applied Bioinformatics & Public Health Microbiology: 15 – 17 May 2013

The awesome ABPHM meeting is back in 2013! This is a really nice conference that I am very happy to help organise. It's a bit different from other public health microbiology conferences in that it specifically aims to bring together public health microbiologists and epidemiologists with bioinformaticians. Once we have everyone in the same room, we try and understand what each other does a little better!

High-throughput sequencing has been high on the agenda for the past few meetings, I expect this to be the case again. However I expect this meeting will start focus on practical and logistical aspects of getting WGS of bacterial isolates into the microbiology lab for routine usage for hospital and community outbreak tracing and surveillance of important pathogens.

The last meeting in 2011 was notable for taking place right in the middle of the E. coli O104:H4 outbreak in Germany, and BGI released reads from an isolate that triggered the crowd-sourcing initiative during the meeting!

My job on the committee, alongside Jon Green from the HPA is to try and represent for the bioinformaticians, and so I am really pleased that we have a couple of high-profile international speakers who know their way around a bash shell: Aaron Darling, of Mauve/progressiveMauve fame (until recently of Jonathan Eisen's lab) will be talking about his tools and work as will Torsten Seemann, author of incredibly useful assembly tools including VelvetOptimiser.

Julian Parkhill and Sharon Peacock will both be speaking about their current work, always something to look forward to.

Also of note is that Oxford Nanopore and Illumina are sponsoring the meeting!

If you do bioinformatics for infectious disease outbreaks or public health surveillance, I strongly recommend you register. Also put in an abstract (deadline 20th March!) as we like to select many talks from submissions. It is a unique grouping of people and the talks are always good. This time it will be held at the Moller Centre which is near Cambridge city centre rather than in Hinxton. It's a really nice venue, and it means that instead of heading to the Red Lion, this time we can take an out-trip to The Eagle and perhaps also The Panton Arms.

Head over to the event website for the agenda and the registration form.