Calling haploid consensus sequence

For some reason, calling a haploid consensus sequence from a VCF seems harder than it needs to be.

I've experimented with samtools mpileup and bcftools call/consensus with much frustration and little success, as it always wants to call heterozygous positions which I don't want.

In the end the easiest way to do this I have found is to use freebayes.

freebayes -f ref.fa -p 1 aln.sorted.bam > vcffile

And then use vcf2fasta from vcflib to call a consensus

vcf2fasta -f ref.fa -P 1 vcffile

This will spit out a file with the consensus sequence.

Of course, given that the VCF format is not really a format, trying to use vcf2fasta on VCFs produced by other tools than FreeBayes (VarScan, in my case) didn't work for me.

Real time genomic surveillance of Ebola outbreak 2014-2015

The current Ebola outbreak in West Africa is the largest ever recorded, with over 26,500 cases reported resulting in an estimated 11,000 deaths. Yet genomic surveillance of this outbreak has been patchy, hampered by understandable but vexing logistical, social, political and technical obstacles in securing and transporting samples for processing.

We wanted to help address the gaps in our knowledge of viral evolution and to generate data for epidemiological use. So, in April, Josh Quick from my group went to Conakry, Guinea to establish proof-of-principle for portable nanopore sequencing. This was the most practical way we could rapidly establish a local sequencing lab in order to generate real-time information.

His travels have been documented in several recent news articles. For background I would recommend reading Erika Hayden's report over at Nature News, the BMC On Biology blog and this recent GenomeWeb article (registration free for academic subscribers).

In the two weeks he was there, he sequenced 14 genomes when based at Donka Hospital in Conakry. However, the surveillance sequencing has continued, thanks to the hard work of Sophie Duraffour in Coyah under the auspcies of the European Mobile Laboratory project. Sophie has been working around the clock in the laboratory generating the real-time genome data, uploading it to Birmingham for analysis and then distributing it to WHO central coordination. We have had early feedback that the data has been extremely useful for the epidemiologists on the ground.

As is often the case in outbreaks, genomic data production and sharing has been patchy and uncoordinated. However, a new exciting deveopment is under way to try and address this. Andrew Rambaut, author of essential phylogenetics software such as BEAST and FigTree and viral genome maven, has taken on a kind of unofficial role of coordinating genome sequence data, which is distributed through his website and forum

His personal database of Ebola genomes sits at nearly 1000 sequences and he has been privately sharing some wonderful integrated phylogenetic analyses covering the entire Ebola outbreak. However, until recently the sharing has been limited by access to public data. At a recent conference at the Institut Pasteur, I met him and his colleague Richard Nehrer and discussed ways to improve sharing. With Trevor Bedford, Richard are the developers of the nextflu website, which aims to track real-time evolution of flu.

I said that we needed this for Ebola, and of course they had already thought of this and had started building something. I said that we would contribute our nanopore sequencing dataset to this project in real-time, and those with large datasets to compare also contributed theirs.

So it is a real thrill to see the website up and running now and available to use at On this website you can explore Ebola evolution during this outbreak, using controls to scroll through time, and restricting analysis to particular locations or laboratories. You can also zoom into particular clades, and see frequency distributions of specific mutations.

One thing that was particularly notable with the data integration is that our surveillance data from Guinea, when compared with Ian Goodfellow's recently produced surveillance data from Sierra Leone is that the two extant Guinean lineages overlap with cases from close to the Guinean border in Sierra Leone. This makes sense, and suggests that cross-country transmission may be frequently occurring.

We will be updating this website with new sequences generated by the EMLab until the end of the outbreak. We have decided that we will leave a one week delay before releasing it for WHO central coordination to see the data, and the data is limited to prefecture level information without more specific locations.


I am at the incredibly impressive and huge ECCMID meeting in Copenhagen.

I've given a talk already on "So I have sequenced my organism .. what do I do now?" (organisers title!). It is viewable here:

Tomorrow I am doing a "Meet-the-Expert" session about what tools to use for bacterial genome analysis, feel free to look at my slides in advance and ask some questions (even if you aren't at ECCMID!)

One thing that is noticeable about this conference is how incredibly high-tech the conference website is. Talks are posted in near-real-time after they are given.

Here are some you should definitely check out!

Matt Holden, Whole genome sequencing for microepidemiological investigations: can person-to-person transmission be identified?

Ed Feil, Whole genome sequencing and public health: how can high-risk clones be identified and what can be learned for prevention and control?

Frank Aarestrup, Bacterial genome sequencing for outbreak detection

Diversity of P. aeruginosa in CF airways

The Gut and Lung Microbiota

Implications of microbiome alterations due to pro/antibiotics


Benefits of microbiome manipulation in reducing resistance

There's lots more at

SGM 2015 livenotes

Some notes from SGM 2015:

Microbiome session

Diabetes, obesity and gut microbiota
Patrice Cani 

1. Microbiota-host interactions play a major role in obesity

2. Intestinal MyD88 is a sensor switching host metabolism during fat feeding

3. Endocannabinoids are key players involved in the microbiota-host interaction

Intestinal epithelial ​MyD88 is a sensor switching host metabolism towards obesity according to nutritional status
The endocannaboid system links gut microbiota to adipogenesis.

Adipose tissue ​NAPE-PLD controls fat mass development by altering the browning process and gut microbiota

Test from Torsten, not at SGM. Looked like an interesting talk!

Bacterial persisters
Sophie Helaine

Many species form persisters - highly tolerant to Abx. Persister cels represent a small sub-population caused b phenotypic switch. E. coli in vitro persisters are non-replicating bacterial cels.


Bacterial persisters: formation, eradication, and experimental systems

Internalization of Salmonella by Macrophages Induces Formation of Nonreplicating Persisters

5-10% of the non-replicating bacteria resume growth.

Toxin-antitoxin systems related to persistence of E. coli in vitro.

Thoughts: how to link dead/alive/persistence state to metagenomics data (single cell RNA-Seq?)

Stephen Bentley

Pneumococcus human host-restricted, usually lives in nasopharynx.

No overall change in species prevalence post-CV7. Subtle effects on resistance- generally remain stable.

Population genomics of post-vaccine changes in pneumococcal epidemiology

MDR not secret of success for antimicrobial resistance.

Association between high admixture and AMR previously shown

Consistency in recombination hotspots between lineages

Non-typeable clone most efficient recipient of DNA by recombination

MGEs in pneumococcal AMR

Prophage insertion in comYC genes blocks recombination in IC1 - !

Variable recombination dynamics during the emergence, transmission and ‘disarming’ of a multidrug-resistant pneumococcal clone

Fleming Prize Lecture

Michael Brockhurst

Sex, Death and the Red Queen

Running with the Red Queen: the role of biotic conflicts in evolution

Antagonistic coevolution accelerates molecular evolution.

Coevolution accelerates molecular evolution
Coevolution drives greater between-population divergence

In clinical samples:

Divergent, Coexisting, Pseudomonas aeruginosa Lineages in Chronic Cystic Fibrosis Lung Infections.

Highly parallel evolution of LES lineages: mexAB-oprM, creBCD, ampC, lasR, oprD, pmaA, etc. etc.

Rapid turnover of diversity within patients

Evidence for changes in diversity during exacerbations, and evidence for lineage 'switching' over time.

Single strain bacterial populations high diverse
Most diversity is present in individual sputum samples
Diversity in clinically important traits like AbR and secreted molecules
Genetic data shows parallel evolution and patient-patient transmission.

What's driving diversification in CF lungs? (Immune system Abx, species interactions, etc.)


Evolutionary adaptation to ASM environment by: 
- loss of motility structures esp flagellum
- mtabolic and biofilm changes

Adding temperate phages to artifiicial sputum medium selects for a different set of mutations than seen in CF in-host evolution, e.g. pili, Type 6 secretion, flagellum, quorum sensing, etc.

Phage insertions cause several parallel mutations.

AMR in South Asia
Stephen Baker

Return to pre-AMR era

S. Typhi: monomorphic

Emergence of fluoroquinolone resistance, independent hyrA mutations.

Fitness benefits in fluoroquinolone-resistantSalmonella Typhi in the absence of antimicrobial pressure.

A high‐resolution genomic analysis of multidrug‐resistant hospital outbreaks of Klebsiella pneumoniae

K. pneumoniae is exceptional coloniser of surfaces, tubes, etc.

K. pneumoniae outbreak - two distinct lineages, acquired blaNDM-1. 

Longitude Prize

The test must: identify when antibiotics are needed and if they are which ones to use.

Test must be: needed, accurate, affordable, rapid (<30 min), easy-to-use, scalable, safe, a prototype must be available.

Easy-to-use: minimally invasive, easy to dispose, long expiration, heat stable, withstand transportation, minimum maintenance etc.

Modelling Clostridium difficile infection
Caroline Chilton 

In vitro human gut model: tripe chemostat system arranged in weir cascade, primerd with faecal slurry, validated against the caecal content of sudden death victims.

Antibiotics knock down Bifidobacterial populations. Bacteroides not affected by clindamycin but effected by vancomycin.

16S profiling matches colony counts well.

Observed diversity highest pre-antibiotic, lowest with recurrence. Fidaxomicin less effect on diversity than others.

Biofilm human gut model using rods


Mutation rate and genotype variation of Ebola virus from Mali case sequences

Epidemiological and viral genomic sequence analysis of the 2014 Ebola outbreak reveals clustered transmission

Michael Tunney, QUB

CF microbiome: culture studies shows significant numbers of anaerobes (similar to P. aeruginosa).

Healthy airway microbiome quite similar to CF microbiome --- Streptococcus, haemophilus, Rothia etc. but don't see Pseudomonas, Burkholderia etc.

Diversity decreased in CF

Diversity positively correlated with lung function

Decade-long bacterial community dynamics in cystic fibrosis airways

Lung explant microbiome study

Healthy microbiome cannot be cultured in late-stage CF infection.

William Wade, Oral Microbiome

50% of oral bacteria are uncultivable

Human Oral Microbiome

Most human oral bacteria are found only in the mouth, notable exception Fusobacterium nucleatum.

Intra-oral habitats have characteristic microbiota.

Diet has relatively little effect on oral microbiome.

Why do historical dental samples correlate diet with microbiome? A: Effect of dental hygeiene.

Willlem van Schaik

E. faecium and E. faecalis genetically distinct - penicillin resistant.

E. faecium clade A1 "clinical isolates" - highest mutation rate
E. faecium clade A2 "animal isolates" - medium mutation rate
E. faecium clade B "human commensal" - lowest mutation rate

Phylogeny of closely related strains: gene content mirrors phylogeny. Differences caused by gain/loss of plasmids and phage-like elenents.

Hospital ICU microbiota: characterised by outgrowth of Enterococcus on long stays

Sewage resistome

Zamin Iqbal

75 out of 1607 samples have minor resistance calls

100% resistance correection on MDR S. aureus

Ebola virus

Prior to 2013: ~20 outbreaks, ~1,600 deaths, 25-90% mortality rate, 5 Ebolavirus species

Filovirus epidemic in 1956 in Bili, DRC - first Ebola outbreak?

Emergence of Ebola

25000 cases, 10000 deaths

Burial rites: involve touching the bodies, washing them. Mobile phone connectivity make it easier to gather more relatives for funerals.

SGM 2015 genomics presentations to watch out for

If you are coming to Birmingham for this year's Society for General Microbiology Annual Conference, welcome! This meeting is pretty big, and the website is curiously unnavigable, so I have put my own personal schedule up here.

A few notables! On Tuesday night there is the announcement of the new SGM journal Microbial Genomics in Hall 3 which I am pleased to be serving on the editorial board. Lots of tweeps are likely to be there. Straight afterwards we will head to a local pub for a "tweet-up".

On Wednesday morning we will be launching our new BBSRC-funded grant MicrobesNG over at the stunning new Library of Birmingham which is well worth a visit if you haven't been in. MicrobesNG aims to provide a very different type of sequencing service, specifically tailored to the needs of microbiologists. We will be running two sessions explaining exactly what we are doing, with refreshments provided. Please sign-up over at the website if you would like to come along!

In terms of scientific sessions I will be mainly hopping between the antibiotic resistance session and the microbiome session, with a few detours for prize lectures and hot genomics lectures. During the breaks I will be over on the MicrobesNG trade-stand. Look forward to saying hello!

And if you are interested in nice places to eat in Birmingham, the Guardian just did a nice piece on cheap eats and we also keep a map of places we like over on my food blog.

Monday 30th

BI05 Microbiome in Health and Disease

09:00 The human microbiome in health and disease Julian R. Marchesi (Cardiff University, UK)

09:30 Diabetes, obesity and gut microbiota Patrice Cani (University of Louvain, Belgium)

BI21 Antimicrobial resistance

11:00 Salmonella persisters in the host Sophie Helaine (Imperial College London, UK)

11:30 A population genomics view of pneumococcal antimicrobial resistance Stephen Bentley (Wellcome Trust Sanger Institute, UK)

12.10 Fleming Prize Lecture – Rapid microbial evolution: From the lab to the clinic and back again Michael Brockhurst (University of York, UK)

14:00 Antimicrobial resistance issues and selective pressures in South East Asia Stephen Baker (Oxford University, UK)

14:30 Offered paper - Detection of NDM-1 positive pathogens and genes in the Ganges River associated with seasonal human migration to pristine areas David Graham (Newcastle University, UK)

14:45 Offered paper - Broad spectrum antimicrobial peptides derived from a bovine rumen Linda Oyama (Aberystwyth University, UK)

15:00 Small World Initiative Paul Hoskisson (University of Strathclyde, UK)

15:15 Longitude Prize Tamar Gosh (NESTA, UK) and Laura Piddock (University of Birmingham, UK)

BI05 Microbiome in Health and Disease

16:00 Metabolomic characterisation of the gut microbiome and disease Elaine Holmes (Imperial College, London)

16:30 Diet and the gut microbiome Yolanda Sanz (National Research Council, Spain)

17:00 Modelling Clostridium difficile Infection Caroline Chilton (University of Leeds, UK)

Tuesday 31st

BI01 Natural and Unnatural Virus Evolution

09:15 Offered paper - Elucidating variations in the nucleotide sequence of Ebola virus associated with increasing pathogenicity Isabel García-Dorival (University of Liverpool, UK)

BI05 Microbiome in Health and Disease

09:30 Lung and normal airway microbiota and implications for cystic fibrosis Michael Tunney (Queen's University Belfast, UK)

10:00 The oral microbiome in health and disease Wiliam Wade (Barts and the London School of Medicine and Dentistry, UK)

11:00 Klebsiella pneumoniae population genomics and antimicrobial resistance Kathryn Holt (University of Melbourne, Australia)

11:30 Emergence of resistance in tuberculosis: Clinical and in vitro studies Stephen Gillespie (University of St Andrews)

BI21 Antimicrobial resistance

14:00 The human gut as reservoir for antibiotic resistance genes and opportunistic pathogens Willem van Schaik (University Medical Center Utrecht, Netherlands)

14:30 Mathematical modelling as a tool to explore unexpected aspects of antimicrobial resistance Robert Beardmore (University of Exeter, UK)

15:00 Offered paper - Displacement of stable bacterial plasmids by a self-transmissable pCURE plasmid as a means of reducing antibiotic resistance gene load Alessandro Lazdins (University of Birmingham, UK)

15:15 Offered paper - Enabling genomic-based antimicrobial susceptibility predictions in the clinic: case studies for S. aureus and M. tuberculosis Zamin Iqbal (University of Oxford, UK)

16:00 Offered paper - The infant airway microbiome in health and disease impacts later asthma development Kathryn Holt (University of Melbourne, Australia)

16:15 Offered paper - The Effects of Novel Dietary Interventions on Campylobacter and the Caecal Microbiome of Broiler Chickens Adrian Horton (Aberystwyth University, UK)

16:30 Phylogenetic assessment of microbiomes – how do we make it more democratic? Jeron Raes (Vrije University, Belgium)

17:00 Microbe-host interactions in chronic intestinal inflammation - microbial dysbiosis versus pathobiont selection Dirk Haller (Tu Munich, Germany)

17:35 Hot Topic Lecture: Ebola virus Hall 1

18:30 New Journal Announcement: Microbial Genomics Hall 3, ICC Birmingham

18:45 Straight after announcement: Tweet-up!

Wednesday 1st April

Over at the Library of Birmingham we will be hosting two workshops in the morning (09:00 - 10:15 and 10:30 - 10:45) to launch our new microbial genome sequencing and strain archiving service, MicrobesNG. Head over to the website to sign-up if you want to find out more!

12:10 Marjory Stephenson Prize Lecture – What's the host and what's the microbe? ICC Birmingham Robin Weiss (University College London, UK)

BI21 Prokaryotic Genetics Forum

14:00 Bacterial protein glycosylation - never say never with bacteria Hall 11a Brendan Wren (London School of Hygiene and Tropical Medicine, UK)

14:30 Offered paper - Identification of DNA uptake sequences in Neisseria gonorrhoeae that are intrinsic transcriptional terminators using bioinformatics supported by RNA-seq Sabrina Roberts (Kingston University, UK)

14:45 Offered paper - Regulation of fimbrial genes in Enteroaggregative Escherichia coli Muhammad Yasir (University of Birmingham, UK)

15:00 Offered paper - Investigating the fitness implications of phase variation rate in Campylobacter jejuni using a cyclical selection assay based on phage and human sera Jack Aidley (University of Leicester, UK)

15:15 Offered paper - Expanding your horizons: phenotypic and genomic insights into very broad-host range phages isolated from Lake Michigan Siobhan Watkins (Loyola University Chicago, USA)

16:00 Offered paper - Evolution of Staphylococcus aureus after a human to livestock host-jump event Rodrigo Bacigalupe (The Roslin Institute, UK)