First SQK MAP 006 experiment

We've just finally found the time to break open the new SQK-MAP-006 kits from Oxford Nanopore. These kits are notable because they introduce the first really major changes to the chemistry for some time.

  • First up, the speed has been doubled from ~30 bp/s to ~75 bp/s. The assumption is this will increase yields, but it will be interesting to see what - if any - effect it has on quality profile. The worry would be that increased speeds would increase the chance of missing events (transitions between signal levels), which would manifest as deletions after basecalling.
  • Secondly, the previous hairpin-motor complex (which enabled 2D reads and also stalled the complement strand) has been jettisoned to return to a simpler setup. As I understand it, the hairpin remains (and is now biotinylated and pulled down by beads to ensure very high 2D yields) but the second motor has gone. The new motor I assume is clever enough to be able to stall both the template and complement strand. It will be interesting to compare translocation times of the two strands (in SQK-MAP-005 the complement strand went through the pore more slowly, as it was retarded by two enzymes).

The new chemistry is accompanied with a new Metrichor basecaller workflow specific to SQK-MAP-006.

A notable change, looking at the returned FAST5 files, is that the model is now considering signal levels from each of the 4^6 possible combinations of 6-mers when doing basecalling. Before 5-mers were used. Does this mean that the ionic flux through the nanopore is in fact affected by 6 or more bases, rather than the 5 that we initially assumed? Or was 5 simply chosen to simplify the analysis. If the latter - and this seems likely - this may help with basecalling accuracy and it will be interesting to see if it resolves any previously difficult to sequence motifs (we looked at such under represented sequences in our recent paper here in the context of 5-mers:

It does not seem to be supported to call older, pre-SQK-MAP-006 data with the new 6-mer model basecaller.

So far we have done four SQK-MAP-006 runs. Two were generated with natural DNA, and two were generated with the low-input library that includes a PCR step.

Each of the files below are archives of the runs following base calling with Metrichor. We also provide a subset of one of the runs in 'raw' format which has the individual signal measurements (i.e. before event detection is carried out).

More files to follow.

Run Basecalled data Raw data 2D pass FASTA
MAP-006-1 MAP-006-1 basecalled (120Gb) MAP-006-1 2D pass FASTA
MAP-006-2 MAP-006-2 basecalled (75Gb) MAP-006-2 2D pass FASTA
MAP-006-PCR-1 MAP-006-PCR-1 basecalled (64Gb) MAP-006-PCR-1 2D pass FASTA
MAP-006-PCR-2 MAP-006-PCR-2 basecalled (154Gb) MAP-006-PCR-2 raw MAP-006-PCR-2 2D pass FASTA

Head over to Jared Simpson's blog to see some early results of using these data for assembly polishing.


As always, thanks to Josh Quick for his masterful library preparation technique.

Calling haploid consensus sequence

For some reason, calling a haploid consensus sequence from a VCF seems harder than it needs to be.

I've experimented with samtools mpileup and bcftools call/consensus with much frustration and little success, as it always wants to call heterozygous positions which I don't want.

In the end the easiest way to do this I have found is to use freebayes.

freebayes -f ref.fa -p 1 aln.sorted.bam > vcffile

And then use vcf2fasta from vcflib to call a consensus

vcf2fasta -f ref.fa -P 1 vcffile

This will spit out a file with the consensus sequence.

Of course, given that the VCF format is not really a format, trying to use vcf2fasta on VCFs produced by other tools than FreeBayes (VarScan, in my case) didn't work for me.

Real time genomic surveillance of Ebola outbreak 2014-2015

The current Ebola outbreak in West Africa is the largest ever recorded, with over 26,500 cases reported resulting in an estimated 11,000 deaths. Yet genomic surveillance of this outbreak has been patchy, hampered by understandable but vexing logistical, social, political and technical obstacles in securing and transporting samples for processing.

We wanted to help address the gaps in our knowledge of viral evolution and to generate data for epidemiological use. So, in April, Josh Quick from my group went to Conakry, Guinea to establish proof-of-principle for portable nanopore sequencing. This was the most practical way we could rapidly establish a local sequencing lab in order to generate real-time information.

His travels have been documented in several recent news articles. For background I would recommend reading Erika Hayden's report over at Nature News, the BMC On Biology blog and this recent GenomeWeb article (registration free for academic subscribers).

In the two weeks he was there, he sequenced 14 genomes when based at Donka Hospital in Conakry. However, the surveillance sequencing has continued, thanks to the hard work of Sophie Duraffour in Coyah under the auspcies of the European Mobile Laboratory project. Sophie has been working around the clock in the laboratory generating the real-time genome data, uploading it to Birmingham for analysis and then distributing it to WHO central coordination. We have had early feedback that the data has been extremely useful for the epidemiologists on the ground.

As is often the case in outbreaks, genomic data production and sharing has been patchy and uncoordinated. However, a new exciting deveopment is under way to try and address this. Andrew Rambaut, author of essential phylogenetics software such as BEAST and FigTree and viral genome maven, has taken on a kind of unofficial role of coordinating genome sequence data, which is distributed through his website and forum

His personal database of Ebola genomes sits at nearly 1000 sequences and he has been privately sharing some wonderful integrated phylogenetic analyses covering the entire Ebola outbreak. However, until recently the sharing has been limited by access to public data. At a recent conference at the Institut Pasteur, I met him and his colleague Richard Nehrer and discussed ways to improve sharing. With Trevor Bedford, Richard are the developers of the nextflu website, which aims to track real-time evolution of flu.

I said that we needed this for Ebola, and of course they had already thought of this and had started building something. I said that we would contribute our nanopore sequencing dataset to this project in real-time, and those with large datasets to compare also contributed theirs.

So it is a real thrill to see the website up and running now and available to use at On this website you can explore Ebola evolution during this outbreak, using controls to scroll through time, and restricting analysis to particular locations or laboratories. You can also zoom into particular clades, and see frequency distributions of specific mutations.

One thing that was particularly notable with the data integration is that our surveillance data from Guinea, when compared with Ian Goodfellow's recently produced surveillance data from Sierra Leone is that the two extant Guinean lineages overlap with cases from close to the Guinean border in Sierra Leone. This makes sense, and suggests that cross-country transmission may be frequently occurring.

We will be updating this website with new sequences generated by the EMLab until the end of the outbreak. We have decided that we will leave a one week delay before releasing it for WHO central coordination to see the data, and the data is limited to prefecture level information without more specific locations.


I am at the incredibly impressive and huge ECCMID meeting in Copenhagen.

I've given a talk already on "So I have sequenced my organism .. what do I do now?" (organisers title!). It is viewable here:

Tomorrow I am doing a "Meet-the-Expert" session about what tools to use for bacterial genome analysis, feel free to look at my slides in advance and ask some questions (even if you aren't at ECCMID!)

One thing that is noticeable about this conference is how incredibly high-tech the conference website is. Talks are posted in near-real-time after they are given.

Here are some you should definitely check out!

Matt Holden, Whole genome sequencing for microepidemiological investigations: can person-to-person transmission be identified?

Ed Feil, Whole genome sequencing and public health: how can high-risk clones be identified and what can be learned for prevention and control?

Frank Aarestrup, Bacterial genome sequencing for outbreak detection

Diversity of P. aeruginosa in CF airways

The Gut and Lung Microbiota

Implications of microbiome alterations due to pro/antibiotics


Benefits of microbiome manipulation in reducing resistance

There's lots more at

SGM 2015 livenotes

Some notes from SGM 2015:

Microbiome session

Diabetes, obesity and gut microbiota
Patrice Cani 

1. Microbiota-host interactions play a major role in obesity

2. Intestinal MyD88 is a sensor switching host metabolism during fat feeding

3. Endocannabinoids are key players involved in the microbiota-host interaction

Intestinal epithelial ​MyD88 is a sensor switching host metabolism towards obesity according to nutritional status
The endocannaboid system links gut microbiota to adipogenesis.

Adipose tissue ​NAPE-PLD controls fat mass development by altering the browning process and gut microbiota

Test from Torsten, not at SGM. Looked like an interesting talk!

Bacterial persisters
Sophie Helaine

Many species form persisters - highly tolerant to Abx. Persister cels represent a small sub-population caused b phenotypic switch. E. coli in vitro persisters are non-replicating bacterial cels.


Bacterial persisters: formation, eradication, and experimental systems

Internalization of Salmonella by Macrophages Induces Formation of Nonreplicating Persisters

5-10% of the non-replicating bacteria resume growth.

Toxin-antitoxin systems related to persistence of E. coli in vitro.

Thoughts: how to link dead/alive/persistence state to metagenomics data (single cell RNA-Seq?)

Stephen Bentley

Pneumococcus human host-restricted, usually lives in nasopharynx.

No overall change in species prevalence post-CV7. Subtle effects on resistance- generally remain stable.

Population genomics of post-vaccine changes in pneumococcal epidemiology

MDR not secret of success for antimicrobial resistance.

Association between high admixture and AMR previously shown

Consistency in recombination hotspots between lineages

Non-typeable clone most efficient recipient of DNA by recombination

MGEs in pneumococcal AMR

Prophage insertion in comYC genes blocks recombination in IC1 - !

Variable recombination dynamics during the emergence, transmission and ‘disarming’ of a multidrug-resistant pneumococcal clone

Fleming Prize Lecture

Michael Brockhurst

Sex, Death and the Red Queen

Running with the Red Queen: the role of biotic conflicts in evolution

Antagonistic coevolution accelerates molecular evolution.

Coevolution accelerates molecular evolution
Coevolution drives greater between-population divergence

In clinical samples:

Divergent, Coexisting, Pseudomonas aeruginosa Lineages in Chronic Cystic Fibrosis Lung Infections.

Highly parallel evolution of LES lineages: mexAB-oprM, creBCD, ampC, lasR, oprD, pmaA, etc. etc.

Rapid turnover of diversity within patients

Evidence for changes in diversity during exacerbations, and evidence for lineage 'switching' over time.

Single strain bacterial populations high diverse
Most diversity is present in individual sputum samples
Diversity in clinically important traits like AbR and secreted molecules
Genetic data shows parallel evolution and patient-patient transmission.

What's driving diversification in CF lungs? (Immune system Abx, species interactions, etc.)


Evolutionary adaptation to ASM environment by: 
- loss of motility structures esp flagellum
- mtabolic and biofilm changes

Adding temperate phages to artifiicial sputum medium selects for a different set of mutations than seen in CF in-host evolution, e.g. pili, Type 6 secretion, flagellum, quorum sensing, etc.

Phage insertions cause several parallel mutations.

AMR in South Asia
Stephen Baker

Return to pre-AMR era

S. Typhi: monomorphic

Emergence of fluoroquinolone resistance, independent hyrA mutations.

Fitness benefits in fluoroquinolone-resistantSalmonella Typhi in the absence of antimicrobial pressure.

A high‐resolution genomic analysis of multidrug‐resistant hospital outbreaks of Klebsiella pneumoniae

K. pneumoniae is exceptional coloniser of surfaces, tubes, etc.

K. pneumoniae outbreak - two distinct lineages, acquired blaNDM-1. 

Longitude Prize

The test must: identify when antibiotics are needed and if they are which ones to use.

Test must be: needed, accurate, affordable, rapid (<30 min), easy-to-use, scalable, safe, a prototype must be available.

Easy-to-use: minimally invasive, easy to dispose, long expiration, heat stable, withstand transportation, minimum maintenance etc.

Modelling Clostridium difficile infection
Caroline Chilton 

In vitro human gut model: tripe chemostat system arranged in weir cascade, primerd with faecal slurry, validated against the caecal content of sudden death victims.

Antibiotics knock down Bifidobacterial populations. Bacteroides not affected by clindamycin but effected by vancomycin.

16S profiling matches colony counts well.

Observed diversity highest pre-antibiotic, lowest with recurrence. Fidaxomicin less effect on diversity than others.

Biofilm human gut model using rods


Mutation rate and genotype variation of Ebola virus from Mali case sequences

Epidemiological and viral genomic sequence analysis of the 2014 Ebola outbreak reveals clustered transmission

Michael Tunney, QUB

CF microbiome: culture studies shows significant numbers of anaerobes (similar to P. aeruginosa).

Healthy airway microbiome quite similar to CF microbiome --- Streptococcus, haemophilus, Rothia etc. but don't see Pseudomonas, Burkholderia etc.

Diversity decreased in CF

Diversity positively correlated with lung function

Decade-long bacterial community dynamics in cystic fibrosis airways

Lung explant microbiome study

Healthy microbiome cannot be cultured in late-stage CF infection.

William Wade, Oral Microbiome

50% of oral bacteria are uncultivable

Human Oral Microbiome

Most human oral bacteria are found only in the mouth, notable exception Fusobacterium nucleatum.

Intra-oral habitats have characteristic microbiota.

Diet has relatively little effect on oral microbiome.

Why do historical dental samples correlate diet with microbiome? A: Effect of dental hygeiene.

Willlem van Schaik

E. faecium and E. faecalis genetically distinct - penicillin resistant.

E. faecium clade A1 "clinical isolates" - highest mutation rate
E. faecium clade A2 "animal isolates" - medium mutation rate
E. faecium clade B "human commensal" - lowest mutation rate

Phylogeny of closely related strains: gene content mirrors phylogeny. Differences caused by gain/loss of plasmids and phage-like elenents.

Hospital ICU microbiota: characterised by outgrowth of Enterococcus on long stays

Sewage resistome

Zamin Iqbal

75 out of 1607 samples have minor resistance calls

100% resistance correection on MDR S. aureus

Ebola virus

Prior to 2013: ~20 outbreaks, ~1,600 deaths, 25-90% mortality rate, 5 Ebolavirus species

Filovirus epidemic in 1956 in Bili, DRC - first Ebola outbreak?

Emergence of Ebola

25000 cases, 10000 deaths

Burial rites: involve touching the bodies, washing them. Mobile phone connectivity make it easier to gather more relatives for funerals.