24 Sep 2015
We've just finally found the time to break open the new SQK-MAP-006
kits from Oxford Nanopore. These kits are notable because they
introduce the first really major changes to the chemistry for some time.
- First up, the speed has been doubled from ~30 bp/s to ~75 bp/s.
The assumption is this will increase yields, but it will be
interesting to see what - if any - effect it has on quality profile.
The worry would be that increased speeds would increase the chance
of missing events (transitions between signal levels),
which would manifest as deletions after basecalling.
- Secondly, the previous hairpin-motor complex (which enabled 2D
reads and also stalled the complement strand) has been jettisoned
to return to a simpler setup. As I understand it, the hairpin
remains (and is now biotinylated and pulled down by beads to
ensure very high 2D yields) but the second motor has gone. The
new motor I assume is clever enough to be able to stall both
the template and complement strand. It will be interesting to
compare translocation times of the two strands (in SQK-MAP-005
the complement strand went through the pore more slowly,
as it was retarded by two enzymes).
The new chemistry is accompanied with a new Metrichor basecaller
workflow specific to SQK-MAP-006.
A notable change, looking at the returned FAST5 files, is that the
model is now considering signal levels from each of the 4^6 possible
combinations of 6-mers when doing basecalling. Before 5-mers were
used. Does this mean that the ionic flux through the nanopore is
in fact affected by 6 or more bases, rather than the 5 that we
initially assumed? Or was 5 simply chosen to simplify the analysis.
If the latter - and this seems likely - this may help with
basecalling accuracy and it will be interesting to see if it
resolves any previously difficult to sequence motifs (we looked at
such under represented sequences in our recent paper here in
the context of 5-mers:
It does not seem to be supported to call older, pre-SQK-MAP-006
data with the new 6-mer model basecaller.
Links to FAST5 data files
So far we have done four SQK-MAP-006 runs. Two were generated with natural
DNA, and two were generated with the low-input library that includes
a PCR step.
Each of the files below are archives of the runs following base calling
with Metrichor. We also provide a subset of one of the runs in 'raw'
format which has the individual signal measurements (i.e. before event
detection is carried out).
Run | Basecalled data | 2D pass FASTA
MAP-006-1 | MAP-006-1 basecalled | MAP-006-1 2D pass FASTA
MAP-006-2 | uploading | MAP-006-2 2D pass FASTA
MAP-006-PCR-1 | MAP-006-PCR-1 basecalled | MAP-006-PCR-1 2D pass FASTA
MAP-006-PCR-2 | uploading | MAP-006-PCR-2 2D pass FASTA
Head over to Jared Simpson's blog to see some early results of using these data for assembly polishing.
As always, thanks to Josh Quick for his masterful library preparation
28 Jul 2015
For some reason, calling a haploid consensus sequence from a VCF
seems harder than it needs to be.
I've experimented with samtools mpileup and bcftools call/consensus
with much frustration and little success, as it always wants to
call heterozygous positions which I don't want.
In the end the easiest way to do this I have found is to use
freebayes -f ref.fa -p 1 aln.sorted.bam > vcffile
And then use
vcflib to call a consensus
vcf2fasta -f ref.fa -P 1 vcffile
This will spit out a file with the consensus sequence.
Of course, given that the VCF format is not really a format,
trying to use
vcf2fasta on VCFs produced by other tools than
FreeBayes (VarScan, in my case) didn't work for me.
05 Jun 2015
The current Ebola outbreak in West Africa is the largest ever recorded, with over 26,500 cases reported resulting in an estimated 11,000 deaths. Yet genomic surveillance of this outbreak has been patchy, hampered by understandable but vexing logistical, social, political and technical obstacles in securing and transporting samples for processing.
We wanted to help address the gaps in our knowledge of viral evolution and to generate data for epidemiological use. So, in April, Josh Quick from my group went to Conakry, Guinea to establish proof-of-principle for portable nanopore sequencing. This was the most practical way we could rapidly establish a local sequencing lab in order to generate real-time information.
His travels have been documented in several recent news articles. For background I would recommend reading Erika Hayden's report over at Nature News, the BMC On Biology blog and this recent GenomeWeb article (registration free for academic subscribers).
In the two weeks he was there, he sequenced 14 genomes when based at Donka Hospital in Conakry. However, the surveillance sequencing has continued, thanks to the hard work of Sophie Duraffour in Coyah under the auspcies of the European Mobile Laboratory project. Sophie has been working around the clock in the laboratory generating the real-time genome data, uploading it to Birmingham for analysis and then distributing it to WHO central coordination. We have had early feedback that the data has been extremely useful for the epidemiologists on the ground.
As is often the case in outbreaks, genomic data production and sharing has been patchy and uncoordinated. However, a new exciting deveopment is under way to try and address this. Andrew Rambaut, author of essential phylogenetics software such as BEAST and FigTree and viral genome maven, has taken on a kind of unofficial role of coordinating genome sequence data, which is distributed through his website and forum Virological.org.
His personal database of Ebola genomes sits at nearly 1000 sequences and he has been privately sharing some wonderful integrated phylogenetic analyses covering the entire Ebola outbreak. However, until recently the sharing has been limited by access to public data. At a recent conference at the Institut Pasteur, I met him and his colleague Richard Nehrer and discussed ways to improve sharing. With Trevor Bedford, Richard are the developers of the nextflu website, which aims to track real-time evolution of flu.
I said that we needed this for Ebola, and of course they had already thought of this and had started building something. I said that we would contribute our nanopore sequencing dataset to this project in real-time, and those with large datasets to compare also contributed theirs.
So it is a real thrill to see the website up and running now and available to use at ebola.nextflu.org. On this website you can explore Ebola evolution during this outbreak, using controls to scroll through time, and restricting analysis to particular locations or laboratories. You can also zoom into particular clades, and see frequency distributions of specific mutations.
One thing that was particularly notable with the data integration is that our surveillance data from Guinea, when compared with Ian Goodfellow's recently produced surveillance data from Sierra Leone is that the two extant Guinean lineages overlap with cases from close to the Guinean border in Sierra Leone. This makes sense, and suggests that cross-country transmission may be frequently occurring.
We will be updating this website with new sequences generated by the EMLab until the end of the outbreak. We have decided that we will leave a one week delay before releasing it for WHO central coordination to see the data, and the data is limited to prefecture level information without more specific locations.
26 Apr 2015
I am at the incredibly impressive and huge ECCMID meeting in Copenhagen.
I've given a talk already on "So I have sequenced my organism .. what do I do now?" (organisers title!). It is viewable here:
Tomorrow I am doing a "Meet-the-Expert" session about what tools to use
for bacterial genome analysis, feel free to look at my slides in advance
and ask some questions (even if you aren't at ECCMID!)
One thing that is noticeable about this conference is how incredibly
high-tech the conference website is. Talks are posted in near-real-time
after they are given.
Here are some you should definitely check out!
Matt Holden, Whole genome sequencing for microepidemiological investigations: can person-to-person transmission be identified?
Ed Feil, Whole genome sequencing and public health: how can high-risk clones be identified and what can be learned for prevention and control?
Frank Aarestrup, Bacterial genome sequencing for outbreak detection
Diversity of P. aeruginosa in CF airways
The Gut and Lung Microbiota
Implications of microbiome alterations due to pro/antibiotics
Benefits of microbiome manipulation in reducing resistance
There's lots more at http://www.eccmidlive.org
30 Mar 2015
Some notes from SGM 2015:
Diabetes, obesity and gut microbiota
1. Microbiota-host interactions play a major role in obesity
2. Intestinal MyD88 is a sensor switching host metabolism during fat feeding
3. Endocannabinoids are key players involved in the microbiota-host interaction
Intestinal epithelial MyD88 is a sensor switching host metabolism towards obesity according to nutritional status
The endocannaboid system links gut microbiota to adipogenesis.
Adipose tissue NAPE-PLD controls fat mass development by altering the browning process and gut microbiota
Test from Torsten, not at SGM. Looked like an interesting talk!
Many species form persisters - highly tolerant to Abx. Persister cels represent a small sub-population caused b phenotypic switch. E. coli in vitro persisters are non-replicating bacterial cels.
Bacterial persisters: formation, eradication, and experimental systems
Internalization of Salmonella by Macrophages Induces Formation of Nonreplicating Persisters
5-10% of the non-replicating bacteria resume growth.
Toxin-antitoxin systems related to persistence of E. coli in vitro.
Thoughts: how to link dead/alive/persistence state to metagenomics data (single cell RNA-Seq?)
Pneumococcus human host-restricted, usually lives in nasopharynx.
No overall change in species prevalence post-CV7. Subtle effects on resistance- generally remain stable.
Population genomics of post-vaccine changes in pneumococcal epidemiology
MDR not secret of success for antimicrobial resistance.
Association between high admixture and AMR previously shown
Consistency in recombination hotspots between lineages
Non-typeable clone most efficient recipient of DNA by recombination
MGEs in pneumococcal AMR
Prophage insertion in comYC genes blocks recombination in IC1 - !
Variable recombination dynamics during the emergence, transmission and ‘disarming’ of a multidrug-resistant pneumococcal clone
Fleming Prize Lecture
Sex, Death and the Red Queen
Running with the Red Queen: the role of biotic conflicts in evolution
Antagonistic coevolution accelerates molecular evolution.
Coevolution accelerates molecular evolution
Coevolution drives greater between-population divergence
In clinical samples:
Divergent, Coexisting, Pseudomonas aeruginosa Lineages in Chronic Cystic Fibrosis Lung Infections.
Highly parallel evolution of LES lineages: mexAB-oprM, creBCD, ampC, lasR, oprD, pmaA, etc. etc.
Rapid turnover of diversity within patients
Evidence for changes in diversity during exacerbations, and evidence for lineage 'switching' over time.
Single strain bacterial populations high diverse
Most diversity is present in individual sputum samples
Diversity in clinically important traits like AbR and secreted molecules
Genetic data shows parallel evolution and patient-patient transmission.
What's driving diversification in CF lungs? (Immune system Abx, species interactions, etc.)
Evolutionary adaptation to ASM environment by:
- loss of motility structures esp flagellum
- mtabolic and biofilm changes
Adding temperate phages to artifiicial sputum medium selects for a different set of mutations than seen in CF in-host evolution, e.g. pili, Type 6 secretion, flagellum, quorum sensing, etc.
Phage insertions cause several parallel mutations.
AMR in South Asia
Return to pre-AMR era
S. Typhi: monomorphic
Emergence of fluoroquinolone resistance, independent hyrA mutations.
Fitness benefits in fluoroquinolone-resistantSalmonella Typhi in the absence of antimicrobial pressure.
A high‐resolution genomic analysis of multidrug‐resistant hospital outbreaks of Klebsiella pneumoniae
K. pneumoniae is exceptional coloniser of surfaces, tubes, etc.
K. pneumoniae outbreak - two distinct lineages, acquired blaNDM-1.
The test must: identify when antibiotics are needed and if they are which ones to use.
Test must be: needed, accurate, affordable, rapid (<30 min), easy-to-use, scalable, safe, a prototype must be available.
Easy-to-use: minimally invasive, easy to dispose, long expiration, heat stable, withstand transportation, minimum maintenance etc.
Modelling Clostridium difficile infection
In vitro human gut model: tripe chemostat system arranged in weir cascade, primerd with faecal slurry, validated against the caecal content of sudden death victims.
Antibiotics knock down Bifidobacterial populations. Bacteroides not affected by clindamycin but effected by vancomycin.
16S profiling matches colony counts well.
Observed diversity highest pre-antibiotic, lowest with recurrence. Fidaxomicin less effect on diversity than others.
Biofilm human gut model using rods
Mutation rate and genotype variation of Ebola virus from Mali case sequences
Epidemiological and viral genomic sequence analysis of the 2014 Ebola outbreak reveals clustered transmission
Michael Tunney, QUB
CF microbiome: culture studies shows significant numbers of anaerobes (similar to P. aeruginosa).
Healthy airway microbiome quite similar to CF microbiome --- Streptococcus, haemophilus, Rothia etc. but don't see Pseudomonas, Burkholderia etc.
Diversity decreased in CF
Diversity positively correlated with lung function
Decade-long bacterial community dynamics in cystic fibrosis airways
Lung explant microbiome study
Healthy microbiome cannot be cultured in late-stage CF infection.
William Wade, Oral Microbiome
50% of oral bacteria are uncultivable
Human Oral Microbiome
Most human oral bacteria are found only in the mouth, notable exception Fusobacterium nucleatum.
Intra-oral habitats have characteristic microbiota.
Diet has relatively little effect on oral microbiome.
Why do historical dental samples correlate diet with microbiome? A: Effect of dental hygeiene.
Willlem van Schaik
E. faecium and E. faecalis genetically distinct - penicillin resistant.
E. faecium clade A1 "clinical isolates" - highest mutation rate
E. faecium clade A2 "animal isolates" - medium mutation rate
E. faecium clade B "human commensal" - lowest mutation rate
Phylogeny of closely related strains: gene content mirrors phylogeny. Differences caused by gain/loss of plasmids and phage-like elenents.
Hospital ICU microbiota: characterised by outgrowth of Enterococcus on long stays
75 out of 1607 samples have minor resistance calls
100% resistance correection on MDR S. aureus
Prior to 2013: ~20 outbreaks, ~1,600 deaths, 25-90% mortality rate, 5 Ebolavirus species
Filovirus epidemic in 1956 in Bili, DRC - first Ebola outbreak?
Emergence of Ebola
25000 cases, 10000 deaths
Burial rites: involve touching the bodies, washing them. Mobile phone connectivity make it easier to gather more relatives for funerals.