Ultra long reads (up to 882 kb and indeed higher) can be achieved on the Oxford Nanopore MinION using traditional DNA extraction techniques and minor changes to the library preparation protocol, without the need for size selection
The protocol is available here; it involves a modified Sambrook phenol-chloroform extraction/purification, DNA QC, minimal pipetting steps, high-input to rapid kit and MinKNOW 1.4
We have tested it on E. coli and human so far with good results; data is of course available
Ultra-long reads: background
What if you could sequence E. coli in just one read? This was the challenge I set Josh. And why can’t we do that, if nanopore sequencing really has no read length limit?
Well actually: we’re not quite there yet, but we did manage to sequence 1/6th of the whole genome in a single read last week. Here’s how we (well, he) did it. As usual we like to release our protocols openly and early to encourage the community to test and improve them. Please let us know about any tweaks you find helpful! The community seems very excited by this judging by my Twitter feed and email inbox, so we have rush released the protocol. The tweets have also inspired commentaries by Keith Robison and James Hadfield, thanks guys!
First … a bit of background and the importance of working with moles not mass. This line of thinking was triggered during the Zika sequencing project when we noticed our yields when sequencing amplicons was never as good as with genomic DNA. Why was that?
We decided a possible reason is that nanopore sequencing protocols are usually expressed in terms of starting mass (typically 1 microgram for the ligation protocols). But of course 1 microgram of 300 bp fragments is a lot more (>25x more) DNA fragments compared to 1 microgram of 8000 base fragments. By not factoring this in the library prep, likely we were not making an efficient library because the protocol has not been scaled up 25 times to account for this difference. It stands to reason that it’s the molarity that’s important when loading the flowcell rather than the total volume of DNA. If you could load some imaginary molecule of DNA with mass 1000ng (bear with me), the chances of that interacting with the pore is still quite low. More molecules means more potential interactions with the pore, meaning more potential yield.
We calculated the desired starting molarity as 0.2 pM based on the length assumptions in the ONT protocol (in practice you load about 40% less after losses from library construction). So by increasing the amount of barcodes and adaptors, as we do in our Zika protocol, we can compensate for this.
That solves the short read problem, but we started thinking about how it would work in the other direction. What if you wanted to get the longest reads possible, what would this mean in mass? The rather silly idea was — if you wanted to get reads sufficiently long to cover a whole bacterial chromosome in a single read, what would the starting DNA concentration need to be?
The math here is simple; you just need to scale the starting DNA by 500x. But this would mean starting with ~500ug of DNA into the library preparation!
500ug of DNA is… quite a lot. And practically there are several problems with this idea:
you would need a lot of cells to start with (perhaps not such a problem with bacterial cultures but certainly restrictive for some applications)
what volume do you elute in? DNA starts to get viscous and thick as concentrations increase, at some point you just won’t be able to pipette any more
how do you deposit that much DNA into the flow cell?
So - we slightly scaled down our ambitions and decided that it could be practical to scale up the protocol 10-fold, which could still result in average 80 kb reads, a significant improvement to the 8kb typically seen with the standard protocol.
We’d already been using the Sambrook protocol (from the classic Molecular Cloning - over 173,000 citations!) for our human genome extractions, which reliably gives very high molecular weight DNA that can be recovered with a Shepherd’s crook fashioned from a glass rod). Previously Dominik Handler demonstrated that HMW extractions with careful pipetting could generate long reads with the rapid kit. So we did a new Sambrook extraction using an overnight culture of E. coli K-12 MG1655 and generated something that was very pure (260:280 of 2.0) and very high molecular weight (>60kb by TapeStation - the limit of the instrument). In fact the DNA is so long that you can’t really size it without employing a pulse-field gel electrophoresis setup. Sadly we don’t have a working one in the department, so infrequently are they used these days. So we were flying blind in terms of the true length of the fragments.
Scaling up the rapid kit was relatively straight-forward when dealing with inputs up to 2 ug. You get DNA at a concentration of 250 ng/ul then add the maximum 7.5 ul. However to get inputs of 10 ug it requires concentrations of 1 ug/ul where things start to get tricky. The library is so viscous loading beads start to clump together and it becomes harder to get the library through the SpotON port on the flowcell. Not satisfied with 10 ug either we pushed on towards 20 ug which required making a double volume library and adjusting the dilution downstream. We eventually settled on a protocol which could reliably give read N50’s over 100 Kb (i.e. half of the dataset in reads of 100 Kb of length or greater) with a tail stretching out to 500 Kb, or sometimes beyond…
The final piece of the puzzle was something we were aware of; the nanopore control software as of version 1.3 does periodic ‘global voltage flicks’ - meaning that the voltage is reversed across the flow cell every 10 minutes. The aim of this is to prevent strands or proteins blocking up the pores, by a rapid change of the direction of the ionic current. However, the problem with a 10 minute flicking interval is that it intrinsically limits the longest read on the system to 150kb (with 250 base/s chemistry) and 270kb (with 450 base/s chemistry). In MinKNOW 1.3 you could change the script parameters (stored in a YAML file) to remove this flick, but in MinKNOW 1.4 luckily it has been dispensed with entirely in favour of a much smarter system that dynamically unblocks individual pores on demand.
So … how does it look after all that’s been done?
We ran E. coli K-12 MG1655 on a standard FLO-MIN106 (R9.4) flowcell.
E. coli stats
Total bases: 5,014,576,373 (5Gb)
Number of reads: 150,604
Read length stats
Ewan Birney suggested this would be more interpretable as a log10 scale, and by golly he was right!
But hold your horses. As Keith Robison likes to say, and Mark Akeson as well, it’s not a
read unless it maps to a reference. Or as Sir Lord Alan Sugar might say,
“squiggles are for vanity, basecalls are sanity, but alignments are reality”.
Are these reads actually real, then?
Just judging by the distribution it’s clear that this is not all spurious channel noise.
Let’s align all the reads…
This dataset poses a few challenges for aligners. BWA-MEM works, but is incredibly slow. It goes much
faster if you split the read into 50kb chunks (e.g. with split.py) but this is a bit annoying.
I decided to use GraphMap, this has a few useful functions - it will try to make an end-to-end
alignment and it also has a circular alignment mode, which is useful as we would expect many of
these reads would cross the origin of replication at position 0.
Another problem! The SAM format will not convert to BAM successfully, so I’ve output using the
BLAST -m5 format for ease of parsing. The SAM/BAM developers are working on this (CRAM is fine).
After a solid couple of days of alignment, here are the results:
So we lose a few of the really long reads here which are obviously noise (the 1Mb reads
is just repetitive sequence and probably represents something stuck in a pore and the 900Kb
read is not a full-length alignment), but otherwise there is an excellent correlation
between the reads and alignments.
That’s theoretical 1x coverage of the 4.6Mb chromosome of E. coli in just the 7 longest reads !!
95.47% of the bases in the dataset map to the reference, and the mean alignment length is
slightly higher at 34.7kb.
A few other notable things:
The 790kb read that didn’t align full-length is interesting. On inspection it is
actually two reads - the template and complement strand of the same starting molecule,
separated by an open pore signal. This gives us a clue as to how the proposed 1D^2
technology (which is replacing 2D reads) could work.
Calling the two reads together (thanks, Chris Wright) gives a 95% accuracy read!
We’ve started using the Albacore basecaller for this, rather than uploading to Metrichor.
Albacore seems to keep up with basecalling a live-run when using 60 cores.
So we would like to claim at least four world records here!
Longest mappable DNA strand sequence **
Longest mappable DNA strand & complement sequence
Highest nanopore run N50 (not sure about other platforms?)
Highest nanopore run mean read length
(**) we’ve actually beaten that record already with another run, but a subject for another
An interesting exercise for the reader is to figure out the minimum number of reads that can
be taken from this dataset to produce a contiguous E. coli assembly! My first attempt found
a set of 43 reads which covers 92% of the genome, but you can do better!
Where now? Well, readers will notice that a real landmark is in sight - the first megabase
read. We’ve been running this protocol for a bit over a week and a new hobby is ‘whale
spotting’ for the largest reads we can see.
We haven’t quite yet worked out a systematic naming scheme for whales, but perhaps Google
has the answer.
So in that case, we’ve in the past few days hit our first narwhal (an 882kb read from a different run, which translates to a 950kb fragment judged against the reference).
How can we go longer? Well it might be possible to increase the DNA input some more, but
we start hitting issues with the viscosity which may start to prevent pipetting onto the flowcell.
Also pipette shearing forces are presumably an issue at these concentrations.
The general consensus is that we will need to employ solid-phase DNA extractions and library construction, e.g. in agarose plugs. The SageHLS instrument also looks quite interesting.
The nanopore squad, John Tyson and Matt Loose provided much helpful advice and input during the development of this protocol. Matt Loose came up with the whale naming scheme.
Thanks to ONT for technical support with particular thanks to Clive Brown, Chris Wright, David Stoddart and Graham Hall for advice and information.
Conflicts of interest
I have received an honorarium to speak at an Oxford Nanopore meeting, and travel and accommodation to attend London Calling 2015 and 2016. I have ongoing research collaborations with ONT although I am not financially compensated for this and hold no stocks, shares or options. ONT have supplied free-of-charge reagents as part of the MinION Access Programme and also generously supported our infectious disease surveillance projects with reagents.
Fields of importance: Floor, Level, SURFACE, BUILDING
Q: What surfaces have the greatest amount of diversity? Is this expected?
Q: What do the profiles of stool, etc. look like?
Q: Are there any natural looking clusters in the data?
Q: Which sources of samples are most similar to others?
Q: Is there any clustering between different floors of the building?
Q: Compare the weighted vs unweighted Unifrac results, do the clusters look more natural in one or the toher?
Q: Which surfaces have the most diversity? Least?
Q: Now, read the paper in more detail and prepare a short summary to present to the whole group. Consider: the context for the study, the methods that were employed and the results found. What did you think? What are the limitations of the study?
Q: Is there any evidence of a gradient? (Key: use SampleID and turn gradient colours on)
Q: How do the taxa change over time?
Q: Which infant samples do the maternal stool most look like?
Q: Is the colour of stools associated with their bacterial diversity?
Q: Now, read the paper in more detail and prepare a short summary to present to the whole group. Consider: the context for the study, the methods that were employed and the results found. What did you think? What are the limitations of the study?
“The past is a foreign country” – well, that’s how I feel about January 2016 looking back today. Definitely some things happened in January, but can’t remember them. So I’m using Twitter Analytics to remind me.
Oh! This was the month that #researchparasites came out, to the horror and amusement of the genomics field:
The logical fallacy of #researchparasites: expert data gatherers are highly unlikely to be the best people to analyse their own data.
Phylogenetic analysis showed that the new cases were very closely related to an Ebola genome sequenced 500 days previously, as can be seen from this NextStrain tree.
Independently, the epidemiologists identified a survivor who had been infected some 500 days previously, the very same individual.
This was a remarkable demonstration of the power of genomics, working in synergy with the epidemiologists on the ground.
Around the same time, we learnt we had been funded by MRC/Wellcome Trust/Newton to receive funds as part of the emergency response to Zika. Remarkably, the outcome was known just a few weeks after submitting the application, and we had the money just days after that. If only all grant funding could be like this …
We hit the road for the ZiBRA project and started generating Zika genomes working in collaboration with the local public health laboratories. Lots of diaries and blog posts are on the Zibra project website if you want to read more about this trip.
We made lots of new lifelong friends in Brazil, and we didn’t die even though our bus caught on fire at one point, although we hit some technical obstacles with sequencing very very low abundance samples.
The year seemed to be going pretty well. Until the Brexit vote …
Going to bed feeling quite relaxed that the outcome will be fairly strong remain vote tomorrow. Don't make a mockery of this tweet overnight
So far over 150 research groups in the UK have signed up for our virtual machine infrastructure which runs across three sites (Birmingham, Warwick and Cardiff) with Swansea to launch in 2017. Particular props to Radoslaw Poplawski, Tom Connor, Andy Smith, Marius Bakke and Matt Bull on the technical side who helped get this launched - just in time!
We sweated over the Zika sequencing protocol and eventually by the end of summer Josh Quick nailed something that worked well on samples with very low viral copy numbers.
In August we just about had time to fit in a week in Cornwall to teach Porecamp with Konrad and his crew.
Pablo, Emily, Jennie and Andy really got MicrobesNG motoring (over 7500 genomes sequenced with a median wait time of 6 weeks!), with insert sizes bolstered with a nice new Nextera XT protocol.
Zika genomes were coming out thick and fast thanks to the new protocol, our Brazilian collaborators, Sarah Hill and Alli Black. Josh’s third trip in 2017. A picture of Zika diversity was now starting to be built (beautifully visualised by Trevor and Richard’s wonderful Nextstrain site - vote for them to win the Open Science prize!).
All the depressing news on Twitter got too much for me, and I took a break for 3 weeks. After 24 hours of extreme withdrawal symptoms, it was actually quite nice to do something else with my time, like imagining what people were saying on Twitter.
Relentless Brexit and Trump news is turning Twitter into an ordeal rather than a pleasure these days. (I realise this isn't helping).
A relaxed end to the year – we did a bit of Beach (well, Leith) sequencing with Andrew Rambaut and Tom Little in Edinburgh, look out for more BeachSeq action early in 2017..
And we even managed to release data from a 30x human genome on MinION working collaboratively on the sequencing with Nottingham, UCSC, UBC and Norwich, a mere 39 flowcells for that (assembly N50 - 3Mb!):
Finally in December, we heard the great news that the Ebola ca suffit! trial reported 100% efficacy for the Ebola vaccine. Well done Stephan, Miles, Sophie and all the others who worked on this.
A few changes in 2017
The MicrobesNG team sees a change in 2017 - we are sad to say goodbye to Andy Smith - our database programmer on the MicrobesNG project. He’s done an amazing job building the MicrobesNG website, our LIMS, the CLIMB Bryn site, and even had time to help out with the Zibra Project database and the Primal Scheme site. We cannot be too annoyed that he only spent a year with us – he’s had the once in a lifetime opportunity to become a trainee pilot with Aer Lingus, a lifetime dream for Andy.
There have been some changes in Birmingham too – it’s been really nice to have Alan McNally join the IMI as a new Senior Lecturer. And we are really excited that Willem van Schaik is joining the IMI later in April, Brexit be damned!
Politically we are in uncharted territory, so we enter 2017 with some trepidation about what is to happen to the scientific environment, but we also hope that the awesome wins out.
Happy New Year to all friends and collaborators from the Loman Lab!
The nanoporati are currently thrilling to a
bevy of new announcements from Oxford Nanopore Technologies (ONT). More information over at the “wafer-thin update” and insightful commentary from Keith Robison on his blog.
But amongst the noise and excitement of future products, there are three important updates we are focusing on right now:
the release of the 5-10 minute 1D rapid prep (Mu transposase based)
coupled with the new R9 (now R9.4) chemistry that produces usable and high accuracy 1D reads (both discussed in this previous post)
and, new updates to the pore, membrane, motor and loading protocol which suggest 5-10Gb output may now be achievable.
We just received our first R9.4 double-speed (450 b/s) kits and so we will see how it looks soon, but as of now, we are able to get up to 3Gb of output on the vanilla R9.
The significance for our work: we can now start to consider using MinION for metagenomic sequencing (previously we have restricted our ambitions to sequencing individual viruses and bacterial cultures due to relatively low outputs).
Ultimately our research group would like to get to culture-free diagnosis of infectious diseases, with full genomic coverage, as a near-patient assay. There have been a few proof of principle papers here including use on Ebola and chikungunya (from Charles Chiu) and on bacterial urinary tract infections (from Justin O’Grady).
However, for portable metagenomics sequencing to really become a viable prospect, the sample needs to be rapidly prepared at point of collection (from near the patient, in diagnostics, or from water, food, animals, the natural environment, etc.).
In my view, local sample preparation, DNA extraction and local bioinformatics analysis are now the major open issues for portable sequencing.
To illustrate this point, we recently saw the exciting news that the nanopore had been run on the International Space Station - surely a landmark moment in genomics. But yet the sample was still not prepared or DNA extracted in space.
And sadly, you cannot get away without proper DNA extraction. We saw in the past few days the bizarre spectacle of David Eccles and Chris Mason at a conference in Australia attempting to sequence various food samples (coffee, strawberries and cream, etc.) on the nanopore, using the 1D prep. A valiant experiment, but the output was effectively noise due to a lack of pure DNA prep. We experienced similar results when attempting to sequence from a virtually DNA free sample on the beach in Cornwall.
So, DNA extraction remains a fact of life for sequencing.
For single molecule sequencing it’s even trickier: you need high purity, high molecular weight, high concentration DNA to get good results from single molecule sequencers like the nanopore (current input 500ng for the transposon prep).
This should not be problematic for many environmental samples. When dealing with low concentration samples, the easiest way of doing this is via PCR (targeted or untargeted WGA, although fragment length can suffer without significant optimisation).
Solutions for portable sample preparation
Whilst presumably not a big market yet, a few companies have started producing solutions for ‘in-field’ sample preparation and DNA extraction. In the rest of this post I want to explore some of the available options for portable sample preparation and DNA extraction.
Just as a reminder, the steps for sample preparation are to a) make the sample safe (particularly important e.g. in Ebola) b) homogenise the sample and lyse cells c) extract DNA and then d) make a sequencing library.
Microbiome maven Elizabeth Bik has a very nice review of this in a recent article which is focused around microbiome studies but applies equally to other types of study:
Bead Beating/Tissue Lysis
Many samples, and particularly environmental samples, need homogenisation and cellular disruption before DNA extraction can proceed efficiently. One of the most popular methods is bead beating, which usually requires a benchtop instrument. Luckily, there is a portable, battery-powered method available in the form of the TerraLyzer (we have one). This device available, available from Zymo Research, uses a converted power tool to act as a portable bead beater. It’s a solid bit of kit, but costs about $1000. If that’s too rich for you, Russell Neches has developed a template that can be 3D printed to turn a Craftsman automatic hammer into a portable bead beater.
Here is a video of Russell using the TerraLyzer to extract DNA from cat poo:
DNA can be extracted very simply from a variety of foods like bananas or strawberries and is method probably familiar to school children; First fruit is blended to break up the tissues, washing up liquid is then added to breakdown the cell membranes before being straining to remove solids. DNA is then precipitated by adding alcohol and spooled off using a toothpick. More detailed instructions here: http://biology.about.com/od/biologylabhowtos/ht/dnafromabanana.htm.
100% ethanol is a problematic substance to ship (it is banned from aircrafts), so it is an open research question about whether a lower proof alcohol that is readily available, e.g. vodka, would be an acceptable substitute. Please fund this important project.
Claire Lonsdale brought along an interesting device to Porecamp in Cornwall called the PureLyse from Claremont Bio. This device combines bead-beating with DNA capture using silica beads which are agitated by a small motor. They have built a small disposable device which combines a syringe and a reusable battery pack. The sample, ideally bacterial culture, is aspirated via the syringe then the motor is turned on for a minute to burst the cells/bind the DNA. Claire presented results at London Calling demonstrating that the DNA extracted is probably suitable for PCR but may be too fragmented for single-molecule sequencing.
Screenshot from Claire’s London Calling talk, showing rapid extraction on the right compared to a regular spin-column extraction on the left.
The announced, but not currently available Zumbador from ONT looks to take this syringe concept further, by including reagents for lysis, purification and potentially library preparation in a single pre-loaded cartridge. This looks appealing but the worry for those of us who deal with a lot of DNA extractions from different organisms is which cell lysis solution is likely to be universally applied to all manner of organisms with quite different cell wall compositions - Gram positives and spore forming bacteria are notoriously tough shells to crack, this may need to be combined with the bead beating step above.
Zymo also offer the Xpedition kit range, which are designed with field work in mind and contain a stabilisation solution which will preserve your DNA (after bead beating with the TeraLyzer) for up to a month at room temperature.
However, you can also use traditional column-based extraction method in the field, if you have a:
From my research, microcentrifuges are nearly all mains powered, which limits their utility in the field.
A homebrew solution is simply to modify a cordless drill with a 3D printed centrifuge adaptor, one example being the DremelFuge, that offers up to 52,000g/rcf acceleration.
However one should be extremely careful here because a flying, solid object at these rotations could cause serious harm, please take appropriate safety precautions if you are thinking of using this solution. Disclaimer! More generally, if in doubt about any safety aspects of field sample preparation, please first get in contact with your local safety officer for advice.
An alternative is to adapt a regular lab microcentrifuge that can take DC input, as that means they can be easily powered from a Lithium-Ion battery pack.
Portable PCR Thermocyclers
The MiniPCR is a fantastic (we’ve got one) biohacker/kickstarter product which costs £500 from Cambio in the UK. It is programmed via a laptop or phone but then must be plugged into a mains adapter or battery pack to start the program. We bought a LiPo powerbank off Amazon for £70 which can provide the 19V, 3.7A power requirement. They also produce a small electrophoresis and visualisation system to go with it.
An alternative is the Bento Lab from Bento Bio. This device caught many people’s attention with its Fisher-Price toy looks and intriguing functionality - it is a PCR thermocycler, gel visualiser block and minifuge all in one! Although mains powered, it should draw sufficiently little power that it could be powered via a car battery or possibly a Lithium pack. We had the pleasure of seeing a prototype box and it kicks ass - the only problem at the moment is that it’s still not available to buy. I hope it will ship soon and we’ll be first in the queue to test it out.
## Portable Liquid Handler
Pipetting is only accurate at relatively large volumes (>1 ul) which both increases reagent costs and can be a major source of errors with multi-step protocols. The Voltrax is an interesting device that was announced by ONT at London Calling 2015 and has not yet been seen in the wild, although the access programme was recently announced. The basic principle is the movement of ultra low liquid volumes around a matrix through an applied electrical current - a process called electrowetting. The appeal of such a process is that complex pipetting and mixing steps could be automated (apparently via a scriptable Python interface).
There may well be more that I have not mentioned … feel free to drop your suggestions in the comments box below!
Conflict of interests
I have received an honorarium to speak at an Oxford Nanopore meeting, and travel and accommodation to attend London Calling 2015 and 2016. I have ongoing research collaborations with ONT although I am not financially compensated for this and hold no stocks, shares or options. ONT have supplied free-of-charge reagents as part of the MinION Access Programme and also generously supported our infectious disease surveillance projects with reagents. Cambio sent us some free reagents to go with the MiniPCR instrument we purchased.
Thanks to Josh Quick for contributing to this post, and to Matt Loose and John Tyson for reading a draft version.