A few more MiSeq nuggets

14 Jan 2011

Following on from Wednesday's post on the HiSeq, I had the opportunity to quiz Neil Ward from Illumina further about the MiSeq. A few notes from our conversation.

One thing people on Seqanswers.com were curious about is how the machine has become so much quicker than the HiSeq. This is mainly down to the reduced size of the flowcell. This means a single fixed field of view for the camera, i.e. the camera can see the entire flowcell at once so no moving parts required. The HiSeq currently takes about 1 hour per base, with 15 minutes of that being fluidics and 45 minutes being imaging, so that's where the major gain in speed is made. Secondarily, the reduced size of the flowcell means the fluidics are much more efficient. Assuming 1 hour for cluster generation, the MiSeq should run about 1 base every 5 minutes. A massive improvement.

I asked about quality scores - one problem with Illumina sequencing in the past has been quality drop-off towards the 3' end of the read. Neil reckons when running 2x150bp configuration that >75% of the 150th base is Q30 or higher. Hopefully he will send some charts to understand the error profile in more detail.

The use of the Nextera kits mean that the input requirements are ~50ng, potentially useful for applications where you don't want to amplify your sample.

The MiSeq will be compatible with the mate-pair protocols (for long jumping libraries) but the libraries will need to be prepared off the machine in the regular way.

The MiSeq has everything you need to get going in the box - what used to be the PE module and the cluster station are built-in. You aren't going to need any extra bits of kit to get going other than standard mol biol wetware. Even the built-in server will be powerful enough for primary and secondary analysis (although what manufacturers mean by this isn't usually what bioinformaticians mean).

We probably won't see these in UK 'til August but the price should relate to the exchange rate, so about £85,000 in today's money.

For more on the battle between Ion Torrent and MiSeq, Keith Robison has it sown up.

MiSeq: Now that's what I'm talking about ...

12 Jan 2011

Illumina announced the MiSeq today. A direct aim at Ion Torrent's PGM and Roche's 454 GS Junior and a strong bid for the potentially lucrative clinical diagnostics by sequencing marketplace.

As always we should be cautious about the specs before the machine is in the hands of any users, but on paper they have (in my opinion) the most compelling offering of the 3. I'm discounting Life's SOLiD PI from the equation.

The machine is coming in at $125k with a run cost of $400-$750 depending on setup. Read length is up to 2 x 150bp reads, 1Gb of sequence and 6.5M reads per run.

The major innovation here is the run time, between 4 hours (fragment 35bp) and 27 hours (150bp paired-end) which has neutralised the major criticism of the Illumina platform (the MiSeq's big brother can run for up to 2 weeks).

Crucially, this number includes the cluster generation (amplification) stage unlike the 2 hour figure for Ion Torrent which depends on bead-based emulsion PCR being completed first.

Although the machine is a bit pricier than Ion Torrent, you'll need a lot less ancillary lab equipment to get it working. And you won't have to spend days agonising why your emulsion PCR failed.

Ion Torrent now need to urgently juice their throughput to respond to this machine. Look for the 316 chips being rushed to market and a new one announced.

Roche just need to do something - anything - to get back in the game. Their only advantage right now is read length (a win mainly for PCR amplicon studies) but I predict this advantage will not last long.

Check out Keith Robison's post at Omics Omics (so quick, I'm in awe!), also GenomeWeb, Forbes.

I'll sign off with this great quote from John Hawks:

Put these things together, and personal genomics today is where personal computing was in 1973. We haven't yet had an Altair, much less an Apple 2. But it's almost in reach.

Ion Torrent: Hype cycle status "disillusionment"

16 Dec 2010

I'm as guilty as anyone about buying into the hype. When I labelled the Ion Torrent an "agile gazelle" earlier this year, a reader admonished me for not being skeptical enough. Well, he was right, I was wrong. The tediously useful hype cycle has come into play and we've hit disillusionment rather earlier than expected with Ion Torrent.

I heard the launch specs for Ion Torrent in November at the ELRIG meet-up in Hinxton. The information we got there was that the shipping "314" chip would permit 100k reads of 100 bases length, i.e. 10 megabases per run. To put that in perspective, it's 1/40th of a 454 run or 0.00005 of a HiSeq run. This is disappointing for those who, like me, had believed 100-200 megabases was going to be the shipping specification, useful enough for bacterial genomics. 10 megs is useful pretty much only for amplicons and viral genomes.

Even more of a blow to this instrument's prospects is the sample preparation workflow - once shrouded in secrecy - this turns out to be virtually identical to 454's. That means fragmentation, adapter ligation and amplification using a bead-based emulsion PCR stage. This is a major hassle and makes a mockery of the instrument's 2 hour run time. A 454 only takes 8 hours to run but the user is tied up for days making libraries. The Ion Torrent, like the 454 will therefore be idle most of the time. Some of this can be improved by automation, but the presently available automation solutions cost more than the Ion Torrent!

Rothberg explains this away "If it takes a machine two weeks to sequence [a genome], it doesn’t matter if the sample prep takes 1.5 days. But if you’re getting sequence in two hours, it does!”, conveniently forgetting the 454, his own invention! Is it significant that 454 has had no major innovations for 2 years now, still generating 400-600Mb and 1m reads. Let's hope the chip can outperform the 454 optics -based system.

Lastly, and not surprisingly, the instrument suffers the same "read-forward" issues as 454 which means homopolymers can't be called accurately, and the reads are full of indels. This is inevitable with a flow-based approach. When questioned on whether Ion Torrent performed worse or better than with 454 in this regard, I got a "no comment".

Specs are supposed to improve with the announced but unavailable "316" chip which is supposed to bring output up to 100 megabases. But despite the promises of infinite scaleability through semiconductors, the launch spec is poor. It is instructive that the 314 chip has 1.4m "sensors" but that this doesn't yet seem to translate into read count. Is the argument for semiconductor scaleability fragile?

Anyway, turned out I didn't need to blog on this because Life Tech have publicised the limitations of their technology themselves, in the form of their $7M contest to "democratise" sequencing. The plan is to outsource the technical development to the community. This is a genius move - if it works - if these 3 drawbacks can be fixed quickly (and so cheaply) then the machine has a great future ahead of it.

But I would urge caution for those people thinking of taking part - the market value of your successful invention will be greater than the prize money on offer! Maybe you can sell it to Roche who desperately need to overcome the same problems with the 454!

Remember Solexa was sold to Illumina for $600m because of the genius of their solid-surface cluster based DNA chemistry. That genius has permitted Illumina to comprehensively beat Moore's Law since the introduction of their technology, scaling greater heights daily (at ELRIG, they announced a ~400Gb run from a HiSeq!).

I love the idea of out-sourcing method development to the community, its very zeitgeisty and Web 2.0. But the community should get a cut of the profits, not a cash prize. Although (as far as I can see) the T&Cs have not been published yet, you can bet that Life take ownership of the intellectual property. Please correct me if I'm wrong, Life Tech!

There's also a worry here, don't Life Tech have a solid roadmap to juice throughput, improve sample workflow and reduce error rates? Alarm bells are sounding.

In the meantime, the jury is out for the Ion Torrent. It's got a great price tag at $50k, but a lot of problems need to be overcome quickly.

What do you think? Is this helping democratise sequencing, or is it a cynical tactic to get cheap R&D?

Genome sequencing platforms compared for bacterial de novo assemblies

15 Dec 2010

Wow, I haven't blogged for ages. Partly this is the usual excuse of not having time, and partly a lack of inspiration. Sorry. Perhaps just before Xmas is the wrong time to get my mojo back, but I guess that's the way life is.

So what's been happening? Well, on the sequencing front we've recently been celebrating getting single-scaffold assemblies for bacterial genomes, a grand total of 4 in a week! This was achieved with the 454 8kb paired-end protocol and 454 WGS data. I know lots of other groups have done this, but it is very satisfying when it happens to you!

That brings me on to some results which I thought were interesting enough to share. Mike Halachev, my fellow developer on the xBASE project was importing the latest batch of bacterial genomes deposited in GenBank and noticed that the COMMENT block often reveals the sequencing platform, coverage depth and assembler used. Needless to say, like a good bioinformatician, he decided to graph the results and see what they showed us.

Firstly, incomplete bacterial genomes submitted to NCBI over the past 12 months (fig 1).

Out of an amazing 514 projects, the majority of people preferred to use 454 for sequencing (286), about half as many used Illumina (144) and most of the rest went for a hybrid 454/Illumina approach. SOLiD (ABI) was used almost as much as Sanger, i.e. not a lot. This is kind of what I would expect, the 454 is a good and tested platform for de novo assembly of microbial genomes. But I might have expected more Illumina deposits given that the large sequencing centres are so focused on this instrument. Some bacterial resequencing studies only do mapping and so the reads end up in the Sequence Read Archive, not covered by these data. I expect the balance to shift in the next 12 months a little towards Illumina.

Coverage for different platforms (fig 2).

As you might expect, Illumina assemblies have an average greater coverage (median 67x) versus 454 assemblies (25x) reflecting the increased throughput of these instruments. SOLiD is a bit skewed by the 7 Listeria genomes submitted by Life Tech, each at >200x coverage. For 454 quite a range of coverage depths are see nfrom 10x but going up to 200x. It's a bit of a waste of money getting that much coverage. For Illumina the range is higher and narrower, concentrating around the 60x mark.

In terms of number of contigs (fig 3) it is surprising and notable that the 454 and Illumina contig numbers are comparable despite the difference in read-length.

Of course 454 covers GS 20, GS FLX and Titanium read lengths and Illumina can be run fragment or paired-end from 25 - 125 base pairs, so the comparisons are not direct. I would presume most of the Illumina sequences used paired-end sequencing which produces the equivalent of ~250bp reads. The 454/Illumina hybrid assemblies are not obviously better with some being much worse which I think reflects the lack of a decent assembly pipeline for combining these data. The SOLiD assemblies are pretty bad, reflected in those Listeria Life Tech sequences again. These data may be skewed by the fact many people omit their really small contigs when depositing in GenBank. N50 would be better but I don't have that information.

Plotting coverage / number of contigs (fig 4) you can see a truth that is still unpalatable to some people (forgive me for not doing any linear regression here) - increasing coverage beyond a certain point (I think about 15x for 454) doesn't mean you get fewer contigs. For those raised on Sanger sequencing and Lander-Waterman statistics this is a bit of a surprise. When planning an experiment it is important to realise that the assembly will never be in fewer contigs than there are repeat regions in the genome (longer than the read length). It's impossible without some manual finishing or guessing against a reference. If you add in scaffolding this is still true but contigs can be oriented and gap lengths defined.

Fig4a

Update: And for Lex Nederbragt who took the time to post in the comments, here's a log/log scale. It strikes me that a few of the genome projects labelled 'ABI' are likely Sanger, and the ones you can see in the top right are SOLiD. I'd be inclined to ignore the outliers which look like they result from mistakes when filling in NCBI's genome project submission form.

Finally, what assemblers are in use? Well there is really only two contenders for the crown of most popular assembler for bacterial data: Newbler for 454 data (does a good job, in my experience) and Velvet for Illumina / SOLiD data. Celera is popular, but mainly at JCVI for obvious reasons. I find it interesting that few other short-read assemblers get a look in, especially as there are heaps of them.

Well, I hope you found that interesting, and I promise not to leave it so long for my next post!

Gazelles, elephants, blue whales and dodos: next-generation sequencing at the zoo

18 Aug 2010

The big news today is that Life Tech, of SOLiD fame intend to acquire Ion Torrent, subject to certain technical milestones being reached. The best blog coverage is at Omics Omics and Genetic Future.

This means Illumina, Life Tech and Roche, our sequencing Big Three have all now got into bed with "next-next-generation" technology platforms. Roche have previously signed up IBM's nanopore technology and Illumina have entered into a marketing and distribution alliance with Oxford Nanopore.

What does this all mean?

Each platform holder is trying to balance their portfolio and position for the future. Roche are struggling to get much more throughput from the 454 platform and so it makes sense that they focus on the potentially superior nanopore platform. This is some way off, perhaps 5-7 years. Roche's portfolio has got a big hole the middle right now.

Illumina are dominant but are also looking to the future. Oxford Nanopore looks like a sensible alliance, and I'd predict we'll see it hit in about 3 years time.

By snapping up Ion Torrent, Life Tech have gone for a quite different technology which looks much more likely to integrate happily into the clinical diagnostics market. As announced, Ion Torrent's closest competitor is the 454 Jr.

In fact, the Ion Torrent system is rather similar to the 454 sequencing system in many ways (not surprising as Jonathan Rothberg started 454 as well as Ion Torrent). Ion Torrent's main advantage is price, clocking in at $50-100k for the instrument and $500 a run in consumables. It's also fast, a promised 1 hour per run (but still a day for sample prep). But this isn't single molecule and the throughput is low, clocking in at 100Mb. This is a gazelle: fast, agile, but relatively weak.

But Life Tech have also announced another in-house technology called quantum dot single-molecule sequencing (more here), and covered their bases well.

A more practical consideration is that the term "next-generation sequencing" is dead. It's all getting way too confusing. Despite my expectation that technology should progress linearly and result in steady improvements in each areas, this isn't the case. Right now you can pick and choose from a menu of options. You can have direct sequencing (single molecule), faster sequencing, faster sample prep, higher throughput, longer reads, cheaper, but you can't have it all.

Continuing the animal analogy; SOLiD and HiSeq are now looking like elephants (or rhinos, or hippos) - powerful, but slow and cumbersome.

So, where does that leave us:

	Current Platform		Next Platform
Roche	454	Gazelle (ailing)	IBM Nanopore	Unknown	5-7 years
Life Tech	SOLiD	Elephant	Ion Torrent	Gazelle	Next year
Illumina	HiSeq 2000	Elephant	Oxford Nanopore	Unknown	Unknown - 2-3 years?

Before you prompt me: where does the PacBio RS sit? Well, in view of its size - perhaps this should be the blue whale. And Helicos, alas is perhaps the dodo.

Older Newer

Loman Labs

A few more MiSeq nuggets

MiSeq: Now that's what I'm talking about ...

Ion Torrent: Hype cycle status "disillusionment"

Genome sequencing platforms compared for bacterial de novo assemblies

Gazelles, elephants, blue whales and dodos: next-generation sequencing at the zoo