Getting Windows 7 running on KVM as a guest OS (Ubuntu LTS 12.04)

There are surprisingly few resources on this on teh intarwebs, so just some notes for my future self and anyone else attempting it. If you are wondering why I want to run a Windows 7 virtual machine - it's because we need a server to run the MiSeq reporter and RTA on, in order to reanalyse runs.

Make sure you belong to libvirt and kvm groups.

This gives you an 8-CPU virtual server with 24Gb of RAM ready to boot the Windows 7 installer DVD.

virsh 'destroy WIN7'
virsh 'undefine WIN7'
virt-install --connect qemu:///system \
--arch=x86_64 \
-n WIN7 \
-r 24000 \
--vcpus 8 \
--vnc \
--vnclisten 0.0.0.0 \
--noautoconsole \
--os-type windows \
--os-variant win7 \
--disk path=/home/nick/windows_partition \
--disk path=/home/nick/virtio-win-0.1-30.iso,device=cdrom,perms=ro \
--cdrom /home/nick/win7.iso \
--boot cdrom,hd \
--prompt

Virtio drivers via Fedora Project.

Update 30/07

To get 8 cores working correctly, you need to add a topology entry to the XML definition using virsh edit:

<cpu>
 <topology sockets='1' cores='4' threads='2'/>
</cpu>


  

Map of high-throughput instruments: What can you do with the data?

I'm going to try and get myself in the habit of more frequent, smaller updates.

A few people have started using the data from Omicsmaps.com, the world-map of high-throughput sequencing instruments that James Hadfield and I run to power their own projects, which we think is great.

For example a service called Findini is scraping the data and using it to help people find sequence providers. They've done a nice job with it.

Art Wuster, a post-doc at the Wellcome Trust Sanger Institute has started a nice blog called Seqonomics and he is regularly using the map data to try and understand the sequencing market, see posts like How is commercial sequencing getting on? and Who are the sequencing superpowers?.

I'm even helping the Genomics Network at the University of Lancaster use the map to help look at the social impact of genomics and sequencing.

So this is all great. But right now James and I have reached a bit of an impasse with Omicsmaps. James and I have had the occasional excited conversation about how the map could be extended and improved, but quite honestly real work means we don't have the time to a whole heap with it. It ticks along quite nicely with your community submissions, but I think the explosion of benchtop instruments means we can't capture as many installations proportionately as we used to, not surprisingly as many new users are not necessarily in touch with our close-knit genomics community centred around Twitter and Seqanswers.

My one thought is that if I can open it up a bit more, perhaps the community will come to my rescue and give it a second lease of life.

I'm happy to put the website code up on Github (well, I will definitely do this but I just haven't got round to it yet) if anyone thinks they might make changes to it.

But a first step in opening up the map is that I have put the data up as a public Google Fusion Table. Not only does this have locations and counts, it's also got snapshots from various timepoints going back to 2010. So hopefully this is a useful resource.

The really cool thing about Google Fusion Tables is that it allows you to do quick little visualisations like the one below really easily.

[iframe src="https://www.google.com/fusiontables/embedviz?viz=GVIZ&t=LINE&containerId=gviz_canvas&isXyPlot=true&q=select+col0%2C+SUM(col8)%2C+SUM(col9)%2C+SUM(col10)%2C+SUM(col11)%2C+SUM(col12)%2C+SUM(col13)%2C+SUM(col14)+from+1tYRJ6qreHion4wWx4bd_TnL7WrmMGai63jKEHPw&qrs=+where+col0+%3E%3D+&qre=+and+col0+%3C%3D+&qe=+group+by+col0+order+by+col0+asc+limit+10&att=true&width=800&height=285" width="800" height="300"]
Figure: Growth of sequencing platforms

Or perhaps see the number of sequencers by country:

[iframe src="https://www.google.com/fusiontables/embedviz?viz=GVIZ&t=PIE&containerId=gviz_canvas&q=select+col2%2C+SUM(col7)+from+1tYRJ6qreHion4wWx4bd_TnL7WrmMGai63jKEHPw+where+col0+%3D+'2012-07-20'&qrs=+and+col2+%3E%3D+&qre=+and+col2+%3C%3D+&qe=+group+by+col2+limit+52&att=true&width=800&height=285" width="800" height="285"]

And it has built-in geolocation support so you can even make little visualisations overlaid on maps.

As always you are only limited by your imagination with datasets like this (sorry, these examples weren't very imaginative, but as I say I'm trying to blog more regularly).

I'm putting the following license on these data which basically let you do what you like with it, obviously a citation would be nice. I will look into getting the map it's own DOI via Figshare or similar.

Creative Commons Licence
This work is licensed under a Creative Commons Attribution 3.0 Unported License.

As always, feedback very welcome.

SBTM12: Sequence-based typing methods for micro-organisms

I'm going to be helping out with teaching the NGS-related aspects of this course which might be interesting to you if you are thinking about using whole-genome sequencing for bacterial typing. Keith Jolley will also be there, talking about his software BigsDB software and his new rMLST method for bacterial typing.

The best news is that the course will be in Lisbon in September so the presence of some sunny weather - whilst not guaranteed - is significantly more likely than in the UK! The course is also about 500m from the beach! Not that this should influence your decision to come, obviously.

Here's the skinny:

ANNOUNCEMENT

SBTM12 - SEQUENCE-BASED TYPING METHODS for MICROORGANISMS

with Keith Jolley, Nick Loman, João Carriço,
Mario Ramirez, Nuno Faria and Teresa Conceição

IMPORTANT DATES for SBTM12
Deadline for applications: September 10th 2012
Notification of acceptance dates:
EARLY: September 3rd 2012
(on special request, see Application)
NORMAL: September 12th 2012
Course date: September 25th - September 28th 2012

Course Description
Overview
Technological advances in DNA sequencing have led to the adoption of
sequence-based typing methods as the standard techniques for bacterial
identification at strain level, specially due to their portability and
reproducibility. The widespread use of remotely accessible databases
offering different typing data and the development of diverse data
analysis techniques show the impact of Bioinformatics in this field,
and the need for understanding how to operate with the databases and
algorithms. Recently, the capability of sequencing whole bacterial
genomes in a few days using Next Generation Sequencing (NGS)
methodologies, opened a new door for the development of more
sophisticated strain identification tools.

Objectives
This training course is directed for the data analysis of sequence
based typing methods, from the raw-data to the identification of a
strain type by several typing methodologies, and the use of analysis
algorithms to create groups of related strains. It will span from
Single Locus Sequence Typing methods, such as spa typing or emm
typing, to the now established MultiLocus Sequence Typing (MLST) and
Multilocus Variable Number of Tandem Repeats Analysis (MLVA)
methodologies. The course will also cover methods that apply to whole
genome sequence data, and show how NGS data can be analyzed and made
sense of, in this context.
The course will be essentially hands-on. Short presentations will be
interweaved by tutored assited exercises.
The participants are expected to gradually gain user independence by
acquiring new analytical skils in using software and online databases
for specific typing methods. Special attention will be given to
methods that can use NGS datasets.

Target Audience
This course is aimed to anyone working in molecular epidemiology, that
wants to develop or consolidate skills in the use and analysis of
sequence-based typing methods. The course will be illustrated with
examples from different bacterial species, but the concepts are
applicable to any species and to other sequence-based typing methods,
not explicitly referred to in the course.

Course Pre-requisites
Basic understanding of molecular biology, namely in microbial typing
methodologies and elementary computer interaction skills are expected.

Futher details, including application instructions available at
http://gtpb.igc.gulbenkian.pt/bicourses/SBTM12

Benchtop Sequencer Comparison paper

In case you haven't seen our recent paper, you can download the paper here (subscription required): http://dx.doi.org/10.1038/nbt.2198

Performance comparison of benchtop high-throughput sequencing platforms 
Nature Biotechnology advance online publication published online 22 April 2012
Nicholas J Loman, Raju V Misra, Timothy J Dallman, Chrystala Constantinidou, Saheer E Gharbia, John Wain & Mark J Pallen

Here is a press release on the blog of our newly launched Institute of Microbiology and Infection: http://imicrobham.blogspot.co.uk/2012/04/performance-comparison-of-benchtop-high.html

And here are some links to the lively coverage in the blogosphere:

http://www.nature.com/nbt/journal/v30/n5/full/nbt.2198.html
http://www.nature.com/news/next-generation-genome-sequencers-compared-1.10497
http://flxlexblog.wordpress.com/2012/04/22/fast-genome-sequencing-of-pathogenic-bacteria-which-benchtop-instrument-to-choose/
http://core-genomics.blogspot.co.uk/2012/04/battle-of-benchtopsmiseq-vs-ion-vs-454.html
http://imicrobham.blogspot.co.uk/2012/04/performance-comparison-of-benchtop-high.html
http://flxlexblog.wordpress.com/2012/05/09/loman-et-al-reflects-the-past-not-the-present-a-rebuttal/
http://massgenomics.org/2012/04/comparison-of-benchtop-sequencers.html
http://www.biotechniques.com/news/Battle-of-the-Benchtop-Sequencers/biotechniques-329861.html
http://www.genomeweb.com/sequencing/life-tech-illumina-ramp-campaigns-battle-benchtop-sequencing-market
The paper is currently the most downloaded on the Nature Biotechnology web site!

All the Tweets from #MMGC 2012

Rapid Next-Generation-Sequencing Conference for Public Health and Clinical Microbiology was held in Münster last week. The title is pretty self-explanatory. It is probably the first meeting since last summer's E. coli outbreak in Germany to allow epidemiologists, microbiologists and genomicists to get together and discuss the impact the rapid release of genome data had on that occasion, and how we might handle future outbreaks.

The scope is similar to the conference I help organise - Applied Bioinformatics & Public Health - this will return in 2013, date to be announced soon.

I unfortunately couldn't attend this meeting, but through Twitter I did get a good flavour of what was discussed - many thanks for those who made the effort to tweet, it's really appreciated.

So for posterity I have captured the tweets here, with kind permission of @TimDallman and @jacarrico:


Most popular tweets
(4) jacarrico: MiSeq testing 2x 400 bp reads. 3.4Gb of data, > 70% Q>30 error rates at 400 cycles 2-4% #mmgc
(4) raqueltobes: OPGEN: A great NGS-technology independent method to evaluate NGS assemblers #mmgc
(1) eduardopareja: #PacBio and Optical Mapping new alternatives to Sanger sequencing for bacterial genome closure #mmgc
(1) TimDallman: Smith: original spec 1.5-2gb 2x150 soon to be 7gb 2x250 reads this summer #mmgc
(1) eduardopareja: Geoff Smith: MRSA Methicillin-resistant Staphylococcus aureus: SNP count alone cannot predic outbreak link #mmgc

7 - 8pm EST
+1m jacarrico: @marina_manrique Hello Marina! I'll try my best I'll use the hashtag #MMGC . to Tweeps:is anyone coming here ?
+56m jacarrico: #mmgc (http://t.co/e7Bih1Y5) starting keynote: Putting the genes into genomics: from MLST to BIGSdb Martin Maiden; U.Oxford, UK
+59m jacarrico: @eduardopareja @timdallman @raqueltobes @era7bioinfo hello there fellow twiiterers ! Shall we use the #mmgc tag?
+5m TimDallman: Keynote - Maiden, "Genes into genomics" #mmgc
+8m jacarrico: Maiden : How many loci: the ultimate question? The answer is 42! ;-). But the problem is always the question ;-)#hhgg #ftw #mmgc
+10m jacarrico: Maiden Timescales are important. What are the questions do we want to ask? diferent questions => diferent answers #mmgc
+12m jacarrico: Maiden : The Planet of the Bacteria (Stephen Jay Gould) Prokaryotes are responsible for the most diversity on the tree of life #mmgc
+15m TimDallman: Maiden - defining core and accessory genome - moving target with bacterial diversity - what is meaningful? #mmgc
+16m jacarrico: Maiden You can extract Barrels of SNPs, butnyou have to figure out the meaning of those SNPs #mmgc
+18m jacarrico: Maiden If there was a completely clonal population then we would be here since anyone could make meaningfull trees ver easily! #mmgc
+19m jacarrico: Maiden: bacteria are much more "sexy" than we thought. Maynard Smith Nature 1981 #mmgc
+20m jacarrico: Maiden : 7 lanes x 96 isolates can be done in a Illumina run #mmgc
+23m TimDallman: Maiden - BigsDb MLST for the whole genome #mmgc
+23m jacarrico: Maiden For core genes regions Illumina has great qual and they only got 12 errors from a large collection of assembled/draft genomes #mmgc
+24m jacarrico: Maiden: the 12 errors were all from manual editing ! #mmgc
+25m TimDallman: Maiden - BigsDB, sequence bin of assembled NGS BLAST against reference alleles which can then be defined into a scheme #mmgc
+26m jacarrico: Maiden : BIGSdb can create MLST-like schemas for any number of locus from contig data. #mmgc
+27m TimDallman: Maiden: these alleles can be annotated by the community help link to function #mmgc
+28m jacarrico: Maiden: with BIGSdb, you can put the genes back in to genomic because you van create specific schemas and anotate then #mmgc
+29m jacarrico: Maiden: associating phenotype with genotype in the Neisseria #mmgc
+31m TimDallman: Maiden - defining species in Neisseria, 16s rRNA doesn't work very well #mmgc
+33m jacarrico: Maiden: Ribosomal MLST - 53 genes high resolution, present in all domains, encode complex protein structure - coevolution of genes #mmgc
+34m TimDallman: Maiden - whole genome means we have whole ribosomes - we can use 53 genes for universal bacterial rMLST
#mmgc
+35m jacarrico: Maiden rMLST - very diferent diversity between the 53 genes. Also gives more detailed structure than standard 16s rna #mmgc
+36m TimDallman: Maiden - rMLST gives good species resolution as well as genus #mmgc
+38m jacarrico: Maiden : rplG gene reproduces the species structure of neisseria . Great to field diagnostics #mmgc
+41m jacarrico: Maiden :The number of mlst allele plots have assimptots and the ST numbers keep increasing => most variation is due to recombination #mmgc
+44m TimDallman: Maiden: showing how MLST clonal complex matters for meningococcal infection likelihood from carriage #mmgc
+46m TimDallman: Maiden: rMLST and MLST compliment each other shows relationships between clonal complexes #mmgc
+47m jacarrico: Maiden: in meningococcus they are not changing the capsule due to vaccine pressure. no ideia why!v #mmgc
+57m jacarrico: Correction : Maiden: rplF (not rplG) reproduces the species structure of neisseria #mmgc
+3m aunderwo: @TimDallman @jacarrico Keep up the good work with the #mmgc tweets. Enjoying the coverage from here in the UK!
+7m TimDallman: Mellmann: German EHEC outbreak #mmgc
+8m jacarrico: German EHEC 2011 outbreak NGS analysis Alexander Mellmann; University Mnster, Germany #mmgc
+9m eduardopareja: #mmgc Alexander Mellmann University Mnster German #EHEC 2011 Outbreak #NGS talk startring now...
+12m jacarrico: Mellman 10% of patients develop HUS #EHEC #mmgc
+13m TimDallman: Mellman: 3842 cases 855 HUS in sprout implicated outbreak in 2001 Germany #mmgc
+14m TimDallman: That should be 2011 #mmgc
+17m jacarrico: Mellman #EHEC subtyping SLST, MLST,... #mmgc
+17m jacarrico: Mellman http://t.co/9p00UOUH #EHEC #mmgc
+19m TimDallman: Mellmann: O104 stereotype outbreak strain was uncommon for EHEC though not novel #mmgc
+25m jacarrico: Mellmann: Ecoli O104:H4 had a aggregation pattern similar to EAEC #mmgc
+27m jacarrico: Mellmann: MST used 1144 core genome genes Mellmann et al Plos One 2011 #mmgc
+29m TimDallman: Mellmann: WGS showed "mosaic" structure of EHEC and EAEC virulence factors #mmgc
+33m TimDallman: Mellmann: can transduce K12 with stx2 phage via outbreak strain but cannot transduce 55989 EAEC by outbreak strain #mmgc
+34m TimDallman: Mellmann: Suggests hypothetical Stx2 O104 is progenitor of 55989. #mmgc
+36m eduardopareja: #mmgc Alexander Mellmann showing interesting results of the outbreak from a PLoS ONE paper about E coli O104:H4 http://t.co/dALL4TYP
+38m TimDallman: Mellmann: Found that there has been carriage of this strain in patients for several months post outbreak. Loss of virulence facotors? #mmgc
+47m jacarrico: Haiti Cholera 2010 outbreak NGS analysis
Rene S. Hendriksen; Technical University Denmark, Lyngby, Denmark #mmgc
+51m TimDallman: Hendriksen: outbreak of cholera in Nepal 6 months before Haiti quake #mmgc
+53m TimDallman: Hendriksen: half a million cases by end of November in Haiti. Nepalese troops implicated - true or false? #mmgc
+55m jacarrico: Hendriksen: Chin et al NEJM 2011, 5 strains sequenced by PacBio.. Used other sequenced historical strains #mmgc
+56m eduardopareja: #PacBio system was used to sequence 5 strains , publishe in New England..http://www.nejm.org/doi/pdf/10.1056/NEJMoa1012928 #mmgc
+59m jacarrico: Hendriksen: Reimer et al EID 2011. 23 whole genomes, also matched PFGE profiles #mmgc
+1m TimDallman: Hendriksen: initial papers suggest strain from Bangladesh closest match by WGS #mmgc
+1m jacarrico: Reimer et al 0-2 SNP difference in core genimes among the 9 Haitian outbreak isolates #mmgc
+3m TimDallman: Hendriksen: WGS sequences of PFGE matched strain found strains with <10 SNPS to Haitian strain #mmgc
+6m jacarrico: Hendriksen et al MBio 2011. Interesting strain collection with PFGE, MLVA and MLST characterization #mmgc
+8m TimDallman: Hendriksen: Strains from Nepal sequenced, Haiti strains cluster within them. One strain differs by only one SNP #mmgc
+9m jacarrico: Hendriksen: Very strong evidence that cholera was trasmitted from Nepal to Haiti from the paper #mmgc
+10m jacarrico: Hendriksen: PCR assay based on canonical SNPs is being developed #mmgc
+13m jacarrico: Hendriksen: need to build global WGST databases to increase the power to identify outbreaks worlwide and track sources #mmgc
+14m eduardopareja: One conclusion: "There is a need to build a global WGST database to increase the power to identify global outbreaks in real time" #mmgc
+16m jacarrico: Peter Gerner-Smith comment: Nepalese soldiers weren't sampled due to political reasons #mmgc
+19m jacarrico: Dag Harmsen comment: traditional typing methods could be used to exclude but , much harder to prove similarity #mmgc
+19m TimDallman: Gharbia - Assembling the biome of emerging pathogens #mmgc
+20m jacarrico: Assembling the biome of emerging pathogenes, a public health perspective
Saheer Gharbia, HPA, Colindale, United Kingdom #mmgc
+23m jacarrico: Gharbia C dif in persistant outbreaks :MDR hypervirulent, high toxin producers #mmgc
+25m eduardopareja: Saheer Gharbia speaking about C. difficile - HPA - Clostridium difficile http://t.co/qckifT7d #mmgc
+33m jacarrico: Gharbia Need to integrate new tools for these new tech to get better insights #mmgc
+35m eduardopareja: Saheer Gharbia "We use proteomics to integrate genotype (WGS) to phenotype (gene expression)" #mmgc
+44m eduardopareja: Saheer Gharbi: "Using proteomics to select the common expressed proteins shared among pathotypes" #mmgc
+46m eduardopareja: Saheer Gharbi: Metagenomics: geno-proteome derived signatures to predict new outbreaks #mmgc
+48m TimDallman: Tom Connor from the Sanger about the PacBio - Filling the gaps #mmgc
+48m eduardopareja: Thomas Connor: Filling the gaps: using the #PacBio RS in microbial whole genome sequencing projects #mmgc
+48m jacarrico: Filling in the gaps: using the PacBio RS in microbial whole genome sequencing projects Thomas Connor Wellcome Trust Sanger Institute #mmgc
+52m jacarrico: Connor #PacBio SMRT cell v2 read length max: 15176bp compared to max=4527bp of v1 #mmgc
+52m TimDallman: Connor: Version 2 SMRT cell read lengths up to 15k bp up from 5k for v1 #mmgc
+52m eduardopareja: Thomas Connor: Version 2 of SMRT cells from #PacBio has a maximun read lenght of 15176 bp !! #mmgc
+55m TimDallman: Connor: PacBio needs far greater coverage then other WGS methods to recover SNPs #mmgc
+56m jacarrico: Connor Pacbio has about 30% of errors in snp calls in their test (190x on Pacbio) #mmgc
+57m jacarrico: Connor Some genomes aren't amenable to sequencing using short read technologies #mmgc
+58m jacarrico: Connor Pacbio useful for rapid generating reference sequences, for outbreak situations when isolates are very similar #mmgc
+59m TimDallman: Connor: highlighting need for good reference when investigating outbreak I.e. close comparator to best call Polymorphisms. #mmgc
+59m eduardopareja: Thomas Connor: #PacBio could be useful to generate new reference genomes for analyzing the isolates of an outbreak #mmgc
+1m jacarrico: Connor PacBio strobing runs canndamage the DNA and create errors #mmgc
+3m TimDallman: Connor: out of box CCS and strobing only gives as good de novo as illumina #mmgc
+3m jacarrico: Connor PacBio standard + CCS + strobing was still worst than Illumina assembly in number of contigs #mmgc
+4m TimDallman: If you include illumina to hybrid assembly much better de novo #mmgc
+5m jacarrico: Connor hybrid illumina and pacbio assemblies get good results as illumina reads correct Pacbio larger reads #mmgc
+6m TimDallman: Connor: mentions image and iCorn for hybrid assemblies #mmgc
+8m jacarrico: Connor: Pacbio advantages: faster and less labor intensive, great scaffolding! #mmgc
+8m jacarrico: Connor Every SNP is sacred! (paraphrasing Monty Python #mmgc
+10m eduardopareja: Thomas Connor: #PacBio is faster and less labour intensive than Sanger and useful for scaffolding when many repetitive sequences #mmgc
+10m jacarrico: Connor: If 2 strains have just a couple of snps check for phages and Mobile Genetic Elements #mmgc
+10m TimDallman: Connor: for outbreaks long read lengths allow Polymorphisms in repetitive and phage to be detected. May be needed when diversity low #mmgc
+12m eduardopareja: #PacBio and Optical Mapping new alternatives to Sanger sequencing for bacterial genome closure #mmgc
+18m eduardopareja: Geoff Smith from Illumina: The MySeq System and applications to human disease #mmgc
+18m TimDallman: Next Geoff Smith from Illumina talking about MiSeq #mmgc
+18m jacarrico: The MiSeq sytem and applications to human disease
Geoff Smith; Illiumina, Chesterford Research Park, United Kingdom #mmgc
+21m jacarrico: Smith MiSeq : On board clustering, fast and On-board analysis #mmgc
+23m jacarrico: Smith MiSeq RFID based reagent and flow cell tracker #mmgc
+25m TimDallman: Smith: original spec 1.5-2gb 2x150 soon to be 7gb 2x250 reads this summer #mmgc
+25m jacarrico: miSeq 1.5-2 G yield 2x150 read length 75% bases Q>30 . Free upgrade next year for better stats #mmgc
+26m TimDallman: Smith: 80% reads >Q30 #mmgc
+27m jacarrico: MiSeq testing 2x 400 bp reads. 3.4Gb of data, > 70% Q>30 error rates at 400 cycles 2-4% #mmgc
+29m TimDallman: Smith: Some public health examples, detection, identification, antimicrobial susceptibility and epidemiology #mmgc
+29m jacarrico: Smith: use in pub health: Detection, Identification, Antibiotic susceptibiliy testing , epidemiology #mmgc
+29m eduardopareja: Geoff Smith: MiSeq extending read lenght to 400 x 2 bp ( in research) #mmgc
+32m jacarrico: Smith : TB example from Niemman Plos ONE paper where a SNP predicted resistance #mmgc
+44m eduardopareja: Geoff Smith: MRSA Methicillin-resistant Staphylococcus aureus: SNP count alone cannot predic outbreak link #mmgc
+48m eduardopareja: Geoff Smith: MRSA Methicillin-resistant Staphylococcus aureus: Sequencing possitive clones can detect a transmission event #mmgc
+28m TimDallman: John Wain - MiSeq and Typhoid #mmgc
+33m TimDallman: Wain: Phenotypic data still needed to understand genotype, unknown resistance mechanisms in Paratyphi A #mmgc
+35m TimDallman: Wain: What SNPs to use for epidemiology and transmission how to get rid of homoplasy etc #mmgc
+37m TimDallman: Wain: After 1993 amount of diversity in S.Typhi decreased rapidly - how does this affect epidemiology studies? #mmgc
+38m eduardopareja: John Wain HPA: Selectable mutations can impact on the clustering of isolates of an outbreak #mmgc
+44m TimDallman: Wain: Case and Carrier 0 SNPs 3 SNPs between unrelated strain. Statistically significant? #mmgc
+51m jacarrico: Ion Torrent Semiconductor sequencing - an overview and outlook
Simone Guenther; Life Technologies; Darmstadt, Germany #mmgc
+54m jacarrico: Guenther : Ion Proton - real incarnation of a sequencing centre on a box #mmgc
+5m jacarrico: Guenther Ion new product AmpliSeq for target enrichment #mmgc
+7m jacarrico: Guenther 2x100bp PE protocol for Ion torrent will be (is?) avalable #mmgc
+10m jacarrico: Guenther Long mate pair sequencing is also working in Ion Torrent . Good for scaffolding in de novo applications #mmgc
+10m eduardopareja: Simone Guenther Ion Torrent: 314 chip , paired rate > 90% !! #mmgc
+11m jacarrico: Guenther AmpliSeq technology can start from only 10ng DNA. #mmgc
+14m jacarrico: Guenther: Ion torrent software has a core part and plugins that can be user-contributed. Also moving to the Cloud #mmgc
+14m eduardopareja: Simone Guenther Ion Torrent: For RNA-seq maintains RNA orientation #mmgc
+15m jacarrico: Guenther: Ion Proton can be ordered right now. Arrives in the Summer or December ( noise in the com channel...) #mmgc
+16m eduardopareja: Simone Guenther Ion Torrent: Open protocols , datasets and Source Code: community http://t.co/6l1WUfcT #mmgc
+17m jacarrico: Whole genome mapping for de-novo assembly and typing, Richard Moore; OpGen, Gaithersburg, Maryland, USA #mmgc
+18m jacarrico: Moore Whole Genome Mapping = Optical Mapping ..they're changing names #mmgc
+19m eduardopareja: Richard Moore OpGen: Whole Genome Mapping , Locates and measures distance between restriction sites in a genome #mmgc
+24m jacarrico: Moore: ~ 500 fragments in Salmonella genome #mmgc
+27m jacarrico: Moore WGmapsCan be used for Genome Comparison #mmgc
+29m eduardopareja: Richard Moore OpGen: Whole Genome Maps distinguish features not detectable by PFGE #mmgc
+33m jacarrico: Moore, using contigs and the MapSolver software you could simulate digestion and assemble a Sequence when doing de novo seq #mmgc
+41m eduardopareja: Arndt von Haeseler, Vienna University: Comparison of NGS platform error rates #mmgc
+28m eduardopareja: Alexander Goesmann, Bielefeld University Bioinformatics software tools for analyzing and comparing microbial genomes #mmgc
+53m eduardopareja: Dirk Hper , Friedrich Loeffler Institute , Bioinformatics for rapid single read based analysis of diagnostic metagenome datasets #mmgc
+4m eduardopareja: Dirk Hper , Friedrich Loeffler Institute , finding a needle in a haystack .... #mmgc
+15m eduardopareja: Dirk Hper , Friedrich Loeffler Institute , 7 reads from Orthobunyavirus found in total 27420 reads led to the diagnostic #mmgc
+16m eduardopareja: Dirk Hper , 14 sequence reads representing an Arenavirus in 103632 total reads pointed to the causative agent #mmgc
+25m raqueltobes: OPGEN: A great NGS-technology independent method to evaluate NGS assemblers #mmgc
+37m jacarrico: Microbe hunting: ultra deep sequencing to discover microbes in human tissue, Thomas Briese; Columbia University, New York, USA #mmgc
+1m eduardopareja: Thomas Briese, NGS for diagnosis: no previous knowledge of target required #mmgc
+21m jacarrico: Design of pathogen-based diagnostic assays for the German EHEC and Dutch Klebsiella OXA-48 outbreaks 2011.C.A. Cummings Life Tech #mmgc
+22m jacarrico: Cummings Bioinformatics plays a large role in the design of new diagnostics #mmgc
+24m eduardopareja: Craig Cummings, Life Technol, Design of pathogen-based diagnostic assay for the German EHEC and Dutch Klebsiella OXA-48 outbreaks 2011 #mmgc
+24m jacarrico: Cummings: False Positives are costly to industry (Ecoli O157 and meat industry example) #mmgc
+25m jacarrico: Cummings: Signature identification by genome alignment by MUMmer #mmgc
+26m jacarrico: Cummings: Draft genome assemblies can also be used to signature identification #mmgc
+27m jacarrico: Cummings : signature sequences are the diference between the intersection and union between all the test sequences #mmgc
+28m eduardopareja: Craig Cummings, Strain - specific assays often require sequencing of more strains #mmgc
+30m eduardopareja: Craig Cummings, Nucmer for strain - specific signature identification #mmgc
+34m jacarrico: Cummings: K.pneumoniae dutch outbreak: 21 persons died #mmgc
+36m jacarrico: Cummings: they used MIRA and CAP3 to do the PGM data assemblies #mmgc
+39m jacarrico: Cummings : Signatures found by comparing 12 Klebsiella genomes and 7 other Enterobacteriacea genomes #mmgc
+41m jacarrico: Cummings: 2009 montevideo Salmonella outbreak 1.28M pounds (weight) of Salami was recalled. 47 strains sequenced by SOLID #mmgc
+45m jacarrico: Cummings: Good results for Ecoli 50x coverage with Long mate pair and assembiles with MIRA. Newbler actually gave better results. #mmgc
+50m jacarrico: Vaccinology in the era of NGS
Michele Barocchi; Novartis Vaccines and Diagnostics, Siena, Italy #mmgc
+51m TimDallman: Barocchi - NGS vaccinology (Novartis) #mmgc
+52m eduardopareja: Michele Barocchi, Novartis Vaccines and Diagnostics: Genomic reverse vaccinology #mmgc
+53m jacarrico: Barocchi: Pasteur idea: Isolate, Inactivate, Inject => vaccines #mmgc
+54m jacarrico: Barocchi: From Jenner to Pasteur 100 years passed and 2 vaccines were out... #mmgc
+56m TimDallman: Barocchi - vaccine success rate correlates to immune response. Antibodies easy e.g. Diphtheria, T-cell hard #mmgc
+57m jacarrico: Barocchi: Pneumococcus and Meningoccus have a Antigen variability period on the 10 years range. Influenza 1 year. HIV 1 day. #mmgc
+57m TimDallman: Barocchi - antigen variability over time plays important role in vaccine success booster etc #mmgc
+59m jacarrico: Barocchi: For Reverse Vaccinology, the Bioinformatics step is where targets are identified, targets are then expresses and tested #mmgc
+1m TimDallman: Neisseria . 2158 ORFs - 570 surface- 350 express in E.coli - 91 confirmation of surface exposure in mouse vaccine discovery pipeline #mmgc
+2m TimDallman: Barocchi - 3 final Neisseria targets. 12 years to get to this point. #mmgc
+6m TimDallman: Barocchi - designing a vaccine one one genome difficult due to pan genome and divergence #mmgc
+7m TimDallman: Barocchi - how many genomes are needed? Streptococcus has 18% genome not shared by all members. #mmgc
+10m jacarrico: Barocchi: for GBS 3 new genes are added to the genome for each genome after 12 and core genome seems about 1800 genes. #mmgc
+11m TimDallman: Barocchi - difficult to use core genomes of a species as often not immunogenic, surface exposed #mmgc
+13m jacarrico: Barocchi : S pneumo core genome of about 1300 genes. After 100 genomes one gene in average was added to pan genome #mmgc
+14m jacarrico: Barocchi: Population Genetics is specially important for Vaccine Development #mmgc
+14m TimDallman: Barocchi - if use non-core genes for antigen have to understand pathogens population structure #mmgc
+14m eduardopareja: Michele Barocchi, Some new vaccines need to be based on non core antigens #mmgc
+15m jacarrico: Barocchi: HGT has a very important role and implications for vaccine design and need to be understood #mmgc
+18m jacarrico: Evaluating vaccine preventability by bacterial whole genome sequencing Ulrich Vogel; University Wrzburg, Germany #mmgc
+20m TimDallman: Vogel - WGS evaluating vaccine prevent ability (Neisseria meningitis (again)) #mmgc
+23m jacarrico: #mmgc room .Ulrich Vogel speaking http://t.co/NZhK1q9b
+30m TimDallman: Vogel suggests NGS will just be a way to extract current typing rather then use whole genome. MLST, porA etc #mmgc
+35m jacarrico: Vogel: they used BIGSdb for the analysis of the MIRA assembled sequences. They were surprised how simple it was #mmgc
+37m TimDallman: Vogel - gold standard of surrogate of protection is serum bactericidal assay #mmgc
+38m jacarrico: Vogel: Meningococcal antigen typing system (MATS) pnas paper #mmgc
+38m TimDallman: Vogel - however SEM not high throughput new surrogate ELISA assay Donnelly et al #mmgc
+42m jacarrico: Vogel: Clonal Complex can predict the MATS with different Accuracies but lots of strains cannot be assigned to a CC. #mmgc
+45m jacarrico: Vogel: Phenotypic tests continue to be necessary due to variable expression levels, gene regulation and new variants #mmgc
+18m jacarrico: Nubel: MRSA ST22 genome international project: 193 genome sequences #mmgc
+19m jacarrico: Nubel: MRSA ST22 is a measurable evolving population #mmgc
+21m jacarrico: Nubel: MRSA Sts 239, 225 ,22 ,5 ,121 have 2-3 *10^6 mut/nucleotide per year #mmgc
+26m jacarrico: Nubel: SNPs found agains a reference genome but also did de novo for the detection of Mobile Genetic Elements #mmgc
+28m jacarrico: Nubel: evaluating the MRSA 225 spread in the Denmark . Tracking single patients #mmgc
+29m eduardopareja: Ulrich Nbel, comparing phenotypes and genotypes: resistance profiles can be predicted by DNA resistance analysis alone #mmgc
+31m jacarrico: Nubel: 6 patients 43 isolates 80something snps (?) 5 diferent STs #mmgc
+34m jacarrico: Nubel: the patient follow-up had samples covering 2-3 years (maybe more) and they had even treatment data #mmgc
+35m jacarrico: Nubel: the snp analysis could infer the direction of the transmission #mmgc
+40m CDK536: Nbel states,there's one clone per patient,but this is evolving as fast as on genus level #mmgc
+44m TimDallman: Off to catch my flight back to London see you later #mmgc
+44m jacarrico: Rossello-Mora: Species is a pragmatic unit, (it is )artificial, an artifact of the mind #mmgc
+45m eduardopareja: Ramon Rossello-Mora, Institut Mediterrani d Estudis avanc Esporles Spain: The use of NGS on the taxonomyof prokaryotes #mmgc
+47m eduardopareja: Ramon Rossello-Mora, Taxonomy is an artifact of the mind #mmgc
+48m jacarrico: Rossello-Mora: Systematics doesn't need a scientific background. Taxonomy: general purpose classification, need db #mmgc
+53m jacarrico: Rossello-Mora: Avoid species descriptions based on a single strain. Need to have standardized methods. Strains need to be deposited #mmgc
+54m eduardopareja: Ramon Rossello-Mora, A taxonomical unit requires phylogenetic , genomic and phenotypic coherence #mmgc
+5m jacarrico: Rosselo-Mora: Jspecies software: a biologist oriented software for doing BLAST comparisons and classifying #mmgc
+6m eduardopareja: Ramon Rossello-Mora,DDH is not working for analyzing genomic coherence. It would be substituted by in silico sequence comparison #mmgc
+36m eduardopareja: Dag Harmsen , University Mnster Germany: Real-time Genomic surveillance of infectious disease #mmgc
+37m eduardopareja: Dag Harmsen , In the sequencing era it is needed an error - tolerant clone definition #mmgc
+44m CDK536: Maiden: Naming has to be stable and backwords compatible #mmgc
+3m CDK536: Stefan Niemann introducing the EU Patho-NGen-Trace Project- wanting to calibrate and validate ngs data for pathogen typing esp on tb #mmgc
+6m eduardopareja: Joao Carrio, Facultade de Medicina de Lisboa, Portugal : Microbial Typing Ontology #mmgc
+8m CDK536: jacarrico: future wgs databases have to be connected to existing typing databases #mmgc
+9m CDK536: jacarrico: we need a babel fish #mmgc
+10m CDK536: jacarrico: babel fish available on http://t.co/1fHAmB5j #mmgc
+11m rjlwillems: Joo Andr Carrio TyPon: prototype ontology for squence based methods #mmgc
+11m eduardopareja: Joao Carrio, TyPon: an ontology to connect the whole genome sequences network with the results of other typing methods #mmgc
+14m rjlwillems: Joo Andr Carrio TyPon provides a common language to facilitate communication between different typing databases #mmgc
+26m CDK536: Aarestrup: Genetic Epidemiology has to get user friendly though everyone can use it #mmgc
+29m CDK536: Aarestrup: We need fast automatic one click solutions for medical stuff #mmgc
+39m jacarrico: @CDK536 thanks! It is still a work in progress but already translates a bit ;-) We are preparing a proof of concept asap #mmgc
+40m CDK536: Aarestrup is voting for a globel database everyone is taking part and stores data as raw as possible #mmgc
+46m eduardopareja: Marc Struelens, ECDC Stockholm: European perspective of NGS for public health #mmgc
+55m lexnederbragt: Pretty cool: my blog post http://t.co/8uJSkrFi cited in a conference presentation, + I find out through somebody tweeting @jacarrico #mmgc
+58m jacarrico: Marc Struelens (eCDC): Tessy 3.0 in 2012. Bionumerics for data analysis #mmgc
+1m rjlwillems: ECDC starts an European pilot project on Salmonella typing using PFGE! Difficult to understand after two days talks on NG-seq? #mmgc
+2m CDK536: @jacarrico Looking forward to see this proof of concept, wondering how you will rule out human errors in false naming and so on #mmgc
+11m eduardopareja: PeterGerner-Smidt, CDC , Atlanta , Gerogia: The sunset of culture techniques and the dawning of genomics for the diagnosis... #mmgc
+13m eduardopareja: PeterGerner-Smidt, Culture / Independent tests for diagnosis of infectious #mmgc
+14m jacarrico: Gerner Smith: Public Healths is somebody else's private health (missed the original citation) #mmgc
+16m aunderwo: @jacarrico: Gerner Smith: public health is somebody else's private health #mmgc
+17m jacarrico: Gerner-Smith With direct sequencing we can save 3-37 days to the current PulseNet protocols #mmgc
+25m eduardopareja: PeterGerner-Smidt, Virulence factors from different pathogens can interact and modify virulence #mmgc
+26m jacarrico: Gerner-Smith: will metagenomics the future for rapid diagnostics? He thinks thing will evolve for that #mmgc
+38m pathogenomenick: @jacarrico @eduardopareja @TimDallman shall I archive #mmgc tweets on the blog? they are useful (wish I was there but hands are full!)
+55m sciencewr: What have We Got in Common with a Gorilla? http://t.co/oTu2uVIW #Science #Evolution #mmgc