Setting up the CLIMB website

02 Dec 2014

If you have Docker installed, it is now trivially easy to set up a working Wordpress.org installation.

For example, this Github repository will set up Wordpress using MySQL and nginx:

https://github.com/eugeneware/docker-wordpress-nginx

Forking this repository permits you to make custom configurations to the setup.

git clone https://github.com/eugeneware/docker-wordpress-nginx.git cd docker-wordpress-nginx sudo docker build -t=”docker-climb.ac.uk” .

Then create the instance:

sudo docker run -p 192.168.1.5:9991:80 –name docker-climb.ac.uk-test -d docker-climb.ac.uk

Not much more to it than this.

It should be possible now to:

trivially make copies of the entire website, e.g. to make the live version
backup the website (content, configuration)

However, I assume this Docker instance is now fairly locked into whatever versions of MySQL and nginx etc. were used initially. To upgrade everything will require making a new Dockerfile which has all the files and changes associated with it?

Making a copy of an existing instance:

sudo docker commit -m “climb-test go live” -a “Nick Loman” 7ade97b25bc9 nick/docker-climb.ac.uk:live

sudo docker run -p 192.168.1.5:9992:80 –name docker-climb.ac.uk-live -i nick/docker-climb.ac.uk:live -d

COST ES1103 Workshop: Shotgun Metagenomics Analysis: From Assembly to Function

30 Oct 2014

University of Birmingham, 10th - 12th December

Organisers:

Dr Nick Loman (University of Birmingham)
Dr Chris Quince (University of Warwick)

Details:

This workshop will focus on the emerging technology of shotgun metagenomics techniques, with a focus on bioinformatics steps including de novo assembly, contig clustering/binning and functional analysis of the resulting dataset. The challenges of very large genome assemblies will be discussed, with reference to normalization, partitioning and use of very large memory servers. Assembly technologies considered will include Ray, Meta-IDBA and MetaVelvet. The use of CONCOCT, a novel algorithm that clusters genomes based on coverage and composition will be demonstrated. Challenges of functional and taxonomic assignments of resulting genome bins will be discussed, with reference to KEGG, PROKKA etc.

This course would suit biologists with familiarity with bioinformatics software and the UNIX command line. Preference will be given to those users who have already generated large shotgun metagenomics datasets. There will be opportunities to analyse real participant data on provided large-memory virtual machines.

Fifteen funded places providing a stipend of £500 for accommodation and travel are available to researchers and students working in an EU member state through the COST project ES1103 (http://www.cost.eu/domains_actions/essem/Actions/ES1103). Application deadline: 1st November 2014

Non-COST funded applicants are also welcome, although these participants will have to pay their own way. Application deadline for these applicants is: 7th November 2014

Applications to Isabel Dodd, I.Dodd@warwick.ac.uk please specify your name, department, supervisor, bioinformatics experience, the type of data you are interested in analysing and details of any datasets you have already generated.

Should I join the MinION Access Programme?

21 Oct 2014

Oxford Nanopore have recently opened phase 2 of the MinION access programme (MAP) and I have been asked a few times by email by folk wondering whether they should sign up, so here is a little blog post that hopefully will be helpful when making your decision.

In case you are wondering what MAP is: it gives you the chance to try out the Oxford Nanopore MinION USB sequencer for a very small initial outlay, specifically $1000 per “MAP package” (one MinION sequencer and some number of flow cell reagents and sample preparation kits). The $1000 is refundable, although the delivery costs are not.

Joining was a no-brainer for us. We wanted to be one of the first to try the MinION out, we did not care if we spent a lot of time on it, some of which might be wasted, and we had the right resource in place to run it. The MAP also came along at just the right time for us, in the summer holidays. Not having children of school age, summer holidays are the sweet spot for starting an intense project - we are all around, but the students and many of the academic staff are not. Party time.

For other labs it may not be such a clear equation.

The first thing to note is that, in common with other early access programmes, you are helping the company debug its hardware, software and reagents. In other words, if everything worked perfectly, they would be selling you a product, not virtually giving it away.

That’s not to say you won’t generate useful data for your science, but you should certainly not expect it out the box.

What kind of person are you? If you like things to be nice and stable and sorted out, an early access programme may not be for you. In the last six months we have had three chemistry updates (R6, R7 and R7.3), three major library construction protocol updates (SQK-MAP-001 to 003) and countless software updates.

If you like to obsessively practice things until you an expert, this may be a frustration.

What is your level of bioinformatics savvy? Right now there is very little software provided for handling nanopore data, which is characterised by a high error rate (improving, but still not much better than 85% read accuracy on average). The community is grasping its way to solutions for certain tasks (e.g. LAST for alignment to a reference) and SPAdes for scaffolding and repeat resolution along with Illumina data, but right now there is no definitive variant calling pipeline, there is no nanopore-only assembly pipeline and there is no de novo amplicon correction pipeline.

What do you want to do with your data?

If you are familiar with PacBio, you might be tempted to buy into the MAP as a way of getting easier access to a long-read instrument, perhaps for bacterial genome assembly. You can use it really quite successfully to scaffold genomes, currently in conjunction with Illumina data. Right now there is not currently a nanopore-only de novo assembler like HGAP. Hopefully someone builds something soon.

You might be interested in using it for isoform detection from RNA-Seq data, this could be a good usage that plays to the long reads generated by the system that does not rely on single-read accuracy.

If you want a drop-in replacement for a MiSeq or a HiSeq this is definitely not for you. If you want to sequence a whole human genome, think again for now.

So who should join the MAP? right now I think if you want to mess around and be one of the first to touch a completely new sequencing paradigm, to start thinking of experiments that would benefit from long reads and real-time sequencing, then it might be for you. Microbial sequencing would seem to be a sweet spot.

I think it works best for small teams. In our group we have one person (Josh) who can literally do everything, from the sample preparation all the way to the bioinformatics analysis. I would like to think even I could build a library if I needed to, although this might not be desirable for others in the lab due to my general lumbering ineptitude with a pipette. But I can do everything after that point including loading the instrument.

Or to put it another way, if you were to raise the money to buy a PacBio or a HiSeq, you’d probably want some staff to go with it. The barrier to entry for nanopore is significantly (ridiculously!) lower in terms of financial outlay, but you still need to be able to resource producing libraries and running the instrument.

If you would take pleasure in messing around, potentially generating some exciting data, and can handle a bit of frustration, I would say go right ahead and happy MAPping!

Advert: WG2 hackathon: Extracting strain level variation from shotgun metagenome data

06 Oct 2014

Location: Cambridge at the Isaac Newton Institute November 7th -11th 2014

Organisers: Dr Christopher Quince - University of Warwick (c.quince@warwick.ac.uk) and Dr Nick Loman - University of Birmingham (n.j.loman@bham.ac.uk)

Special Guest: Dr Jared Simpson - Ontario Institute for Cancer Research, co-author of ABYSS and the SGA assembler.

Objectives: The objective of the workshop is to build on the success of the earlier COST ES1103 funded hackathon in Lisbon that developed the CONCOCT algorithm for contig clustering (http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.3103.html). CONCOCT uses co-occurrence information to cluster contigs into genome bins in an unsupervised fashion. In this follow-up hackathon we will explore three further avenues of research:

Extension to strain-level variation: CONCOCT is very successful at extracting species level bins but it does not fully resolve individual strains. We will develop algorithms to extract strains possibly by incorporating co-occurrence information directly into assemblers or through machine-learning based decomposition techniques.
Long-reads: We will explore the role of long-read data in strain-aware metagenomics assembly by analysis of data produced by new technologies such as Oxford Nanopore, Illumina Synthetic Long Reads (Moleculo) and Pacific Biosciences.
Integration into metagenomics analysis pipeline: CONCOCT could form the first step in an unsupervised contig-based metagenome analysis pipeline. We will integrate CONCOCT into a complete pipeline for phylogeny, annotation, metabolic reconstruction and visualisation of metagenome bins.

This is a joint programme with the MRC funded Cloud Infrastructure for Microbial Bioinformatics (CLIMB) project and computational resources from the CLIMB computing infrastructure will be available to participants, including access to super high memory virtual machines (1.5-3TB).

Participants: This will be a computationally intensive workshop. We are therefore only requesting participants with bioinformatics software development expertise and/or knowledge of statistics and machine learning. Principally we will be coding in Python, Javascript and C or C++. Later in the year (Birmingham - December 10-12th) a workshop for training in metagenome analysis will be run - this will be suitable for biologists.

To participate please e-mail Chris Quince (c.quince@warwick.ac.uk) before the 13th of October from those that express an interest around eight COST funded individuals will be selected.

Please note a training element in these areas is being organised in December, details will be sent around in due course.

Where can I get Oxford Nanopore MinION(tm) data from?

01 Oct 2014

We have released Oxford Nanopore MinION(tm) data for E. coli K-12 MG1655 with two chemistries (R7 and R7.3).

Preprint describing them, see Fig 2 for a guide to read types and associated error rates: http://biorxiv.org/content/early/2014/09/26/009613

Direct downloads via GigaDB: http://gigadb.org/dataset/100102

Or via ENA: http://www.ebi.ac.uk/ena/data/view/PRJEB7385

FASTA direct links:

R7 2D: http://pathogenomics.bham.ac.uk/filedist/nanopore/Ecoli_R7_2D.fasta

R7.3 2D: http://pathogenomics.bham.ac.uk/filedist/nanopore/Ecoli_R73_2D.fasta

R7.3 workflow 1.9 passing 2D: http://pathogenomics.bham.ac.uk/filedist/nanopore/FC20.wf1.9.2D.pass.fasta

And in order to extract data from the FAST5 files you will want poretools: https://github.com/arq5x/poretools http://bioinformatics.oxfordjournals.org/content/early/2014/09/15/bioinformatics.btu555 http://poretools.readthedocs.org/en/latest/

Older Newer

Loman Labs

Setting up the CLIMB website

COST ES1103 Workshop: Shotgun Metagenomics Analysis: From Assembly to Function

Should I join the MinION Access Programme?

Advert: WG2 hackathon: Extracting strain level variation from shotgun metagenome data

Where can I get Oxford Nanopore MinION(tm) data from?