I want to learn bioinformatics! A guide for complete beginners.
18 Jul 2013I asked a question on Twitter yesterday:
What is the correct way to handle a request to “help me learn bioinformatics” from a non-computer-literate person?
This is a frequent request I encounter, and although I have various stock answers, I was curious to find out what you guys would say. Further, I wanted a resource with jumping off links, which I hope this blog post can serve as.
And, as with many times before, I was blown away by the use of Twitter for the diversity and quality of opinion and information generated from such a seemingly innocuous question. I know I am lucky in having >3,500 eager followers hanging on my every question, but still ;)
Luis Pedro Coelho initially took issue with the question and said it is important to define what a user really wants, as "learning bioinformatics" "is not a goal per se". Is the user driven by intellectual curiosity, do they want to learn to code, or simply develop enough skills to "analyse my RNA-Seq data?". Russell Neches takes the extreme view which is that bioinformatics "is a fiction. Biologists just use computers for certain things". Manoj Semanta likes to stratify bioinformatics into five layers of increasing complexity, with using web interfaces and running command-line programmes being the easiest and developing new algorithms being the hardest..
Some suggested learning R or Python and pointed to helpful online tutorials, such as http://learnpythonthehardway.org/ [Phil Ashton]. Another useful resource was the excellent Software Carpentry website which is aimed specifically at scientists wishing to learn best practices, for example the use of version control and Makefiles for reproducible research (software-carpentry.org) [suggested by Rob Davey, Deanna Church]. Software Carpentry plan to run more bootcamps for complete novices in the coming year, so keep an eye on their website.
Casey Bergman suggested using Galaxy, a web-based bioinformatics engine particularly heavily used for NGS/genomics analysis (www.galaxy-project.org). Although Rob Davey qualified this advice pointing out that "using a workflow UI and being a bioinformatician may not be the same thing. Chris Cole agrees, "Galaxy != bioinformatics. It's a great and powerful system, tho."
The always opinionated Mick Watson suggested more of a 'sink or swim' approach, specifically to go away and install Ubuntu on a PC or laptop, because "because to "learn bioinformatics" you need commitment and time and effort". Mick also pointed to some online resources hosted at ARK Genomics which expanded on this idea of bioinformatics being tough, but worth investing time in learning (http://www.ark-genomics.org/events-online-training/eu-training-course).
The theme of "learning by doing" is probably the one that I suggest most to people, also suggested by Aylwyn Scally. I tell people that learning programming through reading a book and doing simple exercises is demotivating if you don't have a problem in mind. So pick a problem that you think can be solved with scripting or bioinformatics tools, perhaps a biological question, and attempt to do it "by all means necessary". Being driven by a goal will help you keep motivated. Mario Caccamo says "Learning by doing is not ideal but that's the reality".
Some suggested attempting some simple tasks. One suggestion was to take data from a laboratory evolution paper (e.g. http://www.ncbi.nlm.nih.gov/pubmed/21940899) and try to reproduce it, in this case by detecting a small set of mutations [@contaminatedscience]. Another was to use the intriguing ROSALIND platform to attempt bioinformatics problem solving (http://rosalind.info/problems/locations/) [Robert Lanfae, Adam Kiezun].
Bastien Chevreux suggested the popular "Dummies Guide …" series including Bioinformatics for Dummies (http://www.dummies.com/store/product/Bioinformatics-For-Dummies-2nd-Edition.productCd-0470089857.html) although C. Titus Brown takes issue with the name of these books suggesting that "the culture of "I'm too stupid" inhibits learning". Good point. They look rubbish on your bookshelf too.
Pete @drosophilic suggested enrolling in local training courses, and a list is kept maintained by Stephen Turner at http://stephenturner.us/p/edu. I also note the newly launched iAnn Events platform (http://iann.pro/iannviewer).
Another useful resource is BioStars, see the thread Advice for newcomers to the bioioinformatics field [Pierre Lindenbaum]
C. Titus Brown has a workshop with online course materials for next-generation sequence analysis.
Alan McNally suggested it is possible for a newbie to learn bioinformatics successfully, citing himself as a case study: "I was the requester 4 years ago. Was told to switch to linux and start reading user guides. Haven't looked back".
And finally, Aylwyn Scally remarks that "the first thing I tell them is close MS Excel".
Thanks to all those who took part in the discussion!
Do you have anything to add?
Update: 31st July 2013 - added some links to the Homolog.us blog.