Basics of bioinformatics

Bioinformatics learning resources

1. Sequence retrieval and manipulation

Search for a sequence of interest begins with keywords, accession number, gene name, species name, etc. The Entrez search engine at NCBI, in addition to retrieving sequences, returns pre-computed lists of data elements such as related sequences, gene, protein, taxonomy, and others. Search can be performed in all databases or restricted to nucleotide in the drop-down menu. The result can be displayed in different format or downloaded, the most common format being FASTA.

2. Sequence alignment

Sequence alignment is the prerequisite of virtually all forms of sequence analysis ranging from search, to assembly, and to phylogenetics. Various algorithms have been developed to produce optimal alignment, a topic which is beyond the scope of this paper. Two examples of widely used open access softwares, namely BioEdit (Hall, 1999), and MEGA (Tamura et al. 2007), are freely downloaded and installed with easy-to-understand user’s manual. A pair of sequences or multiple sequences saved, for example in FASTA format, can be used as an input. However, sequence alignment can also be done on the web at one of the resources listed in this review (e.g. EMBL-EBI) using the ClustalW program or other methods.

3. Phylogenetics

Phylogenetic analysis is the basis of taxonomical and evolutionary studies. In the context of this paper, phylogenetic analysis is performed to cluster multiple sequences based on genetic distances. This is a broad topic and a subject of 100s of articles and books. A deluge of tools and web services can also be found online (e.g. http://evolution.genetics.washington.edu/phylip/software.html). For beginners, stand alone programs such as MEGA can do excellent job of phylogeny tree construction. In addition, web services such as EMBL-EBI provide similar tools.

4. Similarity search

Sequence comparison is essential for understanding evolutionary relationship between genes. The most common and widely used similarity search tool is BLAST (Best Local Alignment Search Tool (Ye et al. 2006). BLAST is a set of programs used to compare a nucleotide or protein query sequence to all of the available sequence databases. NCBI and EBI provide many different types of BLAST. Information on how to access BLAST services on WWW, choosing the right type of BLAST, interpreting BLAST results, how to do batch BLAST jobs, and others can be found at NCBI-BLAST home page (http://blast.ncbi.nlm.nih.gov/Blast.cgi).

5. Primer design

There are several applications in which primer designing is required for marker development. Such cases include, but not limited to, retrieved sequences containing simple sequence repeats suitable for SSR marker development. Orphan crops lack sequence information in which comparative genomics approaches such as homologous sequences are used to design degenerate primers, or re-sequence the gene of interest. The most widely used program for primer designing is PRIMER 3.0 (http://frodo.wi.mit.edu/primer3/) with several versions of web interface. The web-site provides user-friendly web interface and user manual describing the underlying principle of the program.

6. Advanced Skills

The major areas of high-end bioinformatics include the development of databases and algorithms for multiple sequence alignment, analysis and annotation of various types of microarray platforms, high-density oligonucleotide chips, variety of mass spectrometry, and diverse platforms of next generation sequencing. Computer savvy researchers who aspire to become bioinformatics tool developer should consider learning a scripting language program such as Perl (community web site: http://www.cpan.org/). Some genomics tasks such as discovery of SNPs or SSR in thousands of sequences, filtering sequences with the target motif and designing assay reagent (e.g. primer), filtering the result of BLAST, and annotation of thousands of EST sequences is a daunting job. Programming skill allows automation of such large-scale and complex jobs.

Online learning resources

Suggested Online resources for self-paced tutorials and other skill building opportunities

NCBI Training and Tutorials:http://www.ncbi.nlm.nih.gov/guide/training-tutorials/
Free Bioinformatics Educational Resource 2Can Support EBI (http://www.ebi.ac.uk/2can/home.html
Ensembl tutorials and other help documentation for Ensembl at:http://www.ensembl.org/info/website/tutorials/index.html
Open Access literature, books, manuals, lecture notes, slides (via google scholar; Scirus.com; Lab pages; wikipedia; university web sites)
Train online with EMBL-EBI:http://www.ebi.ac.uk/training/online

Most current researchers who use Bioinformatics in their day-to-day work trained themselves. Numerous free bioinformatics educational resources are available.

African Scientists can tap into advances in communication technology for cyber-learning capitalizing on the widely available ITC gadgets such as computers, mobile phones, and other wireless devices to access free available educational resources.

Key References

Armstead et al. 2009. “Bioinformatics in the orphan crops”, Brief Bioinform,10:645-653
Cochrane,G.R., and M.Y.Galperin. (2010) “The 2010 Nucleic Acids Research Database Issue and online Database Collection: a community of data resources”, Nucl. Acids Res., 38:D1-D4
Gedil,M. 2010. “Tailoring Bioinformatics for the Genetic Improvement of Orphan Crops” Afri Tech Dev Forum 6:34-43
Hall.T.A. (1999) “BioEdit: a user-friendly biological sequence alignment editor and analysis program for windows 95/98/NT”, Nucl. Acids. Symp. Ser., 41, 95-98.
Larrinua,IM and SB Belmer. 2009 “Bioinformatics and its relevance to Weed Science”, Weed Science, 56:297-305
Rhee,S.Y., J.Dickson, and D.Xu. (2006) “Bioinformatics and its applications in plant biology”, Annu. Rev. plant Biol. 57:335-360
Tamura,K., J.Dudley, M.Nei, and S.Kumar. (2007) “MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0”, Mol. Biol. Evol., 24, 1596-1599.
Ye,J., S.McGinnis, and T.L.Madden. (2006) “BLAST: improvements for better sequence analysis”, Nucleic Acids Res, 34, W6-W9.