Bioinformatics Online Resources

Contents

Nucleotide Sequence Databases (the principal ones)

  • NCBI – National Center for Biotechnology Information
  • EBI – European Bioinformatics Institute
  • DDBJ – DNA Data Bank of Japan

Protein Sequence Databases

  • SWISS-PROT & TrEMBL – Protein sequence database and computer annotated supplement
  • UniProt – UniProt (Universal Protein Resource) is the world’s most comprehensive catalog of information on proteins. It is a central repository of protein sequence and function created by joining the information contained in Swiss-Prot, TrEMBL, and PIR.
  • PIR – Protein Information Resource
  • MIPS – Munich Information centre for Protein Sequences
  • HUPO – HUman Proteome Organization

Database Searching by Sequence Similarity

Top

Sequence Alignment

  • USC Sequence Alignment Server – align 2 sequences with all possible varieties of dynamic programming
  • T-COFFEE – multiple sequence alignment
  • ClustalW @ EBI – multiple sequence alignment
  • MSA 2.1 – optimal multiple sequence alignment using the Carrillo-Lipman method
  • BOXSHADE – pretty printing and shading of multiple alignments
  • Splign – Splign is a utility for computing cDNA-to-Genomic, or spliced sequence alignments. At the heart of the program is a global alignment algorithm that specifically accounts for introns and splice signals. New!
  • Spidey – an mRNA-to-genomic alignment program
  • SIM4 – a program to align cDNA and genomic DNA (My Personal favorite!)
  • Wise2 – align a protein or profile HMM against genomic sequence to predict a gene structure, and related tools
  • PipMaker – computes alignments of similar regions in two (long) DNA sequences (Yet another of my favorites!)
  • VISTA – align + detect conserved regions in long genomic sequences
  • myGodzilla – align a sequence to its ortholog in the human genome

Top

Human Genome Databases

Databases of other Organisms

Top

Genome-wide Analysis

  • MBGD – comparative analysis of completely sequenced microbial genomes
  • COGs – phylogenetic classification of orthologous proteins from complete genomes
  • STRING – detect whether a given query gene occurs repeatedly with certain other genes in potential operons
  • Pedant – automatic whole genome annotation
  • GeneCensus – various whole genome comparisons

Protein Domains: Databases and Search Tools

  • InterPro – integration of Pfam, PRINTS, PROSITE, SWISS-PROT + TrEMBL
  • PROSITE – database of protein families and domains
  • Pfam – alignments and hidden Markov models covering many common protein domains
  • SMART – analysis of domains in proteins
  • ProDom – protein domain database
  • PRINTS Database – groups of conserved motifs used to characterise protein families
  • Blocks – multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins
  • Protein Domain Profile Analysis @ BMERC – search a library of profiles with a protein sequence
  • TIGRFAMs – yet more protein families based on Hidden Markov Models

Top

Motif and Pattern Search in Sequences

  • Gibbs Motif Sampler – identification of conserved motifs in DNA or protein sequences
  • AlignACE Homepage – gene regulatory motif finding
  • MEME  – motif discovery and search in protein and DNA sequences
  • SAM – tools for creating and using Hidden Markov Models
  • Pratt – discover patterns in unaligned protein sequences
  • Motivated Proteins – a web facility for exploring small hydrogen-bonded motifs

Protein 3D Structure

Top

Phylogeny & Taxonomy

Gene Prediction

Gene Expression Databases

Top

Gene Regulation

  • TRAFAC – For identifying conserved and shared cis regulatory elements between a pair of genes.
  • CisMols – For identifying conserved and shared cis regulatory elements between a set of co-expressed genes.
  • TRANSFAC – database of eukaryotic cis-acting regulatory DNA elements and trans-acting factors
  • EPD – eukaryotic promoter database
  • DBTSS – DataBase of Transcriptional Start Sites (human)
  • SCPD – Saccharomyces cerevisiae promoter database
  • DCPD – Drosophila Core Promoter Database
  • RegulonDB – a database on transcriptional regulation in E. coli
  • DPInteract – protein binding sites on E. coli DNA
  • PromoterInspector – prediction of promoter regions in mammalian genomic sequences
  • MatInspector – search for transcription factor binding sites
  • Cister – cis-element cluster finder
  • Gene regulatory Tools
  • microRNA.org: microRNA Targets & Expression Profiles New!
  • miRBase New!
  • TarBase Provides a means of searching through a comprehensive set of experimentally supported microRNA targets in at least 8 organisms New!
  • microRNA resource A gateway to all types of information about microRNAs, including articles, products, news, events, and other websites New!

Top

Metabolic, Gene Regulatory & Signal Transduction Network Databases

  • KEGG – Kyoto Encyclopedia of Genes and Genomes
  • BioCarta
  • DAVIDDatabase for Annotation, Visualization and Integrated Discovery – A useful server to for annotating microarray and other genetic data.
  • stke – Signal Transduction Knowledge Environment
  • BIND – Biomolecular Interaction Network Database
  • EcoCyc
  • WIT
  • PathGuide A very useful collection of resources dealing primarily with pathways New!
  • SPAD – Signaling Pathway Database
  • CSNDB – Cell Signalling Networks Database
  • PathDB
  • Transpath
  • DIP – Database of Interacting Proteins
  • PFBP – Protein Function and Biochemical Networks
  • Alliance for Cellular Signalling

Top

Systems Biology

Top

Other Databases (Annotations, Ontologies, Consortia, etc.)

Top

Miscellaneous Tools

  • NCBI Genome Workbench – NCBI Genome Workbench is an integrated application for viewing and analyzing sequence data. With Genome Workbench, you can view data in publically available sequence databases at NCBI, and mix this data with your own private data. New!
  • Repeatmasker – mask repetitive elements in DNA sequences
  • Tandem Repeats Finder
  • Vienna RNA Package – RNA secondary structure prediction
  • mfold (1) – RNA secondary structure prediction
  • mfold (2) – RNA secondary structure prediction
  • EST parser – find alternative polyadenylation sites in mRNAs, using ESTs
  • UTR-extender – extends missing ends of an mRNA using EST and genome sequence data
  • CpG Islands – predict CpG islands
  • NetStart – prediction of translation start sites in vertebrate and A.thaliana sequences
  • ATGpr – prediction of translation start sites in cDNA sequences
  • SignalP – secretory signal peptide prediction
  • PSORT – prediction of protein sorting signals and transmembrane helices
  • CBS Prediction Servers – prediction of protein subcellular localization and various sites in protein and nucleotide sequences
  • Compute pI/Mw Tool
  • Translate Tool
  • Reverse complement nucleotide sequences
  • Melting – calculate melting temperature for nucleic acid duplexes
  • bend.it – calculate curvature and bendability of a DNA sequence
  • webcutter – detect restriction enzyme cutting sites in DNA sequences
  • Primer3 – pick primers from a DNA sequence
  • Probability Distribution Calculators – normal, chi square, t, F, etc.

Top

Computational Resources

Top

Bioinformatics on-line course materials and tutorials (not an exhaustive collection)

Intro to bioinformatics and computational biology:

Algorithms:

Miscellaneous:

Top

Web Sites for Background Information & News

Top

Other Collections of Bioinformatics Resources

Top