Skip to main content

Advertisement

Table 1 Bioinformatics tools useful for motif discovery. Each resource is listed with its name, weblink, main reference, and short description

From: Experimental detection of short regulatory motifs in eukaryotic proteins: tips for good practice as well as for bad

Motif Resources/Predictors
ELM http://elm.eu.org [26]
To explore candidate functional sites in proteins and to learn about known motifs
MiniMotif Miner http://mnm.engr.uconn.edu [88]
To analyse protein queries for the presence of short contiguous peptide motifs that have a known function in at least one other protein
Scansite http://scansite3.mit.edu [89]
To identify short protein sequence motifs that are recognized by modular signalling domains, phosphorylated by protein Ser/Thr- or Tyr-kinases or mediate specific interactions with proteins or phospholipids
PePSite http://pepsite2.russelllab.org [90]
To predict binding of a given peptide to a protein structure
Motif Discovery
DILIMOT http://dilimot.russelllab.org [39]
To find short, over-represented peptide patterns/linear motifs, in a set of proteins
SLiMFinder http://bioware.ucd.ie/slimfinder.html [91]
To find novel, significantly over-represented, short protein motifs
Sequence Retrieval/Analysis
BLAST http://www.uniprot.org/blast http://blast.ncbi.nlm.nih.gov [47, 92]
To identify regions of local similarity between nuleotide or protein sequences, which can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families
BioMART http://www.biomart.org [93]
Provides free software and data services to foster scientific collaboration and facilitate the scientific discovery proces; the project adheres to the open source philosophy that promotes collaboration and code reuse
Alignment
Clustal http://www.clustal.org/omega http://www.ebi.ac.uk/Tools/msa/clustalo [49, 94]
General purpose DNA or protein multiple sequence alignment program
MAFFT http://mafft.cbrc.jp/alignment/server [95]
Multiple alignment program for amino acid or nucleotide sequences
Jalview http://www.jalview.org [48]
Lightweight Java applet for use in web applications, and a powerful desktop application that employs web services for sequence alignment
Phylogenetic Tree/Orthology
TreeFam http://www.treefam.org [96]
Database composed of phylogenetic trees inferred from animal genomes, providing orthology/paralogy predictions as well the evolutionary history of genes
EggNog http://eggnog.embl.de [97]
Database of orthologous groups of genes annotated with functional categories derived from COG/KOG categories
COG http://www.ncbi.nlm.nih.gov/COG [98]
Database providing phylogenetic classification of proteins encoded in complete genomes
Motif Conservation
Conscore http://conscore.embl.de [63]
Linear motif conservation filter
Consurf http://consurf.tau.ac.il [99]
To identify functional regions in proteins
SLiMPrints http://bioware.ucd.ie/~compass/biowareweb/Server_pages/slimprints.php [41]
De novo motif discovery tool to identify relatively over-constrained proximal groupings of residues within intrinsically disordered regions, indicative of a putatively functional motif
Protein Domains
SMART http://smart.embl.de [52]
To identify and annotate genetically mobile domains and to analyse domain architectures
PFAM http://pfam.xfam.org [51]
Database providing a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models
InterPro http://www.ebi.ac.uk/interpro [53]
To classify sequences into protein families and to predict the presence of important domains and sites
Structure/Disorder
PDB http://www.rcsb.org [55]
Single worldwide repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids
PDBsum http://www.ebi.ac.uk/pdbsum [100]
Pictorial database providing an at-a-glance overview of the contents of each 3D structure deposited in PDB
IUPred http://iupred.enzim.hu [54]
To predict intrinsically unstructured regions in proteins
D2P2 http://d2p2.pro [101]
Community resource, providing pre-computed disorder predictions on a large library of proteins from completely-sequenced genomes
MobiDB http://mobidb.bio.unipd.it [102]
Centralized resource for annotations of intrinsic protein disorder
DISPROT http://www.disprot.org [103]
Database providing information about proteins that lack fixed 3D structure in their putatively native states, either in their entirety or in part
Protein-Protein Interactions
BioGRID http://thebiogrid.org [104]
Online interaction respository with data compiled through comprehensive curation efforts
STRING http://string-db.org [57]
Provides known and predicted protein-protein interactions
IntAct http://www.ebi.ac.uk/intact [105]
Freely available, open source database system and analysis tools for molecular interaction data; all interactions are derived from literature curation or direct user submissions and are freely available
PiSITE http://pisite.hgc.jp [106]
Web-based database of protein interaction sites, providing information on interaction sites of a protein from multiple PDB entries
DOMINO http://mint.bio.uniroma2.it/domino [107]
Database of domain-peptide interactions
ComPPI http://ComPPI.LinkGroup.hu [108]
Cellular compartment-specific database for protein-protein interaction network analysis
iELM http://i.elm.eu.org [109]
Web server to explore short linear motif-mediated interactions
KEGG http://www.genome.jp/kegg [110]
Database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies
CORUM http://mips.gsf.de/genre/proj/corum [56]
Collection of experimentally verified mammalian protein complexes
Subcellular Localization
CELLO2GO http://cello.life.nctu.edu.tw/cello2go [59]
Web server for protein subcellular localization prediction with functional gene ontology annotation
LocDB https://www.rostlab.org/services/locDB [111]
Database that collects experimental annotations for the subcellular localization of proteins in Homo sapiens and Arabidopsis thaliana
GeneOntology http://geneontology.org/ http://www.ebi.ac.uk/QuickGO [112]
Collaborative effort to address the need for consistent descriptions of gene products across databases
Compartments http://compartments.jensenlab.org [113]
Database of protein subcellular localization data manually curated from the literature or obtained from high-throughput microscopy-based screens
LOCATE http://locate.imb.uq.edu.au [114]
Curated database providing data that describe the membrane organization and subcellular localization of proteins from the RIKEN FANTOM4 mouse and human protein sequence set
Tissue Expression
Protein Atlas http://www.proteinatlas.org [58]
Publicly available database with millions of high-resolution images showing the spatial distribution of proteins in 44 different normal human tissues and 20 different cancer types, as well as 46 different human cell lines
TISSUES http://tissues.jensenlab.org [115]
Resource integrating evidence on tissue expression from manually curated literature, proteomics and transcriptomics screens, and automatic text mining
Generic Resources
UniProt http://www.uniprot.org [116]
Manually annotated, non-redundant protein sequence and sequence isoform database; related information about the biological function of protein are curated from the scientific literature
Antibodypedia http://www.antibodypedia.com [117]
Open-access database of publicly available antibodies against human protein targets; contains data on the antibody efficacy in a range of biochemical and cell biological techniques
IUPAC http://www.iupac.org [118]
Serves to advance the worldwide aspects of the chemical sciences and to contribute to the application of chemistry in science