Skip to main content

Table 1 Bioinformatics tools useful for motif discovery. Each resource is listed with its name, weblink, main reference, and short description

From: Experimental detection of short regulatory motifs in eukaryotic proteins: tips for good practice as well as for bad

Motif Resources/Predictors

ELM

http://elm.eu.org

[26]

To explore candidate functional sites in proteins and to learn about known motifs

MiniMotif Miner

http://mnm.engr.uconn.edu

[88]

To analyse protein queries for the presence of short contiguous peptide motifs that have a known function in at least one other protein

Scansite

http://scansite3.mit.edu

[89]

To identify short protein sequence motifs that are recognized by modular signalling domains, phosphorylated by protein Ser/Thr- or Tyr-kinases or mediate specific interactions with proteins or phospholipids

PePSite

http://pepsite2.russelllab.org

[90]

To predict binding of a given peptide to a protein structure

Motif Discovery

DILIMOT

http://dilimot.russelllab.org

[39]

To find short, over-represented peptide patterns/linear motifs, in a set of proteins

SLiMFinder

http://bioware.ucd.ie/slimfinder.html

[91]

To find novel, significantly over-represented, short protein motifs

Sequence Retrieval/Analysis

BLAST

http://www.uniprot.org/blast http://blast.ncbi.nlm.nih.gov

[47, 92]

To identify regions of local similarity between nuleotide or protein sequences, which can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families

BioMART

http://www.biomart.org

[93]

Provides free software and data services to foster scientific collaboration and facilitate the scientific discovery proces; the project adheres to the open source philosophy that promotes collaboration and code reuse

Alignment

Clustal

http://www.clustal.org/omega http://www.ebi.ac.uk/Tools/msa/clustalo

[49, 94]

General purpose DNA or protein multiple sequence alignment program

MAFFT

http://mafft.cbrc.jp/alignment/server

[95]

Multiple alignment program for amino acid or nucleotide sequences

Jalview

http://www.jalview.org

[48]

Lightweight Java applet for use in web applications, and a powerful desktop application that employs web services for sequence alignment

Phylogenetic Tree/Orthology

TreeFam

http://www.treefam.org

[96]

Database composed of phylogenetic trees inferred from animal genomes, providing orthology/paralogy predictions as well the evolutionary history of genes

EggNog

http://eggnog.embl.de

[97]

Database of orthologous groups of genes annotated with functional categories derived from COG/KOG categories

COG

http://www.ncbi.nlm.nih.gov/COG

[98]

Database providing phylogenetic classification of proteins encoded in complete genomes

Motif Conservation

Conscore

http://conscore.embl.de

[63]

Linear motif conservation filter

Consurf

http://consurf.tau.ac.il

[99]

To identify functional regions in proteins

SLiMPrints

http://bioware.ucd.ie/~compass/biowareweb/Server_pages/slimprints.php

[41]

De novo motif discovery tool to identify relatively over-constrained proximal groupings of residues within intrinsically disordered regions, indicative of a putatively functional motif

Protein Domains

SMART

http://smart.embl.de

[52]

To identify and annotate genetically mobile domains and to analyse domain architectures

PFAM

http://pfam.xfam.org

[51]

Database providing a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models

InterPro

http://www.ebi.ac.uk/interpro

[53]

To classify sequences into protein families and to predict the presence of important domains and sites

Structure/Disorder

PDB

http://www.rcsb.org

[55]

Single worldwide repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids

PDBsum

http://www.ebi.ac.uk/pdbsum

[100]

Pictorial database providing an at-a-glance overview of the contents of each 3D structure deposited in PDB

IUPred

http://iupred.enzim.hu

[54]

To predict intrinsically unstructured regions in proteins

D2P2

http://d2p2.pro

[101]

Community resource, providing pre-computed disorder predictions on a large library of proteins from completely-sequenced genomes

MobiDB

http://mobidb.bio.unipd.it

[102]

Centralized resource for annotations of intrinsic protein disorder

DISPROT

http://www.disprot.org

[103]

Database providing information about proteins that lack fixed 3D structure in their putatively native states, either in their entirety or in part

Protein-Protein Interactions

BioGRID

http://thebiogrid.org

[104]

Online interaction respository with data compiled through comprehensive curation efforts

STRING

http://string-db.org

[57]

Provides known and predicted protein-protein interactions

IntAct

http://www.ebi.ac.uk/intact

[105]

Freely available, open source database system and analysis tools for molecular interaction data; all interactions are derived from literature curation or direct user submissions and are freely available

PiSITE

http://pisite.hgc.jp

[106]

Web-based database of protein interaction sites, providing information on interaction sites of a protein from multiple PDB entries

DOMINO

http://mint.bio.uniroma2.it/domino

[107]

Database of domain-peptide interactions

ComPPI

http://ComPPI.LinkGroup.hu

[108]

Cellular compartment-specific database for protein-protein interaction network analysis

iELM

http://i.elm.eu.org

[109]

Web server to explore short linear motif-mediated interactions

KEGG

http://www.genome.jp/kegg

[110]

Database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies

CORUM

http://mips.gsf.de/genre/proj/corum

[56]

Collection of experimentally verified mammalian protein complexes

Subcellular Localization

CELLO2GO

http://cello.life.nctu.edu.tw/cello2go

[59]

Web server for protein subcellular localization prediction with functional gene ontology annotation

LocDB

https://www.rostlab.org/services/locDB

[111]

Database that collects experimental annotations for the subcellular localization of proteins in Homo sapiens and Arabidopsis thaliana

GeneOntology

http://geneontology.org/ http://www.ebi.ac.uk/QuickGO

[112]

Collaborative effort to address the need for consistent descriptions of gene products across databases

Compartments

http://compartments.jensenlab.org

[113]

Database of protein subcellular localization data manually curated from the literature or obtained from high-throughput microscopy-based screens

LOCATE

http://locate.imb.uq.edu.au

[114]

Curated database providing data that describe the membrane organization and subcellular localization of proteins from the RIKEN FANTOM4 mouse and human protein sequence set

Tissue Expression

Protein Atlas

http://www.proteinatlas.org

[58]

Publicly available database with millions of high-resolution images showing the spatial distribution of proteins in 44 different normal human tissues and 20 different cancer types, as well as 46 different human cell lines

TISSUES

http://tissues.jensenlab.org

[115]

Resource integrating evidence on tissue expression from manually curated literature, proteomics and transcriptomics screens, and automatic text mining

Generic Resources

UniProt

http://www.uniprot.org

[116]

Manually annotated, non-redundant protein sequence and sequence isoform database; related information about the biological function of protein are curated from the scientific literature

Antibodypedia

http://www.antibodypedia.com

[117]

Open-access database of publicly available antibodies against human protein targets; contains data on the antibody efficacy in a range of biochemical and cell biological techniques

IUPAC

http://www.iupac.org

[118]

Serves to advance the worldwide aspects of the chemical sciences and to contribute to the application of chemistry in science