Table 1 Bioinformatics tools useful for motif discovery. Each resource is listed with its name, weblink, main reference, and short description

Motif Resources/Predictors
ELM [26]
To explore candidate functional sites in proteins and to learn about known motifs
MiniMotif Miner [88]
To analyse protein queries for the presence of short contiguous peptide motifs that have a known function in at least one other protein
Scansite [89]
To identify short protein sequence motifs that are recognized by modular signalling domains, phosphorylated by protein Ser/Thr- or Tyr-kinases or mediate specific interactions with proteins or phospholipids
PePSite [90]
To predict binding of a given peptide to a protein structure
Motif Discovery
To find short, over-represented peptide patterns/linear motifs, in a set of proteins
SLiMFinder [91]
To find novel, significantly over-represented, short protein motifs
Sequence Retrieval/Analysis
BLAST [47, 92]
To identify regions of local similarity between nuleotide or protein sequences, which can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families
BioMART [93]
Provides free software and data services to foster scientific collaboration and facilitate the scientific discovery proces; the project adheres to the open source philosophy that promotes collaboration and code reuse
Clustal [49, 94]
General purpose DNA or protein multiple sequence alignment program
MAFFT [95]
Multiple alignment program for amino acid or nucleotide sequences
Jalview [48]
Lightweight Java applet for use in web applications, and a powerful desktop application that employs web services for sequence alignment
Phylogenetic Tree/Orthology
TreeFam [96]
Database composed of phylogenetic trees inferred from animal genomes, providing orthology/paralogy predictions as well the evolutionary history of genes
EggNog [97]
Database of orthologous groups of genes annotated with functional categories derived from COG/KOG categories
COG [98]
Database providing phylogenetic classification of proteins encoded in complete genomes
Motif Conservation
Conscore [63]
Linear motif conservation filter
Consurf [99]
To identify functional regions in proteins
SLiMPrints [41]
De novo motif discovery tool to identify relatively over-constrained proximal groupings of residues within intrinsically disordered regions, indicative of a putatively functional motif
Protein Domains
SMART [52]
To identify and annotate genetically mobile domains and to analyse domain architectures
PFAM [51]
Database providing a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models
InterPro [53]
To classify sequences into protein families and to predict the presence of important domains and sites
PDB [55]
Single worldwide repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids
PDBsum [100]
Pictorial database providing an at-a-glance overview of the contents of each 3D structure deposited in PDB
IUPred [54]
To predict intrinsically unstructured regions in proteins
D2P2 [101]
Community resource, providing pre-computed disorder predictions on a large library of proteins from completely-sequenced genomes
MobiDB [102]
Centralized resource for annotations of intrinsic protein disorder
Database providing information about proteins that lack fixed 3D structure in their putatively native states, either in their entirety or in part
Protein-Protein Interactions
BioGRID [104]
Online interaction respository with data compiled through comprehensive curation efforts
Provides known and predicted protein-protein interactions
IntAct [105]
Freely available, open source database system and analysis tools for molecular interaction data; all interactions are derived from literature curation or direct user submissions and are freely available
PiSITE [106]
Web-based database of protein interaction sites, providing information on interaction sites of a protein from multiple PDB entries
DOMINO [107]
Database of domain-peptide interactions
ComPPI [108]
Cellular compartment-specific database for protein-protein interaction network analysis
iELM [109]
Web server to explore short linear motif-mediated interactions
KEGG [110]
Database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies
CORUM [56]
Collection of experimentally verified mammalian protein complexes
Subcellular Localization
Web server for protein subcellular localization prediction with functional gene ontology annotation
LocDB [111]
Database that collects experimental annotations for the subcellular localization of proteins in Homo sapiens and Arabidopsis thaliana
GeneOntology [112]
Collaborative effort to address the need for consistent descriptions of gene products across databases
Compartments [113]
Database of protein subcellular localization data manually curated from the literature or obtained from high-throughput microscopy-based screens
LOCATE [114]
Curated database providing data that describe the membrane organization and subcellular localization of proteins from the RIKEN FANTOM4 mouse and human protein sequence set
Tissue Expression
Protein Atlas [58]
Publicly available database with millions of high-resolution images showing the spatial distribution of proteins in 44 different normal human tissues and 20 different cancer types, as well as 46 different human cell lines
Resource integrating evidence on tissue expression from manually curated literature, proteomics and transcriptomics screens, and automatic text mining
Generic Resources
UniProt [116]
Manually annotated, non-redundant protein sequence and sequence isoform database; related information about the biological function of protein are curated from the scientific literature
Antibodypedia [117]
Open-access database of publicly available antibodies against human protein targets; contains data on the antibody efficacy in a range of biochemical and cell biological techniques
IUPAC [118]
Serves to advance the worldwide aspects of the chemical sciences and to contribute to the application of chemistry in science