※ Computational resources for PPBDs–specific phosphorylation sites:

Introduction:

Protein phosphorylation is by far the most important and widespread post-translational modification (PTM) of proteins, and plays important roles in most of biological processes. Importantly, numerous proteins containing phosphoprotein-binding domains (PPBDs) can recognize and bind phosphoserine (pS), phosphothreonine (pT) or phosphotyrosine (pY) residues in specific substrates as ‘readers’ and function as ‘molecular integrators’ in space and time, thereby play pivotal roles in determining the specificity of signal transduction events emanating from protein kinases as ‘writers’. In contrast with laborious and time-consuming experimental methods, the computational prediction of PPBDs–specific phosphorylation sites would aid in basic research for further experimental design. In this review, we present a comprehensive but brief summarization of computational resources for PPBDs–specific phosphorylation sites, including PPBDs databases, prediction of PPBDs–specific, kinase-specific phosphorylation sites, and other tools.

We apologized that the computational studies without any web links of databases or tools will not be included in this compendium, since it's not easy for experimentalists to use studies directly. We are grateful for users feedback. Please contact us to add, remove or update one or multiple web links below.

Index:

<1> PPBDs databases

<2> Prediction of PPBDs–specific phosphorylation sites

<3> Prediction of kinase-specific phosphorylation sites

==================================================================================

<1> PPBDs Databases:

1. iEKPD: Contained 197,348 phosphorylation regulators, including 109,912 protein kinases, 23,294 protein phosphatases and 68,748 PPBD-containing proteins in 164 eukaryotic specie (Guo, et al., 2019).

2. PepCyber :P~Pep 1.2: is a database of human protein-protein interactions mediated by 10 classes of phosphoprotein binding domains (PPBDs) (Gong, et al., 2008).

<2> Prediction of PPBDs–specific phosphorylation sites:

1. ScanSite 2.0: searches for motifs within proteins that are likely to be phosphorylated by specific protein kinases or bind to domains such as SH2 domains, 14-3-3 domains and PTB domains (Obenauer, et al., 2003).

2. SMALI : a worldwide web-accessible computer program dubbed SMALI for scoring matrix-assisted ligand identification for SH2 domains and other signaling modules (Li, et al., 2008).

3. NetPhorest: an atlas of consensus sequence motifs that covers 179 kinases and 104 phosphorylation-dependent binding domains [SH2, PTB, BRCT, WW, and 14-3-3] (Miller, et al., 2008).

4. GPS-Polo GPS: an integrative approach for the analysis of Plk-specific phospho-binding and phosphorylation sites (p-sites) in proteins (Liu, et al., 2013).

5. NetSH2: artificial neural network (ANN) predictors for each of the 70 profiled SH2 domains (Tinti, et al., 2013).

6. 14-3-3-Pred : improved methods to predict 14-3-3-binding phosphopeptides (Madeira et al. 2015).

<3> Prediction of kinase-specific phosphorylation sites:

1. GPS 2.1 GPS: GPS 2.1 software was implemented in JAVA and could predict kinase-specific phosphorylation sites for 408 human Protein Kinases in hierarchy (Xue, et al., 2008).

2. GPS 1.10 GPS: the old version of GPS. We designed a novel algorithm GPS (Group-based Phosphorylation sites Prediction) and construct an easy-to-use web server for the experimentalists (Xue, et al., 2005; Zhou, et al., 2004).

3. PPSP 1.0 GPS: we also developed another online program for prediction of kinase-specific phosphorylation sites, implemented in Baysian Decision Theory (BDT) (Xue, et al., 2006).

4. ScanProsite: consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them (de Castro, et al., 2006; Hulo, et al., 2008).

5. ELM: is a resource for predicting functional sites in eukaryotic proteins (Puntervoll, et al., 2010).

6. Minimotif Miner: analyzes protein queries for the presence of short functional motifs that, in at least one protein, has been demonstrated to be involved in posttranslational modifications (PTM), binding to other proteins, nucleic acids, or small molecules, or proteins trafficking (Balla, et al., 2006; Rajasekaran, et al., 2009).

7. PhosphoMotif Finder: contains known kinase/phosphatase substrate as well as binding motifs that are curated from the published literature. It reports the PRESENCE of any literature-derived motif in the query sequence (Amanchy, et al., 2007).

8. PREDIKIN 1.0: produces a prediction of substrates for serine/threonine protein kinases based on the primary sequence of a protein kinase catalytic domain (Brinkworth, et al., 2003).

9. Predikin & PredikinDB 2.0: consists of two components: (i) PredikinDB, a database of phosphorylation sites that links substrates to kinase sequences and (ii) a Perl module, which provides methods to classify protein kinases, reliably identify substrate-determining residues, generate scoring matrices and score putative phosphorylation sites in query sequences (Saunders, et al., 2008; Saunders and Kobe, 2008).

10. ScanSite 2.0: searches for motifs within proteins that are likely to be phosphorylated by specific protein kinases or bind to domains such as SH2 domains, 14-3-3 domains or PDZ domains (Obenauer, et al., 2003).

11. NetPhosK 1.0: produces neural network predictions of kinase specific eukaryotic protein phosphoylation sites. Currently NetPhosK covers the following kinases: PKA, PKC, PKG, CKII, Cdc2, CaM-II, ATM, DNA PK, Cdk5, p38 MAPK, GSK3, CKI, PKB, RSK, INSR, EGFR and Src (Blom, et al., 2004).

12. PredPhospho 1.0: implemented in SVM algorithm, could predict kinase-specific phosphorylation sites for 4 kinase groups and 4 kinase families, respectively (Kim, et al., 2004).

13. PredPhospho 2.0: enhance version of PredPhospho predictor, which was still implemented in SVM algorithm, for 7 kinase groups and 18 kinase families, respectively (Ryu, et al., 2009).

14. KinasePhos 1.0: predicts kinase-specific phosphorylation sites within given protein sequences. Profile Hidden Markov Model (HMM) is applied for learning to each group of sequences surrounding to the phosphorylation residues (Huang, et al., 2005).

15. KinasePhos 2.0: New version of kinase-specific phosphorylation site prediction tool that is based the sequenece-based amino acid coupling-pattern analysis and solvent accessibility as new features of SVM (support vector machine) (Wong, et al., 2007).

16. PhoScan: predicts of kinase-specific phosphorylation sites with sequence features by a log-odds ratio approach (Li, et al., 2007).

17. pkaPS: Prediction of protein kinase A phosphorylation sites using the simplified kinase binding model (Neuberger, et al., 2007).

18. CRPhos 0.8: Prediction of kinase-specific phosphorylation sites using conditional random fields. Its source code is free for academic research and could be compiled in Linux/Unix OS (Dang, et al., 2008).

19. AutoMotif 2.0: allows for identification of PTM (post-translational modification) sites, including phosphorylation sites in proteins. The AutoMotif Server 2.0 was trained support vector machine (SVM) for each type of PTM separately on proteins of the Swiss-Prot database (version 42.0) (Plewczynski, et al., 2005; Plewczynski, et al., 2008).

20. MetaPredPS: Meta-predictors make predictions by organizing and processing the predictions produced by several other predictors in a defined problem domain (Wan, et al., 2008).

21. SMALI: searches for peptide ligands in human proteins that are likely to bind to SH2 domains (Huang, et al., 2008; Li, et al., 2008).

22. NetPhorest: is a non-redundant collection of 125 sequence-based classifiers for linear motifs in phosphorylation-dependent signaling. The collection contains both family-based and gene-specific classifiers (Miller, et al., 2008; Miller, et al., 2008; Horn, et al., 2014).

23. SiteSeek: is trained using a novel compact evolutionary and hydrophobicity profile to detect possible protein phosphorylation sites for a target sequence (Yoo, et al., 2008). The tool is not available.

24. PostMod: is a predict sever for phosphorylation sites. The authors combined physicochemical information, motif information, and evolutionary information by simply comaparing sequence similarities, and could predict phosphorylation sites for 48 different kinases (Jung, et al., 2010).

25. iGPS 1.0 GPS: we developed a software package of iGPS (GPS algorithm with the interaction filter, or in vivo GPS) mainly for the prediction of in vivo ssKSRs. Eukaryotic PKs were classified into a hierarchy with four levels: group, family, subfamily, and single PK. Based on the hypothesis that similar PKs recognize similar SLMs, we selected a predictor in GPS 2.0 for each PK and directly predicted the potential PKs for the non-annotated p-sites from the phosphoproteomic studies. Consequently, protein-protein interaction (PPI) information was used as the major contextual factor to filtrate potentially false-positive hits (Song, et al., 2012).

26. Musite: a tool for global prediction of general and kinase-specific phosphorylation sites. The authors collected phosphoproteomics data in multiple organisms from several reliable sources and used them to train prediction models by a comprehensive machine-learning approach that integrates local sequence similarities to known phosphorylation sites, protein disorder scores, and amino acid frequencies (Gao, et al., 2010; Yao, et al., 2012).

27. MusiteDeep: the first deep-learning framework for predicting general and kinase-specific phosphorylation sites (Wang, et al., 2017).

28. PlantPhos: is a web tool for predicting potential phosphorylation sites in plant proteins with various substrate motifs based on Hidden Markov Models (HMM) and Maximal Dependence Decomposition (MDD) (Lee, et al., 2011).

29. PSEA: the authors proposed a new method called PSEA (Phosphorylation Set Enrichment Analysis) to detect new sites phosphorylated by a specific kinase, kinase family and kinase group. For each query, they assigned a P-value according to its similarity with known phosphorylated ones. The smaller the P-value, the more significant will be the chance that the given peptides were phosphorylated by the chosed kinase type (Suo, et al., 2014).

30. PKIS: based on the latest version of Phopho.ELM (9.0), a novel kinase identification web server, PKIS, incorporating support vector machines (SVMs) with the composition of monomer spectrum (CMS) is used to assign protein kinase for experimentally verified P-sites of human in high specificity (Zou, et al., 2013).

31. PTMPred: a support vector machine (SVM) with the kernel matrix computed by PSPM(position-specific propensity matrices) is applied to predict the posttranslational modification sites (Xu, et al., 2014).

32. Phos_pred: a method for the prediction of protein kinase-specific phosphorylation sites in hierarchical structure using functional information and random forest (Fan, et al., 2014).

33. PhosK3D: is a web server for identifying kinase-specific phosphorylation sites on protein sequences and three-dimensional structures (Su and Lee, et al., 2013).

34. PhosphoPICK: a method for predicting kinase substrates using cellular context information, and is currently able to make predictions for 59 human kinases (Patrick, et al., 2013).