In eukaryotes, protein phosphorylation is by far the most important and widespread post-translational modification (PTM) that mainly occurs on specific serine (S), threonine (T) or tyrosine (Y) residues in protein substrates, and orchestrates a variety of biological processes including signaling transduction, cell cycle/proliferation, autophagy and metabolism (Reinhardt, H.C., et al., 2013; Morrison, D.K., et al., 2009; Lim, W.A., et al., 2010; Yaffe, M.B., et al., 2002). Importantly, numerous proteins containing phosphoprotein-binding domains (PPBDs) can recognize and bind phosphoserine (pS), phosphothreonine (pT) or phosphotyrosine (pY) residues in specific substrates as “readers”, which dictate the phosphorylation signaling events delivered from “writers”, namely, protein kinases (PKs), and accurately propagate signals into downstream pathways (Pawson, T., et al., 1997; Yaffe, M.B., et al., 2001; Pawson, T., et al., 2004). Dysregulation of normal interactions between PPBDs and p-sites is frequently associated with human diseases such as cancer (Hermeking, H., et al., 2003; Garnett, M.J., et al., 2005) and neurodegenerative disorders (Yuan, Z., et al., 2008). Thus, the identification of PPBD-specific binding p-sites (PBSs) is fundamental for revealing dynamic phosphorylation signaling networks.

     In this work, we manually collected 4458 experimentally identified PBSs in 950 PPBD-binding proteins (PPBPs) that interact with 268 PPBD-containing proteins (PPCPs) from 12 eukaryotic species. We classified these known PBSs into a hierarchical structure with three levels, including group, family and single PPBD cluster, based on the annotations of PPCPs (Guo, Y., et al., 2019). With a hypothesis that PPBDs in the same family/cluster might recognize similar sequence motifs in substrates, we considerably improved our previously developed group-based prediction system (GPS) algorithm, and adopted a deep learning plus transfer learning for model training. Then we developed a new online service named GPS-PBS, which implemented 138 predictors for 122 PPBD clusters belonged to 2 groups and 16 families. In total, GPS can predict PBSs for 159 human PPCPs. By comparison, our results demonstrated that GPS-PBS showed a highly competitive accuracy against other exiting tools. Using GPS-PBS, we conducted a large-scale prediction to computationally annotate potential PPBDs from a mammalian phosphoproteomic data set, and observed that various PPCPs and PKs are involved in synergistically orchestrating a number of important pathways. Taken together, we anticipate that GPS-PBS can be a helpful tool to prioritize highly potential candidates for further experimental consideration. For convenience, the online service of GPS-PBS was implemented in PHP and JavaScript, and freely available for academic research at http://pbs.biocuckoo.cn/.

For publication of results please cite the following article:

GPS-PBS: A deep learning framework to predict phosphorylation sites that specifically interact with phosphoprotein-binding domains.
Yaping Guo, Wanshan Ning, Peiran Jiang, Shaofeng Lin, Chenwei Wang, Xiaodan Tan, Lan Yao, Di Peng, Yu Xue*.
2020, Submitted

Systematic analysis of PLK-mediated phosphoproteome in eukaryotes.
Zexian Liu, Jian Ren, Jun Cao, Jiang He, Xuebiao Yao, Changjiang Jin and Yu Xue.
Briefings in Bioinformatics, 2013, 14(3):344–360

[Abstract] [Full Text] [PDF]