您的位置:首页 > 其它

蛋白质磷酸化修饰相关计算资源汇总

2010-08-25 00:10 323 查看
For publication of results, please cite the following article:



A summary of computational resources for protein phosphorylation
.Yu Xue
, Xinjiao Gao, Cao Jun, Liu Zexian, Changjiang Jin, Longping Wen, Xuebiao Yao, and Jian
Ren
. Current Protein & Peptide Science. 2010
; 11(6):485-496






Introduction

:
Protein phosphorylation


is the most ubiquitous post-translational modification
(PTM), and plays important roles in most of biological
processes. Identification of site-specific phosphorylated
substrates is fundamental for understanding the molecular
mechanisms of phosphorylation. Besides experimental
approaches, prediction of potential candidates with
computational methods has also attracted great attention
for its convenience and fast-speed. In this review,
we present a comprehensive but brief summarization
of computational resources of protein phosphorylation,
including phosphorylation databases, prediction of
non-specific or organism-specific phosphorylation
sites, prediction of kinase-specific phosphorylation
sites or phospho-binding motifs, and other tools.
A testing data set prepared from Phospho.ELM

6.0 is available at: Comparison_data


.

We apologized
that the computational studies without any web links
of databases or tools will not be included in this
compendium, since it's not easy for experimentalists
to use studies directly. We are grateful for users
feedback. Please inform Prof.
Yu Xue


or Prof. Jian Ren


to add, remove or update
one or multiple web links below.
Index

:
<1>
Phosphorylation Databases



<2>
Prediction of non-specific or organism-specific phosphorylation
sites



<3>
Prediction of kinase-specific phosphorylation sites
or phospho-binding motifs



<4>
Miscellaneous tools



<5>
Detection of potential phosphorylation sites from
mass spectrometry data


==================================================================================


<1>
Phosphorylation Databases


:
1. Phospho.ELM
8.3 (PhosphoBase)


: contains 5,115 experimentally
verified phosphorylated proteins from different species
with 2,746 tyrosine, 15,972 serine and 3,283 threonine
sites. All instances were manually collected from
scientific literature (Diella,
et al.
, 2004
; Diella,
et al.
, 2008
).
2. PhosphoSitePlus

:
a new version of PhosphoSite, is a web-based database
to collect protein modification sites, including protein
phosphorylation sites from scientific literature as
well as high-throughput discovery programs. Currently,
PhosphoSitePlus contains 78,022 phosphorylation sites
(Hornbeck,
et al.
, 2004
).
3. PhosphoNET

:
PhosphoNET presently holds data on more than 74,000
phosphorylation sites in over 12,400 human proteins
that have been collected from the scientific literature
and other reputable websites. It features direct links
to several other useful websites, and will continue
to expand as a useful portal for phosphoproteomics
information.
4. HPRD
release 9


: HPRD currently contains information
for 16,972 PTMs which belong to various categories
with phosphorylation (10,858), dephosphorylation (3,118)
and glycosylation (1,860) forming the majority of
the annotated PTMs. At least one enzyme responsible
for PTMs has been annotated for 8,960 PTMs, which
resulted in the documentation of 7,253 enzyme - substrate
relationships (Keshava
Prasad, et al
., 2009
).
5. PHOSIDA


(Mirror
website


): a phosphorylation site database,
integrates thousands of high-confidence in vivo phosphosites
identified by mass spectrometry-based proteomics in
various species. For each phosphosite, PHOSIDA lists
matching kinase motifs, predicted secondary structures,
conservation patterns, and its dynamic regulation
upon stimulus. Using support vector machines, PHOSIDA
also predicts non-specific phosphosites (Gnad,
et al.
, 2007
; Gnad, et al.
, 2009
).

6. PhosphoPep
v2.0


: contains MS-derived phosphorylation
data from 4 different organisms, including fly (Drosophila
melanogaster
), human (Homo sapiens
),
worm (Caenorhabditis elegans
), and yeast
(Saccharomyces cerevisiae
) (Bodenmiller,
et al.
, 2008
).
7. PhosPhAt
3.0


: contains information on Arabidopsis
phosphorylation sites which were identified by mass
spectrometry in large scale experiments from different
research groups with 6,282 phosphopeptides (Heazlewood, et al.
, 2008
; Durek, et al.
, 2010
).
8. P(3)DB 1.1

:
provides a database of protein phosphorylation data
from multiple plants. The database was initially constructed
with a dataset from oilseed rape, including 14,670
nonredundant phosphorylation sites from 6382 substrate
proteins (Gao,
et al.
, 2009
).
9. Swiss-Prot
knowledge base


(Mirror
website


): for each protein annotation,
the "Amino acid modifications" in the "Sequence
annotation (Features)" section collected the
post-translational modification information of proteins
(Farriol-Mathis,
et al.
, 2004
).
10. dbPTM
2.0


: integrates experimentally verified
PTMs from several databases, and to annotate the predicted
PTMs on Swiss-Prot proteins (Lee,
et al.
, 2006
).
11. SysPTM
1.1


(Mirror
website


): provides a systematic and sophisticated
platform for proteomic PTM research, equipped not
only with a knowledge base of manually curated multi-type
modification data, but also with four fully developed,
in-depth data mining tools. (Li,
et al.
, 2009
).
12. PhosphoPOINT

:
is a comprehensive human kinase interactome and phospho-protein
database, containing 4195 phospho-proteins with a
total of 15,738 phosphorylation sites (Yang,
et al.
, 2008
).

13. NetworKIN
1.0

(NetworKIN-2.0
beta version

): is a method for predicting
in vivo
kinase-substrate relationships, that
augments consensus motifs with context for kinases
and phosphoproteins. It's a great resource and open
a door for computational discovering of phospho-regulatory
network (Linding,
et al.
, 2007
; Linding,
et al.
, 2008
).
14. Phospho3D

:
is a database of three-dimensional structures of phosphorylation
sites which stores information retrieved from the
phospho.ELM database and which is enriched with structural
information and annotations at the residue level (Zanzoni,
et al.
, 2007
).
15. PepCyber
:P~Pep 1.2


: is a database of human protein-protein
interactions mediated by 10 classes of phosphoprotein
binding domains (PPBDs) (Gong,
et al.
, 2008
).
16. PhosphoVariant

:
a database for human phosphovariants, which were defined
as genetic variations that change phosphorylation
sites or their interacting kinases (Ryu,
et al
., 2009
).

17. ProMEX

:
a mass spectral reference database for proteins and
protein phosphorylation sites, containing 4,557 manually
validated spectra associated with 4,226 unique peptides
from 1,367 proteins (Hummel,
et al
., 2009
).

18. PlantsP

:
contains more than 300 phosphorylation sites from Arabidopsis
thaliana plasma membrane proteins (Nühse,
et al
., 2009
).

19. LymPHOS

:
a phosphosite database of primary human T cells, with
342 phosphorylation sites mapping to more than 200 gene
sequences (Ovelleiro,
et al
., 2009
).

20. PhosSNP 1.0

: a genome-wide analysis of genetic polymorphisms that influence protein phosphorylation in H. Sapiens
.
It was estimated that ~69.76% of nsSNPs (non-synonymous SNPs) are
potential phosSNPs (Phosphorylation-related SNPs) (64, 035) in 17, 614
proteins (Ren, et al
., 2010
).

21. The Phosphorylation Site Database

:
provides ready access to information from the primary scientific
literature concerning those proteins from prokaryotic organisms, i.e.,
the members of the domains Archaea and Bacteria, that have been reported
to undergo covalent phosphorylation on the hydroxyl side chains of
serine, threonine, and/or tyrosine residues (Wurgler-Murphy, et al
., 2004
).

22. PhosphoGRID

: a database of experimentally verified in vivo phosphorylation sites curated from the S. cerevisiae
primary literature. PhosphoGRID records the positions of over 5000 specific phosphorylated residues on 1495 gene products. (Stark, et al
., 2004
).



<2> Prediction of non-specific
or organism-specific phosphorylation sites


:
1. NetPhos
2.0


: produces neural network predictions
for serine, threonine and tyrosine phosphorylation
sites in eukaryotic proteins (Blom,
et al.
, 1999
).
2. CRP

:
C
leaved R
adioactivity
of P
hosphopeptides. CRP performs
an in silico
proteolytic cleavage of the
sequence and reports the predicted Edman cycles in
which radioactivity would be observed if a given serine,
threonine or tyrosine will be phosphorylated (Mackey,
et al.
, 2003
).
3. DISPHOS
1.3


: uses disorder information to improve
the discrimination between phosphorylation and non-phosphorylation
sites, and predicts serine, threonine and tyrosine
phosphorylation sites in proteins (Iakoucheva,
et al.
, 2004
).
4. NetPhosYeast
1.0


: predicts serine and threonine phosphorylation
sites in yeast proteins (Ingrell,
et al.
, 2007)
.
5. NetPhosBac
1.0

: NetPhosBac 1.0 server predicts serine
and threonine phosphorylation sites in bacterial proteins
(Miller
et al
. 2009
).
6. PhosPhAt
3.0


: They utilized a set of 802
experimentally
validated serine phosphorylation sites as
the training data set in their 2.2 version, while with additional 1,818
threonine phosphorylation sites and 676 tyrosine sites in Arabidopsis
to develop their 3.0 predictor for phosphorylation sites in Arabidopsis
(Heazlewood,
et al.
, 2008
; Durek, et al.
, 2010
).
7. PHOSIDA


(Mirror
website


): a predictor based on more than
5,000 high confidence phosphosites, with the Support
vector machines (SVMs) algorithm (Gnad,
et al.
, 2007
).

8. GANNPhos

:
uses a genetic algorithm integrated neural network
(GANN) algorithm (Tang,
et al.
, 2007
). The
tool is not available.


9. PHOSITE

:
is based on the case-based sequence analysis (Koenig
and Grabe, 2004
). The
tool is not available.




<3> Prediction of kinase-specific
phosphorylation sites or phospho-binding motifs


:
1. GPS
2.1







:The
current version of GPS system. We renamed the tool
as the G
roup-based P
rediction S
ystem. GPS 2.1 software was implemented
in J***A and could predict kinase-specific phosphorylation
sites for 408 human Protein Kinases in hierarchy (Xue, et al.
, 2008
).
2. GPS
1.10




:
The old version of GPS. We designed a novel algorithm
GPS (Group-based Phosphorylation sites Prediction)
and construct an easy-to-use web server for the experimentalists
(Xue, et al.
, 2005
; Zhou, et al.
, 2004
).
3. PPSP
1.0







:We
also developed another online program for prediction
of kinase-specific phosphorylation sites, implemented
in Baysian Decision Theory (BDT) (Xue, et al
., 2006
).
4. ScanProsite

:
consists of documentation entries describing protein
domains, families and functional sites as well as
associated patterns and profiles to identify them
(de
Castro, et al.
, 2006
; Hulo,
et al.
, 2008
).
5. ELM

:
is a resource for predicting functional sites in eukaryotic
proteins (Puntervoll,
et al.
, 2003
).
6. Minimotif
Miner

: analyzes protein queries for the
presence of short functional motifs that, in at least
one protein, has been demonstrated to be involved
in posttranslational modifications (PTM), binding
to other proteins, nucleic acids, or small molecules,
or proteins trafficking (Balla,
et al.
, 2006
; Rajasekaran,
et al.
, 2009
).
7. PhosphoMotif
Finder


: contains known kinase/phosphatase
substrate as well as binding motifs that are curated
from the published literature. It reports the PRESENCE
of any literature-derived motif in the query sequence
(Amanchy,
et al.
, 2007
).
8. PREDIKIN
1.0


: produces a prediction of substrates
for serine/threonine protein kinases based on the
primary sequence of a protein kinase catalytic domain
(Brinkworth,
et al.
, 2003
).
9. Predikin
& PredikinDB 2.0


: consists of two
components: (i) PredikinDB, a database of phosphorylation
sites that links substrates to kinase sequences and
(ii) a Perl module, which provides methods to classify
protein kinases, reliably identify substrate-determining
residues, generate scoring matrices and score putative
phosphorylation sites in query sequences (Saunders,
et al.
, 2008
; Saunders
and Kobe, 2008
).
10. ScanSite
2.0


: searches for motifs within proteins
that are likely to be phosphorylated by specific protein
kinases or bind to domains such as SH2 domains, 14-3-3
domains or PDZ domains (Obenauer,
et al.
, 2003
).
11. NetPhosK
1.0


: produces neural network predictions
of kinase specific eukaryotic protein phosphoylation
sites. Currently NetPhosK covers the following kinases:
PKA, PKC, PKG, CKII, Cdc2, CaM-II, ATM, DNA PK, Cdk5,
p38 MAPK, GSK3, CKI, PKB, RSK, INSR, EGFR and Src
(Blom,
et al.
, 2004
).
12. PredPhospho
1.0


: implemented in SVM algorithm, could
predict kinase-specific phosphorylation sites for
4 kinase groups and 4 kinase families, respectively
(Kim,
et al.
, 2004
).
13. PredPhospho
2.0


: enhance version of PredPhospho predictor,
which was still implemented in SVM algorithm, for
7 kinase groups and 18 kinase families, respectively
(Ryu,
et al
., 2009
).
14. KinasePhos
1.0


: predicts kinase-specific phosphorylation
sites within given protein sequences. Profile Hidden
Markov Model (HMM) is applied for learning to each
group of sequences surrounding to the phosphorylation
residues (Huang,
et al.
, 2005
).
15. KinasePhos
2.0


: New version of kinase-specific phosphorylation
site prediction tool that is based the sequenece-based
amino acid coupling-pattern analysis and solvent accessibility
as new features of SVM (support vector machine) (Wong,
et al.
, 2007
).
16. PhoScan

:
predicts of kinase-specific phosphorylation sites
with sequence features by a log-odds ratio approach
(Li,
et al.
, 2007
).
17. pkaPS

:
Prediction of protein kinase A phosphorylation sites
using the simplified kinase binding model (Neuberger,
et al.
, 2007
).
18. CRPhos
0.8


: Prediction of kinase-specific phosphorylation
sites using conditional random fields. Its source
code is free for academic research and could be compiled
in Linux/Unix OS (Dang,
et al.
, 2008
).
19. AutoMotif
2.0


: allows for identification of PTM
(post-translational modification) sites, including
phosphorylation sites in proteins. The AutoMotif Server
2.0 was trained support vector machine (SVM) for each
type of PTM separately on proteins of the Swiss-Prot
database (version 42.0) (Plewczynski,
et al.
, 2005
; Plewczynski,
et al.
, 2008
).
20. MetaPredPS

:
Meta-predictors make predictions by organizing and
processing the predictions produced by several other
predictors in a defined problem domain (Wan,
et al.
, 2008
).
21. SMALI

:
searches for peptide ligands in human proteins that
are likely to bind to SH2 domains (Huang,
et al.
, 2008
; Li,
et al.
, 2008
).
22. NetPhorest

:
is a non-redundant collection of 125 sequence-based
classifiers for linear motifs in phosphorylation-dependent
signaling. The collection contains both family-based
and gene-specific classifiers (Miller,
et al.
, 2008
).
23. SiteSeek

:
is trained using a novel compact evolutionary and
hydrophobicity profile to detect possible protein
phosphorylation sites for a target sequence (Yoo,
et al.
, 2008
). The
tool is not available.


24. PostMod

:
is a predict sever for phosphorylation sites. The authors combined
physicochemical information, motif information, and evolutionary
information by simply comaparing sequence similarities, and could
predict phosphorylation sites for 48 different kinases (Jung, et al.
, 2010
).


<4> Miscellaneous tools


:

1. DOG
1.0






:prepares
publication-quality figures of protein domain structures.
The scale of a protein domain and the position of
a functional motif/site will be precisely calculated
(Ren, et al.
, 2009
).
2. Motif-X

:
is a software tool designed to extract overrepresented
patterns from any sequence data set. The algorithm
is an iterative strategy which builds successive motifs
through comparison to a dynamic statistical background
(Schwartz
and Gygi, 2005
).
3. Scan-X

:
is a software tool designed to find motifs (identified
using motif-x) within any sequence data set. The first
large scale scan was performed using all available
human, mouse, fly and yeast phosphorylation and acetylation
data to perform a scan for undiscovered sites (Schwartz,
et al.
, 2008
).
4. MoDL

:
finds mutliple motifs in a set of phosphorylated peptides
(Ritz,
et al.
, 2009
).
5. PhosphoBlast

:
allows the user to submit a protein query to search
against the curated dataset of phosphorylated peptides
(Wang
and Klemke, 2008
).
6. RLIMS-P

:
is a rule-based text-mining program specifically designed
to extract protein phosphorylation information on
protein kinase, substrate and phosphorylation sites
from the abstracts (Hu,
et al.
, 2005
; Yuan,
et al.
, 2006
).
7. KEA

:
Kinase enrichment analysis (KEA) is a web-based tool
with an underlying database providing users with the
ability to link lists of mammalian proteins/genes
with the kinases that phosphorylate them (Lachmann
and Ma'ayan, 2009
).


<5> Detection of potential
phosphorylation sites from mass spectrometry data



:

1. PhosphoScore

:
is a phosphorylation assignment program that is compatible
with all levels of tandem mass spectrometry spectra
(MSn) generated through the Bioworks/Sequest platform.
The program utilizes a “cost function” which takes
into account both the match quality and normalized
intensity of observed spectral peaks compared to a
theoretical spectrum. PhosphoScore was written in
Java (Ruttenberg,
et al.
, 2008
).
2. Ascore

:
measures the probability of correct phosphorylation
site localization based on the presence and intensity
of site-determining ions in MS/MS spectra (Beausoleil,
et al.
, 2006
).
3. Colander

:
a probability-based support vector machine algorithm
for automatic screening for CID spectra of phosphopeptides
prior to database search (Lu,
et al.
, 2008
).
4. DeBunker

:
a SVM-based software, which could automatically validate
phosphopeptide identifications from tandem mass spectra
(Lu,
et al.
, 2007
).
5. APIVASE
2.2

: was developed for phosphopeptide
validation by combining the information obtained from
MS2 spectra and its corresponding neutral loss MS3
spectra (Jiang,
et al.
, 2008
).
6. InsPecT

:
a new scoring function was developed for phosphorylated
peptide tandem mass spectra for ion-trap instruments,
without the need for manual validation (Payne,
et al.
, 2008
).
7. Phosphopeptide
FDR Estimator


: is designed for
analysis of phosphopeptide LC-MS/MS data (Du,
et al.
, 2008
). The
tool is not available.


8. PhosTShunter

:
a fast and reliable tool to detect phosphorylated
peptides in liquid chromatography Fourier transform
tandem mass spectrometry data sets (Kocher,
et al.
, 2006
). The
tool is not available.


9. PhosphoScan

:
a probability-based method for phosphorylation site
prediction using MS2/MS3 pair information (Wan,
et al.
, 2008
). The
tool is not available.


10. ArMone

:
a new scoring function was developed for phosphorylated peptide tandem
mass spectra for ion-trap instruments, without the need for manual
validation (Jiang, et al.
, 2010
).
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: