The Arabidopsis Protein Phosphorylation Site Database
Home Bulk Downloads Data submission Background Database P-Hotspots Kinome Facts & Figures Imprint
 
System requirements:
  • JavaScript enabled
  • Windows, Mac or Linux
  • Firefox (1.5+), Opera 9+, IE 7+, Safari 3.1+, Google Chrome 1.0+
Logo of MPG Logo of MPIMP

Prediction of Phosphorylation Hotspots

Training data set

Experimentally determined phosphorylation hotspots: hotspots-experimental_w17_d10.txt

Resulting positive and negative data sets as sequences and vectors:

negative sequences
positive sequences
all resulting vectors
vectors training set 1
vectors training set 2
vectors training set 3
vectors training set 4
vectors training set 5
vectors test set 1
vectors test set 2
vectors test set 3
vectors test set 4
vectors test set 5

Prediction

Raw data of predictions

For each of the 12.866.960 Windows within the Arabidopsis proteome a prediction score was determined. It is possible that a window receives a positive score even if it does not contain phosphorylatable amino acids, as within the training dataset the fraction of S, T and Y were not used as additional parameters. This was done, to maintain an unbiased view at this step, and only upon consolidation the S, T and Y content was consisdered.

All 12.866.960 Scores

Consolidation step 1: Runs

Consecutive windows with positive scores (without interruption by a stretch of negaive scores) were consolidated to one "run". For each "run" it was then checked if conditions of a phosphorylation hotspot are met: A phosphorylation hotspot was defined as containing 4 phosphorylatable amino acids (S, T or Y) which were not further apart than 10 amino acids.

If there was less than 17 amino acids (a window size) between two "runs", these two runs were overlapping. For example: if windows of amino acid positions 1-17, 2-18; 3-19; 4-20 and 5-21 were predicted with positive score, they will form a "run" (positions 1-21). If the next window (6-22) has a negative score, and the following windows again form a stretch of positive scores (7-23; 8-24; 9-25; = "run" 9-25), these two runs overlap on the positions 7-21.

Runs for SVM (score>0)

Runs für die SVM (score>1)

Consolidation step 2: From Runs zu Hotspots

Overlapping "runs" were consolidated to Hotspot regions. In the example above, the two "runs" 1-21 and 9-25 would be combined to a single hotspot 1-25.

predicted Hotspots (Score > 0)

predicted Hotspots (Score > 1)

Statistics

Number of Windows Windows with score >0 Valid windows with score >0 Valid runs with score >0 consolidated Hotspots with score >0 Windows with score >1 Valid windows with score >1 (at least 4 STY) Valid runs with score >1 consolidated Hotspots with score >1
12866960 945670 592681 1563102 54329 160780 (sic!) 102664 338681 13677

An overview of the score distributions (i.e. how many windows were predicted with which score) can be found here: window-score-distribution.ods