Experimentally determined phosphorylation hotspots: hotspots-experimental_w17_d10.txt
Resulting positive and negative data sets as sequences and vectors:
negative sequences
positive sequences
all resulting vectors
vectors training set 1
vectors training set 2
vectors training set 3
vectors training set 4
vectors training set 5
vectors test set 1
vectors test set 2
vectors test set 3
vectors test set 4
vectors test set 5
For each of the 12.866.960 Windows within the Arabidopsis proteome a prediction score was determined. It is possible that a window receives a positive score even if it does not contain phosphorylatable amino acids, as within the training dataset the fraction of S, T and Y were not used as additional parameters. This was done, to maintain an unbiased view at this step, and only upon consolidation the S, T and Y content was consisdered.
Consecutive windows with positive scores (without interruption by a stretch of negaive scores) were consolidated to one "run". For each "run" it was then checked if conditions of a phosphorylation hotspot are met: A phosphorylation hotspot was defined as containing 4 phosphorylatable amino acids (S, T or Y) which were not further apart than 10 amino acids.
If there was less than 17 amino acids (a window size) between two "runs", these two runs were overlapping. For example: if windows of amino acid positions 1-17, 2-18; 3-19; 4-20 and 5-21 were predicted with positive score, they will form a "run" (positions 1-21). If the next window (6-22) has a negative score, and the following windows again form a stretch of positive scores (7-23; 8-24; 9-25; = "run" 9-25), these two runs overlap on the positions 7-21.
Overlapping "runs" were consolidated to Hotspot regions. In the example above, the two "runs" 1-21 and 9-25 would be combined to a single hotspot 1-25.
predicted Hotspots (Score > 0)
predicted Hotspots (Score > 1)
Number of Windows | Windows with score >0 | Valid windows with score >0 | Valid runs with score >0 | consolidated Hotspots with score >0 | Windows with score >1 | Valid windows with score >1 (at least 4 STY) | Valid runs with score >1 | consolidated Hotspots with score >1 |
12866960 | 945670 | 592681 | 1563102 | 54329 | 160780 (sic!) | 102664 | 338681 | 13677 |
An overview of the score distributions (i.e. how many windows were predicted with which score) can be found here: window-score-distribution.ods