27.12.2012 Views

l - People

l - People

l - People

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

!<br />

!<br />

x HingeMaster = x" # y . [6]<br />

2<br />

To obtain λ, we minimized the quantity ( x " ! y)<br />

. The least squares regression<br />

methodology is a standard one[82] which will not be derived here. The result is that:<br />

T ! 1 T<br />

" = ( x x)<br />

x y<br />

[7]<br />

The above Equation 7 can be said to train " based on predictor output and gold standard<br />

annotation over some set of residues i. The best available value of<br />

!<br />

fitted using the set of all residues in all proteins in the HAG, which we designate as<br />

179<br />

" is likely to be one<br />

!<br />

{HAG}. That is to say, in Equation 7 we use x, y(i | i " {HAG}) and obtain a particular<br />

value of " which we call " HAG .<br />

!<br />

!<br />

Cross-validating HingeMaster parameters<br />

!<br />

The HingeMaster cannot be tested on the same dataset used to train it. This is because<br />

given sufficient predictors it is possible to fit the test results perfectly to the data, with no<br />

guarantee that similar performance will be obtained on different proteins. This is referred<br />

to as overfitting. To avoid this, we train and validate the predictor by first randomly<br />

separating the 20 homologous pairs of proteins in HAG into a training set consisting of<br />

15 of these pairs (30 total proteins) and a test set consisting of the remaining 5<br />

pairs. The set of all residues in all proteins in the training set we call {TRAINING} ,<br />

!

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!