Andrew Ng's notes on learning theory
- goes over the bias / variance tradeoff.
- variance = when the model has a large testing error; large generalization error.
- bias = the expected generalization error even if the model is fit to a very large training set.
- proves that, with a sufficiently large training set, the training error will be the same as the fitting error.
- also gives an upper bound on the generalization error in terms of fitting error in terms of the number of models available (discrete number)
- this bound is only logarithmic in k, the number of hypotheses.
- the training size m that a certain method or algorithm requires in order to achieve a certain level of performance is the algorithm's sample complexity.
- shows that with infinite hypothesis space, the number of training examples needed is at most linear in the parameters of the model.
- goes over the Vapnik-Chervonenkis dimension = the size of the largest set that is shattered by a hypothesis space. = VC(H)
- A hypothesis space can shatter a set if it can realize any labeling (binary, i think) on the set of points in S. see his diagram.
- In oder to prove that VC(H) is at least D, only need to show that there's at least one set of size d that H can shatter.
- There are more notes in the containing directory - http://www.stanford.edu/class/cs229/notes/
|