Function (Model):

Loss function:

The number of times f get incorrect results on training data.
Find the best function:
Example: Perceptron, SVM
Classification as Regression?
Binary classification as example :
Training: Class 1 means the target is 1; Class 2 means the target is -1
Testing: closer to 1 → class 1; closer to -1 → class 2

Penalize to the examples that are “too correct”
Multiple class: Class 1 means the target is 1; Class 2 means the target is 2; Class 3 means the target is 3 …… problematic

From one of the boxes,where does it come from?

Estimating the Probabilities From training data

Given an x, which class does it belong to



Input: vector x, output: probability of sampling x
The shape of the function determines by mean μ and covariance matrix Σ




The Gaussian with any mean μ and covariance matrix Σ can generate these points

Likelihood of a Gaussian with mean μ and covariance matrix Σ = the probability of the Gaussian samples x1,x2,x^3, …… ,x^79








Testing data: 47% accuracy
All: hp, att, sp att,
de, sp de, speed (6 features)

Modifying Model:




Function Set (Model):

Goodness of a function:
The mean μ and covariance Σ that maximizing the likelihood (the probability of generating data)
Find the best function: easy
Probability Distribution

Posterior Probability:





















Usually people believe discriminative model is better
Benefit of generative model
With the assumption of probability distribution
less training data is needed
more robust to the noise
Priors and class-dependent probabilities can be estimated from different sources.






