Function (Model):
Loss function:
The number of times f get incorrect results on training data.
Find the best function:
Example: Perceptron, SVM
Classification as Regression?
Binary classification as example :
Training: Class 1 means the target is 1; Class 2 means the target is -1
Testing: closer to 1 → class 1; closer to -1 → class 2
Penalize to the examples that are “too correct”
Multiple class: Class 1 means the target is 1; Class 2 means the target is 2; Class 3 means the target is 3 …… problematic
From one of the boxes,where does it come from?
Estimating the Probabilities From training data
Given an x, which class does it belong to
Input: vector x, output: probability of sampling x
The shape of the function determines by mean μ and covariance matrix Σ
The Gaussian with any mean μ and covariance matrix Σ can generate these points
Likelihood of a Gaussian with mean μ and covariance matrix Σ = the probability of the Gaussian samples x1,x2,x^3, …… ,x^79
Testing data: 47% accuracy
All: hp, att, sp att,
de, sp de, speed (6 features)
Modifying Model:
Function Set (Model):
Goodness of a function:
The mean μ and covariance Σ that maximizing the likelihood (the probability of generating data)
Find the best function: easy
Probability Distribution
Posterior Probability:
Usually people believe discriminative model is better
Benefit of generative model
With the assumption of probability distribution
less training data is needed
more robust to the noise
Priors and class-dependent probabilities can be estimated from different sources.