In the previous parts of the tutorial (part 1, part 2) we introduced quantitative indicators of classification model quality. In the next two parts we will take a closer look at a couple of graphical indicators. The first one is called the Confusion Matrix (the name “Contingency Table” is also used).
What is a Confusion Matrix?
Confusion Matrix is an N x N matrix, in which rows correspond to correct decision classes and the columns to decisions made by the classifier. The number ni,j at the intersection of i-th row and j-th column is equal to the number of cases from the i-th class which have been classified as belonging to the j-th class.
Forms of Confusion Matrices
Various forms of Confusion Matrices let us more easily observe certain characteristics of the classification (i.e. the cost incurred by incorrect classifications).
- Numerical form – contains counts of observations assigned to particular classes.
- Percentage form – contains the percentages of observations assigned to particular classes calculated as the ratio of the count of observations assigned to the class to the total observation count.
- Gains and losses form – contains information about gains and losses due to correct and incorrect classification decisions.
Confusion Matrix in the gains and losses form contains sums of costs due to classification decisions.
Cut off point and the Confusion Matrix
Cut off point is a certain threshold value which can be used to determine whether an observation belongs to a particular class.
if P(class(x)=1) >= alfa, then assign to class 1
alfa – the cut off point
P (class(x)=1) – probability, that the given element belongs to the class denoted by 1
If the probability (calculated by our classification model) that a given loan applicant will not be good at repaying the loan is greater or equal to 60%, then assign this applicant to the class of bad debtors, otherwise assign him/her to the class of good debtors.
Different cut off points can be considered for the same problem (i.e. assessing creditworthiness), which will lead to different confusion matrices. By analyzing these matrices the optimal cut off point can be selected.
Confusion Matrix – summary
- A simple and readable way of collecting classification results
- Makes assessment of classification quality easier
- Different forms of the Confusion Matrix can help in observing the required properties of the classifier
- Can be used to determine gains and losses due to classification