Precision, Recall and Friends for Dummies

Qualitative descriptions of popular machine learning and statistical analysis jargon

Yuval Greenfield
2 min readDec 18, 2018

--

In all of the following the scale is from 0 to 1 where a better classifier gets 1 and a bad classifier gets 0.

Precision

  • The quality of each positive prediction. Given a positive prediction — what are the odds that it was correct.
  • Σ True positive / Σ Predicted condition positive

Recall = Sensitivity = True Positive Rate

  • Probability of marking a positive as such. What percentage of the positives was correctly identified. How good the classifier is at avoiding false negatives.
  • Σ True positive / Σ Condition positive
  • 1 - (Σ False negatives / Σ Condition positive)

Specificity = Selectivity = True negative rate

  • Probability of marking a negative as such. What percentage of the negatives was correctly identified.
  • Extra important for medical diagnostics because you don’t want to tell someone they have cancer when they don’t. Practically useless when evaluating a search engine because it’s easy to have billions of true negatives and a 99.999% specificity in that case.
  • Σ True negative / Σ Condition negative

F1 Score = Sørensen–Dice coefficient

  • How perfect is the classifier while ignoring the amount of true negatives. How much overlap there is between the classified-positive and the actually-positive group. The harmonic mean between precision and recall.

In the next post I’ll show the proof for F1 = DSC and show some intuitions for it alongside a visualization.

Identity Equations

You can infer these from the confusion matrix rows and columns. They’re useful for example to prove F1 = DSC.

  • Σ True positive + Σ False negative = Σ Condition positive
  • Σ True negative + Σ False positive = Σ Condition negative
  • Σ True negative + Σ True positive + Σ False negative + Σ False positive = Σ All samples

References

--

--