Basics of RoC Curve and AUC Score

3 min readJun 20, 2021

RoC curve stands for Receiver Operating Characteristics Curve and AUC stands for Area Under Curve

To fully understand the RoC Curve and AUC Score, We need to understand the Confusion Matrix. Confusion matrix is a matrix with dimension 2X2 between Actual+/-, and Test+/-. We can interchange between True and Test though.

When a Actual +and- has the same values in Test it is termed as True Positive and True Negative respectively. Whereas, If an Actual+ is tested as a Test- then it is termed as False- and vice versa.

Confusion Matrix in terms of Diagnosis of a disease

Now, Sensitivity which shows True Positive fraction = TP/ (TP+FN) i.e. True Positive out of Total Actual Positive whereas Specificity which shows False Positive Fraction = FP/ (FP+TN) i.e. False Positive out of Total Actual Negative.

When we plot 1-Specificity on X-axis and True Positivity Rate on Y-axis, the plot obtained is called RoC curve.

Thus every point on RoC curve is actually a cut-off which reflects a different Confusion matrix even though you can’t see it but with TPR and TPF, We can get it.

To make RoC Curve from data, Sort the values and provide rank and link the Positive and Negative flag with them.

Calculate the TPR(True Positivity Rate) or Sensitivity and 1-Specificity for data points.

With the points of Sensitivity and 1-Specificity, We can plot RoC Curve in any tool available.

Coming to AUC or Area Under Curve

Various programs like python already have a dedicated library function for this from sklearn.metrics import roc_auc_score.

But, We can also calculate it using any tool, It is just the sum of all the areas between X-axis and a line connecting two adjacent points with the formula:

(Xk — Xk-1) * (Yk + Yk-1)/2

Analyzing RoC Curve:

A Perfect Test will have no overlap between Positive and Negative and hence, It will have 100% specificity and 100% sensitivity and the curve will touch the upper point on y-axis

whereas a worthless test will have 50% specificity and 50% sensitivity means overlap between all the datasets and the curve will just have a diagonal line

The closer the RoC Curve to y-axis better is the test performance

We can presumably say from above plot that Test A is better than Test B because True Positivity rate is higher that False Positive rate.

Written by Sukant Ranjan