After you make predictions, you need to know if they are any good.

There are standard measures that we can use to summarize how good a set of predictions actually are.

Knowing how good a set of predictions is, allows you to make estimates about how good a given machine learning model of your problem,

In this tutorial, you will discover how to implement four standard prediction evaluation metrics from scratch in python.

After reading this tutorial, you will know:

How to implement classification accuracy. How to implement and interpret a confusion matrix. How to implement mean absolute error for regression. How to implement root mean squared error for regression.

Let’s get started.

How To Implement Machine Learning Algorithm Performance Metrics From Scratch With Python

Photo by Hernán Piera , some rights reserved.

Description

You must estimate the quality of a set of predictions when training a machine learning model.

Performance metrics like classification accuracy and root mean squared error can give you a clear objective idea of how good a set of predictions is, and in turn how good the model is that generated them.

This is important as it allows you to tell the difference and select among:

Different transforms of the data used to train the same machine learning model. Different machine learning models trained on the same data. Different configurations for a machine learning model trained on the same data.

As such, performance metrics are a required building block in implementing machine learning algorithms from scratch.

Tutorial

This tutorial is divided into 4 parts:

1. Classification Accuracy. 2. Confusion Matrix. 3. Mean Absolute Error. 4. Root Mean Squared Error.

These steps will provide the foundations you need to handle evaluating predictions made by machine learning algorithms.

1. Classification Accuracy

A quick way to evaluate a set of predictions on a classification problem is by using accuracy.

Classification accuracy is a ratio of the number of correct predictions out of all predictions that were made.

It is often presented as a percentage between 0% for the worst possible accuracy and 100% for the best possible accuracy.

accuracy = correct predictions / total predictions * 100

We can implement this in a function that takes the expected outcomes and the predictions as arguments.

Below is this function named accuracy_metric() that returns classification accuracy as a percentage. Notice that we use “==” to compare the equality actual to predicted values. This allows us to compare integers or strings, two main data types that we may choose to use when loading classification data.

# Calculate accuracy percentage between two lists defaccuracy_metric(actual, predicted): correct = 0 for i in range(len(actual)): if actual[i] == predicted[i]: correct += 1 return correct / float(len(actual)) * 100.0

We can contrive a small dataset to test this function. Below are a set of 10 actual and predicted integer values. There are two mistakes in the set of predictions.

actualpredicted 0 0 0 1 0 0 0 0 0 0 1 1 1 0 1 1 1 1 1 1

Below is a complete example with this dataset to test the accuracy_metric() function.

# Calculate accuracy percentage between two lists defaccuracy_metric(actual, predicted): correct = 0 for i in range(len(actual)): if actual[i] == predicted[i]: correct += 1 return correct / float(len(actual)) * 100.0 # Test accuracy actual = [0,0,0,0,0,1,1,1,1,1] predicted = [0,1,0,0,0,1,0,1,1,1] accuracy = accuracy_metric(actual, predicted) print(accuracy)

Running this example produces the expected accuracy of 80% or 8/10.

80.0

Accuracy is a good metric to use when you have a small number of class values, such as 2, also called a binary classification problem.

Accuracy starts to lose it’s meaning when you have more class values and you may need to review a different perspective on the results, such as a confusion matrix.

2. Confusion Matrix

A confusion matrix provides a summary of all of the predictions made compared to the expected actual values.

The results are presented in a matrix with counts in each cell. The counts of actual class values are summarized horizontally, whereas the counts of predictions for each class values are presented vertically.

A perfect set of predictions is shown as a diagonal line from the top left to the bottom right of the matrix.

The value of a confusion matrix for classification problems is that you can clearly see which predictions were wrong and the type of mistake that was made.

Let’s create a function to calculate a confusion matrix.

We can start off by defining the function to calculate the confusion matrix given a list of actual class values and a list of predictions.

The function is listed below and is named confusion_matrix() . It first makes a list of all of the unique class values and assigns each class value a unique integer or index into the confusion matrix.

The confusion matrix is always square, with the number of class values indicating the number of rows and columns required.

Here, the first index into the matrix is the row for actual values and the second is the column for predicted values. After the square confusion matrix is created and initialized to zero counts in each cell, it is a matter of looping through all predictions and incrementing the count in each cell.

The function returns two objects. The first is the set of unique class values, so that they can be displayed when the confusion matrix is drawn. The second is the confusion matrix itself with the counts in each cell.

# calculate a confusion matrix defconfusion_matrix(actual, predicted): unique = set(actual) matrix = [list() for x in range(len(unique))] for i in range(len(unique)): matrix[i] = [0 for x in range(len(unique))] lookup = dict() for i, valuein enumerate(unique): lookup[value] = i for i in range(len(actual)): x = lookup[actual[i]] y = lookup[predicted[i]] matrix[x][y] += 1 return unique, matrix

Let’s make this concrete with an example.

Below is another contrived dataset, this time with 3 mistakes.

actual predicted 0 0 0 1 0 1 0 0 0 0 1 1 1 0 1 1 1 1 1 1

We can calculate and print the confusion matrix for this dataset as follows:

# calculate a confusion matrix defconfusion_matrix(actual, predicted): unique = set(actual) matrix = [list() for x in range(len(unique))] for i in range(len(unique)): matrix[i] = [0 for x in range(len(unique))] lookup = dict() for i, valuein enumerate(unique): lookup[value] = i for i in range(len(actual)): x = lookup[actual[i]] y = lookup[predicted[i]] matrix[x][y] += 1 return unique, matrix # Test confusion matrix with integers actual = [0,0,0,0,0,1,1,1,1,1] predicted = [0,1,1,0,0,1,0,1,1,1] unique, matrix = confusion_matrix(actual, predicted) print(unique) print(matrix)

Running the example produces the output below. The example first prints the list of unique values and then the confusion matrix.

set([0, 1]) [[3, 2], [1, 4]]

It’s hard to interpret the results this way. It would help if we could display the matrix as intended with rows and columns.

Below is a function to correctly display

1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责；
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性，不作出任何保证或承若；
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。