Sklearn Predict Function

This tutorial will show how to leverage the Python machine-learning models to predict outcomes using the Sklearn predict function.

So we will briefly summarise what the function accomplishes, review the syntax, and then provide examples of using this method with various machine learning models.

A Brief Overview of Sklearn Prediction

To understand the main function of the predict method, you must be familiar with the standard machine learning approach.

Even if a machine learning model is developed and deployed throughout several stages, there are just two:

train the model for an algorithm
use the model to predict values

It's a bit more complex than that. We often need to tune the model's parameters or design a new optimizer to improve the model's efficiency.

However, at an extreme level, we train the model over known data before using it to do other tasks like prediction.

A machine learning model is mostly used to predict or forecast new data results based on understanding the old data. ML algorithms are designed for the execution of successful value prediction.

Predicting Data using ML algorithms

Given a set of input values, any ML model's main aim is to forecast a quantity's value.

As an illustration, let's look at a model that predicts housing values. The inputs, also known as the features, might include information on the residences' postal code, the square foot of the house, the number of bedrooms, bathrooms, and a range of other amenities.

If we have trained a model over these features, we can feed it additional input data, which should produce an output of our use. Once we have trained our model with our observations, we can use the model to make predictions for completely new data.

Most of the ML models follow the same approach. Machine learning algorithms may be used to create predictions regarding topics like:

predict if a particular person with certain characteristics will respond to the supplied marketing campaign
predict if the email message from a certain source is "spam" or not
Predict if the given input image is of a cat, dog, or other object.

Making a forecast of some form is the primary goal for many machine learning algorithms, such as regression and classification models.

Syntax of Predict Method of Sklearn Library

As we are now aware of what the predict method of the sklearn library performs, let us see its syntax.

We will see the syntax of using the predict function. This means we will assume that you have already imported the scikit-learn library in the current workspace and have trained a model like LinearRegression, RandomForestRegressor, etc., over a dataset.

Sklearn Predict Syntax

We must use an instance of the machine learning model class that will be first trained over the training dataset while executing the predict method. For instance, Support Vector Machines, Decision Tree Regressor, Logistic Regression, and Linear Regression all use machine learning models provided by scikit-learn.

Using the "dot" syntax, we can use the predict method after the trained model:

trained_model.predict(X_test)

We supply the name of the variable which stores the data for whose values we want to predict using the trained model (i.e., we are predicting the values for the test dataset) within the method's parentheses.

As an example, we will use the LinearRegression object to execute a normal linear regression. The Sklearn.fit() approach will generate and train the learning model with regressor_model. Now to predict the values of certain input features, we will use the following command:

regressor_model.predict(input_features)

The Format of the Data to be used as input

Before continuing, just one more thing.

The X_test data must be provided to the predict() function as a 2-dimensional array. For example, all the features or the complete dataset should be stored in a 2-D Numpy array object.

If the X_test, the testing dataset, is in some other Python data structure and not in a 2-D array format, the Python interpreter will stop execution and will throw an error. After that, we have to convert, if not a Numpy array or reshape, if it has any other shape to pass it to the fit() method.

Implementing Python predict() Function

Loading the required dataset into the working environment should be our initial step. The dataset may be loaded from the system using the pandas.read_csv() method. We will use the built-in dataset of the sklearn library.

Using the train_test_split() method, we have divided the dataset into training and testing datasets.

Code

# Python program to show how to use the predict() method to predict the values using the trained model

# Importing the required classes and dataset
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Loading and separating the independent and the dependent features of the dataset
X, Y = load_iris(return_X_y = True)

# Creating an instance of the Logistic Regression model class
logreg = LogisticRegression(random_state = 0)

# Fitting our load_iris dataset to the model
logreg.fit(X, Y)

# Predicting the values using the predict method in the model class
Y_pred = logreg.predict(X)

# Calculating the accuracy score of the model based on the predicted values and the true values of the dependent feature
score = accuracy_score(Y, Y_pred)
print(score)

Output:

0.9733333333333334

Using the predict() Method with Decision Trees

We will now apply the Decision Tree method to the same dataset to predict the target labels of the test dataset.

Code

# Python program to show how to use the predict() method with the decision tree model

# Importing the required classes and dataset
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Loading and separating the independent and the dependent features of the dataset
X, Y = load_iris(return_X_y = True)

# Splitting the dataset
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.3)

# Creating an instance of the Decision Tree model class
DT_model = DecisionTreeClassifier(max_depth = 5)

# Fitting our load_iris dataset to the model
DT_model.fit(X_train, Y_train)

# Predicting the values using the predict method in the model class
Y_pred = DT_model.predict(X_test)

# Calculating the accuracy score of the model based on the predicted values and the true values of the dependent feature
score = accuracy_score(Y_test, Y_pred)
print(score)

Output:

0.9777777777777777

Using predict() Function with KNN Algorithm

In this case, the dataset has been trained on the KNN model to make a prediction. We will follow the same steps, splitting the data into training and test data and then using the training data to train the KNeighborsRegressor().

Additionally, we will use the accuracy() method to find the model's accuracy for this dataset.

Code

# Python program to show how to use the predict() method with the KNN model

# Importing the required classes and dataset
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Loading and separating the independent and the dependent features of the dataset
X, Y = load_iris(return_X_y = True)

# Splitting the dataset
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.3)

# Creating an instance of the Decision Tree model class 
knn_model = KNeighborsClassifier()

# Fitting our load_iris dataset to the model
knn_model.fit(X_train, Y_train)

# Predicting the values using the predict method in the model class
Y_pred = knn_model.predict(X_test)

# Calculating the accuracy score of the model based on the predicted values and the true values of the dependent feature
score = accuracy_score(Y_test, Y_pred)
print(score)

Output: