Sklearn Predict Function

This tutorial will show how to leverage the Python machine-learning models to predict outcomes using the Sklearn predict function.

So we will briefly summarise what the function accomplishes, review the syntax, and then provide examples of using this method with various machine learning models.

A Brief Overview of Sklearn Prediction

To understand the main function of the predict method, you must be familiar with the standard machine learning approach.

Even if a machine learning model is developed and deployed throughout several stages, there are just two:

  1. train the model for an algorithm
  2. use the model to predict values

It's a bit more complex than that. We often need to tune the model's parameters or design a new optimizer to improve the model's efficiency.

However, at an extreme level, we train the model over known data before using it to do other tasks like prediction.

A machine learning model is mostly used to predict or forecast new data results based on understanding the old data. ML algorithms are designed for the execution of successful value prediction.

Predicting Data using ML algorithms

Given a set of input values, any ML model's main aim is to forecast a quantity's value.

As an illustration, let's look at a model that predicts housing values. The inputs, also known as the features, might include information on the residences' postal code, the square foot of the house, the number of bedrooms, bathrooms, and a range of other amenities.

If we have trained a model over these features, we can feed it additional input data, which should produce an output of our use. Once we have trained our model with our observations, we can use the model to make predictions for completely new data.

Most of the ML models follow the same approach. Machine learning algorithms may be used to create predictions regarding topics like:

  • predict if a particular person with certain characteristics will respond to the supplied marketing campaign
  • predict if the email message from a certain source is "spam" or not
  • Predict if the given input image is of a cat, dog, or other object.

Making a forecast of some form is the primary goal for many machine learning algorithms, such as regression and classification models.

Syntax of Predict Method of Sklearn Library

As we are now aware of what the predict method of the sklearn library performs, let us see its syntax.

We will see the syntax of using the predict function. This means we will assume that you have already imported the scikit-learn library in the current workspace and have trained a model like LinearRegression, RandomForestRegressor, etc., over a dataset.

Sklearn Predict Syntax

We must use an instance of the machine learning model class that will be first trained over the training dataset while executing the predict method. For instance, Support Vector Machines, Decision Tree Regressor, Logistic Regression, and Linear Regression all use machine learning models provided by scikit-learn.

Using the "dot" syntax, we can use the predict method after the trained model:

trained_model.predict(X_test)

We supply the name of the variable which stores the data for whose values we want to predict using the trained model (i.e., we are predicting the values for the test dataset) within the method's parentheses.

As an example, we will use the LinearRegression object to execute a normal linear regression. The Sklearn.fit() approach will generate and train the learning model with regressor_model. Now to predict the values of certain input features, we will use the following command:

regressor_model.predict(input_features)

The Format of the Data to be used as input

Before continuing, just one more thing.

The X_test data must be provided to the predict() function as a 2-dimensional array. For example, all the features or the complete dataset should be stored in a 2-D Numpy array object.

If the X_test, the testing dataset, is in some other Python data structure and not in a 2-D array format, the Python interpreter will stop execution and will throw an error. After that, we have to convert, if not a Numpy array or reshape, if it has any other shape to pass it to the fit() method.

Implementing Python predict() Function

Loading the required dataset into the working environment should be our initial step. The dataset may be loaded from the system using the pandas.read_csv() method. We will use the built-in dataset of the sklearn library.

Using the train_test_split() method, we have divided the dataset into training and testing datasets.

Code

Output:

0.9733333333333334

Using the predict() Method with Decision Trees

We will now apply the Decision Tree method to the same dataset to predict the target labels of the test dataset.

Code

Output:

0.9777777777777777

Using predict() Function with KNN Algorithm

In this case, the dataset has been trained on the KNN model to make a prediction. We will follow the same steps, splitting the data into training and test data and then using the training data to train the KNeighborsRegressor().

Additionally, we will use the accuracy() method to find the model's accuracy for this dataset.

Code

Output:

0.9777777777777777





Latest Courses