Boston Housing Kaggle Challenge with Linear Regression

Boston Housing Data

The database is kept by Carnegie Mellon University and was obtained from the StatLib library. The housing costs in Boston are the subject of this dataset. There are 506 instances and 13 features in the supplied dataset.

The following table shows the summary of the dataset, which was derived from the citation below. Our goal is to develop a model with this data utilizing linear regression to forecast the price of homes.

The following columns are present in the data:

  • Crime rate by town, expressed as "crim."
  • percentage of residential land that is zoned for lots larger than 25,000 square feet.
  • The percentage of non-retail business acres per town is called "indus."
  • Charles River dummy variable (= 1 if the tract boundaries the river; 0 otherwise) is called "Chas".
  • nitrogen oxides concentration: "nox" (parts per 10 million).
  • "rm" stands for "rooms per house on average."
  • Age: the percentage of owner-occupied homes constructed before 1940.
  • weighted average of the travel times to five Boston employment hubs is "dis."
  • "rad" stands for the accessibility index for radial highways.
  • Property tax calculated at the full value rate per $10,000.
  • Pupil-teacher ratios by town, or 'ptratio'
  • "Black": 1000(Bk - 0.63), where Bk is the share of black people in each town.
  • "lstat" stands for the population's lower status (percent).

P.S. I am still learning how and where to interpret the graphs; this is my first analysis.

Code:

Output:

Boston Housing Kaggle Challenge with Linear Regression

Input:

Output:

(506, 13 )

Input:

Output:

Array[ 'crim', 'zn', 'indus' ,'chas', 'nox' , 'rm', 'age', 'dis', 'rad', 'tax', 'ptratio' ]

Data conversion to nd-array to info frame and feature names addition

Input:

Output:

Boston Housing Kaggle Challenge with Linear Regression

Input:

Output:

(506, )

Input:

Output:

Boston Housing Kaggle Challenge with Linear Regression

Obtaining input and output data, then dividing the data into training and testing datasets.

Output:

atrain shape flow: (403, 13)
atest shape flow: (102, 13)
btrain shape flow: (404, )
btest shape flow: (102, )

utilising the dataset and a linear regression model to anticipate prices.

Plotting a scatter graph to display the 'y true' value vs 'y pred' value will show the prediction results.

Output:

Boston Housing Kaggle Challenge with Linear Regression

Mean Squared Error & Mean Absolute Error are the results of linear regression.

Output:

Mean Square Error is :  33.4489799151161496
Mean Absolute Error is :  3.8429092484151966

Boston Housing Kaggle Challenge with Linear Regression

As a result, the accuracy of our model is just 66.55%. The prepared model is therefore not particularly effective in forecasting home prices. Using a wide range of additional machine learning methods and approaches, one can enhance the prediction outcomes.






Latest Courses