Example Use of Polynomial Regression: Can be used to describe how pandemics and epidemics spread across territory or across population.
Linear Term is still used. This is because, linear is not in terms of the power of the IVs. Linear here is used to define if we can define y in a linear combination of the co-efficients.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
If I am a manager at Level 4, and I have been working as a manager for 4 years, then my salary will be higher than the base manager salary and lower than the country manager salary (Level 5). It will be something like Level 4.5. To get the salary for position 4.5, we will need a polynomial regression, because the salary won't increase linearly.
dataset = pd.read_csv('Position_Salaries.csv')
dataset.head()
| Position | Level | Salary | |
|---|---|---|---|
| 0 | Business Analyst | 1 | 45000 |
| 1 | Junior Consultant | 2 | 50000 |
| 2 | Senior Consultant | 3 | 60000 |
| 3 | Manager | 4 | 80000 |
| 4 | Country Manager | 5 | 110000 |
X = dataset.iloc[:, 1:-1].values # Level
y = dataset.iloc[:, -1].values
X[:5]
array([[1],
[2],
[3],
[4],
[5]])
from sklearn.linear_model import LinearRegression
lin_regressor = LinearRegression()
lin_regressor.fit(X, y)
LinearRegression()
We will create a new matrix, with the polynomial terms and then apply Multiple Linear Regression to it.
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree=2)
X_poly = poly_reg.fit_transform(X)
X_poly[:5]
array([[ 1., 1., 1.],
[ 1., 2., 4.],
[ 1., 3., 9.],
[ 1., 4., 16.],
[ 1., 5., 25.]])
lin_regressor_2 = LinearRegression()
lin_regressor_2.fit(X_poly, y)
LinearRegression()
plt.figure(figsize=(14, 10), dpi=120)
plt.grid(linestyle="--", alpha=0.7)
plt.scatter(X, y, color="red")
plt.plot(X, lin_regressor.predict(X))
plt.title("Truth or Bluff - Linear Regression", fontsize=14)
plt.xlabel("Position Level", fontsize=14)
plt.ylabel("Salary", fontsize=14)
plt.show()
plt.figure(figsize=(14, 10), dpi=120)
plt.grid(linestyle="--", alpha=0.7)
plt.scatter(X, y, color="red")
plt.plot(X, lin_regressor_2.predict(X_poly))
plt.title("Truth or Bluff - Polynomial Regression w/ n = 2", fontsize=14)
plt.xlabel("Position Level", fontsize=14)
plt.ylabel("Salary", fontsize=14)
plt.show()
poly_reg = PolynomialFeatures(degree=3)
X_poly = poly_reg.fit_transform(X)
lin_regressor_3 = LinearRegression()
lin_regressor_3.fit(X_poly, y)
LinearRegression()
plt.figure(figsize=(14, 10), dpi=120)
plt.grid(linestyle="--", alpha=0.7)
plt.scatter(X, y, color="red")
plt.plot(X, lin_regressor_3.predict(X_poly))
plt.title("Truth or Bluff - Polynomial Regression w/ n = 3", fontsize=14)
plt.xlabel("Position Level", fontsize=14)
plt.ylabel("Salary", fontsize=14)
plt.show()
poly_reg = PolynomialFeatures(degree=4)
X_poly = poly_reg.fit_transform(X)
lin_regressor_4 = LinearRegression()
lin_regressor_4.fit(X_poly, y)
LinearRegression()
plt.figure(figsize=(14, 10), dpi=120)
plt.grid(linestyle="--", alpha=0.7)
plt.scatter(X, y, color="red")
plt.plot(X, lin_regressor_4.predict(X_poly))
plt.title("Truth or Bluff - Polynomial Regression w/ n = 4", fontsize=14)
plt.xlabel("Position Level", fontsize=14)
plt.ylabel("Salary", fontsize=14)
plt.show()
X_grid = np.arange(min(X), max(X), 0.1)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.figure(figsize=(14, 10), dpi=120)
plt.grid(linestyle="--", alpha=0.7)
plt.scatter(X, y, color="red")
plt.plot(X_grid, lin_regressor_4.predict(poly_reg.fit_transform(X_grid)))
plt.title("Truth or Bluff - Polynomial Regression w/ n = 4", fontsize=14)
plt.xlabel("Position Level", fontsize=14)
plt.ylabel("Salary", fontsize=14)
plt.show()
lin_regressor.predict([[6.5]])
array([330378.78787879])
lin_regressor_4.predict(poly_reg.fit_transform([[6.5]]))
array([158862.45265158])