09.05.2023 Views

pdfcoffee

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Regression

Let us imagine a simpler world where only the area of the house determines its

price. Using regression we could determine the relationship between the area of the

house (independent variable: these are the variables that do not depend upon any

other variables) and its price (dependent variable: these variables depend upon one

or more independent variables). Later, we could use this relationship to predict the

price of any house, given its area. To learn more about dependent and independent

variables and how to identify them you can refer to this post: http://www.aimldl.

org/ml/dependent_independent_variables.html. In machine learning, the

independent variables are normally input to the model and the dependent variables

are output from our model.

Depending upon the number of independent variables, the number of dependent

variables, and the relationship type, we have many different types of regression.

There are two important components of regression: the relationship between

independent and dependent variables, and the strength of impact of different

independent variables on dependent variables. In the following section, we

will learn in detail about the widely used linear regression technique.

Prediction using linear regression

Linear regression is one of the most widely known modeling techniques. Existing

for more than 200 years, it has been explored from almost all possible angles.

Linear regression assumes a linear relationship between the input variable (X) and

the output variable (Y). It involves finding a linear equation for predicted value Y

of the form:

Y hat

= W T X + b

Where X = {x 1

, x 2

, ..., x n

} are the n input variables, and W = { w 1

, w 2

, ...w n

} are the linear

coefficients, with b as the bias term. The bias term allows our regression model to

provide an output even in the absence of any input; it provides us with an option

to shift our data left or right to better fit the data. The error between the observed

values (Y) and predicted values (Y hat

) for an input sample i is:

e i

= Y i

- Y hati

The goal is to find the best estimates for the coefficients W and bias b, such that the

error between the observed values Y and the predicted values Y hat

is minimized. Let's

go through some examples in order to better understand this.

[ 88 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!