pdfcoffee

soumyasankar99
from soumyasankar99 More from this publisher
09.05.2023 Views

Chapter 3Predicting house price using linearregressionNow that we have the basics covered, let us apply these concepts to a real dataset.We will consider the Boston housing price dataset (http://lib.stat.cmu.edu/datasets/boston) collected by Harrison and Rubinfield in 1978. The datasetcontains 506 sample cases. Each house is assigned 14 attributes:• CRIM – per capita crime rate by town• ZN – proportion of residential land zoned for lots over 25,000 sq.ft.• INDUS – proportion of non-retail business acres per town• CHAS – Charles River dummy variable (1 if tract bounds river; 0 otherwise)• NOX – nitric oxide concentration (parts per 10 million)• RM – average number of rooms per dwelling• AGE – proportion of owner-occupied units built prior to 1940• DIS – weighted distances to five Boston employment centers• RAD – index of accessibility to radial highways• TAX – full-value property-tax rate per $10,000• PTRATIO – pupil-teacher ratio by town• B – 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town• LSTAT – percentage of lower status citizens in the population• MEDV – median value of owner-occupied homes in $1,000sThe authors and Packt Publishing do not endorse the historical useof the racially-based attribute presented above. Note that this isa classical dataset, and the attributes are listed as per the originalwork by Harrison and Rubinfield, 1978.We will use the TensorFlow estimator to build the linear regression model.1. Import the modules required:import tensorflow as tfimport pandas as pdimport tensorflow.feature_column as fcfrom tensorflow.keras.datasets import boston_housing[ 97 ]

Regression2. Download the dataset:(x_train, y_train), (x_test, y_test) = boston_housing.load_data()3. Now let us define the features in our data, and for easy processing andvisualization convert it into pandas DataFrame:features = ['CRIM', 'ZN','INDUS','CHAS','NOX','RM','AGE','DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT']x_train_df = pd.DataFrame(x_train, columns= features)x_test_df = pd.DataFrame(x_test, columns= features)y_train_df = pd.DataFrame(y_train, columns=['MEDV'])y_test_df = pd.DataFrame(y_test, columns=['MEDV'])x_train_df.head()4. At present we are taking all the features; we suggest that you check thecorrelation among different features and the predicted label MEDV tochoose the best features and repeat the experiment:feature_columns = []for feature_name in features:feature_columns.append(fc.numeric_column(feature_name,dtype=tf.float32))5. We create the input function for the estimator. The function returns thetf.Data.Dataset object with a tuple: features and labels in batches. Use itto create train_input_fn and val_input_fn:def estimator_input_fn(df_data, df_label, epochs=10, shuffle=True,batch_size=32):def input_function():ds = tf.data.Dataset.from_tensor_slices((dict(df_data), df_label))if shuffle:ds = ds.shuffle(100)ds = ds.batch(batch_size).repeat(epochs)return dsreturn input_functiontrain_input_fn = estimator_input_fn(x_train_df, y_train_df)val_input_fn = estimator_input_fn(x_test_df, y_test_df, epochs=1,shuffle=False)[ 98 ]

Chapter 3

Predicting house price using linear

regression

Now that we have the basics covered, let us apply these concepts to a real dataset.

We will consider the Boston housing price dataset (http://lib.stat.cmu.edu/

datasets/boston) collected by Harrison and Rubinfield in 1978. The dataset

contains 506 sample cases. Each house is assigned 14 attributes:

• CRIM – per capita crime rate by town

• ZN – proportion of residential land zoned for lots over 25,000 sq.ft.

• INDUS – proportion of non-retail business acres per town

• CHAS – Charles River dummy variable (1 if tract bounds river; 0 otherwise)

• NOX – nitric oxide concentration (parts per 10 million)

• RM – average number of rooms per dwelling

• AGE – proportion of owner-occupied units built prior to 1940

• DIS – weighted distances to five Boston employment centers

• RAD – index of accessibility to radial highways

• TAX – full-value property-tax rate per $10,000

• PTRATIO – pupil-teacher ratio by town

• B – 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town

• LSTAT – percentage of lower status citizens in the population

• MEDV – median value of owner-occupied homes in $1,000s

The authors and Packt Publishing do not endorse the historical use

of the racially-based attribute presented above. Note that this is

a classical dataset, and the attributes are listed as per the original

work by Harrison and Rubinfield, 1978.

We will use the TensorFlow estimator to build the linear regression model.

1. Import the modules required:

import tensorflow as tf

import pandas as pd

import tensorflow.feature_column as fc

from tensorflow.keras.datasets import boston_housing

[ 97 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!