pdfcoffee
Chapter 3Predicting house price using linearregressionNow that we have the basics covered, let us apply these concepts to a real dataset.We will consider the Boston housing price dataset (http://lib.stat.cmu.edu/datasets/boston) collected by Harrison and Rubinfield in 1978. The datasetcontains 506 sample cases. Each house is assigned 14 attributes:• CRIM – per capita crime rate by town• ZN – proportion of residential land zoned for lots over 25,000 sq.ft.• INDUS – proportion of non-retail business acres per town• CHAS – Charles River dummy variable (1 if tract bounds river; 0 otherwise)• NOX – nitric oxide concentration (parts per 10 million)• RM – average number of rooms per dwelling• AGE – proportion of owner-occupied units built prior to 1940• DIS – weighted distances to five Boston employment centers• RAD – index of accessibility to radial highways• TAX – full-value property-tax rate per $10,000• PTRATIO – pupil-teacher ratio by town• B – 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town• LSTAT – percentage of lower status citizens in the population• MEDV – median value of owner-occupied homes in $1,000sThe authors and Packt Publishing do not endorse the historical useof the racially-based attribute presented above. Note that this isa classical dataset, and the attributes are listed as per the originalwork by Harrison and Rubinfield, 1978.We will use the TensorFlow estimator to build the linear regression model.1. Import the modules required:import tensorflow as tfimport pandas as pdimport tensorflow.feature_column as fcfrom tensorflow.keras.datasets import boston_housing[ 97 ]
Regression2. Download the dataset:(x_train, y_train), (x_test, y_test) = boston_housing.load_data()3. Now let us define the features in our data, and for easy processing andvisualization convert it into pandas DataFrame:features = ['CRIM', 'ZN','INDUS','CHAS','NOX','RM','AGE','DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT']x_train_df = pd.DataFrame(x_train, columns= features)x_test_df = pd.DataFrame(x_test, columns= features)y_train_df = pd.DataFrame(y_train, columns=['MEDV'])y_test_df = pd.DataFrame(y_test, columns=['MEDV'])x_train_df.head()4. At present we are taking all the features; we suggest that you check thecorrelation among different features and the predicted label MEDV tochoose the best features and repeat the experiment:feature_columns = []for feature_name in features:feature_columns.append(fc.numeric_column(feature_name,dtype=tf.float32))5. We create the input function for the estimator. The function returns thetf.Data.Dataset object with a tuple: features and labels in batches. Use itto create train_input_fn and val_input_fn:def estimator_input_fn(df_data, df_label, epochs=10, shuffle=True,batch_size=32):def input_function():ds = tf.data.Dataset.from_tensor_slices((dict(df_data), df_label))if shuffle:ds = ds.shuffle(100)ds = ds.batch(batch_size).repeat(epochs)return dsreturn input_functiontrain_input_fn = estimator_input_fn(x_train_df, y_train_df)val_input_fn = estimator_input_fn(x_test_df, y_test_df, epochs=1,shuffle=False)[ 98 ]
- Page 81 and 82: Neural Network Foundations with Ten
- Page 83 and 84: Neural Network Foundations with Ten
- Page 86 and 87: TensorFlow 1.x and 2.xThe intent of
- Page 88 and 89: An example to start withWe'll consi
- Page 90 and 91: Chapter 23. Placeholders: Placehold
- Page 92 and 93: • To create random values from a
- Page 94 and 95: To know the value, we need to creat
- Page 96 and 97: Chapter 2Both PyTorch and TensorFlo
- Page 98 and 99: Chapter 2state = [tf.zeros([100, 10
- Page 100 and 101: Chapter 2For now, there's no need t
- Page 102 and 103: Chapter 2Let's see an example of a
- Page 104 and 105: Chapter 2If you want to save a mode
- Page 106 and 107: Chapter 2supervised=True)train_data
- Page 108 and 109: Chapter 2There, tf.feature_column.n
- Page 110 and 111: Chapter 2print (dz_dx)print (dy_dx)
- Page 112 and 113: Chapter 2In our toy example we use
- Page 114 and 115: Chapter 2For multi-machine training
- Page 116 and 117: Chapter 25. Use tf.layers modules t
- Page 118 and 119: Chapter 2Keras or tf.keras?Another
- Page 120: • tf.data can be used to load mod
- Page 123 and 124: RegressionLet us imagine a simpler
- Page 125 and 126: RegressionTake a look at the last t
- Page 127 and 128: Regression3. Now, we calculate the
- Page 129 and 130: RegressionIn the next section we wi
- Page 131: Regression2. Now, we define the fea
- Page 135 and 136: RegressionThe following is the Tens
- Page 137 and 138: RegressionIn regression the aim is
- Page 139 and 140: RegressionThe Estimator outputs the
- Page 141 and 142: RegressionThe following is the grap
- Page 143 and 144: RegressionReferencesHere are some g
- Page 145 and 146: Convolutional Neural NetworksIn thi
- Page 147 and 148: Convolutional Neural NetworksIn thi
- Page 149 and 150: Convolutional Neural NetworksIn oth
- Page 151 and 152: Convolutional Neural NetworksThen w
- Page 153 and 154: Convolutional Neural NetworksHoweve
- Page 155 and 156: Convolutional Neural NetworksPlotti
- Page 157 and 158: Convolutional Neural NetworksIn gen
- Page 159 and 160: Convolutional Neural NetworksOur ne
- Page 161 and 162: Convolutional Neural NetworksThese
- Page 163 and 164: Convolutional Neural NetworksSo, we
- Page 165 and 166: Convolutional Neural NetworksEach i
- Page 167 and 168: Convolutional Neural NetworksVery d
- Page 169 and 170: Convolutional Neural NetworksRecogn
- Page 171 and 172: Convolutional Neural NetworksIf we
- Page 173 and 174: Convolutional Neural NetworksRefere
- Page 175 and 176: Advanced Convolutional Neural Netwo
- Page 177 and 178: Advanced Convolutional Neural Netwo
- Page 179 and 180: Advanced Convolutional Neural Netwo
- Page 181 and 182: Advanced Convolutional Neural Netwo
Chapter 3
Predicting house price using linear
regression
Now that we have the basics covered, let us apply these concepts to a real dataset.
We will consider the Boston housing price dataset (http://lib.stat.cmu.edu/
datasets/boston) collected by Harrison and Rubinfield in 1978. The dataset
contains 506 sample cases. Each house is assigned 14 attributes:
• CRIM – per capita crime rate by town
• ZN – proportion of residential land zoned for lots over 25,000 sq.ft.
• INDUS – proportion of non-retail business acres per town
• CHAS – Charles River dummy variable (1 if tract bounds river; 0 otherwise)
• NOX – nitric oxide concentration (parts per 10 million)
• RM – average number of rooms per dwelling
• AGE – proportion of owner-occupied units built prior to 1940
• DIS – weighted distances to five Boston employment centers
• RAD – index of accessibility to radial highways
• TAX – full-value property-tax rate per $10,000
• PTRATIO – pupil-teacher ratio by town
• B – 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
• LSTAT – percentage of lower status citizens in the population
• MEDV – median value of owner-occupied homes in $1,000s
The authors and Packt Publishing do not endorse the historical use
of the racially-based attribute presented above. Note that this is
a classical dataset, and the attributes are listed as per the original
work by Harrison and Rubinfield, 1978.
We will use the TensorFlow estimator to build the linear regression model.
1. Import the modules required:
import tensorflow as tf
import pandas as pd
import tensorflow.feature_column as fc
from tensorflow.keras.datasets import boston_housing
[ 97 ]