Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

peiying410632
from peiying410632 More from this publisher
22.02.2024 Views

Data Preparation1 class CustomDataset(Dataset):2 def __init__(self, x, y):3 self.x = [torch.as_tensor(s).float() for s in x]4 self.y = torch.as_tensor(y).float().view(-1, 1)56 def __getitem__(self, index):7 return (self.x[index], self.y[index])89 def __len__(self):10 return len(self.x)1112 train_var_data = CustomDataset(var_points, var_directions)But this is not enough; if we create a data loader for our custom dataset and try toretrieve a mini-batch out of it, it will raise an error:train_var_loader = DataLoader(train_var_data, batch_size=16, shuffle=True)next(iter(train_var_loader))Output-----------------------------------------------------------------RuntimeErrorTraceback (most recent call last) in 1 train_var_loader = DataLoader(train_var_data, batch_size=16,shuffle=True)----> 2 next(iter(train_var_loader))...RuntimeError: stack expects each tensor to be equal size, but got [3, 2] at entry 0 and [4, 2] at entry 2It turns out, the data loader is trying to stack() together the sequences, which, aswe know, have different sizes and thus cannot be stacked together.We could simply pad all the sequences and move on with a TensorDataset andregular data loader. But, in that case, the final hidden states would be affected byVariable-Length Sequences | 665

the padded data points, as we’ve already discussed.We can do better than that: We can pack our mini-batches using a collate function.Collate FunctionThe collate function takes a list of tuples (sampled from a dataset using its__getitem__()) and collates them into a batch that’s being returned by the dataloader. It gives you the ability to manipulate the sampled data points in any wayyou want to make them into a mini-batch.In our case, we’d like to get all sequences (the first item in every tuple) and packthem. Besides, we can get all labels (the second item in every tuple) and make theminto a tensor that’s in the correct shape for our binary classification task:Data Preparation1 def pack_collate(batch):2 X = [item[0] for item in batch]3 y = [item[1] for item in batch]4 X_pack = rnn_utils.pack_sequence(X, enforce_sorted=False)56 return X_pack, torch.as_tensor(y).view(-1, 1)Let’s see the function in action by creating a dummy batch of two elements andapplying the function to it:# list of tuples returned by the datasetdummy_batch = [train_var_data[0], train_var_data[1]]dummy_x, dummy_y = pack_collate(dummy_batch)dummy_x666 | Chapter 8: Sequences

the padded data points, as we’ve already discussed.

We can do better than that: We can pack our mini-batches using a collate function.

Collate Function

The collate function takes a list of tuples (sampled from a dataset using its

__getitem__()) and collates them into a batch that’s being returned by the data

loader. It gives you the ability to manipulate the sampled data points in any way

you want to make them into a mini-batch.

In our case, we’d like to get all sequences (the first item in every tuple) and pack

them. Besides, we can get all labels (the second item in every tuple) and make them

into a tensor that’s in the correct shape for our binary classification task:

Data Preparation

1 def pack_collate(batch):

2 X = [item[0] for item in batch]

3 y = [item[1] for item in batch]

4 X_pack = rnn_utils.pack_sequence(X, enforce_sorted=False)

5

6 return X_pack, torch.as_tensor(y).view(-1, 1)

Let’s see the function in action by creating a dummy batch of two elements and

applying the function to it:

# list of tuples returned by the dataset

dummy_batch = [train_var_data[0], train_var_data[1]]

dummy_x, dummy_y = pack_collate(dummy_batch)

dummy_x

666 | Chapter 8: Sequences

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!