Simple RNN with more than one layer in Pytorch for squential prediction

Question

I got sequential time series data. At each time stamp, there is only variable to observe (if my understanding is correct this means number of features = 1). I want to train a simple RNN with more than one layer to predict the next observation.

I created training data using sliding window, with window size set to 8. To give a concrete idea, below is my original data, training data and target .

Sample Data

0.40 0.82 0.14 0.01 0.98 0.53 2.5 0.49 0.53 3.37 0.49

Training Data

X = 
    0.40 0.82 0.14 0.01 0.98 0.53 2.5 0.49 
    0.82 0.14 0.01 0.98 0.53 2.5 0.49 0.53
    0.14 0.01 0.98 0.53 2.5 0.49 0.53 3.37

corresponding targets are

I set the batch size to 3. But it gives me an error saying

RuntimeError: input.size(-1) must be equal to input_size. Expected 8, got 1

import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as data
import numpy as np

X = np.array( [ [0.40, 0.82, 0.14, 0.01, 0.98, 0.53, 2.5, 0.49], [0.82, 0.14, 0.01, 0.98, 0.53, 2.5, 0.49, 0.53], [0.14, 0.01, 0.98, 0.53, 2.5, 0.49, 0.53, 3.37] ], dtype=np.float32)

Y = np.array([[0.53], [3.37], [0.49]], dtype=np.float32)

class RNNModel(nn.Module):
    def __init__(self, input_sz, n_layers):
        super(RNNModel, self).__init__()
        self.hidden_dim = 3*input_sz
        self.n_layers = n_layers
        output_sz = 1
        self.rnn = nn.RNN(input_sz, self.hidden_dim, num_layers=n_layers, batch_first=True)
        self.linear = nn.Linear(self.hidden_dim, output_sz)

    def forward(self,x):
        batch_sz = x.size(0)
        hidden = torch.zeros(self.n_layers, batch_sz, self.hidden_dim) #initialize n_layer*batch_sz number of hidden states of dimension hidden_dim)
        out, hidden = self.rnn(x, hidden)
        out = out.contiguous().view(-1, self.hidden_dim)
        return out,hidden

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = RNNModel(8,2)
X = torch.tensor(X[:,:,np.newaxis])
Y = torch.tensor(Y[:,:,np.newaxis])
X = X.to(device)
Y = Y.to(device)
model = model.to(device)
optimizer = optim.Adam(model.parameters())
loss_fn = nn.MSELoss()

loader = data.DataLoader(data.TensorDataset(X, Y), shuffle=False, batch_size=3)

n_epoch = 10
for epoch in range(n_epoch):
    model.train()
    for X_batch, Y_batch in loader:
        Y_pred = model(X_batch)
        loss = loss_fn(Y_pred,Y_batch)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    if epoch % 10 != 0:
        continue
        model.eval()
        with torch.no_grad():
            Y_pred = model(X)
            train_rmse = np.sqrt(loss_fn(Y_pred,Y))
        print("Epoch %d: train RMSE %.4f" % (epoch, train_rmse))

What am I doing wrong? Can anyone help me?

Arunbh Yashaswi · Accepted Answer · 2024-01-19 20:17:19Z

1

Few Suggestion to start with

X = X.reshape((3, 8, 1)) 
X = torch.tensor(X)

reshapes X to have a shape of (batch_size, seq_len, input_size)

model = RNNModel(1,2)

Due to our above change now we are taking only one input

out = self.linear(out[:, -1, :])

You should be concerned with the output of last timestep, currently this could hinder your final prediction as this can lead to inclusion of all timesteps outputs.

These are few things which I can suggest from just seeing the piece of segment once we run it we might face other issues too.

answered Jan 19, 2024 at 20:17

Arunbh Yashaswi

1,2157 silver badges13 bronze badges

can you please elaborate on "You should be concerned with the output of last timestep, currently this could hinder your final prediction as this can lead to inclusion of all timesteps outputs". Also, out = self.linear(out[:, -1, :]) where this code should be placed and what it will do?
– Shew
Commented Jan 19, 2024 at 21:01

Add a comment |

Ro.oT · Accepted Answer · 2024-01-19 20:31:57Z

1

The issue actually originates from this line:

X = torch.tensor(X[:,:,np.newaxis])

where you are changing the shape of your input tensor X from [3, 8] to [3, 8, 1]. RNN() expects batched input in the format of [N, L, Hin]: Now this H_in should match the input_size parameter of nn.RNN() that you defined here:

self.rnn = nn.RNN(input_sz, self.hidden_dim, num_layers=n_layers, batch_first=True)

when you are initializing your model with model = RNNModel(8,2), you're setting the input_size of nn.RNN() layer to be 8 which is mismatching with your input. So just update the input with:

X = torch.tensor(X[:,np.newaxis,:]) # torch.Size([3, 1, 8])
Y = torch.tensor(Y[:,np.newaxis,:]) # torch.Size([3, 1, 1])

and it should work fine.

NOTE: You are returning both out and hidden from your forward function which would throw the below error:

---> 12         loss = loss_fn(Y_pred,Y_batch)
AttributeError: 'tuple' object has no attribute 'size'

You should either return only out or just replace this line in your train loop:

Y_pred = model(X_batch)

with:

Y_pred, _ = model(X_batch)

edited Jan 19, 2024 at 20:31

answered Jan 19, 2024 at 20:18

Ro.oT

2,1227 gold badges17 silver badges28 bronze badges

Sequence length and input size are bit confusing. Sequence length refers to the number of feature and input size refers to the size of the "time series" (window size?)
– Shew
Commented Jan 19, 2024 at 20:41
Yes, you're correct and that's my understanding too.
– Ro.oT
Commented Jan 19, 2024 at 20:48
documentation suggests input_size is the number of expected features... pytorch.org/docs/stable/generated/…
– Shew
Commented Jan 19, 2024 at 20:57

Add a comment |

Karl · Accepted Answer · 2024-01-20 20:51:24Z

Adding to the current answers with a more modern approach.

As others have pointed out, the RNN model takes as input a tensor of size (bs, sl, n_features) when batch_first=True. To accommodate this, you need to add an extra unit axis as you have a single feature.

For your prediction, you want to use a single directional RNN to predict the next value for every state. Doing next step prediction gives you extra training data for free, even if you intend to use it for multistep prediction. We can set up the data as follows:

X = np.array( [ 
                [0.40, 0.82, 0.14, 0.01, 0.98, 0.53, 2.5, 0.49], 
                [0.82, 0.14, 0.01, 0.98, 0.53, 2.5, 0.49, 0.53], 
                [0.14, 0.01, 0.98, 0.53, 2.5, 0.49, 0.53, 3.37] 
            ], dtype=np.float32)

x = torch.from_numpy(X[:, :-1]).unsqueeze(-1) # all values except the last
y = torch.from_numpy(X[:, 1:]).unsqueeze(-1) # next step values shifted by one

Now the model. We add a linear_projection to the input just to give the model a bit more to work with. We also update the forward method to optionally take in an existing hidden state, which we will use for inference.

class RNNModel(nn.Module):
    def __init__(self, d_in, d_rnn, d_hidden, n_layers, d_output):
        super().__init__()
        
        self.linear_projection = nn.Linear(d_in, d_rnn)
        self.rnn = nn.RNN(d_rnn, d_hidden, num_layers=n_layers, batch_first=True)
        self.output_layer = nn.Linear(d_hidden, d_output)
        
        self.n_layers = n_layers
        self.d_hidden = d_hidden

    def forward(self, x, hidden=None):
        
        x = self.linear_projection(x)
        
        if hidden is None:
            hidden = self.get_hidden(x)
            
        x, hidden = self.rnn(x, hidden)
        
        x = self.output_layer(x)
        
        return x, hidden
        
    def get_hidden(self, x):
        hidden = torch.zeros(self.n_layers, x.shape[0], self.d_hidden, device=x.device)
        return hidden

Now we train

d_in = 1
d_rnn = 32
d_hidden = 128
n_layers = 2
d_output = 1

model = RNNModel(d_in, d_rnn, d_hidden, n_layers, d_output)

optimizer = optim.Adam(model.parameters())
loss_fn = nn.MSELoss()

loader = data.DataLoader(data.TensorDataset(x, y), shuffle=False, batch_size=3)

n_epoch = 10
for epoch in range(n_epoch):
    model.train()
    for X_batch, Y_batch in loader:
        Y_pred, hidden = model(X_batch)
        loss = loss_fn(Y_pred,Y_batch)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    # omitting validation, etc for brevity

And now we can do inference

init_value = torch.tensor([0.35, 0.55, 0.2])[:,None,None] # input of size (3, 1, 1)
hidden = None
prediction_steps = 5 # number of timesteps to predict 
preds = []

input_value = init_value
with torch.no_grad():
    for i in range(prediction_steps):
        input_value, hidden = model(init_value, hidden) # prediction + hidden are inputs for the next timestep
        preds.append(input_value)
        
preds = torch.cat(preds) # prediction size of (3, 5, 1)

Collectives™ on Stack Overflow

Simple RNN with more than one layer in Pytorch for squential prediction

3 Answers 3

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Related