How to create LSTM that allows dynamic sequence length in PyTorch

Question

I've created an LSTM in PyTorch and I need to give it a sequence length variable, the following is my code:

class Seq2SeqSingle(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, in_features, out_features):
        super(Seq2SeqSingle, self).__init__()
        self.out_features = out_features
        self.num_layers = num_layers
        self.input_size = input_size
        self.hidden_size = hidden_size

        self.fc_i = nn.Linear(input_size, out_features)
        self.fc_o = nn.Linear(out_features, input_size)
        self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers, batch_first=True)
        self.fc_0 = nn.Linear(128*11, out_features)         ## <----------- LOOK HERE
        self.fc_1 = nn.Linear(out_features, out_features)

    def forward(self, x):
        #print(x.shape)
        output = self.fc_i(torch.relu(x))
        output = self.fc_o(torch.relu(output))
        
        h_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size)).to(device)
        c_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size)).to(device)
        output, (h_out, c_out) = self.lstm(output, (h_0, c_0))
        output = output.reshape(x.size(0), -1)
        output = self.fc_0(torch.relu(output))
        output = self.fc_1(torch.relu(output))
        output = nn.functional.softmax(output, dim = 1)
        return output

In order to match the size of the output of the LSTM layer I need to multiply 128 (that is the hidden size) with 11 (the sequence length), obviously if I change the sequence length it crashes, how can I avoid to specify this fixed size?

Usually, people will use the last hidden states instead of flattening all hidden states for the next layer. If you are concerned about losing information from early steps, you can take the aggregation of all hidden states by mean or sum or weighted sum (attention). — joe32140, Commented Dec 6, 2022 at 16:10
@joe32140 how can I do that? "use the last hidden states instead of flattening all hidden states for the next layer" — Francesco Scala, Commented Dec 7, 2022 at 6:46
It looks like you're trying to classify input sequences, i.e. assign a single label to a given input. Can you please confirm this in your question? — kmkurn, Commented Dec 7, 2022 at 9:29
The output is (N, L, D * H_{out}) when batch_first=True, so you can do last_hidden = output[:,-1,:]. Note that if you did do padding, choosing the last hidden might not be the best method. — joe32140, Commented Dec 7, 2022 at 15:46
The length might change, but the size of D * H_out won't change according to the sequence length. last_hidden = output[:,-1,:] means you only take the hidden state of the last step. — joe32140, Commented Dec 7, 2022 at 19:27

kmkurn · Accepted Answer · 2022-12-08 23:36:56Z

1

You can't avoid specifying the fixed size. As joe32140 said in a comment, a common approach is to take only the hidden state of the last step as input for the linear layer, so the size no longer depends on the number of steps.

answered Dec 8, 2022 at 23:36

community wiki

kmkurn

Add a comment |

Collectives™ on Stack Overflow

How to create LSTM that allows dynamic sequence length in PyTorch

1 Answer 1

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Related