Skip to main content

Questions tagged [sequence-modeling]

Questions about the analysis of sequential data, often used to analyse audio information or to predict time series.

2 votes
1 answer
34 views

Recurrent architectures such as LSTMs and GRUs were originally designed to address the vanishing gradient problem and capture long-range dependencies in sequential data. However, in recent years ...
Avalon Brooks's user avatar
0 votes
0 answers
32 views

$$fig-1$$ Notation & Assumptions Assume that for $t=1$ we give zero vector as $a^{<0>}$ $\hat{X_i}$ - i'th Token where $\hat{X_i} \in R^{2} $ $a^{<t>[L]}_{j}$ - $j$th node or ...
Rambal heart remo's user avatar
1 vote
1 answer
162 views

I'm designing a neural network that takes input of shape (batch_size, seq_len, features), where seq_len can vary between samples....
bliu's user avatar
  • 11
0 votes
0 answers
53 views

I would like to generate sequences of tuples using a neural network algorithm such that the model trains on a dataset of sequences of tuples and generates synthetic sequences of tuples. Each tuple <...
Ben Bost's user avatar
  • 101
1 vote
2 answers
524 views

I was recently brushing up on my deep-learning basics and came back to RNNs. LSTMs/GRUs and the Transformer architecture were invented to solve RNN's vanishing/exploding gradient problem. I was at ...
Vladislav Korecký's user avatar
0 votes
0 answers
53 views

I have a time series forecasting problem which consist of date, item no and quantity columns. I have defined a function which takes input as data frame and forecasting period (Daily,Weekly,Monthly,...
Rohit's user avatar
  • 1
1 vote
1 answer
619 views

Is it theoretically possible to use a transformer architecture to autoregressively generate a sequence of embedding vectors, instead of discrete tokens? For example, if I were to provide an input of a ...
Theo Coombes's user avatar
1 vote
2 answers
2k views

I have been looking for the answer in other questions but no one tackled that. I want to ask you how is the padding mask considered in the formula of attention? The attention formula taking into ...
Daviiid's user avatar
  • 605
2 votes
1 answer
180 views

Hadamard defines (Well-posed problem (Wikipedia)) a well-posed problem as one for which: a solution exists, the solution is unique, the solution depends continuously on the data (e.g. it is stable) ...
aren't eistert's user avatar
1 vote
0 answers
76 views

I am practicing machine translation using seq2seq model (more specifically with GRU/LSTM units). The following is my first model: This model first archived about 0.03 accuracy score and gradually ...
Đạt Trần's user avatar
4 votes
1 answer
1k views

As far as I know, attention was first introduced in Learning To Align And Translate. There, the core mechanism which is able to disregard the sequence length, is a dynamically-built matrix, of shape ...
Gulzar's user avatar
  • 799
-1 votes
1 answer
293 views

During trying to understand transformers by reading Attention is all you need, I noticed the authors constantly refer to "self attention" without explaining it. The original attention ...
Gulzar's user avatar
  • 799
0 votes
0 answers
342 views

When exploring the Twitter Sentiment Analysis dataset on Kaggle, I came up with a model that looks like this: ...
Tran Khanh's user avatar
1 vote
1 answer
128 views

I'm new to LSTMs, and I'm trying to do a basic timeseries prediction using stock prices. However, I'm a bit confused as to how the LSTM is supposed to remember outputs from previous timesteps when it ...
Krusty the Clown's user avatar
0 votes
2 answers
260 views

I need to predict a binary vector given a sequential dataset meaning the current datapoint depends on its predecessors as well as (known) successors. So, it looks something like this: Given the ...
toom's user avatar
  • 101

15 30 50 per page
1
2 3 4 5