Questions tagged [sequence-modeling]
Questions about the analysis of sequential data, often used to analyse audio information or to predict time series.
68 questions
2
votes
1
answer
34
views
Why do Transformers handle long-range dependencies better than LSTMs despite lacking explicit recurrence?
Recurrent architectures such as LSTMs and GRUs were originally designed to address the vanishing gradient problem and capture long-range dependencies in sequential data. However, in recent years ...
0
votes
0
answers
32
views
Is this illustration an accurate representation of RNN?
$$fig-1$$
Notation & Assumptions
Assume that for $t=1$ we give zero vector as $a^{<0>}$
$\hat{X_i}$ - i'th Token
where $\hat{X_i} \in R^{2} $
$a^{<t>[L]}_{j}$ - $j$th node or ...
1
vote
1
answer
162
views
Ensure Monotonic Output in a Neural Network with Variable-Length Sequence Input
I'm designing a neural network that takes input of shape (batch_size, seq_len, features), where seq_len can vary between samples....
0
votes
0
answers
53
views
Best neural network algorithms/architectures for generating synthetic sequences of tuples of words
I would like to generate sequences of tuples using a neural network algorithm such that the model trains on a dataset of sequences of tuples and generates synthetic sequences of tuples. Each tuple <...
1
vote
2
answers
524
views
Wouldn't residual connections in RNNs solve the vanishing/exploding gradient problem?
I was recently brushing up on my deep-learning basics and came back to RNNs. LSTMs/GRUs and the Transformer architecture were invented to solve RNN's vanishing/exploding gradient problem. I was at ...
0
votes
0
answers
53
views
Executing Multiple ML Models simultaneously on multiple cores to reduce the model building time
I have a time series forecasting problem which consist of date, item no and quantity columns. I have defined a function which takes input as data frame and forecasting period (Daily,Weekly,Monthly,...
1
vote
1
answer
619
views
Can transformers autoregressively generate a sequence of embeddings (instead of predictions)?
Is it theoretically possible to use a transformer architecture to autoregressively generate a sequence of embedding vectors, instead of discrete tokens?
For example, if I were to provide an input of a ...
1
vote
2
answers
2k
views
How is the padding mask incorporated in the attention formula?
I have been looking for the answer in other questions but no one tackled that. I want to ask you how is the padding mask considered in the formula of attention?
The attention formula taking into ...
2
votes
1
answer
180
views
Is the problem of Language Modelling a Well-Posed Learning Problem?
Hadamard defines (Well-posed problem (Wikipedia)) a well-posed problem as one for which:
a solution exists,
the solution is unique,
the solution depends continuously on the data (e.g. it is stable)
...
1
vote
0
answers
76
views
The model's accuracy becomes suddenly so unreasonably good at beginning of the training process. I need an explaination
I am practicing machine translation using seq2seq model (more specifically with GRU/LSTM units). The following is my first model:
This model first archived about 0.03 accuracy score and gradually ...
4
votes
1
answer
1k
views
Difference between dot product attention and "matrix attention"
As far as I know, attention was first introduced in Learning To Align And Translate.
There, the core mechanism which is able to disregard the sequence length, is a dynamically-built matrix, of shape ...
-1
votes
1
answer
293
views
Understanding self attention - How come there is no connection between different states?
During trying to understand transformers by reading Attention is all you need, I noticed the authors constantly refer to "self attention" without explaining it.
The original attention ...
0
votes
0
answers
342
views
Increasing "output_sequence_length" in TextVectorization layer worsens model's performance
When exploring the Twitter Sentiment Analysis dataset on Kaggle, I came up with a model that looks like this:
...
1
vote
1
answer
128
views
Many To One LSTM - Can I Use the Same Sequence as Input from Previous Timesteps?
I'm new to LSTMs, and I'm trying to do a basic timeseries prediction using stock prices. However, I'm a bit confused as to how the LSTM is supposed to remember outputs from previous timesteps when it ...
0
votes
2
answers
260
views
Which model should I apply on sequential data?
I need to predict a binary vector given a sequential dataset meaning the current datapoint depends on its predecessors as well as (known) successors.
So, it looks something like this:
Given the ...