Skip to main content

Questions tagged [efficiency]

For questions about efficiency of ML/AI algorithms solving a particular problem.

1 vote
0 answers
69 views

Let $A$ be integer matrix of size $n\times t$ and $B$ be integer matrix of size $t\times m$. Let max entry in absolute value be of $b$ bits in $A,B$. If we can multiply $A,B$ in say $\leq100(n+m)tb(\...
Justaperson's user avatar
0 votes
0 answers
26 views

I am training a Graph Neural Network for inductive link prediction. The final objective is to predict links for unseen nodes. My neural network follows the general GraphSAGE pipeline but I have ...
WYSIWYG's user avatar
  • 101
1 vote
0 answers
59 views

Intro Recently, in an effort to find new & MCU-suitable training-algorithms (for a NN-library I'm developing), I came up with a trick (which I doubt I'm the first one). A ...
Giorgos Xou's user avatar
3 votes
1 answer
129 views

The legacy LLMs have so much more compute power than DeepSeek yet they are comparable. If the efficiencies of DeepSeek get applied to the models that have significantly more compute power would that ...
Joe's user avatar
  • 133
0 votes
1 answer
237 views

As I understand Neural Networks they have a slow training phase (quadratic or cubic time) and a fast (linear time) inference phase. Also, the slow training phase comes from the requirement of doing ...
user172776's user avatar
0 votes
1 answer
154 views

I've seen it stated multiple times that LLMs have much worse data efficiency than humans (IE require more data to reach same or worse performance), EG this Tweet by Yann LeCun, or 19:30 in this talk ...
Jake Levi's user avatar
  • 101
1 vote
0 answers
56 views

I am working on dynamical systems using Optimal Control theory and trying to find the connection between this field and Machine Learning. Consider a simple 2-layer Neural Network (NN) where the ...
Mehdi Moghadasian's user avatar
0 votes
1 answer
83 views

If a certain task T is solved by a non-learning-based method A (let's say, an optimization-based approach). We now train a machine learning model B (let's say a neural network) on the same task. What ...
GSH's user avatar
  • 1
3 votes
1 answer
471 views

For part of a paper I am writing on Clinical Decision Support Systems (computer-aided medical decision making, e.g. diagnosis, treatment), I am trying to compare Expert Systems with systems based on ...
Chris's user avatar
  • 25
2 votes
0 answers
139 views

In the problems of NLP and sequence modeling, the Transformer architectures based on the self-attention mechanism (proposed in Attention Is All You Need) have achieved impressive results and now are ...
spiridon_the_sun_rotator's user avatar
2 votes
0 answers
110 views

I noticed that there are many studies in recent years on how to train/update neural networks faster/quicker with equal or better performance. I find the following methods(except the chips arms race): ...
Lerner Zhang's user avatar
  • 1,065
4 votes
2 answers
2k views

In the Attention is all you need paper, on the 4th page, we have equation 1, which describes the self-attention mechanism of the transformer architecture $$ \text { Attention }(Q, K, V)=\operatorname{...
Uğurcan Özalp's user avatar
1 vote
0 answers
148 views

Training neural networks takes a while. My question is, how efficient is a neural network that is completely trained (assuming it's not a model that is constantly learning)? I understand that this is ...
Anton's user avatar
  • 111
5 votes
2 answers
451 views

I have just started to study reinforcement learning and, as far as I understand, existing algorithms search for the optimal solution/policy, but do not allow the possibility for the programmer to ...
Cristian M's user avatar
2 votes
0 answers
170 views

In ML we often have to store a huge amount of values ranging from 0 to 1, mostly being probabilities. The most common data structure to do so seems to be a floating point? Indeed, the range of ...
Rustam's user avatar
  • 471

15 30 50 per page