Newest 'efficiency' Questions - Artificial Intelligence Stack Exchange

1 vote

0 answers

69 views

Optimal matrix multiplication - impact and applications

Let $A$ be integer matrix of size $n\times t$ and $B$ be integer matrix of size $t\times m$. Let max entry in absolute value be of $b$ bits in $A,B$. If we can multiply $A,B$ in say $\leq100(n+m)tb(\...

Justaperson

133

asked Jul 19, 2025 at 18:07

0 votes

0 answers

26 views

Efficient way to ensure high data coverage in a stochastic minibatch sampling for GNN, while minimizing train time?

I am training a Graph Neural Network for inductive link prediction. The final objective is to predict links for unseen nodes. My neural network follows the general GraphSAGE pipeline but I have ...

WYSIWYG

101

asked Jul 9, 2025 at 9:49

1 vote

0 answers

59 views

Are there any other training methods based on this PRNG trick?

Intro Recently, in an effort to find new & MCU-suitable training-algorithms (for a NN-library I'm developing), I came up with a trick (which I doubt I'm the first one). A ...

Giorgos Xou

111

asked Feb 28, 2025 at 0:53

3 votes

1 answer

129 views

Could the efficiencies developed by DeepSeek be put to use in the larger language models making them significantly more powerful?

The legacy LLMs have so much more compute power than DeepSeek yet they are comparable. If the efficiencies of DeepSeek get applied to the models that have significantly more compute power would that ...

Joe

133

asked Feb 2, 2025 at 22:08

0 votes

1 answer

237 views

Matrix Multiplication using a Neural Network

As I understand Neural Networks they have a slow training phase (quadratic or cubic time) and a fast (linear time) inference phase. Also, the slow training phase comes from the requirement of doing ...

user172776

3

asked Aug 13, 2024 at 19:49

0 votes

1 answer

154 views

Reference request: data efficiency of LLM pre-training

I've seen it stated multiple times that LLMs have much worse data efficiency than humans (IE require more data to reach same or worse performance), EG this Tweet by Yann LeCun, or 19:30 in this talk ...

Jake Levi

101

asked Apr 29, 2024 at 14:22

1 vote

0 answers

56 views

Do NNs suffer from lack of efficiency in network structure and suggesting training parameters?

I am working on dynamical systems using Optimal Control theory and trying to find the connection between this field and Machine Learning. Consider a simple 2-layer Neural Network (NN) where the ...

Mehdi Moghadasian

11

asked Jun 24, 2022 at 10:15

0 votes

1 answer

83 views

Compare the efficiency of a trained ML model with a non-learning-based method for solving the same problem

If a certain task T is solved by a non-learning-based method A (let's say, an optimization-based approach). We now train a machine learning model B (let's say a neural network) on the same task. What ...

GSH

1

asked Oct 2, 2021 at 22:20

3 votes

1 answer

471 views

A comparison of Expert Systems and Machine Learning approaches in terms of run-time-efficiency and time/space complexity

For part of a paper I am writing on Clinical Decision Support Systems (computer-aided medical decision making, e.g. diagnosis, treatment), I am trying to compare Expert Systems with systems based on ...

Chris

25

asked Aug 21, 2021 at 0:31

2 votes

0 answers

139 views

Are there any successful applications of transformers of small size (<10k weights)?

In the problems of NLP and sequence modeling, the Transformer architectures based on the self-attention mechanism (proposed in Attention Is All You Need) have achieved impressive results and now are ...

spiridon_the_sun_rotator

2,952

asked May 16, 2021 at 19:35

2 votes

0 answers

110 views

How to train/update neural networks faster without a decrease in performance?

I noticed that there are many studies in recent years on how to train/update neural networks faster/quicker with equal or better performance. I find the following methods(except the chips arms race): ...

Lerner Zhang

1,065

asked Jan 16, 2021 at 8:52

4 votes

2 answers

2k views

In the multi-head attention mechanism of the transformer, why do we need both $W_i^Q$ and ${W_i^K}^T$?

In the Attention is all you need paper, on the 4th page, we have equation 1, which describes the self-attention mechanism of the transformer architecture $$ \text { Attention }(Q, K, V)=\operatorname{...

Uğurcan Özalp

43

asked Dec 16, 2020 at 15:58

1 vote

0 answers

148 views

What is the efficiency of trained neural networks?

Training neural networks takes a while. My question is, how efficient is a neural network that is completely trained (assuming it's not a model that is constantly learning)? I understand that this is ...

Anton

111

asked Sep 4, 2020 at 14:08

5 votes

2 answers

451 views

Is it possible to guide a reinforcement learning algorithm?

I have just started to study reinforcement learning and, as far as I understand, existing algorithms search for the optimal solution/policy, but do not allow the possibility for the programmer to ...

Cristian M

249

asked Apr 18, 2020 at 12:42

2 votes

0 answers

170 views

What is the most efficient data type to store probabilities?

In ML we often have to store a huge amount of values ranging from 0 to 1, mostly being probabilities. The most common data structure to do so seems to be a floating point? Indeed, the range of ...

Rustam

471

asked Apr 9, 2020 at 9:41

Stack Exchange Network

Questions tagged [efficiency]

Optimal matrix multiplication - impact and applications

Efficient way to ensure high data coverage in a stochastic minibatch sampling for GNN, while minimizing train time?

Are there any other training methods based on this PRNG trick?

Could the efficiencies developed by DeepSeek be put to use in the larger language models making them significantly more powerful?

Matrix Multiplication using a Neural Network

Reference request: data efficiency of LLM pre-training

Do NNs suffer from lack of efficiency in network structure and suggesting training parameters?

Compare the efficiency of a trained ML model with a non-learning-based method for solving the same problem

A comparison of Expert Systems and Machine Learning approaches in terms of run-time-efficiency and time/space complexity

Are there any successful applications of transformers of small size (<10k weights)?

How to train/update neural networks faster without a decrease in performance?

In the multi-head attention mechanism of the transformer, why do we need both $W_i^Q$ and ${W_i^K}^T$?

What is the efficiency of trained neural networks?

Is it possible to guide a reinforcement learning algorithm?

What is the most efficient data type to store probabilities?

Hot Network Questions