Skip to main content

All Questions

1 vote
1 answer
62 views

What makes `print(np.half(500.2))` differs from `print(f"{np.half(500.2)}")`

everyone. I've been learning floating-point truncation errors recently. But I found print(np.half(500.2)) and print(f"{np.half(500.2)}") yield different results. Here are the logs I got in ...
Cestimium's user avatar
0 votes
1 answer
86 views

What is the difference, if any, between model.half() and model.to(dtype=torch.float16) in huggingface-transformers?

Example: # pip install transformers from transformers import AutoModelForTokenClassification, AutoTokenizer # Load model model_path = 'huawei-noah/TinyBERT_General_4L_312D' model = ...
Franck Dernoncourt's user avatar
-1 votes
1 answer
2k views

I load a float32 Hugging Face model, cast it to float16, and save it. How can I load it as float16?

I load a huggingface-transformers float32 model, cast it to float16, and save it. How can I load it as float16? Example: # pip install transformers from transformers import ...
Franck Dernoncourt's user avatar
0 votes
1 answer
490 views

Is there any point in setting `fp16_full_eval=True` if training in `fp16`?

I train a Huggingface model with fp16=True, e.g.: training_args = TrainingArguments( output_dir="./results", evaluation_strategy="epoch", learning_rate=4e-5, ...
Franck Dernoncourt's user avatar
3 votes
0 answers
145 views

How to verify if the tensorflow code trains completely in FP16?

I'm trying to train a TensorFlow (version 2.11.0) code in float16. I checked that FP16 is supported on the RTX 3090 GPU. So, I followed the below link to train the whole code in reduced precision. ...
Sherlock's user avatar
1 vote
1 answer
947 views

Can language model inference on a CPU, save memory by quantizing?

For example, according to https://cocktailpeanut.github.io/dalai/#/ the relevant figures for LLaMA-65B are: Full: The model takes up 432.64GB Quantized: 5.11GB * 8 = 40.88GB The full model won't fit ...
rwallace's user avatar
  • 33.7k
0 votes
0 answers
700 views

Is there a reason why a nan value appears when there is no nan value in the model parameter?

I want to train the model with FP32 and perform inference with FP16. For other networks (ResNet) with FP16, it worked. But EDSR (super resolution) with FP16 did not work. The differences I found are ...
SIwoo Lee's user avatar
1 vote
1 answer
3k views

Convert 16 bit hex value to FP16 in Python?

I'm trying to write a basic FP16 based calculator in python to help me debug some hardware. Can't seem to find how to convert 16b hex values unto floating point values I can use in my code to do the ...
ajcrm125's user avatar
  • 333
1 vote
2 answers
2k views

Why is it dangerous to convert integers to float16?

I have run recently into a surprising and annoying bug in which I converted an integer into a float16 and the value changed: >>> import numpy as np >>> np.array([2049]).astype(np....
guhur's user avatar
  • 2,906
1 vote
1 answer
514 views

Reading a binary structure in Javascript

I have a table that I am trying to read in Javascript, with data that is large enough that I would like to have it in binary format to save space. Most of the table is either numbers or enums, but ...
PearsonArtPhoto's user avatar
1 vote
1 answer
3k views

Why does converting from np.float16 to np.float32 modify the value?

When converting a number from half to single floating representation I see a change in the numeric value. Here I have 65500 stored as a half precision float, but upgrading to single precision changes ...
Mikhail's user avatar
  • 8,058
8 votes
1 answer
9k views

tensorflow - how to use 16 bit precision float

Question float16 can be used in numpy but not in Tensorflow 2.4.1 causing the error. Is float16 available only when running on an instance with GPU with 16 bit support? Mixed precision Today, most ...
mon's user avatar
  • 22.6k
3 votes
3 answers
14k views

fp16 inference on cpu Pytorch

I have a pretrained pytorch model I want to inference on fp16 instead of fp32, I have already tried this while using the gpu but when I try it on cpu I get: "sum_cpu" not implemented for 'Half' torch. ...
user123's user avatar
  • 31
0 votes
1 answer
2k views

Incomplete Cholesky Factorization Very Slow

Background: I'm doing a project for my Numerical Linear Algebra course. For this project I decided to experiment with doing incomplete cholesky factorization with half precision arithmetic and using ...
Onye's user avatar
  • 205
2 votes
2 answers
2k views

how to do convolution with fp16(Eigen::half) on tensorflow

How can I use tensorflow to do convolution using fp16 on GPU? (the python api using __half or Eigen::half). I want to test a model with fp16 on tensorflow, but I got stucked. Actually, I found that ...
Di Huang's user avatar

15 30 50 per page