97 questions
5
votes
1
answer
259
views
How do I convert a `float` to a `_Float16`, or even initialize a `_Float16`? (And/or print with printf?)
I'm developing a library which uses _Float16s for many of the constants to save space when passing them around. However, just testing, it seems that telling GCC to just "set it to 1" isn't ...
1
vote
0
answers
48
views
Flipping a single bit of Floating-points (IEEE-754) mathematically
I'm working on implementing a mathematical approach to bit flipping in IEEE 754 FP16 floating-point numbers without using direct bit manipulation. The goal is to flip a specific bit (particularly in ...
1
vote
0
answers
61
views
Does cuBLAS support mixed precision matrix multiplication in the form C[f32] = A[bf16] * B[f32]?
I'm concerning mixed precision in deep learning LLM. The intermediates are mostly F32 and weights could be any other type like BF16, F16, even quantized type Q8_0, Q4_0. it would be much useful if ...
1
vote
1
answer
372
views
Do all processors supporting AVX2 support F16C?
Is it safe to assume that all machines on which AVX2 is supported also support F16C instructions? I haven't encountered any machine that doesn't do that, currently. Thanks
2
votes
1
answer
78
views
float16_t rounding on ARM NEON
I am implementing emulation of ARM float16_t for X64 using SSE; the idea is to have bit-exact values on both platforms. I mostly finished the implementation, except for one thing, I cannot correctly ...
1
vote
1
answer
61
views
What makes `print(np.half(500.2))` differs from `print(f"{np.half(500.2)}")`
everyone. I've been learning floating-point truncation errors recently. But I found print(np.half(500.2)) and print(f"{np.half(500.2)}") yield different results. Here are the logs I got in ...
-2
votes
1
answer
300
views
Why do BF16 models have slower inference on Mac M-series chips compared to F16 models?
I read on https://github.com/huggingface/smollm/tree/main/smol_tools (mirror 1):
All models are quantized to 16-bit floating-point (F16) for efficient inference. Training was done on BF16, but in our ...
3
votes
2
answers
506
views
How can I convert an integer to CUDA's __half FP16 type, in a constexpr fashion?
I'm the developer of aerobus and I'm facing difficulties with half precision arithmetic.
At some point in the library, I need to convert a IntType to related FloatType (same bit count) in a constexpr ...
0
votes
0
answers
34
views
DCGM_FI_PROF_PIPE_FP16_ACTIVE data collect
When I use dcgm-exporter to collect DCGM_FI_PROF_PIPE_FP16_ACTIVE data, I find that the data is as small as 0.001458, and the unit is still %, is this normal?fp16 active data
And this is the program I ...
0
votes
1
answer
86
views
What is the difference, if any, between model.half() and model.to(dtype=torch.float16) in huggingface-transformers?
Example:
# pip install transformers
from transformers import AutoModelForTokenClassification, AutoTokenizer
# Load model
model_path = 'huawei-noah/TinyBERT_General_4L_312D'
model = ...
-1
votes
1
answer
2k
views
I load a float32 Hugging Face model, cast it to float16, and save it. How can I load it as float16?
I load a huggingface-transformers float32 model, cast it to float16, and save it. How can I load it as float16?
Example:
# pip install transformers
from transformers import ...
0
votes
1
answer
489
views
Is there any point in setting `fp16_full_eval=True` if training in `fp16`?
I train a Huggingface model with fp16=True, e.g.:
training_args = TrainingArguments(
output_dir="./results",
evaluation_strategy="epoch",
learning_rate=4e-5,
...
6
votes
1
answer
953
views
AVX-512 BF16: load bf16 values directly instead of converting from fp32
On CPU's with AVX-512 and BF16 support, you can use the 512 bit vector registers to store 32 16 bit floats.
I have found intrinsics to convert FP32 values to BF16 values (for example: ...
0
votes
1
answer
236
views
Xcode Apple Silicon not comping ARM64 half precision neon instructions: Invalid operand for instruction
To date I have had no issue compiling and running complex ARM Neon assembly language routines in Xcode/CLANG, and the Apple M1 supposedly supports ARMv8.4.
But - when I try to use half precision with ...
0
votes
1
answer
107
views
std::floating_point concept in CUDA for all IEE754 types
I would like to know if CUDA provides a concept similar to std::floating_point but including all IEE754 types including e.g. __half. I provide below a sample code that test that __half template ...