Newest 'half-precision-float' Questions

5 votes

1 answer

259 views

How do I convert a `float` to a `_Float16`, or even initialize a `_Float16`? (And/or print with printf?)

I'm developing a library which uses _Float16s for many of the constants to save space when passing them around. However, just testing, it seems that telling GCC to just "set it to 1" isn't ...

Coarse Rosinflower

145

asked Mar 21 at 2:10

1 vote

0 answers

48 views

Flipping a single bit of Floating-points (IEEE-754) mathematically

I'm working on implementing a mathematical approach to bit flipping in IEEE 754 FP16 floating-point numbers without using direct bit manipulation. The goal is to flip a specific bit (particularly in ...

Muhammad Zaky

11

asked Mar 18 at 10:34

1 vote

0 answers

61 views

Does cuBLAS support mixed precision matrix multiplication in the form C[f32] = A[bf16] * B[f32]?

I'm concerning mixed precision in deep learning LLM. The intermediates are mostly F32 and weights could be any other type like BF16, F16, even quantized type Q8_0, Q4_0. it would be much useful if ...

dentry

11

asked Mar 3 at 14:05

1 vote

1 answer

372 views

Do all processors supporting AVX2 support F16C?

Is it safe to assume that all machines on which AVX2 is supported also support F16C instructions? I haven't encountered any machine that doesn't do that, currently. Thanks

Srihari S

97

asked Feb 12 at 2:50

2 votes

1 answer

78 views

float16_t rounding on ARM NEON

I am implementing emulation of ARM float16_t for X64 using SSE; the idea is to have bit-exact values on both platforms. I mostly finished the implementation, except for one thing, I cannot correctly ...

Bogi

2,638

asked Jan 8 at 21:25

1 vote

1 answer

61 views

What makes `print(np.half(500.2))` differs from `print(f"{np.half(500.2)}")`

everyone. I've been learning floating-point truncation errors recently. But I found print(np.half(500.2)) and print(f"{np.half(500.2)}") yield different results. Here are the logs I got in ...

Cestimium

13

asked Nov 22, 2024 at 2:35

-2 votes

1 answer

300 views

Why do BF16 models have slower inference on Mac M-series chips compared to F16 models?

I read on https://github.com/huggingface/smollm/tree/main/smol_tools (mirror 1): All models are quantized to 16-bit floating-point (F16) for efficient inference. Training was done on BF16, but in our ...

Franck Dernoncourt

83.7k

asked Nov 7, 2024 at 17:32

3 votes

2 answers

506 views

How can I convert an integer to CUDA's __half FP16 type, in a constexpr fashion?

I'm the developer of aerobus and I'm facing difficulties with half precision arithmetic. At some point in the library, I need to convert a IntType to related FloatType (same bit count) in a constexpr ...

Regis Portalez

4,870

asked Sep 14, 2024 at 10:04

0 votes

0 answers

34 views

DCGM_FI_PROF_PIPE_FP16_ACTIVE data collect

When I use dcgm-exporter to collect DCGM_FI_PROF_PIPE_FP16_ACTIVE data, I find that the data is as small as 0.001458, and the unit is still %, is this normal?fp16 active data And this is the program I ...

刘润泽

1

asked Sep 6, 2024 at 9:50

0 votes

1 answer

86 views

What is the difference, if any, between model.half() and model.to(dtype=torch.float16) in huggingface-transformers?

Example: # pip install transformers from transformers import AutoModelForTokenClassification, AutoTokenizer # Load model model_path = 'huawei-noah/TinyBERT_General_4L_312D' model = ...

Franck Dernoncourt

83.7k

asked Jul 7, 2024 at 23:33

-1 votes

1 answer

2k views

I load a float32 Hugging Face model, cast it to float16, and save it. How can I load it as float16?

I load a huggingface-transformers float32 model, cast it to float16, and save it. How can I load it as float16? Example: # pip install transformers from transformers import ...

Franck Dernoncourt

83.7k

asked Jul 5, 2024 at 23:58

0 votes

1 answer

489 views

Is there any point in setting `fp16_full_eval=True` if training in `fp16`?

I train a Huggingface model with fp16=True, e.g.: training_args = TrainingArguments( output_dir="./results", evaluation_strategy="epoch", learning_rate=4e-5, ...

Franck Dernoncourt

83.7k

asked Jun 28, 2024 at 22:04

6 votes

1 answer

953 views

AVX-512 BF16: load bf16 values directly instead of converting from fp32

On CPU's with AVX-512 and BF16 support, you can use the 512 bit vector registers to store 32 16 bit floats. I have found intrinsics to convert FP32 values to BF16 values (for example: ...

Thijs Steel

1,272

asked May 2, 2024 at 13:42

0 votes

1 answer

236 views

Xcode Apple Silicon not comping ARM64 half precision neon instructions: Invalid operand for instruction

To date I have had no issue compiling and running complex ARM Neon assembly language routines in Xcode/CLANG, and the Apple M1 supposedly supports ARMv8.4. But - when I try to use half precision with ...

user2465201

611

asked Apr 19, 2024 at 15:17

0 votes

1 answer

107 views

std::floating_point concept in CUDA for all IEE754 types

I would like to know if CUDA provides a concept similar to std::floating_point but including all IEE754 types including e.g. __half. I provide below a sample code that test that __half template ...

Dimitri Lesnoff

382

asked Mar 7, 2024 at 15:06

Collectives™ on Stack Overflow

How do I convert a `float` to a `_Float16`, or even initialize a `_Float16`? (And/or print with printf?)

Flipping a single bit of Floating-points (IEEE-754) mathematically

Does cuBLAS support mixed precision matrix multiplication in the form C[f32] = A[bf16] * B[f32]?

Do all processors supporting AVX2 support F16C?

float16_t rounding on ARM NEON

What makes `print(np.half(500.2))` differs from `print(f"{np.half(500.2)}")`

Why do BF16 models have slower inference on Mac M-series chips compared to F16 models?

How can I convert an integer to CUDA's __half FP16 type, in a constexpr fashion?

DCGM_FI_PROF_PIPE_FP16_ACTIVE data collect

What is the difference, if any, between model.half() and model.to(dtype=torch.float16) in huggingface-transformers?

I load a float32 Hugging Face model, cast it to float16, and save it. How can I load it as float16?

Is there any point in setting `fp16_full_eval=True` if training in `fp16`?

AVX-512 BF16: load bf16 values directly instead of converting from fp32

Xcode Apple Silicon not comping ARM64 half precision neon instructions: Invalid operand for instruction

std::floating_point concept in CUDA for all IEE754 types

Hot Network Questions

Collectives™ on Stack Overflow

Related Tags