All Questions
Tagged with half-precision-float floating-point
19 questions
5
votes
1
answer
259
views
How do I convert a `float` to a `_Float16`, or even initialize a `_Float16`? (And/or print with printf?)
I'm developing a library which uses _Float16s for many of the constants to save space when passing them around. However, just testing, it seems that telling GCC to just "set it to 1" isn't ...
1
vote
0
answers
48
views
Flipping a single bit of Floating-points (IEEE-754) mathematically
I'm working on implementing a mathematical approach to bit flipping in IEEE 754 FP16 floating-point numbers without using direct bit manipulation. The goal is to flip a specific bit (particularly in ...
3
votes
2
answers
506
views
How can I convert an integer to CUDA's __half FP16 type, in a constexpr fashion?
I'm the developer of aerobus and I'm facing difficulties with half precision arithmetic.
At some point in the library, I need to convert a IntType to related FloatType (same bit count) in a constexpr ...
2
votes
2
answers
498
views
How do I print the half-precision / bfloat16 values from in a (binary) file?
This is a variant of:
How to print float value from binary file in shell?
in that question, we wanted to print IEEE 754 single-precision (i.e. 32-bit) floating-point values from a binary file.
Now ...
0
votes
0
answers
190
views
Clarification on IEEE 754 rounding to nearest, ties to even
I am working on an IEEE 754 16-bit adder, and I am confused at the round to nearest, ties to even logic.
The first addition which confuses me is 169.8 (0x594E) + -0.06256 (0xAC01).
After shifting and ...
0
votes
0
answers
75
views
Precision loss reading from `r16Snorm` texture to `half` variable in Metal
Am I correct in my assumption that reading a value from .r16SNorm texture into Metal Shading Language half data type always unavoidably incur precision loss? It wasn't obvious to me from the start ...
1
vote
3
answers
4k
views
How to convert a float to a half type and the other way around in C
How can I convert a float (float32) to a half (float16) and the other way around in C while accounting for edge cases like NaN, Infinity etc.
I don't need arithmetic because I just need the types in ...
0
votes
0
answers
101
views
16-bit floating point division (half-precision)?
how can I divide a 16-bit float point number by a 16-bit float point number (half-precision)?
I did the sign with XOR gate, the exponent with 5bit subtractor, but couldn't do the mantissa.
how can I ...
0
votes
1
answer
2k
views
List of ARM instructions implementing half-precision floating-point arithmetic
Arm Architecture Reference Manual for A-profile architecture (emphasis added):
FPHP, bits [27:24]
0b0011 As for 0b0010, and adds support for half-precision floating-point arithmetic.
A simple ...
2
votes
2
answers
1k
views
Double vs Float vs _Float16 (Running Time)
I have a simple question in C language. I am implementing a half-precision software using _Float16 in C (My mac is based on ARM), but running time is not quite faster than single or double-precision ...
8
votes
1
answer
2k
views
Why does bfloat16 have so many exponent bits?
It's clear why a 16-bit floating-point format has started seeing use for machine learning; it reduces the cost of storage and computation, and neural networks turn out to be surprisingly insensitive ...
1
vote
2
answers
2k
views
Bit shifting a half-float into a float
I have no choice but to read in 2 bytes that make up a half-float. I would like to work with this in the form of a 4 byte float. Ive done some research and the only thing I can come up with is bit ...
5
votes
3
answers
5k
views
How to correctly determine at compile time that _Float16 is supported?
I am trying to determine at compile time that _Float16 is supported:
#define __STDC_WANT_IEC_60559_TYPES_EXT__
#include <float.h>
#ifdef FLT16_MAX
_Float16 f16;
#endif
Invocations:
# gcc trunk ...
1
vote
1
answer
3k
views
Why does converting from np.float16 to np.float32 modify the value?
When converting a number from half to single floating representation I see a change in the numeric value.
Here I have 65500 stored as a half precision float, but upgrading to single precision changes ...
0
votes
0
answers
253
views
How to Initialise 16-bit Half Floats (GAS for ARM32)?
When writing an ARM assembly program one can use data type directives to initialise some values. For example, in the example below we are initializing a single float:
label: .single 0.0
However, when ...