537 questions
0
votes
1
answer
75
views
How to tell ARM clang to emit SCVTF with the #fbits parameter?
In an embedded application we need to convert a signed-fractional number (S1.31 format) to a single-precision floating point number. A C function looks like this:
#include <stdint.h>
float ...
-1
votes
1
answer
54
views
When converting floating point to fixed point why have I seen some code do BIT_WIDTH ^ 2 MINUS 1?
In my mind intuitively, and even after some thought, if I want to convert a normalised float value back to fixed point I would multiply it by the max value able to be held by the fixed-point format. ...
2
votes
0
answers
68
views
Howto Round an Ada Fixed point Type To Decimal precision?
type Number_Type is delta 0.000_1 digits 18;
--------------------------------------------------------------
-- Round Number To Decimal 0
------------------------------------------------------...
0
votes
1
answer
48
views
Is there an efficient way to overload operators for structs when the underlying data is piecewise?
I have various methods for accumulating quantities in fixed-precision numerics based on the following data type:
int95_t {
long long int x;
int y;
};
The numbers count forward in x until it may ...
2
votes
2
answers
265
views
Is there a way to check if 128-bit integers are available
I'm working on a fixed-point math library. The accepted way of handling multiplications for fixed-point numbers is to multiply them into an integer variable that's larger than what you're storing, ...
1
vote
2
answers
180
views
How to perform accurate FixedPoint Number Calculation in C?
How to perform accurate FixedPoint Number Calculation in C ??
I use the struct in C to store the integer part and the fraction part.
I guess the function is incorrect, maybe didn't handle the overflow ...
2
votes
1
answer
262
views
How does rounding works in float multiplication?
The exact value of float 0.7f is 0.69999...
so I thought the result of 0.7f * 100f would be something below 70, like 69.99999...
But the result is exact 70.
Does float multiplication involve such ...
1
vote
0
answers
102
views
Alpha compositing in fixed point (0xff is not 1.0)
I'm writing some code (intended for use from both C and C++) that does alpha compositing using integer math (fixed point), and I'm coming across the problem that 0xff isn't quite 1.0. This is ...
0
votes
1
answer
126
views
Sign fixed point arithmetic
Can you please help me with the arithmetic of binary numbers in fixed-point format.
I have two sign-magnitude 8-bit numbers, for example, 0_0101010 and 1_0001100, where the most significant bit ...
0
votes
1
answer
376
views
RISC-V Fixed Point Arithmetic
I have following problem:
I wanted to add two numbers -1.25 and 3.25 (the result is obviously 2) in RISC-V assembly using 32-bit registers.
I assumed Q format (imaginary decimal point), so that there ...
0
votes
0
answers
23
views
Quantization effects on Gradient Descent algorithm
Im writing a research paper about the effects of quantization on Gradient descent algorithm when we reduce the precision from float to fixed point airthmatic for instance 16 bits. If anyone could ...
0
votes
0
answers
38
views
Fixed point vs Floating point
I have a task to analyze to effects of quantization in Madgwick filter and Mahony filter which are two orientation estimation algorithms. Madgwick uses gradient descent optimisation technique which ...
0
votes
0
answers
59
views
Fixed point vs Float point number
I have a project that is basically to analyze the effects of quantization on orientation estimation algorithms. I have sensor data from gyroscope that looks like this when using float datatype:
gx=-0....
-1
votes
2
answers
228
views
Calculating the fixed-point representation of (1 - SQRT(0.5)) to arbitrary levels of precision
I have a constant 0.29289321881345247559915563789515..., which can be calculated using the equation (1 - SQRT(0.5)) and then transformed into fixed-point format in order to be used with various sizes ...
1
vote
2
answers
316
views
Which bits do I need to extract in a signed fixed point multiplier?
I need to design a fixed point multiplier in Verilog that takes in a 16 bif formatted with one sign bit, 6 integer bits and 7 fractional bits. I just can't figure out which bits to extract to ensure ...