Skip to main content
Small adjustments in the formatting.
Source Link
glampert
  • 17.3k
  • 4
  • 31
  • 89

We can assume I'm starting from std::mt19937 since the C++ standard does mandate its implementation, so the goal becomes, how can I convert a uniformly random uint32_t to a uniformly random float in the range [0.0f, 1.0f]. The main things I'm concerned about are:

  • efficiencyEfficiency
  • lossLoss of precision

I did also try looking around in the libstdc++libstdc++ headers to see where they are doing the equivalent thing, but it looked like it was going to take some digging to actually find it.

  • When static casting uint32_t to float, the value is never outside the representable range, so the behavior is not undefined. Typically it will not be representable exactly though, since both types have 32 bits, and float has to have some overhead. The standard says it is implementation-defined whether I get the next highest or next lowest representable number in this case. I assume that it doesn't matter since I'm going to divide by two anyways many times after this, and then many of these values will collide anyways?

    When static casting uint32_t to float, the value is never outside the representable range, so the behavior is not undefined. Typically it will not be representable exactly though, since both types have 32 bits, and float has to have some overhead. The standard says it is implementation-defined whether I get the next highest or next lowest representable number in this case. I assume that it doesn't matter since I'm going to divide by two anyways many times after this, and then many of these values will collide anyways?

  • Is it better (faster) to divide by the uint32_t max value, rather than by 2^32? I assume not.

    Is it better (faster) to divide by the uint32_t max value, rather than by 2^32? I assume not.

  • Does dividing by two repeatedly cause a subtle bias as the least significant bits are repeatedly discarded? If they are only being discarded then I would expect not, but possibly there is some rounding that takes place and could cause problems?

    Does dividing by two repeatedly cause a subtle bias as the least significant bits are repeatedly discarded? If they are only being discarded then I would expect not, but possibly there is some rounding that takes place and could cause problems?

We can assume I'm starting from std::mt19937 since the C++ standard does mandate its implementation, so the goal becomes, how can I convert a uniformly random uint32_t to a uniformly random float in the range [0.0f, 1.0f]. The main things I'm concerned about are

  • efficiency
  • loss of precision

I did also try looking around in the libstdc++ headers to see where they are doing the equivalent thing, but it looked like it was going to take some digging to actually find it.

  • When static casting uint32_t to float, the value is never outside the representable range, so the behavior is not undefined. Typically it will not be representable exactly though, since both types have 32 bits, and float has to have some overhead. The standard says it is implementation-defined whether I get the next highest or next lowest representable number in this case. I assume that it doesn't matter since I'm going to divide by two anyways many times after this, and then many of these values will collide anyways?
  • Is it better (faster) to divide by the uint32_t max value, rather than by 2^32? I assume not.
  • Does dividing by two repeatedly cause a subtle bias as the least significant bits are repeatedly discarded? If they are only being discarded then I would expect not, but possibly there is some rounding that takes place and could cause problems?

We can assume I'm starting from std::mt19937 since the C++ standard does mandate its implementation, so the goal becomes, how can I convert a uniformly random uint32_t to a uniformly random float in the range [0.0f, 1.0f]. The main things I'm concerned about are:

  • Efficiency
  • Loss of precision

I did also try looking around in the libstdc++ headers to see where they are doing the equivalent thing, but it looked like it was going to take some digging to actually find it.

  • When static casting uint32_t to float, the value is never outside the representable range, so the behavior is not undefined. Typically it will not be representable exactly though, since both types have 32 bits, and float has to have some overhead. The standard says it is implementation-defined whether I get the next highest or next lowest representable number in this case. I assume that it doesn't matter since I'm going to divide by two anyways many times after this, and then many of these values will collide anyways?

  • Is it better (faster) to divide by the uint32_t max value, rather than by 2^32? I assume not.

  • Does dividing by two repeatedly cause a subtle bias as the least significant bits are repeatedly discarded? If they are only being discarded then I would expect not, but possibly there is some rounding that takes place and could cause problems?

added 184 characters in body
Source Link

My goal is to generate a sequence of random float from a seeded random generator. The sequence should be the same on any machine / any compiler -- this rules out the use of std::uniform_real_distribution since it doesn't make that guarantee. Instead we have to create our own version of this, which we will be able to guarantee is portable / uses the same implementation on all platforms.

We can assume I'm starting from std::mt19937 since the C++ standard does mandate its implementation, so the goal becomes, how can I convert a uniformly random uint32_t to a uniformly random float in the range [0.0f, 1.0f]. The main things I'm concerned about are

  • efficiency
  • loss of precision

I threw something together which looks like this:

template <typename RNG>
float uniform_float(RNG & rng) {
  static_assert(std::is_same<std::uint32_t, typename RNG::result_type>::value, "Expected to be used with RNG whose result type is uint32_t, like mt19937");

  float result = static_cast<float>(rng());

  for (int i = 0; i < 32; ++i) {
    result /= 2;
  }
  return result;
}

In preliminary tests it seems to be outputting floats in the range [0.0f, 1.0f].

I did also try looking around in the libstdc++ headers to see where they are doing the equivalent thing, but it looked like it was going to take some digging to actually find it.

Here are some natural questions in my mind:

  • When static casting uint32_t to float, the value is never outside the representable range, so the behavior is not undefined. Typically it will not be representable exactly though, since both types have 32 bits, and float has to have some overhead. The standard says it is implementation-defined whether I get the next highest or next lowest representable number in this case. I assume that it doesn't matter since I'm going to divide by two anyways many times after this, and then many of these values will collide anyways?
  • Is it better (faster) to divide by the uint32_t max value, rather than by 2^32? I assume not.
  • Does dividing by two repeatedly cause a subtle bias as the least significant bits are repeatedly discarded? If they are only being discarded then I would expect not, but possibly there is some rounding that takes place and could cause problems?

An alternative strategy would be, start with 2^{-32} as a float, and then multiply it up by a random integer into the range 0.0f, 1.0f. However it's harder for me to understand in terms of the standard exactly what will happen if I do that -- what if 2^{-32} is not exactly representable? If I simply write it as a multiplication, then the int will be promoted to a float first anyways, right? Is it better to do some kind hand-rolled operation for int * float, using a bit-by-bit doubling routine etc.?

My goal is to generate a sequence of random float from a seeded random generator. The sequence should be the same on any machine / any compiler -- this rules out the use of std::uniform_real_distribution since it doesn't make that guarantee. Instead we have to create our own version of this, which we will be able to guarantee is portable / uses the same implementation on all platforms.

We can assume I'm starting from std::mt19937 since the C++ standard does mandate its implementation, so the goal becomes, how can I convert a uniformly random uint32_t to a uniformly random float in the range [0.0f, 1.0f]. The main things I'm concerned about are

  • efficiency
  • loss of precision

I threw something together which looks like this:

template <typename RNG>
float uniform_float(RNG & rng) {
  static_assert(std::is_same<std::uint32_t, typename RNG::result_type>::value, "Expected to be used with RNG whose result type is uint32_t, like mt19937");

  float result = static_cast<float>(rng());

  for (int i = 0; i < 32; ++i) {
    result /= 2;
  }
  return result;
}

In preliminary tests it seems to be outputting floats in the range [0.0f, 1.0f].

Here are some natural questions in my mind:

  • When static casting uint32_t to float, the value is never outside the representable range, so the behavior is not undefined. Typically it will not be representable exactly though, since both types have 32 bits, and float has to have some overhead. The standard says it is implementation-defined whether I get the next highest or next lowest representable number in this case. I assume that it doesn't matter since I'm going to divide by two anyways many times after this, and then many of these values will collide anyways?
  • Is it better (faster) to divide by the uint32_t max value, rather than by 2^32? I assume not.
  • Does dividing by two repeatedly cause a subtle bias as the least significant bits are repeatedly discarded? If they are only being discarded then I would expect not, but possibly there is some rounding that takes place and could cause problems?

An alternative strategy would be, start with 2^{-32} as a float, and then multiply it up by a random integer into the range 0.0f, 1.0f. However it's harder for me to understand in terms of the standard exactly what will happen if I do that -- what if 2^{-32} is not exactly representable? If I simply write it as a multiplication, then the int will be promoted to a float first anyways, right? Is it better to do some kind hand-rolled operation for int * float, using a bit-by-bit doubling routine etc.?

My goal is to generate a sequence of random float from a seeded random generator. The sequence should be the same on any machine / any compiler -- this rules out the use of std::uniform_real_distribution since it doesn't make that guarantee. Instead we have to create our own version of this, which we will be able to guarantee is portable / uses the same implementation on all platforms.

We can assume I'm starting from std::mt19937 since the C++ standard does mandate its implementation, so the goal becomes, how can I convert a uniformly random uint32_t to a uniformly random float in the range [0.0f, 1.0f]. The main things I'm concerned about are

  • efficiency
  • loss of precision

I threw something together which looks like this:

template <typename RNG>
float uniform_float(RNG & rng) {
  static_assert(std::is_same<std::uint32_t, typename RNG::result_type>::value, "Expected to be used with RNG whose result type is uint32_t, like mt19937");

  float result = static_cast<float>(rng());

  for (int i = 0; i < 32; ++i) {
    result /= 2;
  }
  return result;
}

In preliminary tests it seems to be outputting floats in the range [0.0f, 1.0f].

I did also try looking around in the libstdc++ headers to see where they are doing the equivalent thing, but it looked like it was going to take some digging to actually find it.

Here are some natural questions in my mind:

  • When static casting uint32_t to float, the value is never outside the representable range, so the behavior is not undefined. Typically it will not be representable exactly though, since both types have 32 bits, and float has to have some overhead. The standard says it is implementation-defined whether I get the next highest or next lowest representable number in this case. I assume that it doesn't matter since I'm going to divide by two anyways many times after this, and then many of these values will collide anyways?
  • Is it better (faster) to divide by the uint32_t max value, rather than by 2^32? I assume not.
  • Does dividing by two repeatedly cause a subtle bias as the least significant bits are repeatedly discarded? If they are only being discarded then I would expect not, but possibly there is some rounding that takes place and could cause problems?

An alternative strategy would be, start with 2^{-32} as a float, and then multiply it up by a random integer into the range 0.0f, 1.0f. However it's harder for me to understand in terms of the standard exactly what will happen if I do that -- what if 2^{-32} is not exactly representable? If I simply write it as a multiplication, then the int will be promoted to a float first anyways, right? Is it better to do some kind hand-rolled operation for int * float, using a bit-by-bit doubling routine etc.?

Source Link

Portably generate uniformly random floats from mt19937 output

My goal is to generate a sequence of random float from a seeded random generator. The sequence should be the same on any machine / any compiler -- this rules out the use of std::uniform_real_distribution since it doesn't make that guarantee. Instead we have to create our own version of this, which we will be able to guarantee is portable / uses the same implementation on all platforms.

We can assume I'm starting from std::mt19937 since the C++ standard does mandate its implementation, so the goal becomes, how can I convert a uniformly random uint32_t to a uniformly random float in the range [0.0f, 1.0f]. The main things I'm concerned about are

  • efficiency
  • loss of precision

I threw something together which looks like this:

template <typename RNG>
float uniform_float(RNG & rng) {
  static_assert(std::is_same<std::uint32_t, typename RNG::result_type>::value, "Expected to be used with RNG whose result type is uint32_t, like mt19937");

  float result = static_cast<float>(rng());

  for (int i = 0; i < 32; ++i) {
    result /= 2;
  }
  return result;
}

In preliminary tests it seems to be outputting floats in the range [0.0f, 1.0f].

Here are some natural questions in my mind:

  • When static casting uint32_t to float, the value is never outside the representable range, so the behavior is not undefined. Typically it will not be representable exactly though, since both types have 32 bits, and float has to have some overhead. The standard says it is implementation-defined whether I get the next highest or next lowest representable number in this case. I assume that it doesn't matter since I'm going to divide by two anyways many times after this, and then many of these values will collide anyways?
  • Is it better (faster) to divide by the uint32_t max value, rather than by 2^32? I assume not.
  • Does dividing by two repeatedly cause a subtle bias as the least significant bits are repeatedly discarded? If they are only being discarded then I would expect not, but possibly there is some rounding that takes place and could cause problems?

An alternative strategy would be, start with 2^{-32} as a float, and then multiply it up by a random integer into the range 0.0f, 1.0f. However it's harder for me to understand in terms of the standard exactly what will happen if I do that -- what if 2^{-32} is not exactly representable? If I simply write it as a multiplication, then the int will be promoted to a float first anyways, right? Is it better to do some kind hand-rolled operation for int * float, using a bit-by-bit doubling routine etc.?