Batcher's sort in C++20

Question

Intro

This is the very first time I actually expose myself to C++20 (concepts in this case). I decided to rewrite the Batcher's sort.

(This post has a continuation: Batcher's sort in C++20 - Take II.)

Code

batcherssort.h

#ifndef IO_GITHUB_CODERODDE_UTIL_BATCHERSSORT_H
#define IO_GITHUB_CODERODDE_UTIL_BATCHERSSORT_H

#include <cmath>
#include <concepts>
#include <utility>

#ifdef _OPENMP
#include <omp.h>
#endif

template <std::totally_ordered T>
void batchersSort(T* begin, T* end) {
    auto n = end - begin;

    if (n < 2) {
        return;
    }

    auto t = (size_t) std::ceil(std::log2(n));
    auto s = 1 << (t - 1);

    for (size_t p = s; p > 0; p /= 2) {
        size_t r = 0;
        size_t d = p;

        for (size_t q = s; q >= p; q /= 2) {

            #ifdef _OPENMP
            #pragma omp parallel for schedule(static)
            #endif 
            for (std::ptrdiff_t k = 0; k < n - static_cast<std::ptrdiff_t>(d); ++k) {
                if ((static_cast<std::size_t>(k) & p) == r) {
                    if (begin[k] > begin[k + d]) {
                        std::swap(begin[k], begin[k + d]);
                    }
                }
            }

            d = q - p;
            r = p;
        }
    }
}

#endif // IO_GITHUB_CODERODDE_UTIL_BATCHERSSORT_H

main.cpp

#include "batcherssort.h"
#include <algorithm>
#include <chrono>
#include <iostream>
#include <limits>
#include <random>

static const size_t N = 100000;

static size_t millis() {
    return std::chrono::duration_cast<std::chrono::milliseconds>(
               std::chrono::system_clock::now().time_since_epoch())
        .count();
}

static void fill(int* arr1, int* arr2) {
    std::random_device rd;
    std::mt19937 gen(rd());
    std::uniform_int_distribution<int> dist(std::numeric_limits<int>::min(),
                                            std::numeric_limits<int>::max());

    for (size_t i = 0; i < N; ++i) {
        int val = dist(gen);
        arr1[i] = val;
        arr2[i] = val;
    }
}

int main()
{
    int* arr1 = new int[N];
    int* arr2 = new int[N];

    fill(arr1, arr2);

    auto ta = millis();
    std::sort(arr1, arr1 + N);
    auto tb = millis();

    std::cout << "std::sort: " << (tb - ta) << " ms.\n";

    ta = millis();
    batchersSort(arr2, arr2 + N);
    tb = millis();

    std::cout << "batchersSort: " << (tb - ta) << " ms.\n";
    std::cout << "Algorithms agree: "
              << std::boolalpha
              << std::equal(arr1, arr1 + N, arr2)
              << "\n";
}

Typical output

std::sort: 11 ms.
batchersSort: 25 ms.
Algorithms agree: true

Critique request

As always, I am eager to hear some constructive commentary.

On what platform did you run your benchmark? I get results more like 1 ms/12 ms on a MacBook Air M4, code compiled with Xcode/clang, optimization level set to "Fastest [-Os]". — Martin R
– Martin R, Commented Jan 24 at 9:02

Martin R · Accepted Answer · 2026-01-23 21:08:11Z

5

The C++ std::sort function has an optional third argument to provide a custom comparator. You can do the same with your sort function to make it more flexible:

template <class T, typename Comparator = std::less<T> >
void batchersSort(T* begin, T* end, const Comparator comp = Comparator()) {
    // ...

                    if (comp(begin[k+d], begin[k])) {
                        std::swap(begin[k], begin[k + d]);
                    }
    // ...
}

Now you can do things like

std::array a{ 1, 2, 3, 4, 5, 6 };
batchersSort(a.begin(), a.end(), std::greater{});

to sort the elements in descending order.

edited Jan 23 at 21:08

answered Jan 23 at 15:45

Martin R

24.8k2 gold badges39 silver badges97 bronze badges

\$\begingroup\$ You shouldn’t use both std::totally_ordered and a comparator. The whole point of the comparator is that you can sort things that are aren’t ordered, totally or otherwise. \$\endgroup\$

indi
– indi

2026-01-23 21:00:44 +00:00
Commented Jan 23 at 21:00
\$\begingroup\$ @indi: Yes, that makes sense. My C++ is a bit rusty, is it better now? \$\endgroup\$

Martin R
– Martin R

2026-01-23 21:08:50 +00:00
Commented Jan 23 at 21:08

Add a comment |

Toby Speight · Accepted Answer · 2026-01-23 16:07:31Z

6

There are several headers that define std::size_t:

Defined in header <cstddef>
Defined in header <cstdio>
Defined in header <cstdlib>
Defined in header <cstring>
Defined in header <ctime>
Defined in header <cwchar>

However, we haven't included any of these.

Don't rely on any of these additionally defining global-namespace size_t - that's an option to the implementation, not a guarantee to the programmer.

We also need to include <cstddef> for std::ptrdiff_t.

Conversely, we include <omp.h> but don't use any of its definitions.

n, t and s remain constant throughout their scope - I recommend adding const to their declaration/initialisation lines.

I'd probably prefer d to be a std::ptrdiff_t from the start, rather than casting in the loop control expression, where it makes the logic harder to read:

        auto d = static_cast<std::ptrdiff_t>(p);

            for (std::ptrdiff_t k = 0;  k < n - d;  ++k) {

C++20 provides a better way to get the required power of 2, without using floating-point arithmetic:

#if __cpp_lib_int_pow2 >= 202002L
    // #include <bit>
    auto const s = std::bit_ceil(static_cast<std::size_t>(n/2));
#else
    auto const t = (size_t) std::ceil(std::log2(n));
    auto const s = 1uz << (t - 1);
#endif

(note the use of std::size_t constant for s in the fallback case - int is a bad choice there)

edited Jan 23 at 16:07

answered Jan 23 at 14:43

Toby Speight

89.6k14 gold badges105 silver badges329 bronze badges

\$\begingroup\$ Unless I am mistaken, s from the original code is the largest power of 2 less than n, whereas your std::bit_ceil version calculates the smallest power of 2 greater than or equal to n. So a small correction might be necessary. \$\endgroup\$

Martin R
– Martin R

2026-01-23 15:55:22 +00:00
Commented Jan 23 at 15:55
1

\$\begingroup\$ I may have misunderstood the original, in which case there are other suitable functions (such as std::bit_floor()). I think that adds weight to my argument that the floating-point logic is hard to comprehend! \$\endgroup\$

Toby Speight
– Toby Speight

2026-01-23 16:05:23 +00:00
Commented Jan 23 at 16:05
\$\begingroup\$ I fully agree. – std::bit_floor(static_cast<std::size_t>(n-1)) should produce the same values as the original code. \$\endgroup\$

Martin R
– Martin R

2026-01-23 16:30:24 +00:00
Commented Jan 23 at 16:30

Add a comment |

Chris · Accepted Answer · 2026-01-24 06:46:44Z

6

Flexibility

The biggest opportunity I see, aside from adding the option for a customer comparator, is enabling your sort to work with any iterators over a container of std::totally_ordered values.

By using pointers, you've limited the usefulness of the function to essentially just the dynamically allocated arrays you're using. What if you wanted to sort a std::vector<int>?

Dynamically allocating memory

A bit pedantic, but in your test you've used new to allocate arrays, but there's no accompanying delete [] for each.

    int* arr1 = new int[N];
    int* arr2 = new int[N];

    ...

    delete [] arr1;
    delete [] arr2;

Or use smart pointers which would automatically free their memory on going out of scope.

    auto arr1 = std::make_unique<int[]>(N);
    auto arr2 = std::make_unique<int[]>(N);

edited Jan 24 at 6:46

answered Jan 23 at 17:51

Chris

10.3k1 gold badge9 silver badges66 bronze badges

1

\$\begingroup\$ std::vector might not be the best example to use for the flexibility observation, since it's specified to have contiguous storage accessible via its data() member function (excluding std::vector<bool> of course, which is a maverick). \$\endgroup\$

Toby Speight
– Toby Speight

2026-01-26 08:04:07 +00:00
Commented Jan 26 at 8:04

Add a comment |

Stack Exchange Network

Batcher's sort in C++20

Intro

Code

Typical output

Critique request

3 Answers 3

Flexibility

Dynamically allocating memory

You must log in to answer this question.

Linked

Hot Network Questions

Batcher's sort in C++20

Intro

Code

Typical output

Critique request

3 Answers 3

Flexibility

Dynamically allocating memory

You must log in to answer this question.

Linked

Related

Hot Network Questions