C++ Bitwise adjacency matrix implementation

Question

I have tried to implement a bitwise adjacency matrix to represent a graph with a fixed number of vertices.

Instead of wastefully representing each connection with an integer, 64 connections are packed into a single uint64_t, to improve memory and hopefully performance (via improved cache locality).
Instead of vector<vector<uint64_t>>, using a contiguous vector<uint64_t>, once again to improve cache locality.
A user interface similar to that of an vector<unordered_set<int>> adjacency list:
- adjmat[i] to access a row
- adjmat[i].contains(j) to check edge existence between i and j.
- for(int neighbor : adjmat[i]) neighbor iteration.

My main motivation is just to create a really fast, efficient representation for dense graphs. However, I am quite inexperienced, and since performance optimization is quite technical, I'd really like some feedback since I might be out of my depth.

Currently some concerns I have with my code include:

Is this actually a valid way to improve performance of an adjacency matrix? I am using memory much more efficiently, but at the expense of more arithmetic and bit-shifting operations - does that erode any potential gains?
I want to add familiar layers of abstraction (e.g. iterators) to replicate some functionality of vector<unordered_set<int>>. However, since I am already operating at such a low-level, I'm worried that the overhead of an iterator offsets all potential gains - if so, how can I create that abstraction without sacrificing performance?
Also, I can't figure out why I can't constexpr my constructor, and if I can't, does that make all the other constexpr methods useless?

Of course, any other comments or thoughts are appreciated too. Thanks!

Compiler explorer link

#include <vector>
#include <cstdint>
#include <iostream>

class BitAdjmat {
public:

    // Abstraction for a "Row" of the BitAdjmat
    class Row {
    public:
        /**
        * This row iterator should only iterate through existing edges, e.g. "1" entries.
        * I've settled for only using a read-only iterator (i.e. const_iterator), since
        * there is no way to hold a reference to a single bit.
        */
        class const_iterator {
        public:

            constexpr const_iterator(std::size_t index,
                std::vector<uint64_t>::const_iterator start,
                std::size_t end_index)
                : index{index}, start{start}, end_index{end_index}
            {
                if (!exists()) {
                    seek_next();
                }
            }

            /**
            * Warning: This is special constructor that is optimized for constructing
            * the end() iterator, and shouldn't be used for anything else.
            */
            constexpr const_iterator(std::size_t end_index,
                std::vector<uint64_t>::const_iterator start) noexcept
                : index{end_index}, start{start}, end_index{end_index} {}

            constexpr const_iterator &operator++() noexcept {
                seek_next();
                return *this;
            }

            constexpr std::size_t operator*() const noexcept { return index; }
            
            constexpr bool operator==(const const_iterator &other) const noexcept { 
                return index == other.index && start == other.start; 
            }
            constexpr bool operator!=(const const_iterator &other) const noexcept { return !(*this == other); }

        private:

            // Check if current neighbor pointed to by *(start+index/N) exists, i.e., if the bit returned is 1.
            constexpr bool exists() const noexcept { return ((*(start + index / N)) >> (index % N)) & 1ULL; }

            constexpr void seek_next() noexcept {
                while (index != end_index) {
                    ++index;
                    if (exists())
                    return;
                }
            }

        private:
            std::size_t index; // logical vertex index (not vector<uint64_t> index)
            const std::vector<uint64_t>::const_iterator start;
            const std::size_t end_index;
        };

    public:

        constexpr Row(const BitAdjmat &parent, std::size_t row_index) noexcept
        : start{parent.matrix.begin() + parent.num_uint64_per_row * row_index}, 
          num_vertices{parent.num_vertices} {}
        
        constexpr bool contains(std::size_t nb_index) const noexcept { 
            return ((*(start + nb_index / N)) >> (nb_index % N)) & 1ULL; 
        }

        constexpr const_iterator begin() const noexcept { return const_iterator(0, start, num_vertices); }
        constexpr const_iterator end() const noexcept { return const_iterator(num_vertices, start); }

    private:
        const std::vector<uint64_t>::const_iterator start;
        const std::size_t num_vertices;
    };

public:
  
    BitAdjmat(std::size_t num_vertices)
        : num_vertices{num_vertices}, 
          num_uint64_per_row{num_vertices / N + (num_vertices % N != 0)}, 
          matrix{std::vector<uint64_t>(num_vertices * num_uint64_per_row, 0)} {}

    constexpr Row operator[](std::size_t row_index) noexcept { return Row(*this, row_index); }
    constexpr const Row operator[](std::size_t row_index) const noexcept { return Row(*this, row_index); }

    constexpr std::size_t size() const { return num_vertices; }

    // Direct access to adjmat entry, where 'i' and 'j' are logical vertex indices.
    constexpr bool get(std::size_t i, std::size_t j) const noexcept { 
        return (matrix[flat_index(i, j / N)] >> (j % N)) & 1ULL; 
    }

    // Direct setting of adjmat entries, where 'i' and 'j' are logical vertex indices.
    constexpr void set(std::size_t i, std::size_t j, bool val) noexcept {
        std::size_t q = j / N;
        std::size_t r = j % N;
        matrix[flat_index(i, q)] = (matrix[flat_index(i, q)] & ~(1ULL << r)) | (static_cast<uint64_t>(val) << r);
    }


private:
    const std::size_t num_vertices;
    const std::size_t num_uint64_per_row;
    std::vector<uint64_t> matrix; // Flat container representating a 2D vector<vector<uint64_t>>.

    constexpr static std::size_t N = 64;

private:
    constexpr std::size_t flat_index(std::size_t i, std::size_t j) const { return i * num_uint64_per_row + j; }


};

You spelt std::size_t correctly, but not std::uint64_t. Is there a reason for that? — Toby Speight
– Toby Speight, Commented Jul 6, 2023 at 8:17
I've taken exactly the same approach earlier to representing graphs with at most 64 vertices. Are you looking to solve a particular problem with this representation? I've implemented various algorithms on top of them too, including greedy coloring, maximal clique enumeration, path finding etc. — Juho
– Juho, Commented Jul 10, 2023 at 18:18

user555045 · Accepted Answer · 2023-07-06 10:33:58Z

Is this actually a valid way to improve performance of an adjacency matrix?

Yes, I have used this technique in several software projects to great effect. But it only really works well if you actually use the fact that you've bit-packed your booleans, I mean beyond just saving space. By the way you cannot really use the "packedness" if you use a vector<bool>, so I'll make the opposite recommendation of Toby Speight: keep the vector<uint64_t>. But use it.

How to use a bit-packed format

The key to actually making use of bit-packed data, is doing as much as possible with whole uint64_ts, and as little as possible bit-by-bit.

So don't do this, this wastes the potential that you created by packing your bits:

constexpr void seek_next() noexcept {
    while (index != end_index) {
        ++index;
        if (exists())
        return;
    }
}

Instead, change your iterator state to a pair of "index of the current uint64_t" and "copy of current uint64_t but with the bits that have already been iterated over reset". Then you can do:

While the current uint64_t (with the bits that have already been iterated over reset) is zero, skip ahead to the next uint64_t.
If it's not zero, calculate where the next set bit is with std::countr_zero. Remember to reset that bit in the iterator state.

That way there is much less "searching", the only actual searching that happens is on the level of uint64_ts, not individual bits. Large blocks of zeroes are efficiently skipped. High-entropy blocks of 50% zeroes and 50% ones do not suffer from branch misprediction, since there is no branch per individual bit.

Here's a reference: Daniel Lemire's blog: Iterating over set bits quickly. The iteration is "out in the open" there, but you can put that into an iterator. And you can use the standard std::countr_zero these days.

There are other operations that you can implement efficiently using similar principles, such as counting neighbours (std::popcount) or computing the complement graph. You can use bulk operations that treat each row as a bit set as the building blocks for graph search algorithms and such.

This is a good point. Although I would then say: create a class bitvector that stores the bits and provides iterators that return indices of only the bits that are 1. Add something equivalent to a std::span for it so you can iterate over subsets of a bitvector. Then you can use that class inside BitAdjMat. This gives you a reusable bit vector and a greatly simplified BitAdjMat. — G. Sliepen
– G. Sliepen, Commented Jul 6, 2023 at 12:37
Thanks, that's pretty clever! And much better than my version of checking every single bit. I tried it out in my code and it definitely is faster. Although I couldn't figure out how to do it without 4 member variables. Minimal example: godbolt.org/z/osdxMn8Ec . I needed cumulative_ to keep track of how many uint64s we pass by, and end_ so that seek_next knows when to stop. — Paradox
– Paradox, Commented Jul 7, 2023 at 20:11

G. Sliepen · Accepted Answer · 2023-07-06 08:43:19Z

This looks nice, but I agree with Toby Speight that you should leverage std::vector<bool> instead, which would get rid of a lot of code. I just want to add:

About your concerns

Is this actually a valid way to improve performance of an adjacency matrix? I am using memory much more efficiently, but at the expense of more arithmetic and bit-shifting operations - does that erode any potential gains?

Accessing memory, especially if it's not already in the L1 cache, is quite slow compared to simple arithmetic and bit-shifting operations. Your solution also has to do less pointer indirection. So I am pretty sure your solution is more performant compared to having a set of nested containers containing integer indices. However, you should not take my word for it, you should actually measure it: create a realistic benchmark for what you want to do with this (for example, do you want to do a minimum spanning tree or breadth-first search on the matrix), then run it with two different implementations of the adjacency matrix, and measure the time and memory it takes.

I want to add familiar layers of abstraction (e.g. iterators) to replicate some functionality of vector<unordered_set<int>>. However, since I am already operating at such a low-level, I'm worried that the overhead of an iterator offsets all potential gains - if so, how can I create that abstraction without sacrificing performance?

The iterators don't add overhead. You need some way to iterate over the neighbors, regardless of whether you implement the iterators yourself, or whether std::unordered_set<int> would provide iterators for you. Typically, iterators only live for the duration of a loop, they will be inlined and can be heavily optimized by the compiler.

Also, I can't figure out why I can't constexpr my constructor,

This is probably because your constructor has to construct a std::vector, which in turn is not constexpr before C++20. Since C++20 std::vector has gotten constexpr constructors, so try compiling it with the -std=c++20 or /std:c++20 (for MSVC) flags.

and if I can't, does that make all the other constexpr methods useless?

Since C++20 everything can be made constexpr. Before that, it would indeed have been useless, although it never hurts to prepare for a future where constexpr is possible, so the more functions you can make compile with constexpr, the better.

Thanks for your comments! I tried comparing it to an equivalent class using vector<unordered_set<int>> and it did seem significantly faster for adjacency iteration, at least for sparse/dense graphs with ~1000 vertices which was my use case. — Paradox
– Paradox, Commented Jul 7, 2023 at 19:36

Toby Speight · Accepted Answer · 2023-07-06 09:20:10Z

3

Most of this code seems to be a reimplementation of std::vector<bool>. I think that implementing in terms of that class will result in simpler code (it may also be more efficient, and will work on platforms that don't provide std::uint64_t).

edited Jul 6, 2023 at 9:20

answered Jul 6, 2023 at 8:20

Toby Speight

88.7k14 gold badges104 silver badges327 bronze badges

1

\$\begingroup\$ Thanks for bringing vector<bool> to my attention. I tried a version with vector<bool> instead of manual bit operations and the logic was much simpler since I could just use its iterators, with comparable performance. \$\endgroup\$

Paradox
– Paradox

2023-07-07 19:30:41 +00:00
Commented Jul 7, 2023 at 19:30

Add a comment |

Stack Exchange Network

C++ Bitwise adjacency matrix implementation

3 Answers 3

How to use a bit-packed format

About your concerns

You must log in to answer this question.

Hot Network Questions

C++ Bitwise adjacency matrix implementation

3 Answers 3

How to use a bit-packed format

About your concerns

You must log in to answer this question.

Related

Hot Network Questions