Optimizing C++ Implementation of the Simulated Bifurcation Algorithm (SB)

Question

I've implemented the Simulated Bifurcation (SB) algorithm in C++, based on the method described in [2] (link to Science Advances). The goal of this algorithm is to find the ground state of an Ising model, introduced in my paper [1] (link to arXiv) .

Here's the code:

#include <iostream>
#include <Eigen/Dense>
#include <vector>
#include <random>
#include <cmath>
#include <ctime>

// [1]: https://arxiv.org/abs/2411.19604
// [2]: https://www.science.org/doi/full/10.1126/sciadv.aav2372

std::pair<double, Eigen::VectorXd> calculate_min_J_o(const Eigen::MatrixXd& J) {

    /*
    This function calculates the minimum energy and configuration of the Hamiltonian of the Ising model in [1].
    This assumes the two cluster pattern holds (which has been proven in the paper). 
    */

    int N = J.rows();
    std::vector<double> H_l;
    std::vector<Eigen::VectorXd> s_l;

    for (int M = 1; M < N; ++M) {
        Eigen::VectorXd s(N);
        s.head(M).setOnes();
        s.tail(N - M).setConstant(-1.0);

        double H = s.transpose() * J * s;
        H_l.push_back(H);
        s_l.push_back(s);
    }

    auto min_H_iter = std::min_element(H_l.begin(), H_l.end());
    int min_H_index = std::distance(H_l.begin(), min_H_iter);
    return { *min_H_iter, s_l[min_H_index] };
}


Eigen::MatrixXd J_o(int N, double d = 1.0) {

    /*
    Ordered J matrix [1]
    */ 

    Eigen::MatrixXd matrix(N, N);
    for (int i = 0; i < N; ++i) {
        for (int j = 0; j < N; ++j) {
            matrix(i, j) = std::pow((i + 1.0) / N, d) + std::pow((j + 1.0) / N, d);
        }
    }
    matrix /= std::pow(N, 2);
    return matrix;
}

// Function g(x)
Eigen::MatrixXd g(const Eigen::MatrixXd& x) {
    return x.unaryExpr([](double xi) {
        if (std::abs(xi) <= 1) return xi;
        return xi > 1 ? 1.0 : -1.0;
    });
}

// Function h(x, y)
Eigen::MatrixXd h(const Eigen::MatrixXd& x, const Eigen::MatrixXd& y) {
    Eigen::MatrixXd result(y.rows(), y.cols());
    for (int i = 0; i < x.rows(); ++i) {
        for (int j = 0; j < x.cols(); ++j) {
            result(i, j) = std::abs(x(i, j)) <= 1 ? y(i, j) : 0;
        }
    }
    return result;
}

// Simulated Bifurcation Algorithm with K parallel batches [2]
Eigen::MatrixXd SimulatedBifurcation(const Eigen::MatrixXd& J, int K = 3, int steps = 10000, 
                                     double c_0 = -1, double a_0 = 1, double delta_t = 0.1, double perturbation = 0.005) {
    int N = J.rows();
    if (c_0 == -1) {
        double expectation_J_squared = std::sqrt(J.cwiseAbs2().sum() / (N * (N - 1)));
        c_0 = 0.5 / (expectation_J_squared * std::sqrt(N));
    }

    std::mt19937 gen(std::random_device{}());
    std::uniform_real_distribution<double> dist(-perturbation, perturbation);

    Eigen::MatrixXd x = Eigen::MatrixXd::NullaryExpr(K, N, [&]() { return dist(gen); });
    Eigen::MatrixXd y = Eigen::MatrixXd::NullaryExpr(K, N, [&]() { return dist(gen); });

    double a = 0;

    for (int step = 0; step < steps; ++step) {
        Eigen::MatrixXd f = x * J; // Batch-wise matrix multiplication
        y += ((-(a_0 - a) * x + c_0 * f) * delta_t);
        x += (a_0 * y * delta_t);
        a += 1.0 / steps;

        x = g(x);
        y = h(x, y);
    }

    return x.unaryExpr([](double xi) { return std::copysign(1.0, xi); });
}

int main() {

    // compile with: g++ -O3 -march=native -flto SB.cpp -o SB

    int N = 1000;
    int K = 10;
    int steps = 1000;

    Eigen::MatrixXd J = -J_o(N);

    clock_t start_time = std::clock();
    Eigen::MatrixXd min_x = SimulatedBifurcation(J, K, steps, -1, 1, 0.2, 0.005);
    clock_t end_time = std::clock();

    std::cout << "Time taken: " << (double)(end_time - start_time) / CLOCKS_PER_SEC << " seconds.\n";

    auto [min_H, true_gs] = calculate_min_J_o(-J);
    Eigen::VectorXd true_gs_sign = true_gs.array().sign();
    Eigen::MatrixXd min_x_sign = min_x.array().sign();
    
    // std::cout << "min_x_sign:" << std::endl << min_x_sign << std::endl;
    // std::cout << "true_gs_sign:" << std::endl << true_gs_sign.transpose() << std::endl;

    bool ground_state_matches = false;
    for (int i = 0; i < min_x_sign.rows(); ++i) {
        bool match = (true_gs_sign.array() == min_x_sign.row(i).transpose().array()).all() || 
                     (true_gs_sign.array() == (-min_x_sign.row(i).transpose().array())).all();
        if (match) {
            ground_state_matches = true;
            break;
        }
    }
    
    std::cout << "Number of Batches (K) = " << K << std::endl;
    std::cout << "N = " << N << std::endl;
    std::cout << "steps = " << steps << std::endl;
    std::cout << "Reached the true ground state in any batch: " << (ground_state_matches ? "Yes" : "No") << std::endl;

    return 0;
}

I'm specifically interested in any tips for optimizing the SimulatedBifurcation function, as this is the core of the algorithm and where most of the computation happens. Given the heavy use of matrix operations, I'm particularly curious about efficient use of the Eigen library.

My questions are:

Are there any obvious bottlenecks or areas within the SimulatedBifurcation function where I could improve performance, particularly regarding the Eigen matrix operations? Any suggestions for more efficient ways to express these calculations using Eigen would be greatly appreciated.
What's the best way to migrate this code base from CPU to GPU based calculation?

In section IV, when you spoke of "neglecting the Kronecker delta in Equation (5)", I think you meant equation 2. — J_H
– J_H, Commented Dec 27, 2024 at 23:03
@J_H Thank you for your feedback. We fixed this issue in the version 2, which is available on arxiv. — Amirhossein Rezaei
– Amirhossein Rezaei, Commented Mar 20 at 20:08

G. Sliepen · Accepted Answer · 2024-12-27 20:22:52Z

There are several little things that could be improved in the code:

Unnecessary use of parentheses and some operations. Consider the calculation of match in main(), which could be written much simpler as:
```
bool match = true_gs_sign == min_x_sign.row(i).transpose() || 
             true_gs_sign == -min_x_sign.row(i).transpose();
```
Don't use std::endl, prefer to use '\n'; the former is equivalent to the latter, but also forces the output to be flushed, which isn't necessary.
Use std::chrono to calculate elapsed time instead of C's clock() function.

Find ways to simplify expressions and move things out of loops. The compiler isn't always able to do it for you. For example, in J_o():

matrix(i, j) = std::pow(i + 1.0, d) + std::pow(j + 1.0, d)
…
matrix /= std::pow(N, 2 + d);

And in g():

return x.unaryExpr([](double xi) { return std::clamp(xi, -1.0, 1.0); });

Remove unused values. For example, min_H is never used, which in turn means that calculate_min_J_o()'s return value can be simplified.
Avoid unnecessary (re)calculations. This is especially important when matrices get large. Consider the repeated calculation of \$\mathbf s^\mathrm{T} J \mathbf s\$: each iteration you are only flipping the sign of one element of \$\mathbf s\$. This means calculating the next value of H could be done with much less effort.
Naming things: if you use short variable names that come from formulas from the papers you referenced, add a comment explaining the function of that variable, the paper it's from and the number of the formula where it was first mentioned. For example:
```
// J is the interaction matrix (see arXiv:2411.19604v1 eq. 2)
```
However, I don't see \$g(\mathbf x)\$ or \$h(\mathbf x, \mathbf y)\$ mentioned in the papers, so I don't know what g() and h() are supposed to do. Give them more descriptive names, or refer to a paper they are coming from (possibly your own).

I don't know how to best convert this to something that can run on a GPU, hopefully someone else will add an answer about that.

Stack Exchange Network

Optimizing C++ Implementation of the Simulated Bifurcation Algorithm (SB)

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Optimizing C++ Implementation of the Simulated Bifurcation Algorithm (SB)

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions