4
\$\begingroup\$

I would like to modify my shortest path finding code to use multithreading and improve its performance. The algorithm must be able to work with negative weights, and as a result, we cannot use Dijkstra's algorithm. Instead, our approach combines three algorithms: Bellman-Ford, SPFA_Queue (shortest path fast algorithm, an improvement over Bellman-Ford that uses a queue and is faster in some cases), and SPFA_MinHeap (a second variation of SPFA that uses a minheap).

After carrying out a large number of tests, we have determined the optimal conditions for selecting one of the three algorithms based on the size and density of the graph.

The question now is how to integrate multithreading in the most efficient way possible based on the way each algorithm works. If necessary, we are open to rewriting parts of the algorithms to enable multithreading and improve performance, we would like to retain the use of these three algorithms since they have proven to be very efficient so far, and we hope to preserve the performance gains obtained from previous work while achieving maximum optimization through multithreading.

I'm not asking for a full code review here. We're quite satisfied with the single-threaded code itself, and all the headers and used methods are properly defined and tested in separate files (which can be provided upon request). However, our main concern is to find the most efficient strategy to implement multithreading into the current logic.

Here is the code in question:

/** 
 * Computes the shortest path between a given source node and all other nodes in a weighted graph using either the Bellman-Ford or SPFA_SLF algorithm based on the density and number of nodes in the graph.
 *
 * Arguments:
 *     links: A 2D array of integers representing the links between nodes. Each row represents a link and contains the indices of the two nodes and the cost of the link.
 *     links_size: The number of links in the links array.
 *     s: The index of the starting node.
 *     dist: An array of size nb_nodes to store the shortest distances from each node to s.
 *     path: An array of size nb_nodes to store the shortest paths from each node to s.
 *     nb_nodes: The total number of nodes in the graph.
 *
 * Returns:
 *     0 if the computation succeeded, 1 if a negative cycle was detected in the graph.
 **/

int shortest_path(sint links[][3], uint links_size, uint s, lsint *dist, uint *path, uint nb_nodes)
{

    // Select algorithm based on graph density and number of nodes
    if (links_size <= 100 || nb_nodes <= 100)
    {
        return bellman_ford(links, links_size, s, dist, path, nb_nodes);
    }
    else if (((double)links_size / (double)(nb_nodes)) < 1)
    {   
        return SPFA_queue(links, links_size, s, dist, path, nb_nodes);
    }
    else
    {   
        return SPFA_minheap(links, links_size, s, dist, path, nb_nodes);
    }
}

// Version 1: classical Bellman-Ford algorithm
int bellman_ford(sint links[][3], uint links_size, uint s, lsint *dist, uint *path, uint nb_nodes)
{
    int updates;
    uint i;
    uint j;
    uint k;

    // Initialize all elements of dist to INT64_MAX ("infinity").
    for (i = 0; i < nb_nodes; i++)
    {
        dist[i] = INT64_MAX;
    }

    // Initialise path
    memset(path, -1, nb_nodes * sizeof(sint));

    dist[s] = 0;

    // Perform nb_nodes - 1 iterations of Bellman-Ford algorithm
    for (i = 0; i < nb_nodes - 1; i++)
    {
        updates = 0;
        // For each link in the links array, attempt to update shortest distances for each node.
        for (j = 0; j < links_size; j++)
        {
            uint node_from = links[j][0];
            uint node_to = links[j][1];
            lsint cost = links[j][2];

            // If a shorter distance is found, store it in the dist array and update the path array
            if ((dist[node_from] != INT64_MAX) && (dist[node_from] + cost < dist[node_to]))
            {
                dist[node_to] = dist[node_from] + cost;
                path[node_to] = node_from;
                updates++;
            }
        }

        // If no update was made during the previous iteration, exit the loop
        if (updates == 0)
        {
            break;
        }
    }

    // Check for negative cycle in the graph
    for (k = 0; k < links_size; k++)
    {
        uint node_from = links[k][0];
        uint node_to = links[k][1];
        lsint cost = links[k][2];

        // If a shorter distance is still found, there is a negative cycle
        if ((dist[node_from] != INT64_MAX) && (dist[node_from] + cost < dist[node_to]))
        {
            // printf("Negative cycle detected\n");
            return 1;
        }
    }

    return 0;
}


// Version 2: SPFA using small label first method and a queue
int SPFA_queue(sint links[][3], uint links_size, uint s, lsint *dist, uint *path, uint nb_nodes)
{
    queue_t *queue = create_queue(); // Initialize the queue
    if (queue == NULL)
    {
        fprintf(stderr, "Failed to allocate memory for queue.\n");
        return 1;
    }

    uint i;

    uint *length = calloc(nb_nodes, sizeof *length); // Array to store the number of nodes in the shortest path for each node in the graph.
    if (length == NULL)
    {
        fprintf(stderr, "Failed to allocate memory for length.\n");
        free_queue(queue);
        return 1;
    }

    // Initialize all elements of dist to INT64_MAX ("infinity").
    for (i = 0; i < nb_nodes; i++)
    {
        dist[i] = INT64_MAX;
    }

    // Initialise path
    memset(path, -1, nb_nodes * sizeof(sint));

    dist[s] = 0;       // Set the distance of the source node to 0
    enqueue(queue, s); // Add the source node to the queue
    length[s] = 1;

    bool *in_queue = calloc(nb_nodes, sizeof(bool));
    in_queue[s] = true;

    // While the queue is not empty, perform iterations of the algorithm
    while (!is_empty(queue))
    {
        uint current_node = dequeue(queue); // Dequeue the next node to process
        in_queue[current_node] = false;
        // If the length of the shortest path for any link reaches nb_nodes, there is a negative cycle in the graph
        if (length[current_node] >= nb_nodes)
        {
            // printf("Negative cycle detected\n");
            free(length);
            free(in_queue);
            free_queue(queue);
            return 1;
        }

        int min_node = -1;
        lsint min_dist = INT64_MAX;

        // Iterate through all the links in the graph
        for (i = 0; i < links_size; i++)
        {
            // If the current link starts from the dequeued node (current_node)
            if (links[i][0] == current_node)
            {
                uint node_from = links[i][0];
                uint node_to = links[i][1];
                lsint cost = links[i][2];

                // If a shorter path is found, update the distances and paths
                if (dist[node_from] + cost < dist[node_to])
                {
                    dist[node_to] = dist[node_from] + cost;
                    path[node_to] = node_from;
                    length[node_to] = length[node_from] + 1;

                    // If the destination node (node_to) is not in the queue, enqueue it
                    if (!in_queue[node_to])
                    {
                        enqueue(queue, node_to);
                        in_queue[node_to] = true;
                    }
                }

                // Update the minimum distance and node for Small Label First (SLF)
                if (dist[node_to] < min_dist)
                {
                    min_dist = dist[node_to];
                    min_node = node_to;
                }
            }
        }
        if (min_node != -1)
        {
            // Move the node with the minimum distance to the front of the queue
            move_to_front(queue, min_node);
        }
    }

    free(length);
    free(in_queue);
    free_queue(queue);
    return 0;
}


// Version 3: SPFA using a min_heap
int SPFA_minheap(sint links[][3], uint links_size, uint s, lsint *dist, uint *path, uint nb_nodes)
{
    // Create a min heap to store the nodes and their distances.
    MinHeap *heap = create_minheap(nb_nodes);
    if (heap == NULL)
    {
        fprintf(stderr, "Failed to allocate memory for min heap.\n");
        return EXIT_FAILURE;
    }

    uint i;

    // Allocate memory for an integer array to keep track of whether a node is already in the heap.
    uint *in_heap = calloc(nb_nodes, sizeof *in_heap);
    if (in_heap == NULL)
    {
        fprintf(stderr, "Failed to allocate memory for in_heap.\n");
        free_minheap(heap);
        return EXIT_FAILURE;
    }

    // Create an array to store the number of nodes in the shortest path for each node in the graph.
    uint *length = calloc(nb_nodes, sizeof *length); // Array to store the number of nodes in the shortest path for each node in the graph.
    if (length == NULL)
    {
        fprintf(stderr, "Failed to allocate memory for length.\n");
        free_minheap(heap);
        free(in_heap);
        return EXIT_FAILURE;
    }

    // Initialize all elements of dist to INT64_MAX ("infinity").
    for (i = 0; i < nb_nodes; i++)
    {
        dist[i] = INT64_MAX;
    }

    // Initialize all elements of `path` to -1.
    memset(path, -1, nb_nodes * sizeof(sint));

    // Create an adjacency list to represent the graph.
    adjacency_node_t **adj_list = create_adjacency_list(nb_nodes, links, links_size);
    if (adj_list == NULL)
    {
        fprintf(stderr, "Failed to allocate memory for adj_list.\n");
        free(length);
        free(in_heap);
        free_minheap(heap);
        return EXIT_FAILURE;
    }

    // Set the distance of the source node to 0.
    dist[s] = 0;

    // Add the source node to the heap.
    heap = insert_minheap(heap, s);
    in_heap[s] = 1;

    // Set the length of the shortest path for the source node to 1.
    length[s] = 1;

    // While the heap is not empty, perform iterations of the algorithm
    while (heap->size != 0)
    {
        // Extract the node with the minimum distance from the heap.
        uint current_node = get_min(heap);
        heap = delete_minimum(heap);
        in_heap[current_node] = 0;

        // If the length of the shortest path for any link reaches nb_nodes, there is a negative cycle in the graph
        if (length[current_node] >= nb_nodes)
        {
            // printf("Negative cycle detected\n");
            free(length);
            free_adjacency_list(adj_list, nb_nodes);
            free(in_heap);
            free_minheap(heap);
            return EXIT_FAILURE;
        }

        adjacency_node_t *adj_node = adj_list[current_node];

        // Loop over all the edges that start from the current node
        while (adj_node != NULL)
        {
            uint node_to = adj_node->node_to;
            lsint cost = adj_node->cost;

            // If a shorter path is found, update the distances and paths
            if (dist[current_node] + cost < dist[node_to])
            {
                dist[node_to] = dist[current_node] + cost;
                path[node_to] = current_node;
                length[node_to] = length[current_node] + 1;

                // If the destination node (node_to) is not in the heap, insert it
                if (!in_heap[node_to])
                {
                    insert_minheap(heap, node_to);
                    in_heap[node_to] = 1;
                }
            }
            adj_node = adj_node->next;
        }
    }

    free(length);
    free_adjacency_list(adj_list, nb_nodes);
    free_minheap(heap);
    free(in_heap);    
    return EXIT_SUCCESS;
}
\$\endgroup\$
6
  • 2
    \$\begingroup\$ Note that at Code Review we primarily review the code you have written, we normally don't answer "how do I do X?" type questions and don't write code for you. However, we're happy to review your single-threaded code for you, and maybe give some hints about how to approach making it multi-threaded. \$\endgroup\$
    – G. Sliepen
    Commented Apr 8, 2023 at 17:42
  • \$\begingroup\$ Thanks for your response. I realize my question wasn't quite on-topic. If you could share any insights on developing a multi-threaded approach for my code, without straying too far from the forum's scope, it would be really valuable to me. I appreciate your help. \$\endgroup\$
    – c.leblanc
    Commented Apr 8, 2023 at 22:43
  • \$\begingroup\$ So far, the only opportunity I found to multithread a shortest path algorithm was parallel computing of the edge weights. As a node can have many neighbors, if the calculation is expensive it might be quicker if done in parallel. \$\endgroup\$ Commented Apr 9, 2023 at 16:23
  • \$\begingroup\$ @DasKrümelmonster Not sure I got you fully. Which case are you parallelizing exactly ? \$\endgroup\$
    – c.leblanc
    Commented Apr 10, 2023 at 13:06
  • \$\begingroup\$ This loop: // Loop over all the edges that start from the current node while (adj_node != NULL) can be parallel. But if your cost is known ahead of time, it's likely not better then \$\endgroup\$ Commented Apr 11, 2023 at 19:34

1 Answer 1

6
\$\begingroup\$

Avoid redefining integer types

I see types like sint and uint, I think somewhere you have lines like:

typedef signed int sint;
typedef unsigned int uint;

However, I strongly recommend you avoid doing that. Your particular typedefs are unfamiliar to other people, who now have to check what sint means. It saved you some typing, but made the code harder to read. I recommend you use a combination of these two alternatives instead:

  1. Use the standard fixed-width integer types like int32_t and uint32_t. They have the benefit that they have a well-defined size, whereas something like int and long int does not, and might vary between CPU architectures and even operating systems. For example, long int might not be 64-bits, and thus storing INT64_MAX in a lsint might not do what you want.

    Also, for sizes, counts and indices, prefer using size_t.

  2. Give them a name that reflects how they are used. For example, you might want to create type aliases named node_id and node_cost.

Create structs to group related data

Instead of representing links as arrays of length 3, create a struct to hold all the information:

struct link {
    node_id from;
    node_id to;
    node_cost cost;
};

Then you don't have to remember the exact order in which things are stored in the array, and you can mix different types. It also avoids having to pass two-dimensional arrays to shortest_path:

int shortest_path(struct link links[], …) {…}

Avoid unnecessary floating point math

Converting integers to/from floating points can be expensive, and if your data is all integers, I would try to avoid it. For example:

else if (((double)links_size / (double)(nb_nodes)) < 1)

Can be rewritten as:

else if (links_size < nb_nodes)

Parallelizing the code

Not all algorithms lend themselves to be parallelized. Most require that the steps are done in a well-defined order, and having multiple threads work at the same time on the same problem might prevent that.

You can add mutexes to prevent multiple threads from modifying the same bit of data simultaneously. However, mutexes add some overhead, and if multiple threads try to access the same mutex often, a multi-threaded version of an algorithm might even be slower than a single-threaded version.

Even without the overhead of mutexes, using multiple threads only makes sense if a large part of the work can be done independently, and there is little need for synchronization between the threads (see Amdahl's law).

Finally, multiple threads allow more computation to be done at the same time, but unless you are on a NUMA system, all cores share access to the main memory, which can be a bottleneck. For example, it doesn't make sense to parallelize memset(), as one core can probably saturate the available memory bandwidth.

With all that in mind, let's look at the algorithms:

Bellman-Ford

In the main part of the algorithm, you could decide to parallelize the inner for-loop. Most of the time, each thread will work on a link between two nodes that none of the other threads are working on. However, the problem is that sometimes they might work on links with overlapping nodes. The algorithm only works correctly if the whole if-statement is executed atomically. That would require mutexes, and the overhead of that compared to the whole if-statement probably does not make it worth parallelizing.

You could try partitioning the graph into segments that have the least amount of links to other segments, and then have each thread do the interior links of one segment, and then afterwards process all the links between segments in a single-threaded way. That might work for some graphs, and would probably need very large graphs before the effort spent partitioning would give a net benefit.

SFPA

While Bellman-Ford didn't care about the order in which links were processed, the same is not true for the shortest path fast algorithms. The problem is that each time a node is visited, it modifies the order of the queue or heap. You would need a mutex to protect the queue or heap, but even then you cannot start multiple threads because you don't know what the right order is to visit the nodes. This makes it very hard to parallelize. I don't think this is a candidate for parallelization at all.

\$\endgroup\$
8
  • \$\begingroup\$ Thank you very much for your response, I've clearly received and taken in your recommendations. Two lines of thought about SFPA: 1) While it is correct that there is an order in which links are visited, it doesn't mean it's mandatory for the logic of the algorithm. It makes a better average complexity, but the algorithm also works if that order is random. 2) What about dividing the graph into subgraphs with common nodes at the boundaries and make each threads work on a subgraph ? (with mutex on common nodes). Input graphs are very large, from thousands to hundreds of thousands of nodes/edges. \$\endgroup\$
    – c.leblanc
    Commented Apr 9, 2023 at 9:36
  • 1
    \$\begingroup\$ 1) Sure but then it almost becomes Bellman-Ford again? 2) Yes, that's what I meant by partitioning the graph. Of course then you have to find an algorithm to do that, and that algorithm must be a lot faster than the shortest-path algorithm, otherwise there is no benefit. \$\endgroup\$
    – G. Sliepen
    Commented Apr 9, 2023 at 9:46
  • \$\begingroup\$ You are correct about both, those are indeed my concerns. I have come up with another idea for SPFA. What if we parallelize the main loop, have each thread extract the next shortest distance node in a sequential way (through a lock), and then work on their nodes in parallel (with a lock around the actual lines where values are updated, to avoid concurrent modification)? What do you think would be more promising and interesting to investigate: partitioning the graph (which implies finding a fast algorithm to do that), or this last idea? \$\endgroup\$
    – c.leblanc
    Commented Apr 9, 2023 at 11:01
  • 1
    \$\begingroup\$ That doesn't work. Consider that the "next shortest distance node" depends on the work performed on the previously visited node. I think you will have more luck parallelizing the inner loops. For the queue version, that means parallelizing visiting every link. For the min-heap version, you only visit the neighbors, which is a much smaller set and thus likely won't give you much benefit. \$\endgroup\$
    – G. Sliepen
    Commented Apr 9, 2023 at 11:34
  • 1
    \$\begingroup\$ No, I said that Bellman-Ford probably only is worth parallelizing if you can partition the work into subgraphs with little connections to each other, and both SPFA variants you might try parallelizing the inner loops, and I think only the one with a queue might see any benefit. \$\endgroup\$
    – G. Sliepen
    Commented Apr 9, 2023 at 13:06

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.