Longest (simple) path problem implementation for contest

Question

For a seminar with problems from programming contests I have to implement an algorithm to find the longest path in a graph. Here is a short summary of what I want to solve/achieve:

In this problem we are having a look at tasks which have a number of time units needed to finish the task and a list of dependencies. A task can only be started once all predecessors are finished. A test case has the following form:
2
5
6 4 2 3 4 5
7 3 3 4 5
3 2 4 5
2 1 5
2 0

4
5 3 2 3 4
3 1 4
4 1 4
2 0
The first line contains an integer t. t test cases follow, each of them separated by a blank line. Each test starts with a single line denoting the number of tasks. For each task there is a line with the number of time needed for a task, the number of successors and the list of these successors.

My solution so far seems to work for small test cases but triggers a timeout in the automatic judge system we use to hand in our solutions.

My current main method looks like this:

int main() {
    int testCases; wcin >> testCases;
    for(int run = 1; run <= testCases; ++run) {
        int n; wcin >> n;
        vector<vector<bool>> G(n + 2, vector<bool>(n + 2));
        vector<int> t(n + 2);
        for(int i = 1; i < n + 1; ++i) {
            int p, suc; wcin >> p >> suc;
            // Time for task i
            t[i] = p;
            while(suc--) {
                wcin >> p;
                // Task i and task p are connected
                G[i][p] = true;
            }
        }
        // AS -> 1 and n -> AE
        G[0][1] = true;
        G[n][n + 1] = true;
        auto c = JobShop(G, t);
        wcout << L"Case #" << run << L": " << c[n + 1] << endl;
        wcin.ignore();
    }
    return 0;
}

I am representing the graph with a vector<vector<bool>> which denotes whether or not two nodes/tasks are connected. The vector t describes the time for the tasks. Furthermore I am creating n+2 nodes with a preceeding node AS and a last node AE (there is no real reason do so but I did this in my notes and was searching a path from AS to AE). The last step is to call my actual algorithm called JobShop.

vector<int> JobShop(vector<vector<bool>> G, vector<int> t) {
    vector<int> c(G.size());
    vector<int> s(G.size());
    vector<bool> visited(G.size());

    c[1] = t[1];
    visited[0] = true;
    visited[1] = true;

    for(size_t step = 0; step < G.size() - 2; ++step) {
        int u = nextTask(G, t, visited);
        visited[u] = true;
        s[u] = completeTask(G, u, c);
        c[u] = s[u] + t[u];
    }
    return c;
}

The algorithm saves a vector c which denotes the completion times of a task, i.e. their earliest possible starting time + task time, and their earliest possible starting times s. Furthermore I am keeping track whether or not I have visited a task. I am then iterating |V|-1 times over all nodes and in each step I am trying to find a task which to work with next. Such a task is one that has all it's dependencies done and has the highest task time so far. The starting time of a task s[u] is the maximum of all completion times of predecessors.

int completeTask(vector<vector<bool>> G, int v, vector<int> c) {
    int max = 0;
    for(size_t i = 0; i < G.size(); ++i)
        if(G[i][v] && c[i] > max)
            max = c[i];
    return max;
}

int nextTask(vector<vector<bool>> G, vector<int> t, vector<bool> visited) {
    int max = 0, u = G.size() - 1;
    for(size_t v = 0; v < G.size(); ++v)
        if(!visited[v] && !isTaskLocked(G, v, visited) && t[v] > max)
            max = t[v], u = v;
    return u;
}

bool isTaskLocked(vector<vector<bool>> G, int v, vector<bool> visited) {
    for(size_t i = 1; i < G.size() - 1; ++i)
        if(G[i][v] && !visited[i])
            return true;
    return false;
}

Obviously I am iterating way too many times over all nodes which results in the timeout for larger test cases. I would like to know whether it is possible to simplify the code in a reasonable way or whether I have to find a completely different approach. If so, I was already thinking about an idea (there are definitely no cycles which was mentioned by our supervisors):

For each task create two nodes u and v with an arc (u,v) with the task time as its weight.
If two tasks (u,v) and (w,x) are dependent, then connect the nodes v and w with an arc of weight 0.
Make all weights negative.
Apply Dijkstra.

What is the sample output for the test cases given? Also can you explain what the test cases mean? Are you just trying to output the longest or shortest time it will take to finish all tasks? — smac89
– smac89, Commented May 7, 2015 at 4:36

Edward · Accepted Answer · 2015-06-09 19:15:45Z

There are a few things that could be improved here.

Use `const` references where practical

All of the functions in your code are declared something like this:

int completeTask(vector<vector<bool>> G, int v, vector<int> c) {

However, none of them actually alter the passed vectors. I was able to get a 6x speed improvement simply by changing all four functions to use const references like this:

int completeTask(const vector<vector<bool>> &G, int v, const vector<int> &c) {

The test I used was simply to clone your two test cases so that I had 200 test cases. I then ran valgrind on both versions. Your original code:

total heap usage: 42,000 allocs, 42,000 frees, 1,581,600 bytes allocated

The modified code:

total heap usage: 3,400 allocs, 3,400 frees, 104,800 bytes allocated

Eliminate "expensive" data structures where practical

The JobShop routine does not need the vector s. Instead, the last two lines of the loop can simply be replaced with this one line:

c[u] = completeTask(G, u, c) + t[u];

Similarly, it doesn't seem that the vector c needs to be returned, since only a single value from it is used by main.

Put each statement on a single line

It detrimental to the readability of your code to abuse the comma operator:

max = t[v], u = v;

And to jam multiple statements on a line:

int testCases; wcin >> testCases;

Separating each statement on its own line makes the code easier to read and maintain.

Use more descriptive variable names

All of the single-letter variable names make it difficult to decipher the algorithm. More meaningful variable names make the code easier to read and maintain.

Omit `return 0`

If your program completes successfully, the return 0 at the end of main() will be generated automatically, so it's not needed in C++ programs.

Stack Exchange Network

Longest (simple) path problem implementation for contest

1 Answer 1

Use `const` references where practical

Eliminate "expensive" data structures where practical

Put each statement on a single line

Use more descriptive variable names

Omit `return 0`

You must log in to answer this question.

Hot Network Questions

Longest (simple) path problem implementation for contest

1 Answer 1

Use const references where practical

Eliminate "expensive" data structures where practical

Put each statement on a single line

Use more descriptive variable names

Omit return 0

You must log in to answer this question.

Related

Hot Network Questions

Use `const` references where practical

Omit `return 0`