0

I've been working on porting a library from C to CUDA. Two functions that I need to port make use of recursion, which CUDA doesn't seem to be playing nicely with. I therefore want to convert these functions from using a recursive approach to an iterative one, as even if the recursion did work, I suspect it would be more efficient to have these functions be iterative rather than recursive.

I've been having quite a bit of trouble though, as the way in which the functions make use of recursion is quite different to traditional recursion to iteration problems.

int get_resulting_node(const uint64_t np[6], const BiomeTree *bt, int idx,
    int alt, uint64_t ds, int depth)
{
    if (bt->steps[depth] == 0)
        return idx;
    uint32_t step;
    do
    {
        step = bt->steps[depth];
        depth++;
    }
    while (idx+step >= bt->len);

    uint64_t node = bt->nodes[idx];
    uint16_t inner = node >> 48;

    int leaf = alt;
    uint32_t i, n;

    for (i = 0, n = bt->order; i < n; i++)
    {
        uint64_t ds_inner = get_np_dist(np, bt, inner);
        if (ds_inner < ds)
        {
            int leaf2 = get_resulting_node(np, bt, inner, leaf, ds, depth);
            uint64_t ds_leaf2;
            if (inner == leaf2)
                ds_leaf2 = ds_inner;
            else
                ds_leaf2 = get_np_dist(np, bt, leaf2);
            if (ds_leaf2 < ds)
            {
                ds = ds_leaf2;
                leaf = leaf2;
            }
        }

        inner += step;
        if (inner >= bt->len)
            break;
    }

    return leaf;
}

and

float getSpline(const Spline *sp, const float *vals)
{
    if (!sp || sp->len <= 0 || sp->len >= 12)
    {
        printf("getSpline(): bad parameters\n");
        exit(1);
    }

    if (sp->len == 1)
        return ((FixSpline*)sp)->val;

    float f = vals[sp->typ];
    int i;

    for (i = 0; i < sp->len; i++)
        if (sp->loc[i] >= f)
            break;
    if (i == 0 || i == sp->len)
    {
        if (i) i--;
        float v = getSpline(sp->val[i], vals);
        return v + sp->der[i] * (f - sp->loc[i]);
    }
    const Spline *sp1 = sp->val[i-1];
    const Spline *sp2 = sp->val[i];
    float g = sp->loc[i-1];
    float h = sp->loc[i];
    float k = (f - g) / (h - g);
    float l = sp->der[i-1];
    float m = sp->der[i];
    float n = getSpline(sp1, vals);
    float o = getSpline(sp2, vals);
    float p = l * (h - g) - (o - n);
    float q = -m * (h - g) + (o - n);
    float r = lerp(k, n, o) + k * (1.0F - k) * lerp(k, p, q);
    return r;
}

What's confusing me is that neither are using tail recursion. While I'm sure it's the way to go, I'm really struggling to see how a stack can be applied here.

2
  • That code is quite evil overall. Realistically it would have to be rewritten from scratch. But then you need to dig into all the dirty details of how the algorithm is supposed to work and then take it from there.
    – Lundin
    Commented Dec 12, 2024 at 11:59
  • To have any chance of an answer, please provide the missing definitions (Spline, FixSpline, BiomeTree, ...etc) and a main function that is runnable and which calls the relevant function will an example.
    – trincot
    Commented Dec 13, 2024 at 7:53

1 Answer 1

-2

It can be helpful to think about how the function calling abstraction is implemented in assembly.

When you make a function call, a new stack frame is allocated in memory, and the stack pointer is incremented accordingly. When a function returns, the frame is popped from the stack and the stack pointer is decremented.

You can "simulate" this in C by creating a Stack object (which can just be a statically allocated array if you know the max recursion depth in advance. Or you can implement something similar to a vector and dynamically reallocate the stack array) with StackFrame elements. The latter is a struct which captures all the state required to execute the function up to then next recursive function call or the return statement. It can look something like this

struct StackFrame {
  const uint64_t np[6];
  const BiomeTree *bt; 
  int idx;
  int alt; 
  uint64_t ds;
  int depth;

  // you should use an enum instead of separate bools
  bool function_start;
  bool after_for_loop;
  size_t inner_loop_idx; // which step of the main for loop 
  // anything else to capture the state 
};

You will also need to separately keep track of the latest return value.

Now the main body of the function should look something like this

// Create the first StackFrame
StackFrame first_frame = { .function_start = true; //etc.. };
stack_push(&frames, first_frame);

// Return values from all function calls are saved in this variable
int return_value; 

// "Run the simulation" 
while (!stack_empty(&stack)) {
    StackFrame current_frame = stack_last(&stack);

    // execute the appropriate portion of get_restuling_node based on the state captured in the stack frame. 
    if current_frame.function_start {
      //...
    }
    else if current_frame.after_for_loop {
      //...
      // return_value = ...;
      stack_pop(&stack);
    }
    else {
      // execute current_frame.inner_loop_idx th step of the for loop
      // use return_value to fetch the value of leaf2
      // finish the loop step 
      // current_frame.inner_loop_idx += 1;
      // stack_push(new_frame)
    }
}
return return_value;

EDIT: See this thread for more discussion and examples around how to convert recursive functions into iterative ones. Convert recursion to iteration

2
  • This looks like LLM generated
    – ray
    Commented Dec 15, 2024 at 14:55
  • What makes you think it is LLM generated?
    – Mushegh
    Commented Dec 15, 2024 at 17:40

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.