Let's talk about the API. I've worked with multiple imaging libraries over the past ~25 years, and have significant experience designing them too.
API:
Let's start with the image container, the core of any imaging library.
typedef struct {
int height;
int width;
Pixel *pixels;
} Image;
...() {
Image *ema = load_image(argv[1]);
Image *sob = sobel(*ema);
free_image(ema); /* (missing from the code, but should be there.)
}
Most of the times that you refer to an image, you use *name, except in a few places where it's just name. This is confusing, and unnecessarily complicated. First of all, passing a struct (even if it's small now) by value (i.e. its values are copied) is more expensive than passing its pointer. Second, when you create the image object you malloc it, and it's awkward to then copy its contents to the stack to call a function. Third, as the library grows, so will this struct (a flag for RGB vs grayscale, or maybe an int for the number of channels, maybe a flag for which color space it's in, maybe a flag for the data type, maybe a flag for whether the struct owns the pixel data or not, ...). At some point copying the struct will be prohibitive.
So, let's enforce passing the struct by pointer, and let's make it easy to do so!
typedef struct {
int height;
int width;
Pixel *pixels;
} *Image;
...() {
Image ema = load_image(argv[1]);
Image sob = sobel(ema);
free_image(ema);
}
Now Image is always a pointer to the struct. There's no type you can use to refer to the struct itself, you can only refer to the pointer. But it looks nice in the code, the user doesn't even need to know it's a pointer!
For Kernel you're doing something totally different: when you create a kernel (e.g. sobel_x) you create the struct on the stack and return it by value. Why the distinction? A kernel is, after all, just an image with a different type for the pixels.
Why is the kernel always square? This is an important limitation. At some point you'll be looking to use a 15x1 kernel, and you'll have to create a 15x15 kernel with lots of zeros, which will be 15 times as expensive to use.
The function convolve doesn't convolve two images, it only computes the result of the convolution at one pixel. This is surprising. Your function apply_kernel should be called convolve (or maybe convolution). The single pixel sub-function should, IMO, be private.
Likewise, Accumulator could be private until you have a reason to make it public. The fewer things you make public initially, the easier it will be to improve on the API. Making something public (i.e. putting it into improc.h) sort of fixes them in perpetuity. As soon as people start using that API, you can't change it any more, but you can always add to it.
kernel_min and kernel_max have the wrong name. I was reading the code, and wondering why you were using addition and not max(). Later I came to realize that you use these functions to determine what the minimum and maximum possible values of the output image will be when you compute the convolution with that kernel.
You could instead consider adding offset and scale arguments to your convolution function, and clip the result of the convolution before writing it to the uint8 output. This makes the function more flexible: the maximum and minimum possible values are not often obtained, so your scaling is a bit too drastic, the result is a very dim image, and a strongly quantized derivative. A user might want to pick a smaller scaling value.
improc.h, which defines your API, should contain documentation for the functions and types it makes public. You can document in a separate file, but it's always easier to put the documentation directly in the header. You user will be able to easily find the documentation in their IDE, and many IDEs will even show this documentation in a tooltip when you hover over a function call with the mouse. I suggest you use the Doxygen style for documentation. Doxygen is a nice tool to generate HTML from the documentation in the header files, though it has some downsides as well (many people, including me, have written alternatives, but most of these will use the same style for the documentation source).
Efficiency:
The convolution tests, for each pixel, whether a neighbor is inside the image or not (you use modulo for this, a neat solution, but it still has a branch in it). It is actually (in my experience) faster to copy the image into a larger buffer, and pick some padding scheme to fill those values outside the original image. The convolution now doesn't need to do any tests at all.
You can also consider reducing the amount of coordinate computation you do within the loop:
Accumulator convolve(Image image, Kernel kernel, int row, int col)
{
Accumulator accumulator = {0, 0, 0};
r_offset = row - kernel.size / 2;
c_offset = col - kernel.size / 2;
int kindex = 0;
for (int kr = 0; kr < kernel.size; kr++) {
int ir = modulo(r_offset + kr, image.height);
int iindex = ir * image.width;
for (int kc = 0; kc <= kernel.size; kc++, kindex++) {
int ic = modulo(c_offset + kc, image.width);
Pixel pixel = image.pixels[iindex + ic];
accumulator.r += pixel.r * kernel.weights[kindex];
accumulator.g += pixel.g * kernel.weights[kindex];
accumulator.b += pixel.b * kernel.weights[kindex];
}
}
return accumulator;
}
kindex, the index into the kernel, increases by 1 every inner loop iteration, so just increment it, don't compute kr*kernel.size + kc (and certainly don't compute that 3 times, even though your compiler will likely optimize that out). ir doesn't change during the inner loop, so compute it outside that loop. And a lot of the remaining computation you did was because your loop goes from -size/2 to size/2, rather than from 0 to size, and so you needed to add an offset again to index into the kernel.
(By the way, your code has a bug if kernel.size is even)
Style:
Please use an empty line in between functions. Vertical space is very important for readability.