I created a code for Image Convolution.
The The code is in my Image Convolution GitHub Repository.
The code is straight forwarda straightforward implementation using SSE Intrinsics for Vectorization and OpenMP for Multi Threading. It It is also portable (Compiles both on GCC and MSVC) and written in pure C.
Any assistance will appreciated.