Skip to main content
1 of 8

Here is some OpenCl test on Intel HD Graphics 400 with 12 compute units and using 1-channel 1600 MHz ddr3 ram:

  • image: 1024 x 1024 and 1 byte per pixel

time: 4.5 ms

  • image: 2048 x 2048 and 1 byte per pixel

time: 9.7 ms

  • image: 4096 x 4096 and 1 byte per pixel

time: 21 ms

  • image: 8192 x 8192 and 1 byte per pixel

time:65 ms

kernel code(number of threads are half of total pixels, each thread swap uppermost line's pixel with bottommost line's pixel):

__kernel void test0(__global char *imagebuf)
{
        int i=get_global_id(0);
        int height=8192;
        int width=8192;
        int y=i/width;
        int x=i%width;
        char tmp=255-imagebuf[((height-y)-1)+x];
        char tmp2=255-imagebuf[x+y*width];
        imagebuf[x+y*width]=tmp;
        imagebuf[((height-y)-1)+x]=tmp2;
}

throughput increases for larger images and minimum latency depends on hardware and opencl wrapper thickness. This example was run on a not-thin wrapper.