5,782 questions
-3
votes
0
answers
64
views
Intel ARC GPU hangs when performing an untyped surface read [closed]
I am currently writing a driver for the Intel ARC GPU series (specifically I use the A750 for testing purposes) for my own operating system.
I am already able to execute compute kernels that use ...
1
vote
0
answers
52
views
OnnxRuntime with ACL Execution Provider on RK3588 (Mali-G610): Nodes assigned to ACL but GPU load remains 0%
[Goal & Problem]
I am trying to accelerate ONNX model inference on an RK3588 (Orange Pi 5) board using the Mali-G610 GPU. I have built OnnxRuntime (ORT) with the ACL (Compute Library) Execution ...
3
votes
2
answers
100
views
OpenCL Kernel slow and doesn't utilise CPU fully
I tried to do an old advent of code problem in OpenCL, but it's very slow.
const char *KernelSource_part_b = "\n" \
"typedef unsigned long uint64_t; ...
0
votes
0
answers
63
views
How to Use OpenCL in Exynos2400 in termux?
I want to compile and run openCL programs to do some parallel computing on my mobile device s24fe(Exynos2400e) I tried to compile clinfo but it always returns 0 in no of devices
I tried various ...
0
votes
1
answer
58
views
What's the OpenCL idiom for elementwise array-lookup / gather operation with vectorized types?
Consider the following OpenCL code in which each element in a vector-type variable gets its value via array lookup:
float* tbl = get_data();
int4 offsets = get_offsets();
float4 my_elements = {
...
1
vote
1
answer
104
views
Local atomics causes GPU to crash
I am writing a OpenCL kernel that uses atomics. As I only need to synchronize groups of 192 threads, I figured using local atomics would be ideal. However, the change from global to local atomics ...
1
vote
0
answers
46
views
Unresolved extern function '__write_pipe_2' when building an OpenCL program
I'm using the OpenCL clBuildProgram() API function on a program created from a source string. The source is:
kernel void foo(int val, write_only pipe int outPipe)
{
write_pipe(outPipe, &val);
}...
0
votes
0
answers
17
views
Can clEnqueueSVMMap be used with a sub-region of an SVM memory region?
Suppose I've allocated a region of memory with clSVMAlloc(). Looking at the clEnqueueSVMMap() function, we are told that it will "allow the host to update a region of a SVM buffer".
Does ...
0
votes
0
answers
27
views
When should I use clEnqueueSVMMemcpy?
OpenCL has the mechanism of "shared virtual memory" (SVM), where the same memory region is available both in OpenCL kernel code and in host-side code - and updates on one side affect the ...
0
votes
0
answers
16
views
How can I determine why clSVMAlloc failed?
Most OpenCL API calls return a status/error value, either directly or via an out-parameter (example: clCreateBuffer()). While that is not as informative as a long-form string description, it can tell ...
0
votes
1
answer
33
views
How should I perform an elementwise cast of an OpenCL C vector value?
OpenCL C supports "vector data types" - a fixed number of scalar types which may be operated on together, as though they were a single scalar, mostly: we can apply arithmetic and logic ...
0
votes
1
answer
37
views
Why is clEnqueueWaitForEvents deprecated? It seems indispensible
I'm looking at the clEnqueueWaitForEvents() OpenCL API function.
As I see it, this is a real boon. You see, almost all clEnqueueXXX functions take an array-of-events, and the size of that array, to ...
0
votes
1
answer
46
views
What's the right way to determine which kind of cl_program I have?
The OpenCL API has one object which is sort of a "kitchen sink" for a lot of stuff: The program (with handle type cl_program). It can hold:
A textual program source ( ...
1
vote
1
answer
46
views
Why can't I create a kernel (CL_INVALID_PROGRAM_EXECUTABLE) after successfully compiling an OpenCL program?
In the following program, I compile a kernel for the first device on the first platform:
const char* kernel_source_code = R"(
__kernel void vectorAdd(
__global float * __restrict C,
...
0
votes
1
answer
99
views
OpenCL createProgramWithSource doesn't work with a c-string declared in either global or function scope
I'm trying to run a basic kernel in OpenCL. See the snipped attached
const char kernel_source[] = "__kernel void matmul(__global float* A, __global float* B, __global float* C) { int row = ...