I saw many attempts to bring to power of OpenCL to jME3. All of them required direct interaction with the underlying renderer implementation. Therefore, I created a wrapper around the OpenCL api to encouple it from the renderer.
The following diagram outlines the structure of the api:
The central object is the Context. The creation of all other objects like kernels, buffers and images are control by the context. The context instance is obtained by JmeContext.getOpenCLContext().
From there on, all OpenCL calls are encapsulated in a typesafe class structure.
All classes are placed in the package com.jme3.opencl.
Context context = yourJmeContext.getOpenCLContext(); //aquire the context CommandQueue queue = context.createQueue(); //create a command queue Program program = context.createProgramFromSourceFiles(assetManager, "OpenCLTest.cl"); //load a program from sources program.build(); //build the program Kernel kernel = program.createKernel("TestKernel"); //create the kernel Buffer buffer = context.createBuffer(1024); //create a buffer with 1024 bytes kernel.Run1(queue, new Kernel.WorkSize(1024), buffer, 512, 0.25f); //Call a kernel with three arguments: a buffer, an int and a float
As you can see from the example, calling kernels is especially easy due to the use of var-arg methods.
This API would not be of much use if it doesn’t integrate into the existing jME system.
Therefore, an integral part is the interoperability between OpenCL and jME:
Buffer clBuf = context.bindVertexBuffer(vertexBuffer, MemoryAccess.READ_WRITE); //use a vertex buffer as an OpenCL buffer Image clImg1 = context.bindImage(jmeTexture, MemoryAccess.READ_WRITE); //use a texture as an OpenCL image //... and more methods
This allows e.g. to:
- modify meshes: particle systems, morphing, animation, mesh deformation
- modify textures: dynamic textures, light maps
- access the renderbuffer: post-process effects, compute the overall luminance for tone mapping
- … whatever you like
I created two test classes in jme3test.opencl showing the interoperation.
A note to the design decisions taken:
Unlike the OpenGL renderer, I did not encapsulate the OpenCL calls in a single CL wrapper class and implement the logic directly in the classes. Instead the classes are all interfaces or abstract classes and the actual implementation is handled by the renderer implementation (currently only lwjgl). There are several reasons for that: the classes are now very light-weight, only hold one pointer to the OpenCL object. No CL wrapper instance has to be passed around. Furthermore, the underlying native bindings are very different: lwjgl has special classes for every OpenCL object while lwjgl3 only passes long values around. Further, lwjgl requires a special PointerBuffer for size parameters. Also the handling of error codes and callback function between the different bindings is not uniform. This would make it very painful to introduce a single CL wrapper class. I found it simpler to implement the logic in subclasses that can adopt to the quirks of the different bindings.
At the moment, the following questions are still open:
- Only OpenCL1.2 supported, the addition functions and types introduced in OpenCL2.0 / 2.1 are not included yet
- Memory handling: I implemented a similar system like the NativeObjectManager to release unused cl objects. However, especially Event objects are very small and are collected from the gc very late. Therefore, I have to call System.gc() periodically to release these objects, so that I do not run out of native memory. This, however, leads to a huge performance penalty that has to be fixed
The following ideas might fix this:
- Extreme way: no automatic releasing, the user has to release every object manually
- Only event objects are created so frequently and most of them are not used at all
→ Provide alternative versions of kernel launches, resource request, memory copies, etc. that do not return an event object but release them immediately
- Provide the implementation for lwjgl3 and jogl
- Cache system for programs similar to the cache system of PyOpenCL
- Automatic detection and resolving of #include statements in kernel source code
- library of often used functions (I already have them, I just need to port them from C++ to this API):
- 4x4Matrix + Quaternion math
- simple random numbers
- sorting (radix sort + bitonic sort)
- Real-world examples
- particle systems
- grid based fluids for smoke, clouds, wind blowing around houses
- particle based fluids (SPH) for water
Any suggestions or ideas?
Then that’s it for now.