White screen on context restart

I want you to know that (probably like many others), I’m not able to much help directly in your quest but I’m watching from the sidelines and cheering you on.

On my to-do list is to make a Lemur settings screen before I release the next Mythruna… and last time I tried context restart stuff was probably almost 10 years ago and it worked fine way back when (further evidence that gamma is involved in lwjgl2).

One of the cool things about open source is that others might find and fix problems before I even have to wrestle with them myself and this is exactly one of those cases.

So, keep going! We’re pulling for you. You are making good progress.

6 Likes

Thanks for the support, all.
Anyway, I think I found the crux of the issue.
When you set Gamma Correction, the LWJGL context calls the setMainFrameBufferSrgb() and setLinearizeSrgbImages() methods on the renderer class. Now, for the GLRenderer implementation, the latter method simply sets a field in the class. However, the former sets a flag within the OpenGL instance. The kicker is when we restart the context, we destroy the OpenGL instance and make it again (provided that the pixel format hasn’t changed on jme3-lwjgl. All that is skipped and we just resize stuff otherwise). As such, all of our information, including the gamma correction flag.

The way I see it, we can do one of two things.

  1. We can find a way to save the state of relevent OpenGL flags and put them back in after we reset the context. Considering that I’m not very familiar with low-level OpenGL, I have no idea what sort of unwanted side effects this would introduce. However, it may solve not only this, but any other obscure bugs of similar nature that we may not know about yet.
  2. We could just check to see if gamma correction is needed, and then change the flag back after we restart the context. Considering that we already seem to do it for input on jme3-lwjgl3, this isn’t too earth-shattering. Also, there doesn’t seem to be too many direct calls to OpenGL going on in initContextFirstTime, so this should wrap up a lot of stuff (although I am concerned about ARBDebugOutput.glDebugMessageCallbackARB and GLFW.glfwSetJoystickCallback, so that may be something to look into eventually).

Still no idea why the screen blanks when calling AppSettings#setRenderer(String), though. I get the feeling that this is technically a different issue.

4 Likes

This pull request fixes the issue with just a couple of additions to the context restart methods.

4 Likes

This is interesting: I was trying to reproduce the original #1445 issue, but, while I could reproduce it on my Linux machine, it wouldn’t show up on Windows 10 with either jme3-lwjgl or jme3-lwjgl3. Both machines were using an NVIDIA Quadro M4000 GPU (I used both the nvidia-driver-450 and 460 proprietary drivers on the Linux end).

You said you were getting the issue on your Windows machine, right @tlf30?

Yes, I am on Windows with an RTX Titan. I can get out some other GPUs to test with if needed. Currently on Driver 466.11. I just ran the test to reproduce just in case. I also tested with opengl2 and saw the expected darkening of the scene.

Here is the debugging output:

The restart of the context takes place at Apr 14, 2021 5:12:15

At this point, I’m a bit concerned that this issue is beyond me. The issue seems to be specific to either LWJGL or the OpenGL API, and highly dependent on the OS. If someone with a little more experience with low-level OpenGL wants to take a look at it, I would recommend starting at calls to LwjglContext’s createContextAttribs() method.
Until then, a workaround would be to stick with OpenGL 2 or to ask the user to restart the application whenever changing display settings.

Been doing some more experiments. I tried moving the context restart logic from LwjglDisplay’s runLoop method to the LwjglAbstractDisplay runnable. Essentially, whenever I need to restart the context, I now just restart pretty much the entire LWJGL.

public void run(){
        if (listener == null) {
            throw new IllegalStateException("SystemListener is not set on context!"
                                          + "Must set with JmeContext.setSystemListener().");
        }

        loadNatives();
        logger.log(Level.FINE, "Using LWJGL {0}", Sys.getVersion());
        do {
            if (!initInThread()) {
                logger.log(Level.SEVERE, "Display initialization failed. Cannot continue.");
                return;
            }
            while (true){
                if (renderable.get()){
                    if (Display.isCloseRequested())
                        listener.requestClose(false);

                    if (wasActive != Display.isActive()) {
                        if (!wasActive) {
                            listener.gainFocus();
                            timer.reset();
                            wasActive = true;
                        } else {
                            listener.loseFocus();
                            wasActive = false;
                        }
                    }
                }

                runLoop();

                if (needRestart.get()) {
                    needRestart.set(false);
                    break;
                }
                if (needClose.get())
                    break;
            }
            deinitInThread();
        }
        while (!needClose.get());
    }


As you might notice, the GUI viewport seems to disappear, but at least we are getting something. The only major issue is that, when I start moving the objects, I seem to have added a new set of objects every time I do a context restart.

Odds are that either my scene graph objects are being duplicated or the entire viewport (root node and all) are being duplicated).

1 Like

Alright, I scrapped what I had from above and tried a new approach, and I think I have it:
First, I took the initContextFirstTime() method from LwjglContext and delegated the logic into a new method, initContext(boolean). This method does pretty much the same thing if the first parameter is true. However, if false, it will skip the creation of the renderer (instead just calling its initialize method) and skip the input initialization.

    private void initContext(boolean first) {
        if (!GLContext.getCapabilities().OpenGL20) {
            throw new RendererException("OpenGL 2.0 or higher is "
                    + "required for jMonkeyEngine");
        }

        int vers[] = getGLVersion(settings.getRenderer());
        if (vers != null) {
            if (first) {
                GL gl = new LwjglGL();
                GLExt glext = new LwjglGLExt();
                GLFbo glfbo;

                if (GLContext.getCapabilities().OpenGL30) {
                    glfbo = new LwjglGLFboGL3();
                } else {
                    glfbo = new LwjglGLFboEXT();
                }

                if (settings.getBoolean("GraphicsDebug")) {
                    gl = (GL) GLDebug.createProxy(gl, gl, GL.class, GL2.class, GL3.class, GL4.class);
                    glext = (GLExt) GLDebug.createProxy(gl, glext, GLExt.class);
                    glfbo = (GLFbo) GLDebug.createProxy(gl, glfbo, GLFbo.class);
                }
                if (settings.getBoolean("GraphicsTiming")) {
                    GLTimingState timingState = new GLTimingState();
                    gl = (GL) GLTiming.createGLTiming(gl, timingState, GL.class, GL2.class, GL3.class, GL4.class);
                    glext = (GLExt) GLTiming.createGLTiming(glext, timingState, GLExt.class);
                    glfbo = (GLFbo) GLTiming.createGLTiming(glfbo, timingState, GLFbo.class);
                }
                if (settings.getBoolean("GraphicsTrace")) {
                    gl = (GL) GLTracer.createDesktopGlTracer(gl, GL.class, GL2.class, GL3.class, GL4.class);
                    glext = (GLExt) GLTracer.createDesktopGlTracer(glext, GLExt.class);
                    glfbo = (GLFbo) GLTracer.createDesktopGlTracer(glfbo, GLFbo.class);
                }
                renderer = new GLRenderer(gl, glext, glfbo);
            }
            renderer.initialize();
        } else {
            throw new UnsupportedOperationException("Unsupported renderer: " + settings.getRenderer());
        }
        if (GLContext.getCapabilities().GL_ARB_debug_output && settings.getBoolean("GraphicsDebug")) {
            ARBDebugOutput.glDebugMessageCallbackARB(new ARBDebugOutputCallback(new LwjglGLDebugOutputHandler()));
        }
        renderer.setMainFrameBufferSrgb(settings.isGammaCorrection());
        renderer.setLinearizeSrgbImages(settings.isGammaCorrection());

        if (first) {
            // Init input
            if (keyInput != null) {
                keyInput.initialize();
            }

            if (mouseInput != null) {
                mouseInput.initialize();
            }

            if (joyInput != null) {
                joyInput.initialize();
            }
        }
    }

Now, whenever I restart the context, LwjglDisplay does this:

    @Override
    public void runLoop(){
        // This method is overriden to do restart
        if (needRestart.getAndSet(false)) {
            try {
                createContext(settings);
            } catch (LWJGLException ex) {
                logger.log(Level.SEVERE, "Failed to set display settings!", ex);
            }
            listener.reshape(settings.getWidth(), settings.getHeight());
            if (renderable.get()) {
                reinitContext();
            } else {
                assert getType() == Type.Canvas;
            }
            logger.fine("Display restarted.");
        } else if (Display.wasResized()) {
            int newWidth = Display.getWidth();
            int newHeight = Display.getHeight();
            settings.setResolution(newWidth, newHeight);
            listener.reshape(newWidth, newHeight);
        }

        super.runLoop();
    }

That said, I remembered the duplication issue from before, so I went and checked the memory usage, and it looks like there is an increase in memory usage every time I do a restart, so it looks like something isn’t being cleared. However, the controls I attached to my boxes in my example aren’t triggering more often than they should, so whatever is being duplicated at least isn’t being processed like before.

While trying to find a better stress test, I ran into TestLeakingGL. Apparently, it is supposed to create and destroy 900 spheres every tick (the documentation says 400, but the code generates 900). Ideally, memory usage shouldn’t change. However, when I ran it, memory steadily increased and the FPS continued to drop. It makes me wonder if this means that the engine isn’t properly clearing the objects. This applies to both jme3-lwjgl and jme3-lwjgl3.

Besides the memory issue, though, context restarting now seems to be working great on jme3-lwjgl, so there’s that!

EDIT: Change seems to work on jme3-lwjgl3 as well.
EDIT 2:

I’ve pushed the changes to GitHub, if anyone is interested in checking them out. However, I’m a little hesitant to create a PR until we can track down this memory issue. I’m pretty sure that it is unrelated to my changes, but it would be good to look into it.
On the side, a pull request from here would override #1524.

EDIT 3: I’ve been looking into the memory issue, and TestLeakGL hasn’t worked since before JME 3.1. Anytime before that is hard to test, considering that JME 3.0 was a rather different project then.

2 Likes

Aloha,
GLRenderer in its postFrame method calls NativeObjectManagers deleteUnused method which in return deletes some NativeObjects. However this has a hardcoded limit of 100 objects per frame which is why the test you mentioned is actually flawed (or the core is but 100 is reasonable). The NativeObjectManager uses a ReferenceQueue so it can delete GL objects once their jme counterpart was GC’ed. It does so by using the NativeObjects createDestructableClone which creates a shallow clone of that object (not actually a shallow clone, just some object of same class that has the same id so it can later properly be deleted from GL) and wrapping that together with a WeakReference to the original NativeObject in a NativeObjectRef which is a PhantomReference that uses an object as referent that is only referenced by that original NativeObject. It is also put into a map so the NativeObjectRef itsself is not GC’ed on next GC
Then whenever the renderers postFrame method is called and a GC has happened before, objects from the ReferenceQueue are polled and the destructable clones are used to remove the NativeObjectRef from the map as well as actually delete the GL object (its class and the id are enough to do that). There is never more than 100 objects deleted per frame and manually deleted objects (via dispose()) will be deleted first. So in theory for VertexBuffers for example, their ByteBuffer / IntBuffer / whatever should get freed properly, but the VertexBuffer object itsself as well as its counterpart in GL will never be destroyed if you constantly leak more than 100 NativeObjects (which have actually be used by the renderer. you can create 1000 VertexBuffers per frame, not add them to anything and they will be GC’ed properly).
You can just make that hardcoded limit a public static instead of private static final (somewhat like Unsafe toggle it already offers) and set it to 900 in the mentioned test to see if you see properly gc’ed memory
Then again, in a real game you would never create / destroy more than 100 objects per frame over many frames at least not with the way jme is set up if you want good performance

greetings from the shire!

4 Likes

@Ali_RS and I actually already found it. So, apparently, if I set the limit high enough, I end up creating and destroying 1800 objects every frame (each of the spheres has two objects in the object manager, it seems). Nevertheless, it doesn’t seem like the objects that I destroy are properly garbaged. Even just creating a bunch of meshes every frame and doing nothing with them, the memory usage climbs. Odds are, there is something going up with the Java Mesh object itself that may not even have anything to do with the native objects.

1 Like

…and eventually throws an OutOfMemoryError?

If not, no leak.

1 Like

The default memory was large enough that I usually ended up stopping it before it got to more than a few GB. I retested it just now. No leak.
I already posted my apologies in the issue thread. I’m going on ahead with the context restart PR.

1 Like

What i can see is the heap sizes increases upon GC. This means the time between 2 GCs will increase over time, which in return means more objects will be created between 2 GCs (on the heap as well as GL objects that the driver manages). This will cause the driver to allocate more memory also because it has to keep around all those buffers that were created for a single frame but are first deleted much later after the next GC. And this is another problem because increasing the number of removes per frame to 2000 does not account for deleting all those objects that were destroyed at once now. So it will also take longer over time to delete the objects in the queue while during deletion new objects are created. It might be even more objects this time in case the heap size increased again.
just a guess: you will not see a OutOfMemoryError because the java objects are GC’ed properly, however you might see a performance decrease because the GL driver has to manage an increasing amount of objects to a point where maybe your ram is filled and it starts paging stuff in and out

EDIT: ok i just read though the github issue and from my side no need to apologize i respect every motivation towards improving jme :smiley:

2 Likes

Thanks, @Samwise.
That said, the rapid creation and destruction of objects many times a second do seem to drastically affect the ultimate frames per second. However, I think this is less of a JME issue and more of a result of pushing the limits of Java. Either way, since there is no memory leak, I’m probably going to drop it.

On a broader topic, I closed my original PR and instead am pushing this (sorry, @sgold):

It should solve not only the gamma correction issue, but also the screen blanking issue that @tlf30 originally reported.

3 Likes

Since this is a bug fix, perhaps we can get in in 3.4.0? @sgold

3 Likes

Sorry to bump this thread!

I noticed using LWJGL 2 if I change BitsPerPixel

// By default, the app started with 24 BitsPerPixel
// let's change it to 16!
settings.setBitsPerPixel(16);
setSettings(settings);

just before the context restart, then it results in a blank window

I made a PR to fix it:

I tried TestContextRestart and it works fine for me on Linux with the above change.

but I noticed I am actually reverting this commit

also noticed @sgold reported that he was getting a blank window when running TestContextRestart with LWJGL 2, which seems to be the reason why the above commit is made(?)

I am considering merging my PR in less than 24 hours then it should be immediately available in 3.7.0-SNAPSHOT for the test.

After I merged the PR, I will appreciate it if someone then can test the jme3test.renderer.TestContextRestart with lwjgl 2 (both on a Windows and Linux machine) and let me know if it also works fine for you.

5 Likes