White screen on context restart

If you would like I can provide the test case include the javafx component, but it is not necessary to test this issue.

No, I can see that I am moving the flycam, etc. I was just a little disoriented because I didn’t realize that the screenshots were from the original project.

Update on the CPU use: I tried commenting out the call to swap the v-sync mode.

  • Pegged thread is gone
  • Still lose the statistics window

So, maybe I was seeing two different issues?

Can you confirm if this test-case over-uses CPU on your system?

Hmm, I am not seeing the CPU usage increase with the VSync

1 Like

Were you able to get the time to work on tracking down the issue?

Not much.

  • I do believe that the CPU spike happens when v-sync is disabled, due to the render loop running without pause. Not related to this issue.
  • I’ve not been able to collect a heap dump when the render is running correctly for comparison purposes, as the app UI captures my mouse until after I’ve triggered the issue.
1 Like

Continuing the discussion from Updating Display Settings Darkens Scene:

So, I ran your test case on my end, and I made a few observations:

  1. The screen blank only seems to happen on the jme3-lwjgl3 library when messing with the “setRenderer” option in the app settings (specifically, setting it to any OPENGL version 3 or 4). Setting it to OpenGL 2 or not setting it at all doesn’t seem to blank it (although colors will still darken)
  2. A regular context restart won’t trigger anything normally on the jme3-lwjgl library. However, setting stencil bits, depth bits, bits per pixel, or samples prior to the restart will trigger any of the above behavior (darkening colors on OPENGL2 compatibility, screen blanking on OPENGL 3 or 4 compatibility).
  3. The darkening barely affects color channels close to the maximum
    Before:

    After:

    There does appear to be a difference, but it is nearly imperceptable. It only really begins to take effect on lower values of the color channel.
2 Likes

@yaRnMcDonuts actually mentioned gamma correction, so I did some more experiemtation.
Gamma Correction off:


Gamma Correction on:

From what I can see, restarting the context seems to turn the gamma correction system off, despite what is in the settings. Chances are, there is probably a bug somewhere in this. I’ll do some digging and see if I can locate it.

UPDATE: When the application runs for the first time in jme3-lwjgl3, isGammaCorrection is called first from LwjglWindow’s createContext(AppSettings) method, but is then called twice on LwjglContext’s initContextFirstTime() method. However, when I restart the context, it is only called from LwjglWindow, and not LwjglContext. Apparently, the lines in LwjglContext are critical for enabling gamma correction in the jme3-lwjgl3 library.
UPDATE 2: For the jme3-lwjgl library, isGammaCorrection() isn’t called from anywhere but LwjglContext’s initContextFirstTime() method. Naturally, this method is not called when restarting the context. Oddly enough, though, this doesn’t seem to matter if the samples or bits or anything haven’t been changed. I’m guessing there may be some sort of flag that checks if these have changed for jme3-lwjgl.

4 Likes

I want you to know that (probably like many others), I’m not able to much help directly in your quest but I’m watching from the sidelines and cheering you on.

On my to-do list is to make a Lemur settings screen before I release the next Mythruna… and last time I tried context restart stuff was probably almost 10 years ago and it worked fine way back when (further evidence that gamma is involved in lwjgl2).

One of the cool things about open source is that others might find and fix problems before I even have to wrestle with them myself and this is exactly one of those cases.

So, keep going! We’re pulling for you. You are making good progress.

6 Likes

Thanks for the support, all.
Anyway, I think I found the crux of the issue.
When you set Gamma Correction, the LWJGL context calls the setMainFrameBufferSrgb() and setLinearizeSrgbImages() methods on the renderer class. Now, for the GLRenderer implementation, the latter method simply sets a field in the class. However, the former sets a flag within the OpenGL instance. The kicker is when we restart the context, we destroy the OpenGL instance and make it again (provided that the pixel format hasn’t changed on jme3-lwjgl. All that is skipped and we just resize stuff otherwise). As such, all of our information, including the gamma correction flag.

The way I see it, we can do one of two things.

  1. We can find a way to save the state of relevent OpenGL flags and put them back in after we reset the context. Considering that I’m not very familiar with low-level OpenGL, I have no idea what sort of unwanted side effects this would introduce. However, it may solve not only this, but any other obscure bugs of similar nature that we may not know about yet.
  2. We could just check to see if gamma correction is needed, and then change the flag back after we restart the context. Considering that we already seem to do it for input on jme3-lwjgl3, this isn’t too earth-shattering. Also, there doesn’t seem to be too many direct calls to OpenGL going on in initContextFirstTime, so this should wrap up a lot of stuff (although I am concerned about ARBDebugOutput.glDebugMessageCallbackARB and GLFW.glfwSetJoystickCallback, so that may be something to look into eventually).

Still no idea why the screen blanks when calling AppSettings#setRenderer(String), though. I get the feeling that this is technically a different issue.

4 Likes

This pull request fixes the issue with just a couple of additions to the context restart methods.

4 Likes

This is interesting: I was trying to reproduce the original #1445 issue, but, while I could reproduce it on my Linux machine, it wouldn’t show up on Windows 10 with either jme3-lwjgl or jme3-lwjgl3. Both machines were using an NVIDIA Quadro M4000 GPU (I used both the nvidia-driver-450 and 460 proprietary drivers on the Linux end).

You said you were getting the issue on your Windows machine, right @tlf30?

Yes, I am on Windows with an RTX Titan. I can get out some other GPUs to test with if needed. Currently on Driver 466.11. I just ran the test to reproduce just in case. I also tested with opengl2 and saw the expected darkening of the scene.

Here is the debugging output:

The restart of the context takes place at Apr 14, 2021 5:12:15

At this point, I’m a bit concerned that this issue is beyond me. The issue seems to be specific to either LWJGL or the OpenGL API, and highly dependent on the OS. If someone with a little more experience with low-level OpenGL wants to take a look at it, I would recommend starting at calls to LwjglContext’s createContextAttribs() method.
Until then, a workaround would be to stick with OpenGL 2 or to ask the user to restart the application whenever changing display settings.

Been doing some more experiments. I tried moving the context restart logic from LwjglDisplay’s runLoop method to the LwjglAbstractDisplay runnable. Essentially, whenever I need to restart the context, I now just restart pretty much the entire LWJGL.

public void run(){
        if (listener == null) {
            throw new IllegalStateException("SystemListener is not set on context!"
                                          + "Must set with JmeContext.setSystemListener().");
        }

        loadNatives();
        logger.log(Level.FINE, "Using LWJGL {0}", Sys.getVersion());
        do {
            if (!initInThread()) {
                logger.log(Level.SEVERE, "Display initialization failed. Cannot continue.");
                return;
            }
            while (true){
                if (renderable.get()){
                    if (Display.isCloseRequested())
                        listener.requestClose(false);

                    if (wasActive != Display.isActive()) {
                        if (!wasActive) {
                            listener.gainFocus();
                            timer.reset();
                            wasActive = true;
                        } else {
                            listener.loseFocus();
                            wasActive = false;
                        }
                    }
                }

                runLoop();

                if (needRestart.get()) {
                    needRestart.set(false);
                    break;
                }
                if (needClose.get())
                    break;
            }
            deinitInThread();
        }
        while (!needClose.get());
    }


As you might notice, the GUI viewport seems to disappear, but at least we are getting something. The only major issue is that, when I start moving the objects, I seem to have added a new set of objects every time I do a context restart.

Odds are that either my scene graph objects are being duplicated or the entire viewport (root node and all) are being duplicated).

1 Like

Alright, I scrapped what I had from above and tried a new approach, and I think I have it:
First, I took the initContextFirstTime() method from LwjglContext and delegated the logic into a new method, initContext(boolean). This method does pretty much the same thing if the first parameter is true. However, if false, it will skip the creation of the renderer (instead just calling its initialize method) and skip the input initialization.

    private void initContext(boolean first) {
        if (!GLContext.getCapabilities().OpenGL20) {
            throw new RendererException("OpenGL 2.0 or higher is "
                    + "required for jMonkeyEngine");
        }

        int vers[] = getGLVersion(settings.getRenderer());
        if (vers != null) {
            if (first) {
                GL gl = new LwjglGL();
                GLExt glext = new LwjglGLExt();
                GLFbo glfbo;

                if (GLContext.getCapabilities().OpenGL30) {
                    glfbo = new LwjglGLFboGL3();
                } else {
                    glfbo = new LwjglGLFboEXT();
                }

                if (settings.getBoolean("GraphicsDebug")) {
                    gl = (GL) GLDebug.createProxy(gl, gl, GL.class, GL2.class, GL3.class, GL4.class);
                    glext = (GLExt) GLDebug.createProxy(gl, glext, GLExt.class);
                    glfbo = (GLFbo) GLDebug.createProxy(gl, glfbo, GLFbo.class);
                }
                if (settings.getBoolean("GraphicsTiming")) {
                    GLTimingState timingState = new GLTimingState();
                    gl = (GL) GLTiming.createGLTiming(gl, timingState, GL.class, GL2.class, GL3.class, GL4.class);
                    glext = (GLExt) GLTiming.createGLTiming(glext, timingState, GLExt.class);
                    glfbo = (GLFbo) GLTiming.createGLTiming(glfbo, timingState, GLFbo.class);
                }
                if (settings.getBoolean("GraphicsTrace")) {
                    gl = (GL) GLTracer.createDesktopGlTracer(gl, GL.class, GL2.class, GL3.class, GL4.class);
                    glext = (GLExt) GLTracer.createDesktopGlTracer(glext, GLExt.class);
                    glfbo = (GLFbo) GLTracer.createDesktopGlTracer(glfbo, GLFbo.class);
                }
                renderer = new GLRenderer(gl, glext, glfbo);
            }
            renderer.initialize();
        } else {
            throw new UnsupportedOperationException("Unsupported renderer: " + settings.getRenderer());
        }
        if (GLContext.getCapabilities().GL_ARB_debug_output && settings.getBoolean("GraphicsDebug")) {
            ARBDebugOutput.glDebugMessageCallbackARB(new ARBDebugOutputCallback(new LwjglGLDebugOutputHandler()));
        }
        renderer.setMainFrameBufferSrgb(settings.isGammaCorrection());
        renderer.setLinearizeSrgbImages(settings.isGammaCorrection());

        if (first) {
            // Init input
            if (keyInput != null) {
                keyInput.initialize();
            }

            if (mouseInput != null) {
                mouseInput.initialize();
            }

            if (joyInput != null) {
                joyInput.initialize();
            }
        }
    }

Now, whenever I restart the context, LwjglDisplay does this:

    @Override
    public void runLoop(){
        // This method is overriden to do restart
        if (needRestart.getAndSet(false)) {
            try {
                createContext(settings);
            } catch (LWJGLException ex) {
                logger.log(Level.SEVERE, "Failed to set display settings!", ex);
            }
            listener.reshape(settings.getWidth(), settings.getHeight());
            if (renderable.get()) {
                reinitContext();
            } else {
                assert getType() == Type.Canvas;
            }
            logger.fine("Display restarted.");
        } else if (Display.wasResized()) {
            int newWidth = Display.getWidth();
            int newHeight = Display.getHeight();
            settings.setResolution(newWidth, newHeight);
            listener.reshape(newWidth, newHeight);
        }

        super.runLoop();
    }

That said, I remembered the duplication issue from before, so I went and checked the memory usage, and it looks like there is an increase in memory usage every time I do a restart, so it looks like something isn’t being cleared. However, the controls I attached to my boxes in my example aren’t triggering more often than they should, so whatever is being duplicated at least isn’t being processed like before.

While trying to find a better stress test, I ran into TestLeakingGL. Apparently, it is supposed to create and destroy 900 spheres every tick (the documentation says 400, but the code generates 900). Ideally, memory usage shouldn’t change. However, when I ran it, memory steadily increased and the FPS continued to drop. It makes me wonder if this means that the engine isn’t properly clearing the objects. This applies to both jme3-lwjgl and jme3-lwjgl3.

Besides the memory issue, though, context restarting now seems to be working great on jme3-lwjgl, so there’s that!

EDIT: Change seems to work on jme3-lwjgl3 as well.
EDIT 2:

I’ve pushed the changes to GitHub, if anyone is interested in checking them out. However, I’m a little hesitant to create a PR until we can track down this memory issue. I’m pretty sure that it is unrelated to my changes, but it would be good to look into it.
On the side, a pull request from here would override #1524.

EDIT 3: I’ve been looking into the memory issue, and TestLeakGL hasn’t worked since before JME 3.1. Anytime before that is hard to test, considering that JME 3.0 was a rather different project then.

2 Likes

Aloha,
GLRenderer in its postFrame method calls NativeObjectManagers deleteUnused method which in return deletes some NativeObjects. However this has a hardcoded limit of 100 objects per frame which is why the test you mentioned is actually flawed (or the core is but 100 is reasonable). The NativeObjectManager uses a ReferenceQueue so it can delete GL objects once their jme counterpart was GC’ed. It does so by using the NativeObjects createDestructableClone which creates a shallow clone of that object (not actually a shallow clone, just some object of same class that has the same id so it can later properly be deleted from GL) and wrapping that together with a WeakReference to the original NativeObject in a NativeObjectRef which is a PhantomReference that uses an object as referent that is only referenced by that original NativeObject. It is also put into a map so the NativeObjectRef itsself is not GC’ed on next GC
Then whenever the renderers postFrame method is called and a GC has happened before, objects from the ReferenceQueue are polled and the destructable clones are used to remove the NativeObjectRef from the map as well as actually delete the GL object (its class and the id are enough to do that). There is never more than 100 objects deleted per frame and manually deleted objects (via dispose()) will be deleted first. So in theory for VertexBuffers for example, their ByteBuffer / IntBuffer / whatever should get freed properly, but the VertexBuffer object itsself as well as its counterpart in GL will never be destroyed if you constantly leak more than 100 NativeObjects (which have actually be used by the renderer. you can create 1000 VertexBuffers per frame, not add them to anything and they will be GC’ed properly).
You can just make that hardcoded limit a public static instead of private static final (somewhat like Unsafe toggle it already offers) and set it to 900 in the mentioned test to see if you see properly gc’ed memory
Then again, in a real game you would never create / destroy more than 100 objects per frame over many frames at least not with the way jme is set up if you want good performance

greetings from the shire!

4 Likes

@Ali_RS and I actually already found it. So, apparently, if I set the limit high enough, I end up creating and destroying 1800 objects every frame (each of the spheres has two objects in the object manager, it seems). Nevertheless, it doesn’t seem like the objects that I destroy are properly garbaged. Even just creating a bunch of meshes every frame and doing nothing with them, the memory usage climbs. Odds are, there is something going up with the Java Mesh object itself that may not even have anything to do with the native objects.

1 Like

…and eventually throws an OutOfMemoryError?

If not, no leak.

1 Like

The default memory was large enough that I usually ended up stopping it before it got to more than a few GB. I retested it just now. No leak.
I already posted my apologies in the issue thread. I’m going on ahead with the context restart PR.

1 Like

What i can see is the heap sizes increases upon GC. This means the time between 2 GCs will increase over time, which in return means more objects will be created between 2 GCs (on the heap as well as GL objects that the driver manages). This will cause the driver to allocate more memory also because it has to keep around all those buffers that were created for a single frame but are first deleted much later after the next GC. And this is another problem because increasing the number of removes per frame to 2000 does not account for deleting all those objects that were destroyed at once now. So it will also take longer over time to delete the objects in the queue while during deletion new objects are created. It might be even more objects this time in case the heap size increased again.
just a guess: you will not see a OutOfMemoryError because the java objects are GC’ed properly, however you might see a performance decrease because the GL driver has to manage an increasing amount of objects to a point where maybe your ram is filled and it starts paging stuff in and out

EDIT: ok i just read though the github issue and from my side no need to apologize i respect every motivation towards improving jme :smiley:

2 Likes