Irregular exception in texture.Image, caused by garbage collector?

Apollo · September 28, 2021, 5:51am

Hi,
I am trying to understand an Exception which is occurring irregularly since a couple of days. I also do not exactly know which change in my code was causing it. Therefore I’d like to ask whether you are familiar with it and can give me a hint.

This is the stack trace:

com.jme3.texture.Image.toString(Image.java:1166)
java.lang.String.valueOf(String.java:2994)
java.lang.StringBuilder.append(StringBuilder.java:131)
com.jme3.util.NativeObjectManager.deleteNativeObject(NativeObjectManager.java:136)
com.jme3.util.NativeObjectManager.deleteUnused(NativeObjectManager.java:188)
com.jme3.renderer.opengl.GLRenderer.postFrame(GLRenderer.java:1073)
com.jme3.system.lwjgl.LwjglAbstractDisplay.runLoop(LwjglAbstractDisplay.java:189)
com.jme3.system.lwjgl.LwjglDisplay.runLoop(LwjglDisplay.java:197)
com.jme3.system.lwjgl.LwjglAbstractDisplay.run(LwjglAbstractDisplay.java:232)
java.lang.Thread.run(Thread.java:748)

and this is line #1073 in GLRenderer (jME version should be 3.1.0).

    @Override
    public void postFrame() {
        objManager.deleteUnused(this);
        OpenCLObjectManager.getInstance().deleteUnusedObjects();
        gl.resetStats();
    }

So, this gives me at least some indication. The error happens irregularly and is therefore hardly reproducable, but very annoying.

I can only assume that it is occurring because of a step-wise loading of scenery content. I have a human-readable input file, where each line loads and object, generates a node, applies a decal, etc. I go through the file almost line-by-line and make changes to the scenegraph as I go along. Therefore the loading itself is happening at a slow pace but always on the JME thread:

   Thread t=new Thread(() ->
   {
      while (!isDone)
      {
         try
         {
            visual.enqueue(() ->
            {
               long tik=System.currentTimeMillis();
               while (!isDone && System.currentTimeMillis()-tik<10L)
               {
                  try
                  {  processNextStatement();
                  }
                  catch (Exception e)
                  {  LOGGER.warning("Exception while reading file '"+inputFileName+"': "+e.getMessage());
                  }
               }
            });
         }
         catch (NullPointerException e)
         {  LOGGER.warning("NullPointerException while reading file: "+inputFileName+". Loading may not have completed.");
            e.printStackTrace();
         }

         //try {Thread.sleep(20);} catch (InterruptedException e) {};
         Thread.yield();
      }
      isStreaming=false;
   });
   t.start();

Am I missing something obvious?

tonihele · September 28, 2021, 6:24am

Hmm, what is the actual exception you get?

Apollo · September 28, 2021, 6:48am

Sorry, forgot to mention. It is a NullPointerException.

tonihele · September 28, 2021, 7:13am

Hard to find the actual code on Image.java on your snapshot to see what value is null. But is it just enough to fix the NPE from toString? That should never throw NPE in my opinion anyway. Or is there something else wrong?

Apollo · September 28, 2021, 7:51am

I’m afraid it will become even more puzzling when I show the code of Image (see below). I guess it is the call to format.name(), which should not throw any Exception unless format is null.

grafik

tonihele · September 28, 2021, 7:57am

Hmm yeah and it shouldn’t be null. At least internally the Image.java always sets the format via the setter and the setter has the sanity check against null.

Should try to set the Image.java format class variable protected → private to try see if it accessed from somewhere else.

sgold · September 28, 2021, 8:03am

I believe the NPE is an after-effect of this code in NativeObjectManager.java:

                    throw new IllegalArgumentException("The " + obj + " NativeObject is not "
                            + "registered in this NativeObjectManager");

What strikes me as odd is that (in 3.1.0-stable) “Image.java” has only 1065 lines of sourcecode. There’s no line 1166. So perhaps the jme3-core version isn’t really 3.1.0.

For me, the obvious question is whether you could upgrade to a more recent engine, such as JME 3.4.0-stable.

Apollo · September 28, 2021, 8:08am

Maybe that’s true. I thought that I’d be using 3.1.0-alpha but I may be wrong.
How can I find the exact version actually?

Nevertheless, there seem to be some changes in Image. I will upgrade to a new version and see if the error persists.

sgold · September 28, 2021, 8:19am

How can I find the exact version actually?

Two ways:

If using the JME SDK (or Netbeans) there should be a “Dependencies” node under the project in the “Projects” tab.
At runtime, you can print various static fields of com.jme3.system.JmeVersion, including JmeVersion.FULL_NAME.

tonihele · September 28, 2021, 8:19am

This is what I was afraid of too. That there is a real problem and the toString NPE just masks it…

Apollo · September 28, 2021, 8:29am

FYI, my current version is 3.3-6654. Not so old actually.

sgold · September 30, 2021, 9:42pm

If you ever find a convenient way to reproduce the underlying issue, I’d like to take a look at it.

In pursuit of reproducibility, you might try hastening garbage collection by strategically setting variables to null and invoking System.gc().

Apollo · October 1, 2021, 5:20am

Thank you. I will give it a try on the weekend. Since I have a lot of scenery content I will also take a closer look at the texture count and memory consumption and try to pin point which part of the scenery is causing the issue.

Apollo · October 3, 2021, 10:04am

@sgold a few updates from today:

I am almost able to reproduce the error within 5-10 minutes. I jump between 4 different airports very heavily and call System.gc() every ten seconds. This seems to provoke the Exception.
There is no guarantee however, sometimes it just works and sometimes it does not. (Probably I should try it only on Mondays.)

Additionally, I found another thread. The difference is, that I do not manually dispose stuff, but I think the same thing could be happening. I don’t want to jinx it, but the additional call to ref2.clear() seems to fix the problem on my side too. I’ll keep you informed in any case.
I would appreciate your thoughts however, because I still do not understand the problem 100%.

Retzinsky · October 3, 2021, 12:54pm

Ah yes, this very much looks like the same issue I was having. The “NativeObject is not registered in this NativeObjectManager” crash sometimes manifests as the NPE you are getting. As you can see the problem is a tricky one to get to the root of because of the obfuscating nature of how the NativeObjectManager works.

An important thing to note is that your code is manually disposing of some NativeObjects, because there are numerous calls to it within the engine itself. FilterPostProcessor.cleanup() for example disposes of its NativeObjects explicitly.

pspeed · October 3, 2021, 5:02pm

Which is only called if you remove the FPP… and if you are constantly adding and removing FPP then you are definitely taking your life into your own hands.

I think most apps probably create it once and remove it once (at shutdown).

I’d be curious to know how many of these are done “all the time” and how many are done during normal shutdown.

Apollo · October 3, 2021, 5:41pm

@Retzinsky Thanks for joining the discussion too. I am curious whether you experienced the problem after your proposed fix. Were you able to isolate the issue or make it exactly reproducible?

@pspeed Maybe we should not call it „manually disposing“ stuff, because in a way every process which removes stuff from the scenegraph and removes all references (ArrayLists etc.) suffers from the same problem. I think that it is not connected to the FPP. Maybe one issue is that the objects are rendered in multiple viewports?

pspeed · October 3, 2021, 5:56pm

Well, I think we need to distinguish between manually disposing and “normal GC disposing”… because they are certainly different. And the errors you indicate seem impossible for naturally GC’ed objects to encounter (how could something be trying to call toString() on an image that nothing references, for example).

So, the question is: does your app manually, as in your code calls it, dispose of objects hoping to be GC-friendly? Or are 100% of your objects reclaimed naturally by the garbage collector?

Are there other things that you add/remove that a simpler app might just enable/disable (post processors being one example but I’m sure there are others)?

I’m not saying we shouldn’t be allowed to do these things but it would definitely help narrow down the issues.

sgold · October 3, 2021, 6:53pm

So you (maybe) solved the issue by modifying the Engine, specifically “NativeObjectManager.java”?

(Before this, I didn’t realize that @retzinsky was talking about modifying the Engine!)

I’m convinced that clearing the reference is a good idea, to avoid the race condition you apparently encountered. I’ll open an issue at GitHub to that effect.

EDIT: explicitly deleting a native object should clear its phantom reference · Issue #1614 · jMonkeyEngine/jmonkeyengine · GitHub

Retzinsky · October 3, 2021, 7:25pm

Sometimes I’m changing the kind of camera I’m using and the ViewPort along with it and was only doing what seemed like good housekeeping in destroying the FPP and making another later as required.

It hasn’t happened since the fix. I was able to make it sufficiently reproducible that I’m happy the solution works.