I’ve occasionally been having a problem that crashes my game server.
07:22:25,916 ERROR [StateCollector] Collection error
at java.nio.Buffer.nextPutIndex(Buffer.java:521) ~[?:1.8.0_172]
at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:169) ~[?:1.8.0_172]
at com.jme3.network.serializing.serializers.ByteSerializer.writeObject(ByteSerializer.java:51) ~[jme3-networking-3.3.0-SNAPSHOT.jar:3.3-6587]
at com.jme3.network.serializing.serializers.ArraySerializer.writeArray(ArraySerializer.java:124) ~[jme3-networking-3.3.0-SNAPSHOT.jar:3.3-6587]
at com.jme3.network.serializing.serializers.ArraySerializer.writeObject(ArraySerializer.java:109) ~[jme3-networking-3.3.0-SNAPSHOT.jar:3.3-6587]
at com.jme3.network.serializing.serializers.FieldSerializer.writeObject(FieldSerializer.java:202) ~[jme3-networking-3.3.0-SNAPSHOT.jar:3.3-6587]
at com.jme3.network.serializing.Serializer.writeClassAndObject(Serializer.java:458) ~[jme3-networking-3.3.0-SNAPSHOT.jar:3.3-6587]
at com.jme3.network.base.MessageProtocol.messageToBuffer(MessageProtocol.java:73) ~[jme3-networking-3.3.0-SNAPSHOT.jar:3.3-6587]
at com.jme3.network.base.DefaultServer$Connection.send(DefaultServer.java:582) ~[jme3-networking-3.3.0-SNAPSHOT.jar:3.3-6587]
at com.simsilica.ethereal.net.StateWriter.endMessage(StateWriter.java:437) ~[sim-ethereal-1.2.1-SNAPSHOT.jar:?]
at com.simsilica.ethereal.net.StateWriter.flush(StateWriter.java:451) ~[sim-ethereal-1.2.1-SNAPSHOT.jar:?]
at com.simsilica.ethereal.NetworkStateListener.endFrameBlock(NetworkStateListener.java:226) ~[sim-ethereal-1.2.1-SNAPSHOT.jar:?]
at com.simsilica.ethereal.zone.StateCollector.collect(StateCollector.java:262) [sim-ethereal-1.2.1-SNAPSHOT.jar:?]
at com.simsilica.ethereal.zone.StateCollector$Runner.run(StateCollector.java:313) [sim-ethereal-1.2.1-SNAPSHOT.jar:?]
Any thoughts on how to troubleshoot this problem? I turned off message splitting with this
Your messages are getting too big for JME’s poopy Serializer. Since it’s based off of buffers instead of streams, the only way to write objects is to guess at some size and hope you don’t run out of RAM for picking a size too big. SpiderMonkey picks 32767 as the max message size… mostly because the data size in the protocol was already a signed short.
One of my big mistakes in life was not rewriting the Serializer when I rewrote the rest of SpiderMonkey… but now it is what it is.
Well, you should fix your real problem, then… since you’ve tied SimEthereal’s hands.
Which new release? I’m still not sure which release you are running so I can’t comment on what new might help fix/resolve/debug this issue. I’m pretty sure I haven’t fixed anything directly related to your issue. (link to previous thread could be helpful, by the way)
The point is that even if I did somehow find and fix something then it’s still an uphill battle for you to use it anyway… so I’m unlikely to get much feedback on whether something I do fixes it or not. That’s all.
I’m going to try some tests to see if I can reproduce it locally.
So in local testing, if I create a giant number of objects (for me)… like 200+ in the local zone… everything works fine until I leave the space that can see that zone.
I then get an error about bad things happening in the ack processing or something like that. Upping the message size to 32000 fixes that (and is small enough to avoid any issues with SpiderMonkey). So there seems to be something I can try to fix.
However, these parameters tend to make me worry about your setup. Like number of objects, whether you’ve tweaked the update rates, or if you are sending object updates faster than 60 Hz, etc.
In the normal setup, objects will be updated as some application-defined frequency… I guess usually 60 FPS or less on the server. By default, the state collector then bundles and sends these 20 times per second… so will try to send three frames per message.
Somehow, in your setup, you are managing to exceed 32767 bytes in message size for 1/20th a second’s worth of data. Considering that 80 or so constantly moving/rotating objects can fit in under 1500 bytes, that seems pretty crazy.
Now, clearly there is something bad going on when objects leave visibility… and I’m not willing to rule out that it is a cumulative error. (Though local testing has not indicated that it is.)
@pspeed this who thread is deja-vu for me. Remember when I stress tested your library in production with 1500 simultaneous users about 3 years ago and found this bug? I spent a few weeks finding the root cause and patching my version of the code. I do seem to recall you fixing it at some point because today I am running your latest version.
It’s possible… it also could have been something different.
Locally I’m able to replicate the problem by rapidly spawning objects. One per 0.1 second with a 30 second decay. Overall that’s a churn of about 300 objects where every 0.1 second 1 is created and 1 is removed… very taxing on the network code.
I’ve committed a fix to master that is available to anyone who builds from head.
First, thank you, I really appreciate your help … I admit fully that debugging this thing is a bit beyond my skills as a programmer. I full admit that.
When we previously had issues, the size >= 128 line ‘fixed’ it, however I never had the opportunity to test with more players. Since I released my game, I’ve been getting more players and then encountered this error.
I have not updated any of the update rates.
We previously had 80 objects max when I tested this last time you asked about it, however with 5 players its possible that its bumped up a little from there, but honestly not much.
I simply don’t create/destroy game objects quickly enough to get more than around 100 at any given time.
However, I do frequently change zones, very frequently in fact.
Twitch to illustrate the game. Zone switching happens a fair amount, especially on larger maps. Perhaps its a issue with the scale I’m using? each tile is 16 units and my grid size is 256 … the map in the video is 55x55 tiles I think?
from the discussion so far it sounds like I should update to the latest master and set my max message size to 32000. Does that sound correct?
Again, thank you for your help pspeed, your code has taught me so much, I am super grateful.
Definitely try master if you can. Your current message size is also way too big so yeah, if you set it at all then definitely lower it to 32000 or so.
Splits were causing the issue faster because the code was burning through message IDs faster. You may find that you don’t need to worry about that more and can go back to the default. That being said, if you aren’t experiencing performance issues with the larger message size then you might as well leave it big (32000 or so) to avoid splitting at all.
Just a note: I’ve just committed some additional changes which should help a lot in these cases.
In my own local testing I have 300 active objects where one is being created every 0.1 seconds and one is being destroyed every 0.1 seconds. So there is a constant churn of about 300 objects.
I was seeing a lot of strange issues when crossing zone boundaries. Missing baseline messages, etc.
One thing that was nagging at me was that as the ACK lists grew (because of message lag or whatever), that array would take up more and more of the object state messages. This concerned me because SimEthereal tries to be a really tight protocol (I count every bit). I had the idea that mostly (always?) these ACK lists would be one contiguous set of values. I can’t really count on “always” so I decided to write a Set implementation that internally keeps track of ranges.
Thus a new class: IntRangeSet (and unit test suite)
I then converted the internal ACK tracking to use it.
When looking at the SentState message reading/writing, I noticed it has been calculating its header size wrong all this time… I also noticed that it was only using 8 bits for the array size. When SimEthereal was always throwing exceptions >= 128 IDs before this would never have been a problem… but now that I let the buffer grow based on message lag, anything over 255 ACKs in the array would cause issues.
Fortunately, all of that code has been replaced and not only ranges are sent… and in all of my testing, only ever one range. So instead of 4*ACK count bytes of header, it takes up 7 bytes of header. (And I could probably reduce that further by a byte or three.)
Bottom line: the new code is much more stable, maybe a little faster, and a lot more line-efficient on the network.
…I can also run my 300 cycling objects test without any issues. A good sign.
Edit: do note that to build SimEthereal right now, you will also need to build SimMath. Should be as easy as gradle install in both projects (assuming you’re using gradle for your projects).