SimEthereal, BufferOverflowException


#1

I’ve occasionally been having a problem that crashes my game server.

07:22:25,916 ERROR [StateCollector] Collection error
java.nio.BufferOverflowException
	at java.nio.Buffer.nextPutIndex(Buffer.java:521) ~[?:1.8.0_172]
	at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:169) ~[?:1.8.0_172]
	at com.jme3.network.serializing.serializers.ByteSerializer.writeObject(ByteSerializer.java:51) ~[jme3-networking-3.3.0-SNAPSHOT.jar:3.3-6587]
	at com.jme3.network.serializing.serializers.ArraySerializer.writeArray(ArraySerializer.java:124) ~[jme3-networking-3.3.0-SNAPSHOT.jar:3.3-6587]
	at com.jme3.network.serializing.serializers.ArraySerializer.writeObject(ArraySerializer.java:109) ~[jme3-networking-3.3.0-SNAPSHOT.jar:3.3-6587]
	at com.jme3.network.serializing.serializers.FieldSerializer.writeObject(FieldSerializer.java:202) ~[jme3-networking-3.3.0-SNAPSHOT.jar:3.3-6587]
	at com.jme3.network.serializing.Serializer.writeClassAndObject(Serializer.java:458) ~[jme3-networking-3.3.0-SNAPSHOT.jar:3.3-6587]
	at com.jme3.network.base.MessageProtocol.messageToBuffer(MessageProtocol.java:73) ~[jme3-networking-3.3.0-SNAPSHOT.jar:3.3-6587]
	at com.jme3.network.base.DefaultServer$Connection.send(DefaultServer.java:582) ~[jme3-networking-3.3.0-SNAPSHOT.jar:3.3-6587]
	at com.simsilica.ethereal.net.StateWriter.endMessage(StateWriter.java:437) ~[sim-ethereal-1.2.1-SNAPSHOT.jar:?]
	at com.simsilica.ethereal.net.StateWriter.flush(StateWriter.java:451) ~[sim-ethereal-1.2.1-SNAPSHOT.jar:?]
	at com.simsilica.ethereal.NetworkStateListener.endFrameBlock(NetworkStateListener.java:226) ~[sim-ethereal-1.2.1-SNAPSHOT.jar:?]
	at com.simsilica.ethereal.zone.StateCollector.collect(StateCollector.java:262) [sim-ethereal-1.2.1-SNAPSHOT.jar:?]
	at com.simsilica.ethereal.zone.StateCollector$Runner.run(StateCollector.java:313) [sim-ethereal-1.2.1-SNAPSHOT.jar:?]

Any thoughts on how to troubleshoot this problem? I turned off message splitting with this

getService(EtherealHost.class).getStateListener(conn).setMaxMessageSize(65535); 

because of other problems we previously encountered.

Thanks!


#2

Your messages are getting too big for JME’s poopy Serializer. Since it’s based off of buffers instead of streams, the only way to write objects is to guess at some size and hope you don’t run out of RAM for picking a size too big. SpiderMonkey picks 32767 as the max message size… mostly because the data size in the protocol was already a signed short.

One of my big mistakes in life was not rewriting the Serializer when I rewrote the rest of SpiderMonkey… but now it is what it is.

Well, you should fix your real problem, then… since you’ve tied SimEthereal’s hands.


#3

Simethereal crashes if I let it split messages … as per our previous messages regarding this. (probably over a year ago now)

So my real problem is that simethereal has a message splitting bug I don’t know how to fix? …


#4

In my timeline, a year ago might as well be a decade. :slight_smile:

Are the messages getting too big because you have tons of objects or because bad connections are letting ACKs accumulate?

Edit: also are you running the latest SimEthereal or an older version?


#5

I get around 90 physics objects maximum at any given time.

I am running an older version because I didn’t want to update to the latest 4 days before releasing my game. (I am using version 1.2.1 I think? before you redid the time sync stuff)

In the above error message I had 5 players joined.


#6

Did you address message splitting at all in the new release?


#7

Did you address message splitting at all in the new release?

Did you check the release notes?


#8

Which new release? I’m still not sure which release you are running so I can’t comment on what new might help fix/resolve/debug this issue. I’m pretty sure I haven’t fixed anything directly related to your issue. (link to previous thread could be helpful, by the way)

The point is that even if I did somehow find and fix something then it’s still an uphill battle for you to use it anyway… so I’m unlikely to get much feedback on whether something I do fixes it or not. That’s all.

I’m going to try some tests to see if I can reproduce it locally.


#9

Missed this before. So ignore my comments about not knowing which version you are running.


#10

So in local testing, if I create a giant number of objects (for me)… like 200+ in the local zone… everything works fine until I leave the space that can see that zone.

I then get an error about bad things happening in the ack processing or something like that. Upping the message size to 32000 fixes that (and is small enough to avoid any issues with SpiderMonkey). So there seems to be something I can try to fix.

However, these parameters tend to make me worry about your setup. Like number of objects, whether you’ve tweaked the update rates, or if you are sending object updates faster than 60 Hz, etc.

In the normal setup, objects will be updated as some application-defined frequency… I guess usually 60 FPS or less on the server. By default, the state collector then bundles and sends these 20 times per second… so will try to send three frames per message.

Somehow, in your setup, you are managing to exceed 32767 bytes in message size for 1/20th a second’s worth of data. Considering that 80 or so constantly moving/rotating objects can fit in under 1500 bytes, that seems pretty crazy.

Now, clearly there is something bad going on when objects leave visibility… and I’m not willing to rule out that it is a cumulative error. (Though local testing has not indicated that it is.)


#11

For what it’s worth, I have locally solved the watchdog overflow problem (at least in prototype form), ie: seeing this exception:

If that was your old problem then it may be fixed by the upcoming changes.

If your old problem was different then a link to the old thread/messages would be helpful.


#12

@pspeed this who thread is deja-vu for me. Remember when I stress tested your library in production with 1500 simultaneous users about 3 years ago and found this bug? I spent a few weeks finding the root cause and patching my version of the code. I do seem to recall you fixing it at some point because today I am running your latest version.


#13

It’s possible… it also could have been something different.

Locally I’m able to replicate the problem by rapidly spawning objects. One per 0.1 second with a 30 second decay. Overall that’s a churn of about 300 objects where every 0.1 second 1 is created and 1 is removed… very taxing on the network code.

I’ve committed a fix to master that is available to anyone who builds from head.

Changes can be seen here:

Locally it fixes my problem.

This stress test is a bit unrealistic but it’s good for tracking down issues… as I already have two more problems to look into triggered by this test (but not a more realistic test).

Edit: note that most of the changes in the diff are the addition of trace logging. The actual fix is just to make the watchdog max variable based on message lag conditions.


#14

Hello Pspeed,

First, thank you, I really appreciate your help … I admit fully that debugging this thing is a bit beyond my skills as a programmer. I full admit that.

When we previously had issues, the size >= 128 line ‘fixed’ it, however I never had the opportunity to test with more players. Since I released my game, I’ve been getting more players and then encountered this error.

I have not updated any of the update rates.

We previously had 80 objects max when I tested this last time you asked about it, however with 5 players its possible that its bumped up a little from there, but honestly not much.

I simply don’t create/destroy game objects quickly enough to get more than around 100 at any given time.

However, I do frequently change zones, very frequently in fact.

https://www.twitch.tv/videos/385772848 to illustrate the game. Zone switching happens a fair amount, especially on larger maps. Perhaps its a issue with the scale I’m using? each tile is 16 units and my grid size is 256 … the map in the video is 55x55 tiles I think?

from the discussion so far it sounds like I should update to the latest master and set my max message size to 32000. Does that sound correct?

Again, thank you for your help pspeed, your code has taught me so much, I am super grateful.

Wobblytrout


#15

Your game always looks so cool.

Definitely try master if you can. Your current message size is also way too big so yeah, if you set it at all then definitely lower it to 32000 or so.

Splits were causing the issue faster because the code was burning through message IDs faster. You may find that you don’t need to worry about that more and can go back to the default. That being said, if you aren’t experiencing performance issues with the larger message size then you might as well leave it big (32000 or so) to avoid splitting at all.


#16

Thanks man, it means alot … I couldn’t have made it without your help.

I will update all my stuff to master, thank you!

Wobblytrout


#17

Just a note: I’ve just committed some additional changes which should help a lot in these cases.

In my own local testing I have 300 active objects where one is being created every 0.1 seconds and one is being destroyed every 0.1 seconds. So there is a constant churn of about 300 objects.

I was seeing a lot of strange issues when crossing zone boundaries. Missing baseline messages, etc.

One thing that was nagging at me was that as the ACK lists grew (because of message lag or whatever), that array would take up more and more of the object state messages. This concerned me because SimEthereal tries to be a really tight protocol (I count every bit). I had the idea that mostly (always?) these ACK lists would be one contiguous set of values. I can’t really count on “always” so I decided to write a Set implementation that internally keeps track of ranges.

Thus a new class: IntRangeSet (and unit test suite)

I then converted the internal ACK tracking to use it.

When looking at the SentState message reading/writing, I noticed it has been calculating its header size wrong all this time… I also noticed that it was only using 8 bits for the array size. When SimEthereal was always throwing exceptions >= 128 IDs before this would never have been a problem… but now that I let the buffer grow based on message lag, anything over 255 ACKs in the array would cause issues.

Fortunately, all of that code has been replaced and not only ranges are sent… and in all of my testing, only ever one range. So instead of 4*ACK count bytes of header, it takes up 7 bytes of header. (And I could probably reduce that further by a byte or three.)

Bottom line: the new code is much more stable, maybe a little faster, and a lot more line-efficient on the network.

…I can also run my 300 cycling objects test without any issues. A good sign.

Edit: do note that to build SimEthereal right now, you will also need to build SimMath. Should be as easy as gradle install in both projects (assuming you’re using gradle for your projects).


#18

Oh wow, that is awesome. Thank you pspeed!

I will update my game as soon as I can so I can try things out. Hopefully this weekend I’ll have enough time.

Thanks


#19

When I use gradle for dependencies, does ‘snapshot’ account for latest commit? or do you not build inter-release commits to the jcenter/?-repositories?


#20

Good work by the way @pspeed