Refactoring SpiderMonkey

rbottema · March 13, 2016, 11:55am

We are two students that are examining jMonkeyEngine as part of a Software Re-engineering course (looks like our professor chose jMonkeyEngine before). One of the assignments is to refactor parts of the code (we have chosen the networking package) to make it more maintainable. We would like to make our efforts useful for you, the core maintainers, as well. Our question for you is: What is something that you would refactor if you had the time for it?

john01dav · March 13, 2016, 1:28pm

Tagging @pspeed as he is (AFAIK) the primary developer for SpiderMonkey.

Also, it’s not exactly refactoring, but I would be very happy to see built-in support for messages larger than 32kb. Right now attempting to send something too large causes a BufferOverflowException. You could put some logic to split the bytes of a serialized message into several smaller “wrapper” messages (provided by the APi in the backend, the user of the API never directly interfaces with this class) which are then automatically reassemled when the message is received before it is deserialized.

Right now, any developer wanting large messages (which are useful for a variety of things such as images and map data) needs to either write a separate networking system or write the aforementioned message chunking. I’ve been meaning to write something like this myself and submit a PR but I haven’t found the time lately.

EDIT: Based on the link in your post it seems like your course is mainly for maintaining software and not just refactoring. IMHO adding features counts as maintaining.

pspeed · March 13, 2016, 5:08pm

This person is casting about as they also sent me an e-mail. SpiderMonkey is fine and doesn’t need refactoring.

The message splitting you ask for is tough to put down in the core layers as you would need to manage it on some level to keep it from swamping your other messages. Also in these cases it’s best to compress the content when you can… another thing that easier for you to do before making the message. And once you’ve done that, for the simple message splitting it’s one simple static method on your end. On the other hand it’s a whole bunch of nonsense inside the framework.

One day I will rewrite serialization to not use Buffers at all because it’s fairly ridiculous. It’s the one thing I kept when I rewrote spider monkey because swapping it out would have broken a bunch of folks. I’ve regretted it almost every day.

xuan · March 13, 2016, 11:42pm

And wouldn’t that be a refactor? :chimpanzee_confused:

pspeed · March 13, 2016, 11:55pm

No, it’s swapping out one piece that would break literally every spidermonkey user. The rest of the networking code would hardly change… 99% would stay exactly the same, basically. That’s not really a refactoring of the networking library.

One of the things OP didn’t include in the post in this topic that they sent me in an e-mail is that they want to refactor without affecting existing users.

pspeed · March 13, 2016, 11:58pm

For the record, here is the e-mail correspondence:

On 3/9/2016 7:31 AM, Michiel Haisma wrote:

Dear Paul Speed,

We were examining the source code of the JME networking module, and
found it to be mostly build by you. In the objective of doing a course
on Software-Reenigneering, we would like to ask you, what do you think
are some things that could be improved in the networking package, such
that the internal structure of the code would be improved, but no
external behavior would be changed?

When we will do some refactorings on the networking package, we will
open a PR for this on the GH page, but we thought there might have been
some things on your mind that you would like to change, but didn’t have
any time for.

We would love to hear from you.

Kind Regards,

Michiel Haisma
Rowan Bottema

Master Students of
Delft University of Technology

My response:

I rewrote and redesigned it from the ground up already and those designs
and requirements still hold up. Other than serialization, (which is
impossible to fix without breaking everyone’s code), there is nothing
I’d change about the architecture at this point.

-Paul

john01dav · March 14, 2016, 12:10am

It would be easy enough to make large messages get automatically gzipped or to add a flag similar to setReliable(). In my specific use case I’m just taking a message wrapper for a large already compressed byte array and splitting it up into several messages.

Whenever I have used message splitting I needed to write a custom serializer + deserializer for my data along with custom chunking both on the client and the server. Am I missing some much simpler way to implement this or is there some built-in static method I’m not aware of.

EDIT: If the issue with buffers is that it would break anything how is it any different from the changes in the registering of messages in alpha2?

pspeed · March 14, 2016, 1:02am

Nothing broke with registering messages. If you want the old behavior you only have to remove the service that is sending the serializer data automatically. Then you get exactly the old behavior that there always was. Since 99.999999% of the questions regarding SpiderMonkey resolved to “did you remember to register messages on the client”… I chose to default to the version that won’t trip up new users. Old users merely get to stop doing something that was a pain to do anyway.

Switching from buffers to something else would break EVERYONE… like, completely rewrite all of your messages classes and serializers. We will do it but it won’t be done lightly.

But using Buffers was a dumb approach that only got dumber with age. There were some maybe good reasons originally but they are completely blown away by lessons learned.

When I’ve done it, I just had a utility method to split a byte array into multiple messages and one to put it back together again. The messages themselves were simple byte array with an indicator for last message. Well, there was a message ID too so I knew where to collect them for rejoining.

The thing is, to build this into the core would require somewhere to keep all of that stuff while it’s being built. That being said, now that there is a service model then you could probably write one to do it. Some kind of streaming service would be nice to have.

…it doesn’t belong down in the message layer, though.

pspeed · March 14, 2016, 1:14am

Put another way… here is the crux of the problem down inside spider monkey:

How big do we make the Buffer so that we can serialize your message to tell that it’s too big for the buffer? And then what if it’'s too big for that buffer?

That’s the problem with using java.nio.Buffer for serialization.

Anything that needs to send something of arbitrary size will have to do it through some different mechanism that doesn’t serialize through a Buffer. If you write a good streaming service using the new service model I’m sure that it would be great for consideration. All of the challenges you will have to deal with would have been challenges that the SM message layer would somehow have had to deal with… but you’d have the benefit of getting to define a streaming interface, at least.

john01dav · March 14, 2016, 1:20am

Would it be considered acceptable if it used java.io.* rather than nio? I know that nio performs faster but I doubt it would be much of an issue on modern multicore computers if each stream had it’s own thread. If this is okay, then it would probably be as simple as:

DataOutputStream dos = new DataOutputStream(socket.getOutputStream());

while(!socket.isClosed()){
    byte[] messageData = Serializer.serialize(getNextMessage());
    dos.writeInt(messageData.length);
    dos.write(messageData);
    dos.flish();
}

(of course that code assumes that the existing serializer is structured in a way that the modification to a byte array instead of buffer would be easy enough and it also is missing a lot of the code needed to manage threading but that stuff is basically just boilerplate)

Also, if there end up being new serializer methods that return byte[] arrays (perhaps built with a ByteArrayOutputStream) rather than buffers it may be useful to deprecate the buffer methods for a few versions and have both available to give people time to switch.

pspeed · March 14, 2016, 1:28am

Yes, of course that’s possible… but it requires completely rewriting all of the serializer package. Basically, every serializer already written would have to be rewritten. Every existing custom serializer anyone has written would have to be rewritten.

…and if I’m going to do that, I’ll do even better and use bit streams. If we’re going to replace it then we might as well replace it with something that is really better in every way. I’ll even replace the crappy serializer registration with something better. I’ve grown to hate that entire package… really really loath it. It’s only saving grace is that it is 100x easier to keep than to replace and all of my personal needs can be met on top of the existing serializer with little effort.

Ultimately, your still going to want to break big things into smaller chunks to send them… and then have something on the other end to collect them. Else you have to allocate huge amounts of RAM just to hold the traffic. Better to let the user be involved in that really.

So a streaming service would be a nice thing to have regardless. You’d have to have some way to coordinates "hey you’ve got a stream’ on the receiving end but other than that it’s pretty straight forward. Could be, grab the streaming service, create a stream and get an ID. Send that ID to the other end in a regular message to let it know it needs to start sucking data. Then at that point, sending end could be writing to a regular OutputStream and receiving end could be reading from a regular InputStream (with message magic under the covers.)

…that service could adapt to whatever we do with serializer in the future.

Random:
For an example of the bit streams that will someday be the foundation of a new serializer, you can look here:

pspeed · March 14, 2016, 1:31am

By the way, this right here misunderstands how the network layer works in a pretty fundamental way. Never once does anything at the SM client/server level get access to a socket. Messages would be impossible to broad cast, you’d be blocking whole data paths, etc… just bad news.

Under the covers there is a kernel that routes messages appropriately and buffers the responses in receiver thread pools and such.

Messages will always be sent in discrete chunks. We’re just talking about how those chunks get made.

john01dav · March 14, 2016, 5:00am

I was writing that example more to show how a message could be encoded over the network rather than the specific implementation.

pspeed · March 14, 2016, 6:00am

Ok, just making sure.

Messages are serialized and added to an outbound queue where a separate thread makes sure they get written. That’s why send() is basically instantaneous and does not block the application (as it would if we sent it on the same thread).

john01dav · March 14, 2016, 6:20am

I was thinking of something similar as well except where the message object is added to a queue and serialized on the network thread. Would a system like the following be good?:

One ServerSocket thread on server
Accepts connections and then creates new Kernel object to read, write, and process messages
Each Kernel gets it’s own thread
Read kernel just reads from a DataInputStream similar to my previous example (on it’s own thread, so blocking isn’t an issue)
Read kernel passes data to a message processing queue where the messages are deserialized and listeners called
Write kernel keeps an arraylist of queued messages and sends + serializes them as needed using interrupts
The same read+write kernels are used on the client and the server. The difference is how they are constructed.

In this case, although Sockets are being used, there isn’t an issue since each socket is a private field of a Kernel object so I don’t think that this violates the rule of keeping those kinds of objects private.

I don’t know the NIO library nor do I have the time to learn it so I can’t implement anything using that right now.

Another possibility (albeit perhaps not a good one) is to have the serializer write directly to the DataOutputStream of the socket so that messages can start being sent as they are serialized. This is what I used in addition to the thread described above when I implemented a non-JME game that needed a networking framework.

john01dav · March 14, 2016, 6:26am

Why not just add a setBufferSize() method? If an NIO’s buffer size can’t be changed on the fly it should be easy enough to construct a new one. That way, each user can choose what they need.

pspeed · March 14, 2016, 6:52am

The other limitation is that the message size is sent as two bytes… so you are limited to 65535 max anyway.

No matter what… ever single thing about your networked game will be happier is you break you up your huge data into some management chunks. Whether that’s 32k or 64k or 128k it doesn’t matter. There could be any number of things that would like to get a message through on that channel that you’ve blocked with the 2 meg image your are sending as one giant stream. Everything will be happier if there is some max size and there is always going to be a case where you want to send something bigger.

So why not implement a streaming service to handle that case?

john01dav · March 14, 2016, 6:55am

Fair enough. I was trying to think of a temporary solution until something better can be implemented. Also, dosen’t SpiderMonkey support multiple channels each with their own connection (I never saw anything about this on the wiki but I’ve seen people talk about this on the forums)? If so, then it is easy enough to have a separate buffer size per channel.

pspeed · March 14, 2016, 6:59am

Yeah, but there still could be reasons to limit the size of messages. Right now is spider monkey drops a connection then you are done. Someday I’d like to implement a reconnect under the covers. That’s going to be pretty impossible if channels are tied up sending megabytes of data.

The service described is not hard to write. Someone just needs to do it. Volunteers welcome. It’s easier to do than anything proposed so far by a longshot.

john01dav · March 14, 2016, 7:02am

If there are two channels, the main one and a bulk one and the main one disconnects, what prevents the main one from reconnecting while the bulk one sends data? Web browsers do this all the time (continuing to browse while a large file is downloading).

Are you referring to the service I described or the idea of using bitstreams (I’m not even sure what those are – Google gives nothing) to re-implement the serializer?