GZIPSerializer creates too big buffers

alrik · June 22, 2015, 10:21am

Hi,

I have a question about the GZIPSerializer/ZIPSerializer. Is there a reason why the whole temp buffer is written to the output buffer?
The output array is a lot smaller if only the used space is written from the temp buffer like this:

public void writeObject(ByteBuffer buffer, Object object) throws IOException
{
    if (!(object instanceof GZIPCompressedMessage)) return;
    Message message = ((GZIPCompressedMessage)object).getMessage();

    ByteBuffer tempBuffer = ByteBuffer.allocate(512000);
    Serializer.writeClassAndObject(tempBuffer, message);

    ByteArrayOutputStream byteArrayOutput = new ByteArrayOutputStream();
    GZIPOutputStream gzipOutput = new GZIPOutputStream(byteArrayOutput);

    // ORIGINAL
    //gzipOutput.write(tempBuffer.array());

    // MODIFIED
    tempBuffer.flip();
    gzipOutput.write(tempBuffer.array(), 0, tempBuffer.limit());

    gzipOutput.flush();
    gzipOutput.finish();
    gzipOutput.close();

    buffer.put(byteArrayOutput.toByteArray());
}

regards,

Alrik

pspeed · June 22, 2015, 10:27am

(shrug) I think probably no one uses this class.

alrik · June 22, 2015, 10:44am

I use it and I think there are enough other developers that use it.

alrik · June 22, 2015, 11:54am

Are there any alternatives? What do you use?

pspeed · June 22, 2015, 1:22pm

In all of my messages, I’ve only ever needed compression once… and in that case, data-specific compression was better and I compressed the data instead of compressing the message every time I needed to send it.

Code that uses SpiderMonkey is meant to be fast… so compression almost never comes up. Code that uses SpiderMonkey also needs to understand that messages can only be a max of 64k… often the type of data that needs compression is unbounded so spider monkey messages are unsuitable in the first place or the data will already be broken up… in which case it’s better to compress before breaking it up.

alrik · June 22, 2015, 2:37pm

It doesn’t matter in which way you use the compression feature. It’s an improvement why not simply implement it? You can save up to 50% of data usage.

normen · June 22, 2015, 2:40pm

Good question, source code is here:

pspeed · June 22, 2015, 3:05pm

Because usually when doing real time networking, you care more about speed than saving a few bytes… and compression is slow.

Edit: and to be clear: 90% of SpiderMonkey messages in a typical game are less than 100 bytes… compression is completely wasted in those cases. If you find you are sending lots of huge messages then you are doing something wrong.

alrik · June 22, 2015, 3:25pm

How do you send your block data in mythruna if compression is so slow and bad and sending lots of huge messages is wrong?
pull request

pspeed · June 22, 2015, 3:30pm

I already answered this question. I do it in a data-specific way (run length encoded + gzipped) but I don’t do it per message… because then it would have to be done EVERY TIME I sent it… versus just doing it when it changes. Even better, I can store it compressed, too.

Edit: and note as also mentioned earlier, in my case, even after run length encoding and zipping the data, it might be too big for a single message and must be split. And this split has to happen after compression. So the data in the messages is already compressed and there is no reason (in fact many reasons not) to use the GZip serializer.

…and this is likely to be true for any messages large enough for zipping to really matter more than speed.

Kecon · June 23, 2015, 8:33pm

It takes about 15-30 micro seconds for my 6 years old i7 950 CPU to pack 100 bytes (not really a problem if you offload it to a secondary core using an async method, that time is nothing if you will run your program over the Internet). The question is also how many times per second these packets are sent. The risk of broken packets increase with size, and that might trigger resendings if you’re using TCP.

What you should consider before using GZIP or Deflate is how good you can pack your data? My evil random method here doesn’t exactly generate good patterns, so you will end up with larger packets, but you might get a better result if you’re sending text for example, or if you’re sending low numbered integers instead of floats.

I don’t know what kind of game to OP is writing, so I suppose that you need to do some real world testing first. I wrote a game earlier that downloaded worlds via this API, and that was faster if I packed the data, but I never packed commands.

public class GzipTest {

    private static final Random RANDOM = new Random();

    public static void main(String[] args) throws IOException {
        testGzip();
        testDeflate();
    }

    public static void testGzip() throws IOException {
        final List<Long> times = new ArrayList<Long>();
        final List<Integer> sizes = new ArrayList<Integer>();
        for (int i = 0; i < 200000; i++) {

            final byte[] bs = getData();
            final long start = System.nanoTime();

            try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {

                try (GZIPOutputStream gzipOutputStream = new GZIPOutputStream(outputStream)) {
                    gzipOutputStream.write(bs);
                    gzipOutputStream.close();

                    final long end = System.nanoTime();

                    times.add(end - start);
                    sizes.add(outputStream.toByteArray().length);
                }
            }
        }

        long totalTimes = 0;
        for (Long n : times) {
            totalTimes += n;
        }

        long totalSize = 0;
        for (Integer n : sizes) {
            totalSize += n;
        }

        System.out.println("GZIP; Average: " + (totalTimes / (long) times.size()) + ", size: " + (totalSize / (long) sizes.size()));

    }

    public static void testDeflate() throws IOException {
        final List<Long> times = new ArrayList<Long>();
        final List<Integer> sizes = new ArrayList<Integer>();
        for (int i = 0; i < 200000; i++) {

            final byte[] bs = getData();
            final long start = System.nanoTime();

            try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {

                try (DeflaterOutputStream deflateOutputStream = new DeflaterOutputStream(outputStream)) {
                    deflateOutputStream.write(bs);
                    deflateOutputStream.close();

                    final long end = System.nanoTime();

                    times.add(end - start);
                    sizes.add(outputStream.toByteArray().length);
                }
            }
        }

        long totalTimes = 0;
        for (Long n : times) {
            totalTimes += n;
        }

        long totalSize = 0;
        for (Integer n : sizes) {
            totalSize += n;
        }

        System.out.println("Deflate; Average: " + (totalTimes / (long) times.size()) + ", size: " + (totalSize / (long) sizes.size()));

    }

    public static byte[] getData() {
        final byte[] bs = new byte[100];

        for (int i = 0; i < bs.length; i++) {
            bs[i] = (byte) RANDOM.nextInt(255);
        }

        return bs;
    }
}

Output:

GZIP; Average: 25391, size: 123
Deflate; Average: 27737, size: 111

pspeed · June 23, 2015, 9:18pm

BTW: your microbenchmark is meaningless… even in the realm of microbenchmarks also being misleading.

You need to measure things in the aggregate (outside your loops) and then compare it something else. On its own the data doesn’t really mean anything.

It would also need to be run several times in order to “warm up” hotspot (which would improve your times potentially).

For anything less than the size of an MTU, it makes no sense to compress it at all. It’s a waste of time (regardless of how much, it’s a waste) and it’s definitely a waste of garbage.

For a tiny buffer like that, I’m surprised GZip even zips it at all.