ImageRaster lets you get and set pixels on jME3 images

@Momoko_Fan is still away on vacation so some questions might be left unanswered until he gets back, fyi.

@kwando said:
Is @zarch's ImagePainter available for the jme3 ImageRaster implementation?


No. I'm waiting to find out what's happening with it before I do anything more. Atm the jme3 approach is significantly worse performance (up to twice the time) in some very common cases so I'm reluctant to convert over to using it even though I'd much rather be using a standard interface.

There were some changes to ImageRaster. First is it now supports Android, but that also requires some changes on the part of the user. You now use ImageRaster.create() instead of new ImageRaster() to create a new instance. Sorry for the inconvenience. Also the component reading method was changed to using position then get/set, rather than using index versions of get/set. The performance was not tested yet however it should be faster based on what @zarch said earlier.

I’ve not looked at the changes but I did an update and build then re-ran my speed test. Here are the results:



[java]

Sep 10, 2012 8:58:00 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Transcode ABGR8 to ABGR8

Sep 10, 2012 8:58:00 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 107,927,063 (107,927) New 173,836,979 (173,836)

Sep 10, 2012 8:58:00 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Transcode ABGR8 to BGR8

Sep 10, 2012 8:58:00 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 95,341,431 (95,341) New 161,138,176 (161,138)

Sep 10, 2012 8:58:00 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Transcode ABGR8 to Luminance8

Sep 10, 2012 8:58:00 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 85,233,541 (85,233) New 152,669,606 (152,669)

Sep 10, 2012 8:58:00 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Transcode ABGR8 to RGBA8

Sep 10, 2012 8:58:01 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 99,084,760 (99,084) New 201,516,303 (201,516)

Sep 10, 2012 8:58:01 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Transcode ABGR8 to RGB8

Sep 10, 2012 8:58:01 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 94,998,918 (94,998) New 159,730,300 (159,730)

Sep 10, 2012 8:58:01 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Transcode BGR8 to ABGR8

Sep 10, 2012 8:58:01 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 96,609,120 (96,609) New 161,118,664 (161,118)

Sep 10, 2012 8:58:01 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Transcode BGR8 to BGR8

Sep 10, 2012 8:58:02 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 92,451,532 (92,451) New 152,588,556 (152,588)

Sep 10, 2012 8:58:02 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Transcode BGR8 to Luminance8

Sep 10, 2012 8:58:02 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 78,748,605 (78,748) New 146,252,513 (146,252)

Sep 10, 2012 8:58:02 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Transcode BGR8 to RGBA8

Sep 10, 2012 8:58:02 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 96,791,333 (96,791) New 192,716,628 (192,716)

Sep 10, 2012 8:58:02 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Transcode BGR8 to RGB8

Sep 10, 2012 8:58:02 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 92,063,691 (92,063) New 155,325,058 (155,325)

Sep 10, 2012 8:58:02 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Transcode Luminance8 to ABGR8

Sep 10, 2012 8:58:03 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 84,493,581 (84,493) New 143,265,955 (143,265)

Sep 10, 2012 8:58:03 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Transcode Luminance8 to BGR8

Sep 10, 2012 8:58:03 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 78,552,283 (78,552) New 138,670,095 (138,670)

Sep 10, 2012 8:58:03 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Transcode Luminance8 to Luminance8

Sep 10, 2012 8:58:03 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 67,882,744 (67,882) New 131,867,261 (131,867)

Sep 10, 2012 8:58:03 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Transcode Luminance8 to RGBA8

Sep 10, 2012 8:58:03 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 87,393,085 (87,393) New 177,470,740 (177,470)

Sep 10, 2012 8:58:03 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Transcode Luminance8 to RGB8

Sep 10, 2012 8:58:03 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 79,037,385 (79,037) New 135,035,432 (135,035)

Sep 10, 2012 8:58:03 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Transcode RGBA8 to ABGR8

Sep 10, 2012 8:58:04 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 100,557,777 (100,557) New 193,281,279 (193,281)

Sep 10, 2012 8:58:04 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Transcode RGBA8 to BGR8

Sep 10, 2012 8:58:04 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 96,456,625 (96,456) New 184,435,974 (184,435)

Sep 10, 2012 8:58:04 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Transcode RGBA8 to Luminance8

Sep 10, 2012 8:58:04 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 85,373,127 (85,373) New 179,167,996 (179,167)

Sep 10, 2012 8:58:04 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Transcode RGBA8 to RGBA8

Sep 10, 2012 8:58:05 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 100,695,563 (100,695) New 225,909,783 (225,909)

Sep 10, 2012 8:58:05 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Transcode RGBA8 to RGB8

Sep 10, 2012 8:58:05 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 95,484,620 (95,484) New 183,228,022 (183,228)

Sep 10, 2012 8:58:05 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Transcode RGB8 to ABGR8

Sep 10, 2012 8:58:05 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 98,328,590 (98,328) New 160,727,521 (160,727)

Sep 10, 2012 8:58:05 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Transcode RGB8 to BGR8

Sep 10, 2012 8:58:05 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 91,586,394 (91,586) New 153,491,818 (153,491)

Sep 10, 2012 8:58:05 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Transcode RGB8 to Luminance8

Sep 10, 2012 8:58:06 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 79,065,303 (79,065) New 146,232,101 (146,232)

Sep 10, 2012 8:58:06 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Transcode RGB8 to RGBA8

Sep 10, 2012 8:58:06 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 96,975,648 (96,975) New 192,894,939 (192,894)

Sep 10, 2012 8:58:06 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Transcode RGB8 to RGB8

Sep 10, 2012 8:58:06 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 91,448,008 (91,448) New 152,153,885 (152,153)

[/java]



My machine must be running a little slow as the old timings are slightly worse too (1 to 2% worse) but whatever change you’ve made has slowed it down even more (15 to 20% worse than the timings we did have).



For example:

[java]INFO: Transcode RGB8 to RGB8

Aug 16, 2012 5:11:17 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 89,781,070 (89,781) New 132,468,836 (132,468)[/java]



Has gone to:



[java]INFO: Transcode RGB8 to RGB8

Sep 10, 2012 8:58:06 PM jme3test.texture.ImageAndImagePainter speedTest

INFO: Time taken: Old 91,448,008 (91,448) New 152,153,885 (152,153)[/java]



Your approach is fundamentally slower so I don’t know how much you are going to be able to achieve. The best way would be to split out the image raster functionality using interfaces. Have yours as the generic “fall back” solution and then use the optimised implementations for the common formats.



I could make the changes easily enough…but…

2 Likes

With the recent changes, there are actual speed improvements… My profiler is showing a 100% speed improvement vs. before the improvements (For the byte component formats only).



In any case, the whole purpose for getPixel/setPixel was never actually performance, but compatibility with most image formats, so that things like lightmap generators could be done without having to know the image format to read it. If you’re making a system like ImagePainter, it is much faster to do so by using a standard image format like RGBA8, and having a backing byte array which only when needed writes to the underlying ByteBuffer. Doing so avoids the expensive JNI calls that write/read one byte at a time.

@Momoko_Fan said:
With the recent changes, there are actual speed improvements... My profiler is showing a 100% speed improvement vs. before the improvements (For the byte component formats only).

In any case, the whole purpose for getPixel/setPixel was never actually performance, but compatibility with most image formats, so that things like lightmap generators could be done without having to know the image format to read it. If you're making a system like ImagePainter, it is much faster to do so by using a standard image format like RGBA8, and having a backing byte array which only when needed writes to the underlying ByteBuffer. Doing so avoids the expensive JNI calls that write/read one byte at a time.


Interesting, I'll do another update and do some more testing when I get a chance. What are you using to do your speed comparisons?

Given this is a game engine performance is important. Whether you have a need for performance or not other people (myself included) do.

Your suggestion of using a buffer in a standard image format and copying is only true in situations where you have at a minimum >3* overdraw as otherwise the overhead of copying into and out of the backing byte array is larger than any potential savings. (once to write to backing, once to read backing, once to write to real destination). That's leaving aside the cost in extra memory allocations and suchlike. Consider for example an Android application which might decide to use RGBA4444 or RGB565. With an intermediate format you would allocate and read/write twice as much data as you need in order to then down sample it to the final format. Or even worse imagine the final format was something like a 32 bit floating point one. All the potential accuracy of those floating points has been thrown away and clipped at 8 bits per channel.

In my case I have maybe around 1.3* overdraw so a backing very definitely would not be more efficient. In fact I'd be surprised if any serious algorithm did perform that level of overdraw since you would be repeatedly drawing and then throwing away pixels which is very wasteful.

By overdraw I assume you mean writing over the same pixel multiple times? That’s not the reason why a byte array would help. A good situation to have a backing array is when all or most pixels are going to be manipulated in a single frame.



Let me explain why it would help. The methods ByteBuffer.get() and ByteBuffer.put() need to go down to the JNI layer in order to write these bytes into native/direct memory, therefore each time you call them, you incur a fixed performance cost. The way to avoid it is by using a backing array. Then each getPixel or setPixel call manipulates that array, and only when you actually need to update the Image’s ByteBuffer, you would do it all in one go. This way you have a single ByteBuffer.get() or ByteBuffer.put() call for potentially millions of getPixel or setPixel calls and thus avoid the aforementioned fixed performance cost.

So you mean doing bulk put(byte[]) operations? To do that efficiently presumably the backing buffer would need to be in the same format as the direct buffer being written to? That’s an interesting idea and might well make quite a difference. I might do a few speed tests on that.

@zarch said:
So you mean doing bulk put(byte[]) operations? To do that efficiently presumably the backing buffer would need to be in the same format as the direct buffer being written to? That's an interesting idea and might well make quite a difference. I might do a few speed tests on that.

That's faster for sure, we already use it for software skinning or for the batch node updates. The draw back is that you heap memory for that backing array.

Yes, absolutely. I’ll need to look at the performance vs memory tradeoff but it’s definitely worth checking out. (I didn’t follow @momoko_fan at first since he said to always do the backing buffer in a standard format which clearly wouldn’t help in this case as unless I missed something both buffers need to be in the same format).

@zarch said:
Yes, absolutely. I'll need to look at the performance vs memory tradeoff but it's definitely worth checking out. (I didn't follow @momoko_fan at first since he said to always do the backing buffer in a standard format which clearly wouldn't help in this case as unless I missed something both buffers need to be in the same format).

I'm not completely aware of what image format should look like so i may say nonsense...but can't you convert your backing array before the bulk put?
With some byte shifting magic?

See back to my post (http://hub.jmonkeyengine.org/groups/development-discussion-jme3/forum/topic/imageraster-lets-you-get-and-set-pixels-on-jme3-images/?topic_page=4&num=15#post-190902) for why I think that would be a bad idea.



There is a difference between avoiding multiple put calls to direct memory (which may well be worthwhile) and going through an intermediate format (almost certainly not worthwhile).



It’s not a problem anyway, just need the raster to support both direct and non-direct memory.

Speaking of buffers, I remember reading a while ago that there has been some improvements to direct buffer access through some compiler magic. I think it was around the time java7 was released. It is supposed to make multiple read/writes faster.



Could someone perhaps confirm this? Otherwise I’m gonna have to dig that article/info up again. Pretty sure it was from a hotspot dev, some oracle blog or something. A legit source, in other words.

Hmm, Google didn’t find anything on a quick search. Can you see if you can dig that up thanks?

I couldn’t find it. Disregard my post.