Fastest contains method?

So here's a random question. What kind of collection has the fastest "contains(object)" method? I need a collection that can give me the fastest answer to whether it contains an object. I'm assuming the entries would be sorted, but what type of sort would be best? Thanks!!

You got bottle neck in collections? Are you sure? I'm not sure how fast are List and Set collections, but I guess they are pretty fast. Furthermore, if you have sorted List you could implement binary search, that could be faster then List.contains()

If you're dealing with lots of objects in collections, it is really important to choose the right ones! So drfuglys question is an important one, Kova.

Fastest would be HashSet (and HashMap and alike). But that's not sorted. You'd get O(1) complexity for contains (if configured properly).

A TreeSet (/Map) is sorted. For the complexity the Comparator does not really matter (if not dependent on the number of entries), but it should be fast for general performance, of course. You'd get O(log(n)) complexity for contains.

Worst case are LinkedList, ArrayList etc. - everything where you'd need to search through the whole (half of) the list. Contains has complexity O(n) there.

In game development you even have an additional performance measure: amount of produced garbage! ArrayList is known to cause little garbage if you constantly add and remove entries. HashSet is known to produce a lot of garbage (like TreeSet). So you see the issue…

Different collections have their strengths in different areas (e.g. LinkedList is fast for removing entries in the middle, compared to ArrayList). So it's important to know what must be fast.

There are even optimized collections around in the net, claimed to be faster than the jdk stuff.

may I just ask how much is a lot? 50? 100? 1000? more?

Where possible you want to avoid doing lookups at all as they can be very expensive in a loop.  Are you sure you need a contains at all?

Kova said:

may I just ask how much is a lot? 50? 100? 1000? more?

1000 and more :)

drfugly said:

Take the hashcode of the object and store it into an int[] and do a binary search. [...] How does a hashset do its job?

That's what a HashSet does (approximately) :D

darkfrog said:

Where possible you want to avoid doing lookups at all as they can be very expensive in a loop.  Are you sure you need a contains at all?

Darkfrog is right, in some cases you can replace it by a simple boolean flag in the object you inserted into that collection.

oh, so you mean add a boolean value to the object, then instead of adding it to a collection just check that flag… hmmmm this sounds interesting… I'm not sure if i can pull that off, but i guess i should really look into that!

edit1: after looking at my implementation long and hard, i'm not really sure how i can use a boolean flag since the two objects being compared aren't the same instance. They are two seperate instances that would have the same field values.

edit2: ok, so there must be a huge difference between my hashcode/binary search method and hash set. My method took an average of 59000~ns. Using a hashset takes 7000ns… time to start searching their source code… Or is this simply sun pre compiling?

I would say it is their implementation… If I remember correctly, when I looked at their data structures code I saw a pretty tidy and clean code.

I would suggest you just use them since they are already there? Why reproduce the code? (I mean, I understand the value of learning, but if your concern is performance already, just use the best solution ;))

when you're right, well you're right!

if possible, you can use an array or arraylist using the objects id as its index in the array. that way, you read the nth element and check if its null. if it isn't possible, you can use a hashset and the object id as the hashcode.

(object id -> i use a simple counter that counts how many objects i've created and gives every object a unique number)

but if you get problems in contains, your algorithm itself may be … eeeeeviil

well its an AStar algoritim, so checking for containment happens ALOT :slight_smile:

the problem with using the object id in an array, is how big that array can get and how much memory it might take up (i guess probably not so much if its an array of booleans) but since this is a path finding a map of lets say 100x100 would need an array containing 10,000 elements… this can get messy fast…

edit: hold on… my closed and open lists can already get this big… might be on the verge of a breakthrough… MUCHO GRACIAS HAMSTER!!

edit2: well i forgot something very basic… arrays are slower than hashes… (doh) but using this boolean array could be a nice memory saver vs the hashset for about the same performance (only a couple ns got tacked on by using the boolean array, i can probably make that smaller but not having the array be at 10,000 at all times.

Ehem, I think you misunderstood that trick with the boolean. I try to explain it again with the A* example:

You don't need a "closed" list at all. Instead of closed.add(node) and if ( closed.contains(node) ) you do

node.closed = true and if ( node.closed )… got it?

The idea I think Irrisor is trying to convey is a more object-oriented approach to this problem.

You obviously have the object or you couldn't call the contains method, so rather than store the object in a map, store an extra variable in the object that defines the function of the "contains".

oh sorry for being unclear. I get you irrosor. The problem is that with my contains(node) and add(node) the two nodes probably won't be of the same instance. They would clones of each other, so when if i do node.closed = true when i add it to the closed list, i would be referring to a different object when using contains(node). I'm trying to use hamster's trick now because it uses the hashcode of the node. Since those are based on field values it doesn't matter if the two objects are of the same instance.

Thanks everyone for helping me out!

why are there clones? multithreading?

hmm… that's actually a good question that  i haven't really thought about… hmmm i wonder if i could create a collection or a pool of some sorts so that i never create the same object twice…