String or Int?

Hi, im sorta going crazy over this! lol



Should I use ints? or should I use Strings?Merits of each:



Int

Can use == rather than .equals(…);

ProgrammableSound uses ints rather than Strings

Encourages the use of constants



String

Easier to work with than ints

Easier debugging

More variety of patterns to choose from



And thats about it. I really need to come up with an answer soon as they seem to be level pegging at the moment. Because I dont want to develop something then to realise that i should have used the other.



Thx, DP

I almost always prefer ints. They take up significantly less space and are faster to work with.



The only reason I might take a second look at Strings in this case is because of parsing XML files… Do references to these constants need to be defined there?



If that’s the case, I would consider defining some String-based versions for the XML file only. The XML parser will then have some custom logic to translate them to the int-constants. This might be a good idea for the String-based solution as well for similar reasons… (Can use ‘==’ on them, ensures the XML is properly formatted, allows the GC to discard the newly-created String object)



In either case, I’d just make sure the constants are very well documented and explained… including in the methods that need them as arguments. It can be frustrating to have an ambiguous parameter in the method arguments… If people aren’t sure where it comes from, they might not use the constants at all and just supply their own. (Eg: Using ‘1’ directly in their code instead of ‘CONSTANT_A’)



My .02.





–K

How about enums and templates? Untill 1.5 I would recimend ints insted of Strings, do to the spead. (You can also convert a String to a int by getHash()).


The only reason I might take a second look at Strings in this case is because of parsing XML files.. Do references to these constants need to be defined there?


Yeah they would be. In the XML specs, the names of the state machines would have to be ints, and so will the stats and inner states.

Il probably add a tag as a reference in each xml spec:


<someMachine>
  <variable name="CONSTANT A" value="0" />
  <variable name="CONSTANT B" value="1" />
 
  <state name="CONSTANT A">
    ..........
  </state>
</someMachine>



It would be something like that, what ya think? If I can't convert the value to an int, il get the strings hashcode like badmi suggested. But is that hashcode unique for each values of strings?

e.g. are DP and dp are the same?

Also, can say DP and oipoiandf produce in some off chance the same hashcode?

I’m guessing your XML-parsing code will easily be able to take whatever the user has typed in and convert it to something.







Eg, in your code, assume you want to define the constants as such:


public static final int ANGRY = 1;
public static final int HAPPY = 2;



In your XML specification (whatever tells the user how to write the XML.. could be a schema, or just a design doc), you wouldn't make any reference to '1' or '2', just 'ANGRY' and 'HAPPY'. It shouldn't matter what the actual ints are:

<someMachine>
  <state name="ANGRY">
    ..........
  </state>

  <state name="HAPPY">
    ...........
  </state>
</someMachine>



Your parsing code will look at those state names and convert them to the appropriate ints. If it can't match them up to one of your constants, the XML parsing fails because it's an invalid file... You would throw an exception or something and tell the user the file was bad. Should be easy for them to fix if you make the error message explicit.

As a side note, you could put those constants in an enum (whether or not you're using Java 5).. The enum would have a list of possible values, and would have a method to convert a String value to the appropriate int. This would help keep your XML parsing code a little cleaner and (hopefully) independent of new constants being added.

Just for shits and giggles (and as an example), I've typed out a possible enum here for reference:


/**
 *  MoodState is an enum that defines the 'mood' of an AI.
 *
 *  <H2>ISSUES</H2>
 *  If the class is made Serializable, make sure to add a
 *  'readResolve' method to handle translating the object
 *  appropriately.  Otherwise, new instances of the enum
 *  could be created by RMI and the equality checks will
 *  fail.
 */
public class MoodState {

      // These 'ids' are made public so that external apps
      //   can use them in switch statements.
    public static final int UNKNOWN_ID = 0;
    public static final int ANGRY_ID = 1;
    public static final int HAPPY_ID = 2;

    public static final String UNKNOWN_NAME = "Unknown";
    public static final String ANGRY_NAME = "Angry";
    public static final String HAPPY_NAME = "Happy";

    public static final MoodState UNKNOWN = new MoodState(UNKNOWN_ID, UNKNOWN_NAME);
    public static final MoodState ANGRY = new MoodState(ANGRY_ID, ANGRY_NAME);
    public static final MoodState HAPPY = new MoodState(HAPPY_ID, HAPPY_NAME);


    public static final MoodState[] POSSIBLE_VALUES =
        {UNKNOWN, ANGRY, HAPPY};



    private int myId;
    private String myName;

    /**
      *  Private constructor so nobody else can create this enum.
      */
    private MoodState(int id, String name) {
        myId = id;
        myName = name;
    }

    /**
     *  Returns the unique id of this enum.
     */
    public int getId() {
        return myId;
    }

    /**
     *  Returns the name of this enum.
     */
    public String getName() {
        return myName;
    }

    /**
     *  Returns a String value of this enum.
     */
    public String toString() {
        return myName;
    }



    /**
     *  Returns the MoodState that has the given id,
     *  or MoodState.UNKNOWN.
     */
    public static MoodState fromInt(int id) {
          // Could also use a loop through w/ POSSIBLE_VALUES,
          //   but the switch statement is faster.
        switch(id) {
            case ANGRY_ID: return ANGRY;
            case HAPPY_ID: return HAPPY;
            default: return UNKNOWN;  // Could throw an exception instead
        }
    }

    /**
     *  Returns a MoodState that has the given name, or
     *  MoodState.UNKNOWN.  The name will be treated
     *  as if it were case-sensitive.
     */
    public static MoodState fromString(String name) {
        for(int i = 0; i < POSSIBLE_VALUES.length; i++) {
            if(name.equals(POSSIBLE_VALUES[i].getName())) {
                return POSSIBLE_VALUES[i];
            }
        }

          // Could also throw an exception, if desired..
        return UNKNOWN;
    }

    /**
     *  Returns a MoodState that has the given name, or
     *  MoodState.UNKNOWN.  The name will be treated
     *  as if it were case-insensitive.
     */
    public static MoodState fromStringIgnoreCase(String name) {
        for(int i = 0; i < POSSIBLE_VALUES.length; i++) {
            if(name.equalsIgnoreCase(POSSIBLE_VALUES[i].getName())) {
                return POSSIBLE_VALUES[i];
            }
        }

          // Could also throw an exception, if desired..
        return UNKNOWN;
    }


    // etc...

}





As for the hash-code thing.. Two different Strings should ALWAYS generate different hash codes. This includes upper/lower case letters. Otherwise, it doesn't really help you very much.

Here's the 'hashCode' method from String:


    /**
     * Returns a hash code for this string. The hash code for a
     * <code>String</code> object is computed as
     * <blockquote><pre>
     * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
     * </pre></blockquote>
     * using <code>int</code> arithmetic, where <code>s[i]</code> is the
     * <i>i</i>th character of the string, <code>n</code> is the length of
     * the string, and <code>^</code> indicates exponentiation.
     * (The hash value of the empty string is zero.)
     *
     * @return  a hash code value for this object.
     */
    public int hashCode() {
        int h = hash;
        if (h == 0) {
            int off = offset;
            char val[] = value;
            int len = count;

            for (int i = 0; i < len; i++) {
                h = 31*h + val[off++];
            }
            hash = h;
        }
        return h;
    }



I haven't studied the algorithm that much, but to prove the case-sensitivity part.. The 'val[off++]' will be a char from the String's internal character array. It is added as an int to the hash code. 'A' and 'a' have different int values. Therefore, the hash code will be different.

Hope this helps.


--K

i think its best to go with the hashcode type as they seem to be the best choice IMO. However, i can forsee a problem:



Say the name of an Entity was hashcoded to become an int, so its stored as an int rather than a string. Now say I want to get the name of that entity, the user knows that he has entered a String rather than an int, so he is completely blind. Now say he did something like getID(). It should return a string rather than an int.



So what i need is a dehashcoder :slight_smile:



is this possible? And if that is faster than using Strings still, then im most definetly going to go in that direction into coding everything into an int and reconvert to a string



Thx shmooh, DP

thank to the advice of mojo and his great collage professor ;), i made a little benchmark.



That benchmark stated the following:



with 1,000,000 loops and a simple:



if ("hello".equals("hello") {
}



that came to 42 millis, with == it was 0

Now I only do a max of 10 comparisons anyway. So mojo said take it to 100, and see, and they both returned 0 millis.

Now I dont forsee a current problem with it, but if a problem was to arise in the future, i will keep this fix in mind.

DP

PS. "If you ever code anything that you think is really clever, redo it, you did something stupid" -- Mojo's professor :)

with == it was 0 because the compiler removed the statement due to it being redundant.

So I have to make that if statement do something? Like setting an int to i or something?


Yes, probably. May want to do the assignment to an int outside the loop though. The compiler might be smart enough to optimize it out if nobody ever sees it.

And try doing more than just 10 or 100.. Try 10000, as well. If it only takes 3 seconds instead of 1.. that doesn't sound like a big deal on the surface. But remember that this gets called over and over again by your code, even if it's not in immediate succession.

Side note: If you're running on Windows, the System.currentTimeInMillis call is not all that accurate on the very small scale.

Also remember that it's not just the speed of the app, but the memory usage. Ints are simply smaller and cleaner than Strings.

As far as the hashCode thing goes.. I'm not sure what you have in mind, but I would encourage you to not go that way. I mean.. what benefit would there be to having the String already, then turning it into an int? There's not much point in making it more complicated than it has to be. Just my .02. (Also remember Mojo's prof's quote. :) )

And uhm.. I don't think de-hashing is possible, but I could be wrong on that. You probably don't want to count on it in any case. If you're finding that you need to do that, then you should probably consider altering the design.


--K

I may point I was getting across to DP was take a look at the problem. He really wanted to use Strings for everything, but was worried about the String.equals calls. So, he was trying to convert a string to int, but was creating all sorts of other issues for himself. I just wanted him to take a look at the speed of String.equals, and decide if the difference in time was acceptable, or did he really need to switch to int.

thanks pascal. :wink:

I'm with Badmi. If you can use enums you should IMO. sun claims they are as fast as ints and you can even do sort of bitwise or if you use an enumset.

They even translate to strings easily for your XML.



just my 2 cents.