Editor's note: Sometimes, the most interesting discussions begin when someone says, "This may be a stupid question, but ...". If the person asking the question has taken the time to think about the problem before asking, the question is often not stupid at all. The uncertainty points out an ambiguity in the specs, holes in the docs, or a search for how more experienced programmers might address a particular problem. From time to time, we will print one of the "(Not So) Stupid Questions" we receive and invite our readers to answer the question in the feedback section. This one began in a different form as a suggestion from Vladimir V. Ostromensky. He sent us some sample code somewhat like the code we present below with a question about String equality. We have adapted and extended his initial question.
Remember that new people are joining the Java community all the time and may be looking for help from those with more experience. Also, those who began with Java as their first language can benefit from those coming to the community with experience in other languages. As always, answer the questions with kindness. You are also welcome to submit your questions to
.
This may be a stupid question, but... "Some side-effects of String equality don't make sense"
One of our readers submitted the following code, which had us scrambling for our javadocs and a copy of the Java Language Specification. Compile the following:
When run with myValue as the command-line argument, this produces the following output:
a.equals(b): myValue.equals(myValue) is true
a==b: myValue == myValue is true
a.equals(c): myValue.equals(myValue) is true
a==c: myValue == myValue is false
So, the two constants, aString and bString are not only equivalent, they're the same object, yet cString is equivalent but is a different object. My question is:
What's the deal with String equality?
First thoughts:
We can see that aString and bString are the same object. Doesn't the spec tell us that
All Strings are immutable.
All Strings are held in a "String pool", with one unique instance of each string of characters.
In other words, doesn't this explain the object equality of aString and bString; they're the same run of characters, so there's one object in the String pool pointed to by both aString and bString.
The fact that aString and cString are equivalent but are different objects seems to indicate that point #2 is not entirely true. Since cString consists of the characters myValue, just like aString and bString, it should point to the same member of the String pool, and thus have pointer equality, right?
So, aString, bString, and cString have identical values and yet are not equal. It seems that cString is not working with the String pool as it is a unique object with the same value as the other two.
Strings are unusual in Java because often developers treat them as primitives when they are really objects. Part of this confusion comes up for new developers because a String can be instantiated without using new as String a = "hello". But we have more of this possible confusion coming with autoboxing and autounboxing. Are there going to be problems with equals in the future that arise from boxing and unboxing?
This brings me to three questions:
1. What is going on with String equality?
And the two follow-on questions:
2. Why might this be the desired behavior? and
3. Are we going to have more problems with equals starting in J2SE 1.5 that result from the autoboxing and autounboxing?
(Not So) Stupid Questions is where we feature the questions you want to ask but aren't sure how.
Take A Look At The Class Dissassembled....
2004-07-23 13:14:02 markgowdy
[Reply | View]
Hi
You may find the following interesting in explaining why a.equals(c) is false. If you compile the java file and then run javap -c StringTester, you get the following excerpt:
C:\>javap -c StringTester
Compiled from "StringTester.java"
public class StringTester extends java.lang.Object{
public StringTester();
Code:
0: aload_0
1: invokespecial #1; //Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: ldc #2; //String myValue
2: astore_1
3: ldc #2; //String myValue
5: astore_2
6: ldc #3; //String
8: astore_3
9: aload_0
I do not have any problems with the String euqlity behaviour. If you really understand what is going on in the compiler, you wont have any problems too.
Before compiling the source the compiler makes some "source rewritings".
Every occurence of a String declaration gets replaced.
For example:
String a = "Hello world";
gets:
String a = SomeStringPool.somestaticvariableX;
Every occurence of the String literal "Hello World" in the source gets replaced by this static variable.
This can be made at compile time, saving much memory and even time at runtime.
What I have stated above is just the way I think it is working. I do not really know it, but there are no inkonsistencies.
The mystrey lies with String[] args
2004-04-08 07:03:12 swapnonil
[Reply | View]
When you initialize an array of String elments, Java creates a new String object for each element in the array.
String[] args = new String[]{"Hello","How"};
Here you have intialized three objects
1.args
2,3 for the two elements in the array.
Considering the above fact to be true, when you accept parameters from the command like by decalring
public static void main(String[] args), Java inserts each parameter from the command line as a new
String object.
For example
public class Test
{
public static void main(String[] args)
{
System.out.println("Args "+args[0]+","+args[1]);
System.out.println(args[0].equals(args[1]));
System.out.println(args[0]==args[1]);
}
}
-------------------------------------------------------------------
Gives the following output.
Args hello,hello
true
false
This is the same reason why aString and cString does not have pointer equality (==) as both different are different objects.
Hope this solves your question.
String literals are interned() when class is loaded
2004-04-07 17:26:39 brucechapman
[Reply | View]
ITs not quite so simple as the compiler optimizing the values to a single String. The JVM (probably the classloader stuff) then interns the Strings loaded from the class files.
The api docs for String.intern() say
"All literal strings and string-valued constant expressions are interned."
When you see the == operator in Java code, never, ever, ever say to yourself "equals". Always say "identical to". That is what that operator means.
(I find I usually "sound out" what I'm reading when looking at code: so having the right words to use is very important. Otherwise, it can lead to subtle mis-assumptions over time.)
Now, when you are talking about primitive types, it so happens that "identical to" and "equals" mean the same thing.
But with Object types, "identical to" means one and only one thing: they are the same object. On the other hand, "equals" is defined by .equals(), and can have whatever twisted meaning the class writer can come up with. In the case of java.lang.String, it means "the strings have the same contents".
To answer your questions:
> 1. What is going on with String equality?
String equality? If you are talking about String equal()-ity, then you are talking about .equals() and it means "do these two strings have the same contents". Just remember not to confuse String identity (==) with string equality (.equals).
> 2. Why might this be the desired behavior?
So long as we accept that == means "identical to" (meaning the very-same object), then this is the desired behavior.
To take a step back, have you ever wondered why it makes sense to have a .equals() method on java.lang.Object? This methods allows us to compare Apples and Oranges, and get a true-or-false result.
A better question might be: What is going on with Object.equals()? The javadoc for that method describes the *technical* contract for that method; but what is the *semantic* or *social* contract for that method? Does the existance of that method promote assumtions that are untrue?
> 3. Are we going to have more problems with equals starting in J2SE 1.5 that result from the autoboxing and autounboxing?
Sure are. Check out this code (hopefully the cut-and-paste works):
---
int a = 1;
Integer A = a;
int b = 1;
Integer B = b;
The output (compiled with the JDK1.5 beta) is:
---
a==b true
A==B false
a==B true
A==b true
A.equals(B) true
A.equals(b) true
B.equals(a) true
---
:D how delicious.
=Matt
In J2SE1.5, transitivity under == broken
2004-04-22 06:47:17 roytock
[Reply | View]
Yikes...very good example for autoboxing. So under 1.5, transitivity under == has been broken, and Matt has an example: A==a==B but A!=B. That's pretty evil.
Understanding Equality
2004-04-13 05:34:34 sven
[Reply | View]
Not surprising to find that
a==B true
A==b true
Seems autoboxing uses either equals to compare the boxed values (else there would be no sense in comparing - it would false by default) or it unboxes the object and uses == on the primitive type - result still the same.
More surprising will be to most of us that if a==b && a==B && b==A still A!=B, I see a new flood of questions ahead approaching fast...
... as you can see even equality is relative.
Understanding Equality
2004-04-13 05:34:34 sven
[Reply | View]
Not surprising to find that
a==B true
A==b true
Seems autoboxing uses either equals to compare the boxed values (else there would be no sense in comparing - it would false by default) or it unboxes the object and uses == on the primitive type - result still the same.
More surprising will be to most of us that if a==b && a==B && b==A still A!=B, I see a new flood of questions ahead approaching fast...
... as you can see even equaliy is relative.
What the JLS says
2004-04-07 16:41:07 spudbean
[Reply | View]
> 2. All Strings are held in a "String pool", with one unique instance of each string of characters.
This is not true. Only strings that the compile knows about at compile time (ie; strings that end up in the .class file) are "interned" into the string pool this way. See 3.10.5 of the lang spec
http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html#19369
To quote from that section:
> This example illustrates six points:
>
> - Literal strings within the same class (§8) in the same package (§7) represent references to the same String object (§4.3.1).
> - Literal strings within different classes in the same package represent references to the same String object.
> - Literal strings within different classes in different packages likewise represent references to the same String object.
> - Strings computed by constant expressions (§15.28) are computed at compile time and then treated as if they were literals.
> - Strings computed at run time are newly created and therefore distinct.
> - The result of explicitly interning a computed string is the same string as any pre-existing literal string with the same contents.
Like all objects
2004-04-07 15:37:31 kmonsen
[Reply | View]
Strings behave as all objects, when you check with == you check if the memory adress is the same. If you override equals (which is done in String) you typically check the contents.
If String should behave any other way it would need a native constuctor that was working close with the JVM.
I think it's the desired behavior in the sense that very few people actually care about object equality of strings, so it just doesn't matter.
Strings are a design mistake.
2004-04-07 11:53:30 zander
[Reply | View]
I see each and every Java newbee make this mistake and I explained it too many time already, so I think this is really a design mistake.
To be plain about it; Strings should be treated either as basetypes or as objects; this in-between is very non-intuitive.
From efficiency and from a logic POV Strings should be base types. Plain and simple.
This means that "a" == new String("a") should be true. Having this option with the 'intern()' means its done the wrong way around; doing the natural action takes too many characters. (i.e. they got the hint; but failed to use it)
I'm not familiar enough with autoboxing to know how this effects things, I hope some of this problem can be aleviated..
Naturally the change I suggest above opens a huge can of worms; so I'll stick to how it works and just start the default talk when a new Java programmer asks me "Why the *&^( does this not work?"
Strings are a design mistake.
2004-04-08 06:00:50 jwenting
[Reply | View]
There is nothing wrong with "a" == new String("a") NOT being true.
There's no guarantee that they should be after all.
If you make that true by default then you'd have to make it impossible to have multiple instances of ANY class containing the same data OR you'd have to remove String from the Object hierarchy entirely and make it into something else (maybe a primitive?).
It might be easier to remove the overloading on the assignment operators for String, thus removing the ambiguity of people thinking Strings are primitives when reallyt they're not.
As this would break on last estimate 100% of the existing Java codebase worldwide, costing a fortune to fix, it would be a lot cheaper to just keep hammering newbies that == should never be used to compare Strings or any other Objects unless you have a very good reason and know full well what you're getting yourself into.
Well, the two original objects are the same for efficiency reasons. Some strings probably occur a fair number of times in your class files, it seems only sensible that they are the same reference (for performance issues).
The second object is different, also for efficiency reasons. If we had to make sure that only one instance of a string ever existed then for every string we would basically have to call String.intern(). Given that most strings that aren't in compiled code probably have a fairly short lifetime, this would be inefficient. It is very confusing though, but as Java choose not to implement object overloading (for quite good reasons - there are cases when you want pointer equality and nothing else), then it's going to always look a bit cludgey
Also
* Are we going to have more problems with equals starting in J2SE 1.5 that result from the autoboxing and unboxing?
I can't see why. HashMap, etc all use .equals() which will work fine for non-primitive objects. When you actually pull the value out of the map and turn it into a primitive, then equals will be true, even if they're stored in different locations.
The only 'problem' I can forsee is if somebody decides to use something like IdentityHashMap, with boxing/unboxing, but they need to be shot anyway. (And, note, I can't imagine WHY you would do such a thing)
> Well, the two original objects are the same for efficiency reasons. Some strings probably occur a fair number of times in
> your class files, it seems only sensible that they are the same reference (for performance issues).
Actually; the string is only present in the class one time; so the 'optimalisation' is done on compile time. The first two strings are actually two references to the exact same object. (hence the '==' being true)
Stating your version is only going to confuse people further, so sorry for being an ass and correcting you :)
true, but it actually interns all Strings in loaded classes. I didn't mean to confuse anybody ;)
(brief example)
public class A {
public static void main(String[] args) {
B b = new B();
String c = "foo";
String d = b.toString();
System.out.println(c == d);
}
}
public class B {
public String toString() {
return "foo";
}
}