Summary
My observations from compiling a few examples and analyzing the byte code:- The compiled code selects the right block to execute with just two calls: a String.hashCode() and a String.equals() method call. In a regular "if" statement there is one String.equals() call for each case.
- The ordering of the cases in the switch statement doesn't matter anymore in terms of performance, which helps readability. Both javac and Eclipse JDT generate the same instructions regardless of the order in the source code.
- The Eclipse 3.7.3 compiler produces more compact code for my examples compared to javac.
- The new switch statement has often been portrayed as a performance improvement over using multiple "if" statements. However, to evaluate if the "switch" expression matches a case label, the generated byte code traverses the input string twice: once in String.hashCode(), then again in String.equals(). This is certainly an improvement over the worst case in a chained "if" statement: an input string that matches a large prefix of most of the tested constants, when the number of compared strings is larger than 2. In the average case the actual performance depends on the length of strings involved, the length of common prefixes, and the overall distribution of each value in a representative data set. I'd argue that the performance improvement is in speeding up human understanding of the code base, not execution speed of the code.
Detailed Analysis
My sample code:
public String myswitch(String s) {
switch(s) {
case "a":
return "aa";
case "b":
return "bb";
default:
return null;
}
}
The Eclipse JDT compiler produces the following byte code:
// Method descriptor #22 (Ljava/lang/String;)Ljava/lang/String;
// Stack: 2, Locals: 3
public java.lang.String myswitch(java.lang.String s);
0 aload_1 [s]
1 dup
2 astore_2
3 invokevirtual java.lang.String.hashCode() : int [23]
6 lookupswitch default: 62
case 97: 32
case 98: 44
32 aload_2
33 ldc <String "a"> [29]
35 invokevirtual java.lang.String.equals(java.lang.Object) : boolean [31]
38 ifne 56
41 goto 62
44 aload_2
45 ldc <String "b"> [35]
47 invokevirtual java.lang.String.equals(java.lang.Object) : boolean [31]
50 ifne 59
53 goto 62
56 ldc <String "aa"> [37]
58 areturn
59 ldc <String "bb"> [39]
61 areturn
62 aconst_null
63 areturn
I was surprised that the compiler wouldn't emit a tableswitch (http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.tableswitch) instruction at pc 6, which allows for a more compact representation and implementation of the switch compared to lookupswitch (http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.tableswitch), since the values are consecutive. For this example it would have made a difference of 4 bytes, but then again, the chances of getting consecutive values for hash codes are slim.
Javac produces equivalent code for the detection of which branch to execute, but then creates another switch that defines which block to execute. The benefit of this approach is not clear to me, as it blows up code size by 30% for this simple example. The same observation about tableswitch vs lookupswitch applies to the code generated by javac.
// Method descriptor #31 (Ljava/lang/String;)Ljava/lang/String;
// Stack: 2, Locals: 4
public java.lang.String myswitch(java.lang.String s);
0 aload_1 [s]
1 astore_2
2 iconst_m1
3 istore_3
4 aload_2
5 invokevirtual java.lang.String.hashCode() : int [4]
8 lookupswitch default: 61
case 97: 36
case 98: 50
36 aload_2
37 ldc <String "a"> [5]
39 invokevirtual java.lang.String.equals(java.lang.Object) : boolean [6]
42 ifeq 61
45 iconst_0
46 istore_3
47 goto 61
50 aload_2
51 ldc <String "b"> [7]
53 invokevirtual java.lang.String.equals(java.lang.Object) : boolean [6]
56 ifeq 61
59 iconst_1
60 istore_3
61 iload_3
62 lookupswitch default: 94
case 0: 88
case 1: 91
88 ldc <String "aa"> [8]
90 areturn
91 ldc <String "bb"> [9]
93 areturn
94 aconst_null
95 areturn
No comments:
Post a Comment