Stefan's Server-Side Stuff: Java 7: A closer look at "switch" statements with Strings

I am working on an EcIipse plugin that upgrades Java source code to use new language constructs introduced in JDK7. So today I took a closer look today at the JLS7 extension to the switch statement (see http://docs.oracle.com/javase/specs/jls/se7/html/jls-14.html#jls-14.11), which allows String constants as case labels.

Summary

My observations from compiling a few examples and analyzing the byte code:

The compiled code selects the right block to execute with just two calls: a String.hashCode() and a String.equals() method call. In a regular "if" statement there is one String.equals() call for each case.
The ordering of the cases in the switch statement doesn't matter anymore in terms of performance, which helps readability. Both javac and Eclipse JDT generate the same instructions regardless of the order in the source code.
The Eclipse 3.7.3 compiler produces more compact code for my examples compared to javac.
The new switch statement has often been portrayed as a performance improvement over using multiple "if" statements. However, to evaluate if the "switch" expression matches a case label, the generated byte code traverses the input string twice: once in String.hashCode(), then again in String.equals(). This is certainly an improvement over the worst case in a chained "if" statement: an input string that matches a large prefix of most of the tested constants, when the number of compared strings is larger than 2. In the average case the actual performance depends on the length of strings involved, the length of common prefixes, and the overall distribution of each value in a representative data set. I'd argue that the performance improvement is in speeding up human understanding of the code base, not execution speed of the code.

Detailed Analysis

My sample code:

public String myswitch(String s) {

switch(s) {

case "a":

return "aa";

case "b":

return "bb";

default:

return null;

}

The Eclipse JDT compiler produces the following byte code:

// Method descriptor #22 (Ljava/lang/String;)Ljava/lang/String;

// Stack: 2, Locals: 3

public java.lang.String myswitch(java.lang.String s);

0 aload_1 [s]

1 dup

2 astore_2

3 invokevirtual java.lang.String.hashCode() : int [23]

6 lookupswitch default: 62

case 97: 32

case 98: 44

32 aload_2

33 ldc <String "a"> [29]

35 invokevirtual java.lang.String.equals(java.lang.Object) : boolean [31]

38 ifne 56

41 goto 62

44 aload_2

45 ldc <String "b"> [35]

47 invokevirtual java.lang.String.equals(java.lang.Object) : boolean [31]

50 ifne 59

53 goto 62

56 ldc <String "aa"> [37]

58 areturn

59 ldc <String "bb"> [39]

61 areturn

62 aconst_null

63 areturn

I was surprised that the compiler wouldn't emit a tableswitch (http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.tableswitch) instruction at pc 6, which allows for a more compact representation and implementation of the switch compared to lookupswitch (http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.tableswitch), since the values are consecutive. For this example it would have made a difference of 4 bytes, but then again, the chances of getting consecutive values for hash codes are slim.

Javac produces equivalent code for the detection of which branch to execute, but then creates another switch that defines which block to execute. The benefit of this approach is not clear to me, as it blows up code size by 30% for this simple example. The same observation about tableswitch vs lookupswitch applies to the code generated by javac.

// Method descriptor #31 (Ljava/lang/String;)Ljava/lang/String;

// Stack: 2, Locals: 4

public java.lang.String myswitch(java.lang.String s);

0 aload_1 [s]

1 astore_2

2 iconst_m1

3 istore_3

4 aload_2

5 invokevirtual java.lang.String.hashCode() : int [4]

8 lookupswitch default: 61

case 97: 36

case 98: 50

36 aload_2

37 ldc <String "a"> [5]

39 invokevirtual java.lang.String.equals(java.lang.Object) : boolean [6]

42 ifeq 61

45 iconst_0

46 istore_3

47 goto 61

50 aload_2

51 ldc <String "b"> [7]

53 invokevirtual java.lang.String.equals(java.lang.Object) : boolean [6]

56 ifeq 61

59 iconst_1

60 istore_3

61 iload_3

62 lookupswitch default: 94

case 0: 88

case 1: 91

88 ldc <String "aa"> [8]

90 areturn

91 ldc <String "bb"> [9]

93 areturn

94 aconst_null

95 areturn

Stefan's Server-Side Stuff

Wednesday, April 18, 2012

Java 7: A closer look at "switch" statements with Strings

Summary

Detailed Analysis

No comments:

Post a Comment

About Me