Wednesday, April 18, 2012

Java 7: A closer look at "switch" statements with Strings

I am working on an EcIipse plugin that upgrades Java source code to use new language constructs introduced in JDK7. So today I took a closer look today at the JLS7 extension to the switch statement (see http://docs.oracle.com/javase/specs/jls/se7/html/jls-14.html#jls-14.11), which allows String constants as case labels.

Summary

My observations from compiling a few examples and analyzing the byte code:
  1. The compiled code selects the right block to execute with just two calls: a String.hashCode() and a String.equals() method call. In a regular "if" statement there is one String.equals() call for each case. 
  2. The ordering of the cases in the switch statement doesn't matter anymore in terms of performance, which helps readability. Both javac and Eclipse JDT generate the same instructions regardless of the order in the source code. 
  3. The Eclipse 3.7.3 compiler produces more compact code for my examples compared to javac.
  4. The new switch statement has often been portrayed as a performance improvement over using multiple "if" statements. However, to evaluate if the "switch" expression matches a case label, the generated byte code traverses the input string twice: once in String.hashCode(), then again in String.equals(). This is certainly an improvement over the worst case in a chained "if" statement: an input string that matches a large prefix of most of the tested constants, when the number of compared strings is larger than 2. In the average case the actual performance depends on the length of strings involved, the length of common prefixes, and the overall distribution of each value in a representative data set. I'd argue that the performance improvement is in speeding up human understanding of the code base, not execution speed of the code.

Detailed Analysis

My sample code:
public String myswitch(String s) {
switch(s) {
case "a":
return "aa";
case "b":
return "bb";
default:
return null;
}
}

The Eclipse JDT compiler produces the following byte code:
  // Method descriptor #22 (Ljava/lang/String;)Ljava/lang/String;
  // Stack: 2, Locals: 3
  public java.lang.String myswitch(java.lang.String s);
     0  aload_1 [s]
     1  dup
     2  astore_2
     3  invokevirtual java.lang.String.hashCode() : int [23]
     6  lookupswitch default: 62
          case 97: 32
          case 98: 44
    32  aload_2
    33  ldc <String "a"> [29]
    35  invokevirtual java.lang.String.equals(java.lang.Object) : boolean [31]
    38  ifne 56
    41  goto 62
    44  aload_2
    45  ldc <String "b"> [35]
    47  invokevirtual java.lang.String.equals(java.lang.Object) : boolean [31]
    50  ifne 59
    53  goto 62
    56  ldc <String "aa"> [37]
    58  areturn
    59  ldc <String "bb"> [39]
    61  areturn
    62  aconst_null
    63  areturn

I was surprised that the compiler wouldn't emit a tableswitch (http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.tableswitch) instruction at pc 6, which allows for a more compact representation and implementation of the switch compared to lookupswitch (http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.tableswitch), since the values are consecutive. For this example it would have made a difference of 4 bytes, but then again, the chances of getting consecutive values for hash codes are slim.

Javac produces equivalent code for the detection of which branch to execute, but then creates another switch that defines which block to execute. The benefit of this approach is not clear to me, as it blows up code size by 30% for this simple example. The same observation about tableswitch vs lookupswitch applies to the code generated by javac.

  // Method descriptor #31 (Ljava/lang/String;)Ljava/lang/String;
  // Stack: 2, Locals: 4
  public java.lang.String myswitch(java.lang.String s);
     0  aload_1 [s]
     1  astore_2
     2  iconst_m1
     3  istore_3
     4  aload_2
     5  invokevirtual java.lang.String.hashCode() : int [4]
     8  lookupswitch default: 61
          case 97: 36
          case 98: 50
    36  aload_2
    37  ldc <String "a"> [5]
    39  invokevirtual java.lang.String.equals(java.lang.Object) : boolean [6]
    42  ifeq 61
    45  iconst_0
    46  istore_3
    47  goto 61
    50  aload_2
    51  ldc <String "b"> [7]
    53  invokevirtual java.lang.String.equals(java.lang.Object) : boolean [6]
    56  ifeq 61
    59  iconst_1
    60  istore_3
    61  iload_3
    62  lookupswitch default: 94
          case 0: 88
          case 1: 91
    88  ldc <String "aa"> [8]
    90  areturn
    91  ldc <String "bb"> [9]
    93  areturn
    94  aconst_null
    95  areturn

No comments:

Post a Comment