Stefan's Server-Side Stuff

Tuesday, September 29, 2015

Resource Leaks and OpenJDK

On October 18th, 2009, I filed a bug report along with a fix on java.net concerning nine resource leaks in six JAX-P classes in the Java JDK. I came across the old bug report today, and thought I'd take a look if they have been fixed. Unfortunately, after six years, the issues still exist.

All issues follow the same bug pattern, as the code has been reused via copy and paste:

Properties props = new Properties();
props.load(new FileInputStream(file));

Why is this a problem? The FileInputStream closes the underlying file descriptor from the operating system in the finalize() method, right? It does indeed: eventually (see Section 12.6 of the Java Language Specification), and that's exactly the problem, especially in a server-like application. If the heap is large, the garbage collection may run infrequently; if the open rate is higher than the reclaim rate, the application may exhibit byzantine failures in the attempt to access OS resources. The stack trace won't help in this case, because it will show the resource allocation trace, not the where the leak occurred. Long story short: don't let this happen to you! The try-with-resource block solves this elegantly:

Properties props = new Properties();
try(FileInputStream fix = new FileInputStream(file)) {
props.load(new FileInputStream(file));

}

By following the call traces of the Properties.load method in OpenJDK 8, I counted 8 proper try-with-resources, 15 leaks, 13 leaks in the exception case, 18 instances of correct handling, and three instances of broken release-logic that end up releasing the file descriptor, but not other streams. In other words, almost half the call sites got it wrong. Luckily, the vast majority are in the CORBA, JAX-P, the repackaged aging xerces and tools modules, which aren't used as much. Unless your application uses XML, which is how I found it six years ago.

The instances I counted under "broken release logic" follow the bug pattern below (extracted from java.util.logging.LogManager):

try (final InputStream in = new FileInputStream(fname)) {
final BufferedInputStream bin = new BufferedInputStream(in);
readConfiguration(bin);
}

The precious FileInputStream is closed, but it's closed underneath the BufferedInputStream that wraps it, assuming that the latter doesn't use system resources itself. The correct way to code it is like this:

try (final BufferedInputStream bin = new BufferedInputStream(new FileInputStream(fname))) {
readConfiguration(bin);
}

Depending on how Oracle reacts to the new bug report, I may contribute more than just the JAX-P fixes.

Friday, April 27, 2012

Java 7: Multi-Catch Statements under the Looking Glass

In the last post I evaluated the benefits of using switch statements on string literals over nested if statements. Today, I am going to do the same for multi-catch blocks, also a recent addition to the JLS. This feature is formally described in the JLS 7, chapter 14.20 at http://docs.oracle.com/javase/specs/jls/se7/html/jls-14.html#jls-14.20.

Short Summary

Use multi-catch blocks everywhere you can, it reduces class file size, will reduce resident memory in the JVM when the class gets loaded, and most importantly, make the code easier to understand.

Impact on Code Readability

I produced a refactoring with my eclipse plugin that introduced multi catch blocks in all appropriate places in the entire OpenJDK 7 code base, producing a change set spanning 230 files, each with up to 5 combined catch expressions. As I am reviewing the changes for submission to the Openjdk team, it takes me 10 to 20 seconds to check that all catch blocks have the same sequence of statements, which is immediately obvious in the multi-catch construct.

If we assume $120 per developer hour and 20 seconds to understand that all catch blocks are identical, that's will set you back 66 cents every time a developer has to comprehend that code snippet.

A complication I faced when writing the Eclipse refactoring is that it is not sufficient to identify identical blocks, and simply combine the Exception into one catch clause. Consider this code snippet, for example:

1 try{
2 // some File Operation here
3 } catch (FileNotFoundException e) {
4 } catch( IOException e) {
5 }
Combining lines 3 and 4 into

3 } catch (FileNotFoundException | IOException e) {
produces a compile error, since FileNotFoundException is derived from IOException, and the latter is all that needs to be handled. So in some cases, the multi-catch refactoring produces simple traditional catch clauses with one exception only. The example above will end up reading like this, which was likely to be the original intend anyway:

1 try{
2 // some File Operation here
3 } catch( IOException e) {
4 }

Impact on Class File Size

What is the minimum benefit of using multi catch in term of class file size? The following code snippet uses reflection as an example of an API that relies heavily on checked exceptions, and would benefit most.

package p;
import java.lang.reflect.InvocationTargetException;

public class C {

public static void main(String[] args) {

try {
C.class.getDeclaredMethod("main", null).invoke(null, null);
} catch (NoSuchMethodException | SecurityException | IllegalAccessException | IllegalArgumentException | InvocationTargetException e) {
}
}
}
Eclipse JDT produces the following byte code:

// Method descriptor #15 ([Ljava/lang/String;)V
// Stack: 3, Locals: 2
public static void main(java.lang.String[] args);
0 ldc <Class p.C> [1]
2 ldc <String "main"> [16]
4 aconst_null
5 invokevirtual java.lang.Class.getDeclaredMethod(java.lang.String, java.lang.Class[]) : java.lang.reflect.Method [17]
8 aconst_null
9 aconst_null
10 invokevirtual java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) : java.lang.Object [23]
13 pop
14 goto 18
17 astore_1
18 return
Exception Table:
[pc: 0, pc: 14] -> 17 when : java.lang.NoSuchMethodException
[pc: 0, pc: 14] -> 17 when : java.lang.SecurityException
[pc: 0, pc: 14] -> 17 when : java.lang.IllegalAccessException
[pc: 0, pc: 14] -> 17 when : java.lang.IllegalArgumentException
[pc: 0, pc: 14] -> 17 when : java.lang.reflect.InvocationTargetException When using separate catch blocks, the amount of code increases by almost 50%, since the compiler doesn't detect that it produced the same code sequence for each catch block. This is the least benefit case, if the blocks are actually duplicate, the saving is much higher.

// Method descriptor #15 ([Ljava/lang/String;)V
// Stack: 3, Locals: 2
public static void main(java.lang.String[] args);
0 ldc <Class p.C> [1]
2 ldc <String "main"> [16]
4 aconst_null
5 invokevirtual java.lang.Class.getDeclaredMethod(java.lang.String, java.lang.Class[]) : java.lang.reflect.Method [17]
8 aconst_null
9 aconst_null
10 invokevirtual java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) : java.lang.Object [23]
13 pop
14 goto 34
17 astore_1
18 goto 34
21 astore_1
22 goto 34
25 astore_1
26 goto 34
29 astore_1
30 goto 34
33 astore_1
34 return
Exception Table:
[pc: 0, pc: 14] -> 17 when : java.lang.IllegalAccessException
[pc: 0, pc: 14] -> 21 when : java.lang.IllegalArgumentException
[pc: 0, pc: 14] -> 25 when : java.lang.reflect.InvocationTargetException
[pc: 0, pc: 14] -> 29 when : java.lang.NoSuchMethodException
[pc: 0, pc: 14] -> 33 when : java.lang.SecurityException
The moral of the story is: multi-catch blocks are not only syntactic sugar that result in the byte code, it's a feature that helps both the developer and the compiler to reduce code. Finally an optimization without a trade-off!

Wednesday, April 18, 2012

Java 7: A closer look at "switch" statements with Strings

I am working on an EcIipse plugin that upgrades Java source code to use new language constructs introduced in JDK7. So today I took a closer look today at the JLS7 extension to the switch statement (see http://docs.oracle.com/javase/specs/jls/se7/html/jls-14.html#jls-14.11), which allows String constants as case labels.

Summary

My observations from compiling a few examples and analyzing the byte code:

The compiled code selects the right block to execute with just two calls: a String.hashCode() and a String.equals() method call. In a regular "if" statement there is one String.equals() call for each case.
The ordering of the cases in the switch statement doesn't matter anymore in terms of performance, which helps readability. Both javac and Eclipse JDT generate the same instructions regardless of the order in the source code.
The Eclipse 3.7.3 compiler produces more compact code for my examples compared to javac.
The new switch statement has often been portrayed as a performance improvement over using multiple "if" statements. However, to evaluate if the "switch" expression matches a case label, the generated byte code traverses the input string twice: once in String.hashCode(), then again in String.equals(). This is certainly an improvement over the worst case in a chained "if" statement: an input string that matches a large prefix of most of the tested constants, when the number of compared strings is larger than 2. In the average case the actual performance depends on the length of strings involved, the length of common prefixes, and the overall distribution of each value in a representative data set. I'd argue that the performance improvement is in speeding up human understanding of the code base, not execution speed of the code.

Detailed Analysis

My sample code:

public String myswitch(String s) {

switch(s) {

case "a":

return "aa";

case "b":

return "bb";

default:

return null;

}

The Eclipse JDT compiler produces the following byte code:

// Method descriptor #22 (Ljava/lang/String;)Ljava/lang/String;

// Stack: 2, Locals: 3

public java.lang.String myswitch(java.lang.String s);

0 aload_1 [s]

1 dup

2 astore_2

3 invokevirtual java.lang.String.hashCode() : int [23]

6 lookupswitch default: 62

case 97: 32

case 98: 44

32 aload_2

33 ldc <String "a"> [29]

35 invokevirtual java.lang.String.equals(java.lang.Object) : boolean [31]

38 ifne 56

41 goto 62

44 aload_2

45 ldc <String "b"> [35]

47 invokevirtual java.lang.String.equals(java.lang.Object) : boolean [31]

50 ifne 59

53 goto 62

56 ldc <String "aa"> [37]

58 areturn

59 ldc <String "bb"> [39]

61 areturn

62 aconst_null

63 areturn

I was surprised that the compiler wouldn't emit a tableswitch (http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.tableswitch) instruction at pc 6, which allows for a more compact representation and implementation of the switch compared to lookupswitch (http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.tableswitch), since the values are consecutive. For this example it would have made a difference of 4 bytes, but then again, the chances of getting consecutive values for hash codes are slim.

Javac produces equivalent code for the detection of which branch to execute, but then creates another switch that defines which block to execute. The benefit of this approach is not clear to me, as it blows up code size by 30% for this simple example. The same observation about tableswitch vs lookupswitch applies to the code generated by javac.

// Method descriptor #31 (Ljava/lang/String;)Ljava/lang/String;

// Stack: 2, Locals: 4

public java.lang.String myswitch(java.lang.String s);

0 aload_1 [s]

1 astore_2

2 iconst_m1

3 istore_3

4 aload_2

5 invokevirtual java.lang.String.hashCode() : int [4]

8 lookupswitch default: 61

case 97: 36

case 98: 50

36 aload_2

37 ldc <String "a"> [5]

39 invokevirtual java.lang.String.equals(java.lang.Object) : boolean [6]

42 ifeq 61

45 iconst_0

46 istore_3

47 goto 61

50 aload_2

51 ldc <String "b"> [7]

53 invokevirtual java.lang.String.equals(java.lang.Object) : boolean [6]

56 ifeq 61

59 iconst_1

60 istore_3

61 iload_3

62 lookupswitch default: 94

case 0: 88

case 1: 91

88 ldc <String "aa"> [8]

90 areturn

91 ldc <String "bb"> [9]

93 areturn

94 aconst_null

95 areturn