Consider the following code:
public class test {
public void g() {
int i1, i2, i3, i4 ;
i1 = i2 = i3 = 0 ;
i4 = 9 ;
i1 = i4;
}
}
The last two assignment statements result in this assembly language code:
bipush 9
istore 4
iload 4
istore_1
That is, we push 9 on the stack, store it in local variable 4, fetch the contents of local variable 4 (which must be 9) and then store that to local variable 1. This sequence of instructions is equivalent to the shorter:
bipush 9
dup
istore 4
istore_1
Or rather, it's equivalent apart from its effect on the stack. The
second sequence uses one more word on the stack than the first. One of
the security mechanisms implemented by the Java Virtual Machine is a
check on the number of words of stack used. The compiler encodes the
maximum stack size into every method. In Jasmin
this is represented using the '.limit stack' directive which
appears at the start of each method in the D-Java output.
Now, the compiler can work out the stack size from its knowledge of the entire method. Our peephole optimiser doesn't have this advantage. It may be that the instructions we're trying to optimise occur when the stack is already at its maximum allowed height. Adding an extra item to the stack will then cause a runtime exception.
Since we don't know for sure if this will happen we have to err on the
side of caution and increase the stack limit. However, when we're performing
this optimisation we've already
processed the line containing the .limit directive. To
work around this I use the following optimisation rule:
istore %1
iload %1
=
dup
; increase stack limit by 1
istore %1
Then, after the optimisation is complete, the jopt script
uses awk to replace the comment with a suitable .limit
directive. Fortunately, Jasmin uses the last .limit
directive that it finds in any method. In this case the final optimised
code is:
bipush 9
dup
.limit stack 3
istore 4
istore_1
Incidentally, the Java code in the example above could be rewritten as:
i1 = i4 = 9 ;
In this case the compiler generates the same code as our final optimisation, except that it knows that the stack limit of 2 is sufficient.
The example used a local integer variable. The same sort of technique can
be applied to other types of local variable, though dup2 has
to be used instead of dup for longs and doubles. Similar
optimisations are also possible for class variables. Instance variables
are a different matter and are considered on the next page.
Here are the results of timing tests for 1.5 million iterations:
Instructions Bytes Unoptimised Optimised
saved time (ms) time (ms)
put/getstatic Z 2 864 760
put/getstatic C 2 879 762
put/getstatic I 2 869 770
put/getstatic F 2 869 761
put/getstatic L; 2 890 773
put/getstatic S 2 889 762
put/getstatic [ 2 867 763
put/getstatic J 2 985 878
put/getstatic D 2 958 870
istore/iload 1 667 657
fstore/fload 1 665 653
astore/aload 1 728 657
lstore/lload 1 679 747
dstore/dload 1 749 748
The only combination which doesn't give improved performance after optimisation is the long local variable.