Removing duplicate pushes to the stack need not be limited to constants. If we find instances where the same variable is pushed to the stack we can eliminate them too. Consider the following code, which uses an instance variable as a method argument:
public class test {
int k = 0 ;
public void f(int i, int j) {
}
public void g() {
f(k, k);
}
}
Compiling and disassembling this we find that the method call has been rendered as:
aload_0 aload_0 getfield test/k I aload_0 getfield test/k I invokevirtual test/f(II)V
The peephole optimiser can detect the duplicated aload/getfield
pair and replace the second occurrence with a dup. Since the
aload_0 is a one byte instruction and getfield
takes three bytes, this results in a net saving of three bytes.
In this case the object owning the instance variable is 'this', which is
available to the method f() as its first local variable.
Hence the aload_0 instruction. Only the first four local
variables can be accessed using this efficient form of aload.
If the reference to the object owning the instance variable were not in one
of the first four local variables a two byte aload would
need to be used. In extreme cases, with more than 256 local variables,
there is a three byte aload.
For example:
public class test {
int k = 0 ;
public void f(int i, int j) {
}
public void g() {
test a, b, c, d ;
a = b = c = d = new test() ;
f(d.k, d.k);
}
}
In this case the reference to the object owning the instance variable being pushed on the stack is in the fourth local variable, resulting in the slightly different assembly language code for the method call:
aload_0 aload 4 getfield test/k I aload 4 getfield test/k I invokevirtual test/f(II)V
All of the possible cases can be handled by one optimisation rule:
aload%1
getfield %2 I
aload%1
getfield %2 I
=
aload%1
getfield %2 I
dup
However, the examples above only used integer variables, where the last
operand of getfield is I. There are a
further seven case to handle: byte, character, float, object, short,
boolean and array variables. In these cases the second operands are
B, C, F, L...;,
S, Z, and [... respectively.
Moreover, there are two additional types, long and double (operands
J and D), which
require slightly different treatment. Since longs and doubles consume
two words on the stack we have to use the dup2 instruction
to replace the second aload/getfield pair.
Once again I have tested the results of the optimisations by measuring
the time taken to perform a million calls to the method g().
The time for the same number of method calls without any arguments
has been subtracted.
Variable Bytes Unoptimised Optimised
saved time (ms) time (ms)
int (aload_0) 3 1272 812
int (aload 15) 4 1380 871
long (aload_0) 3 1349 842
long (aload 15) 4 1973 921
Clearly, in all cases there are savings in the number of bytes and in execution time.