On the previous page we considered how to optimise a store followed by a fetch using the same local or class variable. We might consider using the same procedure for instance variables. Take, for example, this code:
public class test {
int i ;
public void g() {
int j ;
i = 9 ;
j = i ;
}
}
When this is compiled and disassembled we find that the last two assignments generate this assembly language code:
aload_0
bipush 9
putfield test/i I
aload_0
getfield test/i I
istore_1
To optimise this we have to identify the store and the subsequent
fetch, and ensure that they refer to the same variable. The problem is
that the instructions making up the store (aload_0 and
putfield test/i I) are wrapped around the instruction (or
instructions) to get the value to be stored (bipush 9 in this
case).
At first I thought that this rule would work:
aload%1
%2
putfield %3 I
aload%1
getfield %3 I
=
aload%1
%2
; increase stack limit by 1
dup_x1
putfield %3 I
The dup_x1 instruction takes the item on top of the stack
(which is 9 in the above example) and puts a copy of it under the
second-to-top item. When the top item and the object reference
below it are consumed by the putfield the duplicate of the
top item is left on the stack as if it had been fetched.
The number of bytes saved will depend on whether the aload
is the one byte or two byte form. Replacing the aload (one or
two bytes) and putfield (three bytes) with a dup_x1
(one byte) will save three or four bytes. (You may have noticed that
I haven't considered the wide form of any instructions. I
don't think they're sufficiently common to merit much attention.)
Now, the code to obtain the value to be stored can be more
than just a bipush. In some cases
this can cause the optimisation rule to generate incorrect code. For
example:
public class test {
public int i ;
public int f() {
return 8 ;
}
public void g() {
test b = new test() ;
int j ;
b.i = f() ;
j = i ;
}
}
Here the value to be stored is the return value of a method call.
Also, it is being stored in the instance variable, i, of one
test object, but the value that's being fetched in the next
statement is from the same variable in a different test object.
Here's what javac generates for the two assignment statements:
aload_1
aload_0
invokevirtual test/f()I
putfield test/i I
aload_0
getfield test/i I
istore_2
The code to fetch the value to be stored consists of two instructions
(aload_0 and invokevirtual test/f()I). The
simple optimisation rule generates a false match. This results in the
incorrect code:
aload_1
aload_0
invokevirtual test/f()I
.limit stack 3
dup_x1
putfield test/i I
istore_2
The correct method is called (this.f()) and the return value is
put under the first two stack items. Then the value is stored in the
correct instance variable (b.i), leaving the value at the
top of the stack. But it isn't the value we wanted! It should be
this.i, but instead it's b.i,
The problem is that the optimisation rule isn't sufficiently precise.
The '%2' should only match instructions which put
an integer value on the stack. To make the rule work we need to specify
examples of such instructions explicitly. The following rule will
certainly work:
aload%1
bipush %2
putfield %3 I
aload%1
getfield %3 I
=
aload%1
bipush %2
; increase stack limit by 1
dup_x1
putfield %3 I
In similar vein we could use iconst_%2, sipush %2,
ldc %2 and iload%2. In each case we have a
single instruction which pushes an integer onto the stack. Actually,
ldc %2 can push a float or a String, but it's use here is
clear from the context.
Booleans are a simpler case, as they only require
iconst_%2 and iload%2 to be handled.
It becomes quite tedious to have to consider all these special cases. The original (but wrong) substitution at least had the advantage of being concise. After adding these special cases for integers and booleans to the rules file I got bored and haven't handled any other types.
There some other special cases which are sufficiently common to merit attention. Firstly, there is the construction of an object followed by a call to one of its methods. This seems to be quite a common sequence, as objects are first created, stored to an instance variable and then used.
public class test {
public test t ;
public void f() {
}
public void g() {
t = new test() ;
t.f() ;
}
}
The code generated by javac for this is:
aload_0
new test
dup
invokespecial test/<init>()V
putfield test/t Ltest;
aload_0
getfield test/t Ltest;
invokevirtual test/f()V
which can be rewritten as:
aload_0
new test
dup
invokespecial test/<init>()V
.limit stack 4
dup_x1
putfield test/t Ltest;
invokevirtual test/f()V
aload%1
aload%2
putfield %3 L%4;
aload%1
getfield %3 L%4;
=
aload%1
aload%2
; increase stack limit by 1
dup_x1
putfield %3 L%4;
Once again I have made some performance measurements:
Instructions Bytes Unoptimised Optimised
saved time (ms) time (ms)
bipush 3 1921 1278
new object 3 1132 587
1,500,000 iterations were used in the first case, 500,000 in the second.
Results for sipush and ldc are much the same
as for bipush and are not shown. Actually, I'd have expected
the timings for the new object case to be similar as well, but even
allowing for the different number of iterations they clearly aren't.