Java Bytecode
Java Virtual Machine Basic Properties
The Java Virtual Machine is a stack based virtual machine. It has the following properties:
-
Parameters and local variables are stored in slots, with parameters being first. This also includes the
this
parameter, which is always in the first slot for instance methods. Compilers can also introduce synthetic local variables that do not correspond to any variable in the source. -
On exception, the current stack is cleared and the thrown exception is pushed to the stack.
-
Each instruction must have a consistent prior stack size. This means if an instruction is the target of multiple instructions, each instruction that have it as a target must have the same post stack size. For instance, this is invalid:
LDC 1 ILOAD 1 if_icmpeq [if-equal] ILOAD 2 [if-equal] RETURN
Because RETURN have an inconsistent prior stack size: if the
if_icmpeq
branch is taken, the expected stack size is 0; otherwise, the expected stack size is 1 (becauseILOAD 2
push anint
to the stack). The bytecode below is valid:LDC 1 ILOAD 1 if_icmpeq [if-equal] ILOAD 2 POP [if-equal] RETURN
since the expected stack size prior to RETURN is 0 for both branches of
if_icmpeq
. -
The stack is typed, and will cause verification errors if we were to try to call a method with invalid types on the stack. For instance, this is invalid:
ILOAD 1 INVOKESTATIC [MyClass.method(float) -> float]
since we are calling a method that expects a
float
with anint
. This is also invalid:ALOAD 1 [Object] INVOKEVIRTUAL [MyClass.method() -> void]
Since we are trying to call a
MyClass
method onObject
(which may or may not be aMyClass
instance). To fix it, we need to cast it first:ALOAD 1 [Object] CHECKCAST [MyClass] INVOKEVIRTUAL [MyClass.method() -> void]
Java Virtual Machine instruction listing
A full explanation of the Java Virtual Machine’s instruction set can be viewed at the Oracle JVM Documentation and a summary can be read on the List of Java bytecode instruction wikipedia article.
Generating Java bytecode instructions
Java bytecode instructions are generated with the ASM library.
Functions are created by creating a JavaPythonClassWriter
(which is a ClassWriter
that overrides getCommonSuperClass
to prevent TypeNotPresent
errors) instance,
getting the MethodVisitor for its functional interface method,
adapting the method visitor with MethodVisitorAdapters.adapt
(which sorts try blocks for us and gives us better error messages),
and then generating the Java bytecode using that method visitor.
It looks like this in the code:
ClassWriter classWriter = new JavaPythonClassWriter(ClassWriter.COMPUTE_MAXS | ClassWriter.COMPUTE_FRAMES);
classWriter.visit(Opcodes.V11, Modifier.PUBLIC, internalClassName, null, Type.getInternalName(Object.class),
new String[] { methodDescriptor.getDeclaringClassInternalName() });
// ... Create fields and constructor ...
MethodVisitor methodVisitor = classWriter.visitMethod(Modifier.PUBLIC,
methodDescriptor.getMethodName(),
methodDescriptor.getMethodDescriptor(),
null,
null);
MethodVisitorAdapters.adapt(methodVisitor, methodDescriptor);
// ... Visit parameters ...
methodVisitor.visitCode();
// ... Create bytecode ...
methodVisitor.visitMaxs(0, 0);
methodVisitor.visitEnd();
To generate a particular opcode instruction, identify what kind of instruction it is:
-
Instructions that either conditionally or unconditionally jump to a label are created with
visitJumpInsn
, which take the label to either conditionally or unconditionally jump to. -
Instructions that load parameters or local variables are created with
visitVarInsn
, which take anint
to identify which slot to load the parameter/local variable from. We can use theLocalVariableHelper
onStackMetadata
to obtain the slot number of parameters and local variables. -
Instructions that operate on the stack are created with
visitInsn
(for instance, popping a value or adding the top twoint
on the stack). No parameter (beside the opcode to generate) is needed for these opcodes. -
Instructions that call methods are created with
visitMethodInsn
, which takes.-
The internal name of the declaring class. This can be retrieved with
Type.getInternalName
. -
The method name.
-
The method descriptor string that describe the parameter and return types of the method. This can be received with
Type.getMethodDescriptor(Type returnType, Type… parameterTypes)
(theType
object of a class can be received withType.getType
). -
A boolean that is true if the method is defined on an interface, false otherwise.
-
-
Instructions that read or set fields are created with
visitFieldInsn
, which takes-
The internal name of the class this field belongs to. This can be retrieved with
Type.getInternalName
. -
The field’s name.
-
The field’s type descriptor string. This can be received with
Type.getDescriptor
.
-
-
Instructions that operate on types (i.e.
CHECKCAST
,INSTANCEOF
andNEW
) are generated withvisitTypeInsn
. It takes the internal name of the class to cast/instanceof/new as an argument (this can be retrieved withType.getInternalName
). -
The instruction that load constants onto the stack can be generated with
visitLDC
; it takes the constant to load (which must be a primitive type, Type object, or a String) as its only argument.