Java Bytecode

Java Virtual Machine Basic Properties

The Java Virtual Machine is a stack based virtual machine. It has the following properties:

  • Parameters and local variables are stored in slots, with parameters being first. This also includes the this parameter, which is always in the first slot for instance methods. Compilers can also introduce synthetic local variables that do not correspond to any variable in the source.

  • On exception, the current stack is cleared and the thrown exception is pushed to the stack.

  • Each instruction must have a consistent prior stack size. This means if an instruction is the target of multiple instructions, each instruction that have it as a target must have the same post stack size. For instance, this is invalid:

    LDC 1
    ILOAD 1
    if_icmpeq [if-equal]
    ILOAD 2
    [if-equal]
    RETURN

    Because RETURN have an inconsistent prior stack size: if the if_icmpeq branch is taken, the expected stack size is 0; otherwise, the expected stack size is 1 (because ILOAD 2 push an int to the stack). The bytecode below is valid:

    LDC 1
    ILOAD 1
    if_icmpeq [if-equal]
    ILOAD 2
    POP
    [if-equal]
    RETURN

    since the expected stack size prior to RETURN is 0 for both branches of if_icmpeq.

  • The stack is typed, and will cause verification errors if we were to try to call a method with invalid types on the stack. For instance, this is invalid:

    ILOAD 1
    INVOKESTATIC [MyClass.method(float) -> float]

    since we are calling a method that expects a float with an int. This is also invalid:

    ALOAD 1 [Object]
    INVOKEVIRTUAL [MyClass.method() -> void]

    Since we are trying to call a MyClass method on Object (which may or may not be a MyClass instance). To fix it, we need to cast it first:

    ALOAD 1 [Object]
    CHECKCAST [MyClass]
    INVOKEVIRTUAL [MyClass.method() -> void]

Java Virtual Machine instruction listing

A full explanation of the Java Virtual Machine’s instruction set can be viewed at the Oracle JVM Documentation and a summary can be read on the List of Java bytecode instruction wikipedia article.

Generating Java bytecode instructions

Java bytecode instructions are generated with the ASM library. Functions are created by creating a JavaPythonClassWriter (which is a ClassWriter that overrides getCommonSuperClass to prevent TypeNotPresent errors) instance, getting the MethodVisitor for its functional interface method, adapting the method visitor with MethodVisitorAdapters.adapt (which sorts try blocks for us and gives us better error messages), and then generating the Java bytecode using that method visitor. It looks like this in the code:

ClassWriter classWriter = new JavaPythonClassWriter(ClassWriter.COMPUTE_MAXS | ClassWriter.COMPUTE_FRAMES);
classWriter.visit(Opcodes.V11, Modifier.PUBLIC, internalClassName, null, Type.getInternalName(Object.class),
        new String[] { methodDescriptor.getDeclaringClassInternalName() });

// ... Create fields and constructor ...

MethodVisitor methodVisitor = classWriter.visitMethod(Modifier.PUBLIC,
        methodDescriptor.getMethodName(),
        methodDescriptor.getMethodDescriptor(),
        null,
        null);

MethodVisitorAdapters.adapt(methodVisitor, methodDescriptor);

// ... Visit parameters ...

methodVisitor.visitCode();

// ... Create bytecode ...

methodVisitor.visitMaxs(0, 0);
methodVisitor.visitEnd();

To generate a particular opcode instruction, identify what kind of instruction it is:

  • Instructions that either conditionally or unconditionally jump to a label are created with visitJumpInsn, which take the label to either conditionally or unconditionally jump to.

  • Instructions that load parameters or local variables are created with visitVarInsn, which take an int to identify which slot to load the parameter/local variable from. We can use the LocalVariableHelper on StackMetadata to obtain the slot number of parameters and local variables.

  • Instructions that operate on the stack are created with visitInsn (for instance, popping a value or adding the top two int on the stack). No parameter (beside the opcode to generate) is needed for these opcodes.

  • Instructions that call methods are created with visitMethodInsn, which takes.

    • The internal name of the declaring class. This can be retrieved with Type.getInternalName.

    • The method name.

    • The method descriptor string that describe the parameter and return types of the method. This can be received with Type.getMethodDescriptor(Type returnType, Type…​ parameterTypes) (the Type object of a class can be received with Type.getType).

    • A boolean that is true if the method is defined on an interface, false otherwise.

  • Instructions that read or set fields are created with visitFieldInsn, which takes

    • The internal name of the class this field belongs to. This can be retrieved with Type.getInternalName.

    • The field’s name.

    • The field’s type descriptor string. This can be received with Type.getDescriptor.

  • Instructions that operate on types (i.e. CHECKCAST, INSTANCEOF and NEW) are generated with visitTypeInsn. It takes the internal name of the class to cast/instanceof/new as an argument (this can be retrieved with Type.getInternalName).

  • The instruction that load constants onto the stack can be generated with visitLDC; it takes the constant to load (which must be a primitive type, Type object, or a String) as its only argument.