JPyInterpreter Architecture

The architecture of JPyInterpreter is composed of several components spread across Python and Java. The Python components are:

  • jvm_setup.py, which sets up the JVM and the hooks JPyInterpreter uses to communicate with CPython (to look up packages, call native methods, and converting CPython object to Java object and vice-versa).

  • python_to_java_bytecode_translator.py, which acts as the Python’s frontend to JPyInterpreter. Users supply a CPython function to translate (and an optional Java functional interface it implements) to translate_python_bytecode_to_java_bytecode, which firsts converts that function to a PythonCompiledFunction, and then passes it to PythonBytecodeToJavaBytecodeTranslator to translate the function. It also provides the translate_python_class_to_java_class function, which is given a user supplied CPython class, converts it to a PythonCompiledClass, and passes it to PythonClassTranslator to translate the class.

The Java components are:

  • PythonBytecodeToJavaBytecodeTranslator, which acts as the entrypoint for function translation. It is responsible for:

    • Setting up the JavaPythonClassWriter and MethodVisitor used for bytecode generation.

    • Creating and setting fields on the generated class objects (See PythonBytecodeToJavaBytecodeTranslator for details).

    • Do the leg work of moving/translating Java parameters (ex: int) into PythonLikeObject.

    • Setup cells variables.

    • Delegating to PythonGeneratorTranslator when it detects the function being translated is a generator.

    • Using FlowGraph to calculate the StackMetadata of each instruction.

    • Calling the implement method on every opcode in the Opcode list with the StackMetadata for the opcode and the FunctionMetadata for the overall function.

  • PythonGeneratorTranslator is like PythonBytecodeToJavaBytecodeTranslator but for generators. It breaks a single generator function into multiple advance functions, and generates each advance function bytecode independently.

  • FlowGraph, which calculates the StackMetadata that corresponds to each Opcode. It is responsible for unifying the StackMetadata from all jump sources for each Opcode that is a jump target. For instance, if two sources with the same target have post StackMetadata of …​ int and …​ bool respectively, FlowGraph will unify that to …​ int (since bool is a subclass of int, for better or worse).

  • StackMetadata stores metadata about the stack and local variables. Each Opcode get its own StackMetadata instance. It is mostly used to perform optimizations; for instance, if we detect the top two items on the stack are int and int for the BINARY_ADD instruction, we can change the (normally complex due to Python semantics) BINARY_ADD bytecode into a single method call.

  • FunctionMetadata stores metadata about the function (for instance, the MethodVisitor to use to generate bytecode). Each Opcode gets the same FunctionMetadata instance.

  • Opcode are the interface between CPython opcodes and the Implementors. Each describe a particular operation, and usually (but not always) correspond to a CPython opcode. Some CPython opcodes map to the same Opcode implementation.

  • Implementors are responsible for generating the Java bytecode corresponding to CPython bytecode. They can be found in the implementors package.

The overall process of compiling a function looks like this:

A diagram showing how JPyInterpreter classes interact

Types

The builtin types for JPyInterpreter can be found in the types package. They all implement PythonLikeObject, the interface the bytecode uses to represent arbitrary objects. If type flow analysis determines a more specific type can be used (via StackMetadata), the more specific type is used directly instead. PythonLikeObject have several methods:

  • __getAttributeOrNull: returns the attribute with the given name if it exists, otherwise returns null. This is NOT __getattribute__ (which is implemented by $method$__getattribute__ instead). This is more akin to self.__dict__[attribute]. The default $method$__getattribute__ uses it to get the attribute (with additional magic to handle descriptors, see the Python descriptor tutorial for more detail).

  • __getAttributeOrError: returns the attribute with the given name if it exists, otherwise raises AttributeError. Used in bytecode generation to lookup methods on types.

  • __setAttribute: sets the attribute with the given name to the given value. This is NOT __setattr__ (which is implemented by $method$__setattr__ instead). This is more akin to self.__dict__[attribute] = value. The default $method$__setattr__ uses it to set the attribute.

  • __deleteAttribute: deletes the attribute with the given name. This is NOT __delattr__ (which is implemented by $method$__delattr__ instead). This is more akin to del self.__dict__[attribute]. The default $method$__delattr__ uses it to delete the attribute.

  • __getType: returns the type of the object. Used to implement type(object).

  • __getGenericType: returns the generic type of the object (ex: list[int]). Used for typeflow analysis.

  • $method$<method-name>: the builtin methods on every object in Python. The $method$<method-name> naming is to allow custom classes to override them (custom classes prefix method names with $method$ to not clash with Java method names).

PythonBytecodeToJavaBytecodeTranslator

The entrypoint for function translation, and the glue code for the many subsystems of the translator. It is responsible for setting up the JavaPythonClassWriter, MethodVisitor and configuring the class' fields. The fields it configures are:

  • co_consts: Static; a List<PythonLikeObject> that stores constants used in the bytecode.

  • co_names: Static; a List<PythonString> that stores names used in the bytecode.

  • co_varnames: Static; a List<PythonString> that stores variable names used in the bytecode.

  • __globals__: Static; a Map<String, PythonLikeObject> used to read and store globals.

  • __spec_getter__: Static; a BiFunction<PythonLikeTuple, PythonLikeDict, ArgumentSpec<PythonLikeObject>> that maps default arguments (which are per function) to an ArgumentSpec that can be used to set parameters.

  • __defaults__: Instance; a PythonLikeTuple that stores default positional arguments.

  • __kwdefaults__: Instance; a PythonLikeDict that stores default keyword arguments.

  • __annotations__: Instance; a PythonLikeDict that stores type annotations on the function.

  • __closure__: Instance; a PythonLikeTuple that stores the function’s closure (i.e. the free variable cells).

  • __qualname__: Instance; a PythonString that stores the qualified name of the function.

  • __spec__: Instance; an ArgumentSpec that can be used to receive parameter (and correctly handle default arguments).

  • __interpreter__: Instance; the PythonInterpreter this function runs in (used to perform imports and lookup unknown globals).

If a Python function cannot be translated for any reason (ex: native code), the following fields are also added:

  • __code__: Static; an opaque pointer to the function’s CPython code object (used in the construct to make the wrapped CPython function).

  • __function__: Instance; a PythonObjectWrapper that wraps the CPython function (used to call the CPython function).