JVM Internals: The ClassLoader
JVM Internals: The ClassLoader
Welcome to a series of writings where I’ll be going over JVM internals and how different parts work. Today we’ll be covering how the classloader works.
The JVM’s ClassLoader
A ClassLoader is a fundamental part of the VM. It’s responsible for loading, linking and initialisation of classes at runtime. It’s what allows a Java application to load code from various sources, such as the filesystem, network locations and more. The class loading mechanism is built on three principles. The delegation model, unique namespaces and visibility constraints.
The Entry Point
The process of defining a class from a byte array begins with a call to the native method java.lang.ClassLoader.defineClass. This call invokes JNI_DefineClass at line 275 of src/hotspot/share/prims/jni.cpp.
// Convert Java arguments to VM native format
TempNewSymbol class_name = nullptr;
if (name != nullptr) {
// Create a Symbol* from the UTF-8 class name string
// This gets interned in the global SymbolTable
class_name = SymbolTable::new_symbol(name);
}
// Get a GC-safe handle to the ClassLoader oop
Handle class_loader(THREAD, JNIHandles::resolve(loaderRef));
// Wrap the raw byte buffer in a stream for parsing
ClassFileStream st((u1*)buf, bufLen, nullptr, ClassFileStream::verify);
// Hand off to SystemDictionary to handle class creation
Klass* k = SystemDictionary::resolve_from_stream(&st, class_name,
class_loader,
cl_info,
CHECK_NULL);
This function acts as a bridge from Java to HotSpot and its internals. It first sets up the arguments into the VM’s native format. The const char* name parameter is converted into a Symbol* and the jobject loaderRef is loaded into a GC-safe handle pointing to the internal classloader oop (ordinary object pointer). It then wraps the raw bytecode buffer in a ClassFileStream for reading.
This hands off class creation and loading to SystemDictionary, which handles class management.
SystemDictionary
The SystemDictionary is basically a big registry for all loaded classes. It’s a hash table that maps a (ClassLoaderData*, Symbol*) pair to a Klass*. Our call from jni.cpp lands in SystemDictionary::resolve_from_stream at line 388 of systemDictionary.cpp, which then calls resolve_class_from_stream.
// Get or create ClassLoaderData for this loader
ClassLoaderData* loader_data = register_loader(class_loader);
// For non-parallel-capable loaders, acquire the loader lock
// This prevents concurrent class definition
Handle lockObject = get_loader_lock_or_null(class_loader);
ObjectLocker ol(lockObject, THREAD);
// Parse the bytecode stream into an InstanceKlass
InstanceKlass* k = KlassFactory::create_from_stream(st,
class_name,
loader_data,
cl_info,
CHECK_NULL);
// Add the new class to the system
define_instance_class(k, class_loader, THREAD);
This function sets up the definition process. For older non-parallel-capable classloaders, it acquires a monitor on the ClassLoader oop itself via ObjectLocker ol(lockObject, THREAD). This prevents class definition for that specific loader from happening concurrently, mainly to avoid race conditions.
The function’s main job is to hand off the heavy lifting of parsing to a factory pattern by calling KlassFactory::create_from_stream. This is where the raw bytes actually get processed. After the factory successfully returns a newly created InstanceKlass*, the SystemDictionary finishes the process by calling define_instance_class, which puts the new class into the system.
Parsing the Class File
The KlassFactory uses a ClassFileParser to transform the byte stream into a structured in-memory representation. The main logic lives in ClassFileParser::parse_stream, which carefully follows the .class file specification.
// Read and verify magic number (0xCAFEBABE)
stream->guarantee_more(8, CHECK);
const u4 magic = stream->get_u4_fast();
guarantee_property(magic == JAVA_CLASSFILE_MAGIC,
"Incompatible magic value");
// Read version numbers
_minor_version = stream->get_u2_fast();
_major_version = stream->get_u2_fast();
// Read constant pool size and allocate
u2 cp_size = stream->get_u2_fast();
_cp = ConstantPool::allocate(_loader_data, cp_size, CHECK);
// Parse all constant pool entries
parse_constant_pool_entries(stream, _cp, cp_size, CHECK);
It reads the 0xCAFEBABE magic number first (line 83 of classFileParser.cpp), checks version info, then allocates a ConstantPool metadata object. The most complex part is parse_constant_pool_entries, which goes through the constant pool via a massive switch statement starting at line 152.
When it hits a JVM_CONSTANT_Utf8 tag, it reads the bytes and puts them into a Symbol* in the global SymbolTable. There’s a batching optimisation here. It gathers multiple strings before calling SymbolTable::new_symbols to reduce locking overhead (see lines 256-264).
case JVM_CONSTANT_Utf8: {
// Read the UTF-8 string length
u2 utf8_length = cfs->get_u2_fast();
const u1* utf8_buffer = cfs->current();
// Skip over the string data
cfs->skip_u1_fast(utf8_length);
// Try to find if this symbol already exists
Symbol* result = SymbolTable::lookup_only(utf8_buffer,
utf8_length);
if (result == nullptr) {
// New symbol, batch it up for allocation
names[names_count] = (const char*)utf8_buffer;
lengths[names_count] = utf8_length;
indices[names_count] = index;
names_count++;
// Flush batch when full
if (names_count == SymbolTable::symbol_alloc_batch_size) {
SymbolTable::new_symbols(_loader_data, cp,
names_count, names, lengths);
names_count = 0;
}
} else {
// Symbol already exists, just use it
cp->symbol_at_put(index, result);
}
break;
}
Other tags like JVM_CONSTANT_Class or JVM_CONSTANT_Methodref are parsed as indices that point back to these Utf8 entries. After processing the constant pool, it calls other functions like parse_fields and parse_methods to build up the complete picture of the class structure.
Here’s a concrete example. When parsing a simple class like this one below, the parser first validates the magic number at the start of the bytecode, then reads through the constant pool extracting all the symbolic references.
public class Example {
static int x = 42;
static { System.out.println("init"); }
}
It creates Symbol* entries for “Example”, “x”, “I” (the int type signature) and so on. These symbols are interned in the global SymbolTable so they can be shared across all classes.
InstanceKlass
The final product of the parser is the InstanceKlass. This is the main C++ metadata object representing a Java class. It’s allocated in Metaspace and holds everything the VM needs. Pointers to its constant pool and methods, its superclass, implemented interfaces and importantly its current initialisation state.
// Possible class states as it progresses through loading
enum ClassState {
allocated, // Memory allocated, nothing else done
loaded, // Basic structure created
linked, // Verification and preparation complete
being_initialized, // Running <clinit> right now
fully_initialized, // Ready to use
initialization_error // Something went wrong
};
The possible states are allocated, loaded, linked, being_initialized, fully_initialized and initialization_error. You can see the state definitions in instanceKlass.cpp starting at line 70. The class progresses through these states as it’s being prepared for use.
Linking
The newly created InstanceKlass isn’t ready for use yet. The linking phase, started by InstanceKlass::link_class_impl, validates it and integrates it into the VM.
Verification is the first step. A call to verify_code hands off to the Verifier (see line 423 of instanceKlass.cpp). This does static analysis of each method’s bytecode. For modern class files (version 50.0 and above), it uses the compiler-provided StackMapTable attribute to do a fast, single-pass check for type safety, stack integrity and valid control flow. This prevents malformed bytecode from breaking the VM.
bool InstanceKlass::link_class_impl(TRAPS) {
// Don't link if already verified
if (is_linked()) {
return true;
}
// Link superclass first (recursively)
if (super() != nullptr) {
super()->link_class_impl(CHECK_false);
}
// Verify all methods have valid bytecode
if (!is_rewritten()) {
bool verify_ok = verify_code(THREAD);
if (!verify_ok) {
return false;
}
}
// Build vtable and itable for dynamic dispatch
vtable().initialize_vtable_and_check_constraints(CHECK_false);
itable().initialize_itable_and_check_constraints(CHECK_false);
// Mark as linked
set_init_state(linked);
return true;
}
Preparation handles allocation of memory for static fields and their initialisation to default values (like 0 or null). This is done during InstanceKlass creation.
Resolution is where HotSpot is clever. It’s lazy. By default, the ConstantPool initially holds symbolic references. The first time an instruction like invokevirtual runs, a VM runtime routine resolves the symbolic name to a direct pointer (such as a vtable index) and patches the ConstantPoolCache. All later executions of that instruction use the fast, direct pointer, skipping the expensive lookup process.
After these steps, link_class_impl builds the class’s vtable and itable via vtable().initialize_vtable_and_check_constraints(...) at line 440, then finally updates the class state to linked.
Initialisation
This final phase runs the class’s static logic. It’s triggered by the first active use of the class and handled by InstanceKlass::initialize_impl.
void InstanceKlass::initialize_impl(TRAPS) {
// Get lock on the class mirror (java.lang.Class object)
Handle h_init_lock(THREAD, init_lock());
ObjectLocker ol(h_init_lock, CHECK);
// Wait if another thread is already initializing
while (is_being_initialized() &&
!is_reentrant_initialization(THREAD)) {
ol.wait_uninterruptibly(CHECK);
}
// Check if already initialized by another thread
if (is_initialized()) {
return;
}
// Mark as being initialized by this thread
set_init_state(being_initialized);
set_init_thread(THREAD);
// Initialize superclass first
if (super() != nullptr && super()->should_be_initialized()) {
super()->initialize(THREAD);
}
// Run static initializer <clinit> if it exists
if (class_initializer() != nullptr) {
call_class_initializer(THREAD);
}
// Mark as fully initialized and wake waiting threads
set_initialization_state_and_notify(fully_initialized, CHECK);
}
Synchronisation is crucial here. The current thread must acquire a monitor on the class’s java.lang.Class mirror object. Other threads trying to initialise the same class will block in a while loop until the first thread finishes (see lines 1158-1170 of instanceKlass.cpp).
Recursive Initialisation ensures the superclass is initialised first by recursively calling super_klass->initialize(THREAD) at line 1277.
Running <clinit> is where the magic happens. The VM finds the special compiler-generated method named <clinit>, which contains all static field initialisers and static {} blocks. The function call_class_initializer at line 1295 then uses JavaCalls::call to have the JVM interpreter or JIT-compiled code execute the bytecode of this method.
For our Example class above, the <clinit> method would contain bytecode to store 42 into field x and then invoke System.out.println.
Completion is the final step. When <clinit> returns successfully, set_initialization_state_and_notify at line 1300 atomically updates the class’s _init_state to fully_initialized and calls ol.notify_all(CHECK) to wake up any threads waiting on the initialisation lock.
Putting It All Together
Let’s trace through what happens when you call ClassLoader.defineClass with the bytecode for our Example class.
First, JNI_DefineClass converts the Java arguments into native VM structures. It wraps your byte array in a ClassFileStream and calls into SystemDictionary::resolve_from_stream.
The SystemDictionary checks if the class is already loaded. If not, it acquires the appropriate locks (depending on whether your classloader is parallel-capable) and calls KlassFactory::create_from_stream.
The factory creates a ClassFileParser which reads through your bytecode. It validates the magic number is 0xCAFEBABE, checks the class file version, then parses the constant pool. Each CONSTANT_Utf8 entry like “Example”, “x”, “I” gets converted into a Symbol* and interned in the global symbol table.
After the constant pool, the parser processes fields and methods, building up the complete InstanceKlass structure. It allocates this in Metaspace and returns it to the SystemDictionary.
The SystemDictionary then calls define_instance_class, which first calls link_class_impl. This runs the verifier on all your methods to ensure the bytecode is valid. It prepares storage for static fields and sets up the vtable and itable.
At this point, the class is linked but not initialised. The first time someone accesses the static field x, the VM triggers initialisation. It acquires the lock on the Example.class mirror, runs the <clinit> method (which stores 42 and prints “init”), then marks the class as fully initialised.
Any other threads trying to use Example during this time will block on the class mirror lock, then wake up once initialisation completes. They’ll see a fully initialised class ready to use.
Thread Safety
The entire class loading process is carefully designed to be thread-safe. Multiple threads can try to load the same class simultaneously, but the VM ensures only one thread actually performs the work.
For the bootstrap and platform loaders, the SystemDictionary uses a PlaceholderTable to track classes currently being loaded. When a thread starts loading a class, it adds a placeholder entry. Other threads attempting to load the same class will see the placeholder and wait.
For custom classloaders, the locking strategy depends on whether they’re parallel-capable. Non-parallel loaders use a simple lock on the loader object itself. Parallel-capable loaders can load different classes concurrently but still need coordination for the same class.
The initialisation phase has its own locking via the class mirror. This prevents multiple threads from running <clinit> simultaneously, which could lead to seeing partially initialised state.
Conclusion
The JVM classloader is a sophisticated system that transforms raw bytecode into live, executable classes. By understanding how JNI_DefineClass flows through SystemDictionary to ClassFileParser, how InstanceKlass tracks state and how linking and initialisation work, you can better debug class loading issues and appreciate the engineering that makes Java work.
Thank you for reading <3