The Java Class File Format – an Overview

The Java Class File Format is the format in which Java classes are stored when Java source files are compiled by the Java compiler. It includes all the information from the source files as well as some optimizations, but in a format the JVM can handle.

A complete description can be found in chapter 4 of the Java Virtual Machine Specification.

The Java Class File Format is a binary format that consists of chunks of 8-bit bytes. The JVM specification defines the datatypes u1, u2 and u4 as 8-, 16- and 32-bit unsigned numbers. That means if the specification says something like “u2 access_flags”, the next two bytes contain information about access flags.

A class file has the following sections. Some of them are of fixed size, some are variable.

Section Size
Magic fixed
Version fixed
Constant Pool variable
Access Flags fixed
This Class fixed
Super Class fixed
Interfaces variable
Fields variable
Methods variable
Attributes variable

Many of the terms are quite clear. I will not go into detail now as there will be a separate blog post about each of them.

Step 0: Display All the Bytes!

I would say lets move slowly in the beginning. As we move forward in the specification it will get more and more advanced anyways. So the first step will be printing the bytes of a Java class file, just to get an overview.

To keep the count of bytes small lets look at the bytes of the following small Java class:

package at.lemme.classfilereader; 

public class EmptyClass { 
}

The method that reads all the bytes is quite simple. It reads up to bytesInARow bytes to an array. Then it prints all the bytes with String.format("%02X", b);:

private static String bytesToString(InputStream is, int bytesInARow) throws IOException {
    byte[] buffer = new byte[bytesInARow];
    int length = 0;
    StringBuilder sb = new StringBuilder();
    while((length = is.read(buffer)) > 0){
        for (int i = 0; i < length; i++) {
            sb.append(String.format("%02X ", buffer[i]));
        }
        sb.append("\n");
    }
    return sb.toString();
}

The %02X pattern simply means: Display a number as a hexadecimal integer (X), display in a width of 2 and zero-pad it (0).

The class PrintClassFile can be found in the GitHub repository. The result of printing the 305 bytes of the class is the following:

CA FE BA BE 00 00 00 34 00 10 0A 00 03 00 0D 07 
00 0E 07 00 0F 01 00 06 3C 69 6E 69 74 3E 01 00 
03 28 29 56 01 00 04 43 6F 64 65 01 00 0F 4C 69 
6E 65 4E 75 6D 62 65 72 54 61 62 6C 65 01 00 12 
4C 6F 63 61 6C 56 61 72 69 61 62 6C 65 54 61 62 
6C 65 01 00 04 74 68 69 73 01 00 25 4C 61 74 2F 
6C 65 6D 6D 65 2F 63 6C 61 73 73 66 69 6C 65 72 
65 61 64 65 72 2F 45 6D 70 74 79 43 6C 61 73 73 
3B 01 00 0A 53 6F 75 72 63 65 46 69 6C 65 01 00 
0F 45 6D 70 74 79 43 6C 61 73 73 2E 6A 61 76 61 
0C 00 04 00 05 01 00 23 61 74 2F 6C 65 6D 6D 65 
2F 63 6C 61 73 73 66 69 6C 65 72 65 61 64 65 72 
2F 45 6D 70 74 79 43 6C 61 73 73 01 00 10 6A 61 
76 61 2F 6C 61 6E 67 2F 4F 62 6A 65 63 74 00 21 
00 02 00 03 00 00 00 00 00 01 00 01 00 04 00 05 
00 01 00 06 00 00 00 2F 00 01 00 01 00 00 00 05 
2A B7 00 01 B1 00 00 00 02 00 07 00 00 00 06 00 
01 00 00 00 03 00 08 00 00 00 0C 00 01 00 00 00 
05 00 09 00 0A 00 00 00 01 00 0B 00 00 00 02 00 
0C

The first 4 bytes (u4) of the class represent the “Magic” section, which we will handle in the next blog post.

Author: Thomas Lemmé

Something about me...

Leave a Reply

Your email address will not be published. Required fields are marked *