Sunday 5 August 2012

What's Inside a Class File ?

The Java class file is a binary stream of 8-bit bytes. Data items are stored sequentially in class file with no padding between adjacent items (lack of padding helps keep class files compact).
The Java class file contains everything a JVM needs to know about one Java class or interface. Each item in a class file has a type, a name, and a count.
Below is description of some major components of a class file:

1) A constant number
----------------------------
The first four bytes of every Java class file is a number, 0xCAFEBABE. This makes non-Java class files easier to identify. If a file doesn’t start with 0xCAFEBABE, it definitely isn’t a Java class file. This number can be chosen by a file format’s designers to be any arbitrary number that isn’t already in wide use.

2) minor_version and major_version
----------------------------
The second four bytes of the class file contain the minor and major version numbers. As Java technology evolves, new features may occasionally be added to the Java class file format. Each time the class file format changes, the version numbers will change as well. JVM will generally be able to load class files with a given major version number and a range of minor version numbers. JVMs must reject class files with version numbers outside their valid range.

3) constant_pool_count and constant_pool
----------------------------
The constant pool contains the constants associated with the class or interface defined by the file. Constants such as literal strings, final variable values, class names, and method names are stored in the constant pool. A count of the number of entries in the list, constant_pool_count, precedes the actual list, constant_pool.

4) access_flags
----------------------------
The first two bytes after the constant pool, the access flags, reveal several information. The access flags indicate whether the file defines a class or an interface. It also indicate what modifiers were used in declaration of the class or interface.


5) this_class
----------------------------
The next two bytes are the this_class item, an index into the constant pool. The constant pool entry at position this_class must be a CONSTANT_Class_info table, which has two parts: a tag and a name_index. The tag will have the value CONSTANT_Class. The constant pool entry at position name_index will be a CONSTANT_Utf8_info table containing the fully qualified name of the class or interface.

The this_class item provides a glimpse of how the constant pool is used. By itself, the this_class item is just an index into the constant pool. When a JVM looks up the constant pool entry at position this_class, it will find an entry that identifies itself via its tag as a CONSTANT_Class_info. The JVM knows CONSTANT_Class_info entries always have an index into the constant pool, called name_index, following their tag. So the virtual machine looks up the constant pool entry at position name_index, where it should find a CONSTANT_Utf8_info entry that contains the fully qualified name of the class or interface. 
Below figure depicts this process:



6) super_class
----------------------------
super_class is a two-byte index into constant pool. The constant pool entry at position super_class will be CONSTANT_Class_info entry that refers to fully qualified name of this class’s superclass. Because base class of every object in Java programs is java.lang.Object class, the super_class constant pool index will be valid for every class except Object. For Object, super_class is zero. For interfaces, constant pool entry at position super_class is java.lang.Object.

7) interfaces_count and interfaces
----------------------------
The component that follows super_class starts with interfaces_count, a count of the number of superinterfaces directly implemented by the class or interface defined in this file. Immediately following the count is interfaces, an array that contains one index into the constant pool for each superinterface directly implemented by this class or interface. Each superinterface is represented by a CONSTANT_Class_info entry in the constant pool that refers to the fully qualified name of the interface. Only direct superinterfaces, those that appear in the implements clause of the class or the extends clause of the interface declaration, appear in this array. The superinterfaces appear in the array in the order in which they appear (left to right) in the implements or extends clause.

8) fields_count and fields
----------------------------
Following the interfaces component in the class file is a description of the fields declared by this class or interface. This component starts with fields_count, a count of the number of fields, including both class and instance variables. Following the count is a list of variable-length field_info tables, one for each field. The only fields that appear in the fields list are those that were declared by the class or interface defined in file. No fields inherited from superclasses or superinterfaces appear in the fields list.

9) methods_count and methods
----------------------------
This component starts with methods_count, a two-byte count of the number of methods in the class or interface. The count includes only those methods that are explicitly defined by this class or interface. Following the method count are the methods themselves, described in a list of method_info tables. The method_info table contains several pieces of information about the method, including the method’s name and descriptor.

10) attributes_count and attributes
----------------------------
The last component in the class file are the attributes, which give general information about the particular class or interface defined by the file. The attributes component starts with attributes_count, a count of the number of attribute_info tables appearing in the subsequent attributes list. The first item in each attribute_info table is an index into the constant pool of a CONSTANT_Utf8_info table that gives the attribute’s name.





No comments:

Post a Comment