专注虚拟机与编译器研究

第5.1篇-字段解析

在ClassfileParser::parseClassFile()函数中,解析完常量池、父类和接口后,接着会调用parser_fields()函数解析字段信息。调用语句如下:

u2 java_fields_count = 0;
// Fields (offsets are filled in later)
FieldAllocationCount fac;
Array<u2>* fields = parse_fields(class_name,
                                     access_flags.is_interface(),
                                     &fac, &java_fields_count,
                                     CHECK_(nullHandle));

在调用parse_fields()方法之前定义了一个变量fac,类型为FieldAllocationCount,定义如下:

来源:classFileParser.cpp文件

class FieldAllocationCount: public ResourceObj {
 public:
  u2 count[MAX_FIELD_ALLOCATION_TYPE];
 
  FieldAllocationCount() {
    for (int i = 0; i < MAX_FIELD_ALLOCATION_TYPE; i++) { // MAX_FIELD_ALLOCATION_TYPE的值为10
      count[i] = 0;
    }
  }
 
  FieldAllocationType update(bool is_static, BasicType type) {
    FieldAllocationType atype = basic_type_to_atype(is_static, type);
    // Make sure there is no overflow with injected fields.
    assert(count[atype] < 0xFFFF, "More than 65535 fields");
    count[atype]++;
    return atype;
  }
};

count数组用来统计各个类型变量的数量,这些类型通过FieldAllocationType枚举值定义。FieldAllocationType枚举类的定义如下:

enum FieldAllocationType {
  STATIC_OOP,                // 0 Oops
  STATIC_BYTE,               // 1 Boolean, Byte, char
  STATIC_SHORT,              // 2 shorts
  STATIC_WORD,               // 3 ints
  STATIC_DOUBLE,             // 4 aligned long or double

  NONSTATIC_OOP,             // 5
  NONSTATIC_BYTE,            // 6
  NONSTATIC_SHORT,           // 7
  NONSTATIC_WORD,            // 8
  NONSTATIC_DOUBLE,          // 9

  MAX_FIELD_ALLOCATION_TYPE, // 10
  BAD_ALLOCATION_TYPE = -1
};

主要统计静态与非静态的这5种变量的数量,这样在分配内存空间时,会根据变量的数量计算所需要的内存大小。统计的类型如下:

  • Oop,引用类型
  • Byte,字节类型
  • Short,短整型
  • Word,双字类型
  • Double,浮点类型

update()方法用来更新对应类型变量的总数量。其中的BasicType枚举类的定义如下:

源代码位置:utilities/globalDefinitions.hpp 
enum BasicType {
  T_BOOLEAN     =  4,
  T_CHAR        =  5,
  T_FLOAT       =  6,
  T_DOUBLE      =  7,
  T_BYTE        =  8,
  T_SHORT       =  9,
  T_INT         = 10,
  T_LONG        = 11,
  T_OBJECT      = 12,
  T_ARRAY       = 13,
  T_VOID        = 14,
  T_ADDRESS     = 15, // 表示ret指令用到的表示返回地址的returnAddress类型
  T_NARROWOOP   = 16,
  T_METADATA    = 17,
  T_NARROWKLASS = 18,
  T_CONFLICT    = 19, // for stack value type with conflicting contents
  T_ILLEGAL     = 99
};

调用basic_type_to_atype()方法将BasicType对象转换为对应的FieldAllocationType对象,如下:

static FieldAllocationType _basic_type_to_atype[2 * (T_CONFLICT + 1)] = {
  BAD_ALLOCATION_TYPE, //                  0
  BAD_ALLOCATION_TYPE, //                  1
  BAD_ALLOCATION_TYPE, //                  2
  BAD_ALLOCATION_TYPE, //                  3
  ///////////////////////////////////////////////////////////
  NONSTATIC_BYTE ,     // T_BOOLEAN     =  4,
  NONSTATIC_SHORT,     // T_CHAR        =  5,
  NONSTATIC_WORD,      // T_FLOAT       =  6,
  NONSTATIC_DOUBLE,    // T_DOUBLE      =  7,
  NONSTATIC_BYTE,      // T_BYTE        =  8,
  NONSTATIC_SHORT,     // T_SHORT       =  9,
  NONSTATIC_WORD,      // T_INT         = 10,
  NONSTATIC_DOUBLE,    // T_LONG        = 11,
  NONSTATIC_OOP,       // T_OBJECT      = 12,
  NONSTATIC_OOP,       // T_ARRAY       = 13,
  ///////////////////////////////////////////////////////////
  BAD_ALLOCATION_TYPE, // T_VOID        = 14,
  BAD_ALLOCATION_TYPE, // T_ADDRESS     = 15,
  BAD_ALLOCATION_TYPE, // T_NARROWOOP   = 16,
  BAD_ALLOCATION_TYPE, // T_METADATA    = 17,
  BAD_ALLOCATION_TYPE, // T_NARROWKLASS = 18,
  BAD_ALLOCATION_TYPE, // T_CONFLICT    = 19,

  BAD_ALLOCATION_TYPE, //                  0
  BAD_ALLOCATION_TYPE, //                  1
  BAD_ALLOCATION_TYPE, //                  2
  BAD_ALLOCATION_TYPE, //                  3
  ///////////////////////////////////////////////////////////
  STATIC_BYTE ,        // T_BOOLEAN     =  4,
  STATIC_SHORT,        // T_CHAR        =  5,
  STATIC_WORD,         // T_FLOAT       =  6,
  STATIC_DOUBLE,       // T_DOUBLE      =  7,
  STATIC_BYTE,         // T_BYTE        =  8,
  STATIC_SHORT,        // T_SHORT       =  9,
  STATIC_WORD,         // T_INT         = 10,
  STATIC_DOUBLE,       // T_LONG        = 11,
  STATIC_OOP,          // T_OBJECT      = 12,
  STATIC_OOP,          // T_ARRAY       = 13,
  ///////////////////////////////////////////////////////////
  BAD_ALLOCATION_TYPE, // T_VOID        = 14,
  BAD_ALLOCATION_TYPE, // T_ADDRESS     = 15,
  BAD_ALLOCATION_TYPE, // T_NARROWOOP   = 16,
  BAD_ALLOCATION_TYPE, // T_METADATA    = 17,
  BAD_ALLOCATION_TYPE, // T_NARROWKLASS = 18,
  BAD_ALLOCATION_TYPE, // T_CONFLICT    = 19,
};

static FieldAllocationType basic_type_to_atype(bool is_static, BasicType type) {
  assert(type >= T_BOOLEAN && type < T_VOID, "only allowable values");
  FieldAllocationType result = _basic_type_to_atype[  type + (is_static ? (T_CONFLICT + 1) : 0)  ];
  assert(result != BAD_ALLOCATION_TYPE, "bad type");
  return result;
}

方法baseic_type_to_atype()的实现很简单,这里不在介绍。  

1、为变量分配内存空间

为变量分配内存,在ClassFileParser::parse_fields()函数中有如下调用:

 u2* fa = NEW_RESOURCE_ARRAY_IN_THREAD(
             THREAD, u2, total_fields * (FieldInfo::field_slots + 1));

其中NEW_RESOURCE_ARRAY_IN_THREAD宏定义如下:

#define NEW_RESOURCE_ARRAY_IN_THREAD(thread, type, size)\
    (type*) resource_allocate_bytes(thread, (size) * sizeof(type))

宏替换后相当于如下调用代码:

u2* fa = (u2*) resource_allocate_bytes(THREAD, (total_fields * (FieldInfo::field_slots + 1)) * sizeof(u2))

其中FieldInfo是个枚举类型,枚举常量field_slots的值为6,在内存中开辟total_fields * (FieldInfo::field_slots + 1)个sizeof(u2)大小的内存空间,因为存储时要按如下的规则存储:

f1: [access, name index, sig index, initial value index, low_offset, high_offset]
f2: [access, name index, sig index, initial value index, low_offset, high_offset]
       ...
fn: [access, name index, sig index, initial value index, low_offset, high_offset]
     [generic signature index]
     [generic signature index]
     ...

也就是如果有n个变量,那么每个变量要占用6个u2类型的存储空间,不过每个变量还可能会有generic signature index,所以只能暂时开辟足够大小的空间来临时存储一下,在后面会按照实际情况来分配空间,然后copy一下即可,这样就避免了由于某些变量没有generic signature index而多分配出的空间。 

变量在Class文件中的存储格式如下:

field_info {
    u2             access_flags;
    u2             name_index;
    u2             descriptor_index;
    u2             attributes_count;
    attribute_info attributes[attributes_count];
}

其中的access_flags、name_index与descriptor_index对应的就是每个fn中的access、name index与sig index。另外的initial value index用来存储常量值(如果这个变量是一个常量),low_offset与high_offset在后面会详细介绍,这里暂时不介绍。

调用的resource_allocate_bytes()函数如下:

extern char* resource_allocate_bytes(Thread* thread, size_t size, AllocFailType alloc_failmode) {
  return thread->resource_area()->allocate_bytes(size, alloc_failmode);
}
char* allocate_bytes(size_t size, AllocFailType alloc_failmode = AllocFailStrategy::EXIT_OOM) {
   return (char*)Amalloc(size, alloc_failmode);
}
void* Amalloc(size_t x, AllocFailType alloc_failmode = AllocFailStrategy::EXIT_OOM) {
	// 校验ARENA_AMALLOC_ALIGNMENT必须是2的整数倍
    assert(is_power_of_2(ARENA_AMALLOC_ALIGNMENT) , "should be a power of 2");
    // 宏扩展后为:
    // ((((size_t)(x)) + (((size_t)((2*BytesPerWord))) - 1)) & (~((size_t)(((size_t)((2*BytesPerWord))) - 1))))
    x = ARENA_ALIGN(x);

    if (!check_for_overflow(x, "Arena::Amalloc", alloc_failmode))
      return NULL;

    if (_hwm + x > _max) {
      return grow(x, alloc_failmode);
    } else {
      char *old = _hwm;
      _hwm += x;
      return old;
    }
}

最终是在ResourceArea中分配空间,每个线程有一个_resource_area属性,调用的Amalloc()函数与之前在释放Handle句柄时介绍到的Amalloc_4()函数非常相似,这里不过多介绍。

_resource_area属性的定义如下:

// Thread local resource area for temporary allocation within the VM
ResourceArea* _resource_area;

在创建线程对象Thead时就会初始化这个属性,在构造函数中有如下调用:

set_resource_area(new (mtThread)ResourceArea()); // 初始化_resource_area属性

ResourceArea继承自Arena类,通过ResourceArea分配内存空间后就可以通过ResourceMark释放,类似于HandleArea和HandleMark。  

2、读取变量

下面看ClassFileParser::parse_fields()方法中对变量的读取,如下:

// The generic signature slots start after all other fields' data.
  int generic_signature_slot = total_fields * FieldInfo::field_slots;
  int num_generic_signature = 0;
  for (int n = 0; n < length; n++) {
    cfs->guarantee_more(8, CHECK_NULL);  // access_flags, name_index, descriptor_index, attributes_count
    // 读取变量的访问标识
    AccessFlags access_flags;
    jint flags = cfs->get_u2_fast() & JVM_RECOGNIZED_FIELD_MODIFIERS;
    access_flags.set_flags(flags);
    // 读取变量名称索引
    u2 name_index = cfs->get_u2_fast();
    int cp_size = _cp->length(); // 读取常量池中的数量

    Symbol*  name = _cp->symbol_at(name_index);
    // 读取描述符索引
    u2 signature_index = cfs->get_u2_fast();
    Symbol*  sig = _cp->symbol_at(signature_index);

    u2     constantvalue_index = 0;
    bool   is_synthetic = false;
    u2     generic_signature_index = 0;
    bool   is_static = access_flags.is_static();
    FieldAnnotationCollector parsed_annotations(_loader_data);
    // 读取变量属性
    u2 attributes_count = cfs->get_u2_fast();
    if (attributes_count > 0) {
      parse_field_attributes(attributes_count, is_static, signature_index,
                             &constantvalue_index, &is_synthetic,
                             &generic_signature_index, &parsed_annotations,
                             CHECK_NULL);
      if (parsed_annotations.field_annotations() != NULL) {
        if (_fields_annotations == NULL) {
          _fields_annotations = MetadataFactory::new_array<AnnotationArray*>(
                                             _loader_data, length, NULL,
                                             CHECK_NULL);
        }
        _fields_annotations->at_put(n, parsed_annotations.field_annotations());
        parsed_annotations.set_field_annotations(NULL);
      }
      if (parsed_annotations.field_type_annotations() != NULL) {
        if (_fields_type_annotations == NULL) {
          _fields_type_annotations = MetadataFactory::new_array<AnnotationArray*>(
                                                  _loader_data, length, NULL,
                                                  CHECK_NULL);
        }
        _fields_type_annotations->at_put(n, parsed_annotations.field_type_annotations());
        parsed_annotations.set_field_type_annotations(NULL);
      }

      if (is_synthetic) {
        access_flags.set_is_synthetic();
      }
      if (generic_signature_index != 0) {
        access_flags.set_field_has_generic_signature();
        fa[generic_signature_slot] = generic_signature_index;
        generic_signature_slot ++;
        num_generic_signature ++;
      }
    } // 变量属性读取完毕

    FieldInfo* field = FieldInfo::from_field_array(fa, n);
    field->initialize(access_flags.as_short(),
                      name_index,
                      signature_index,
                      constantvalue_index);
    BasicType type = _cp->basic_type_for_signature_at(signature_index);

    // Remember how many oops we encountered and compute allocation type
    FieldAllocationType atype = fac->update(is_static, type);
    field->set_allocation_type(atype);

    // After field is initialized with type, we can augment it with aux info
    if (parsed_annotations.has_any_annotations())
       parsed_annotations.apply_to(field);
  } // 结束了for语句

按格式读取出变量的各个值后存储到fa中,其中FieldInfo::from_field_array()方法的实现如下:

static FieldInfo* from_field_array(u2* fields, int index) {
    return ((FieldInfo*)(fields + index * field_slots));
}

取出第index个变量对应的6个u2类型的内存位置,然后强制转换为FieldInfo*,这样就通过FieldInfo类非常方便的存取6个属性了,FieldInfo类的定义如下:

// This class represents the field information contained in the fields
// array of an InstanceKlass.  Currently it's laid on top an array of
// Java shorts but in the future it could simply be used as a real
// array type.  FieldInfo generally shouldn't be used directly.
// Fields should be queried either through InstanceKlass or through
// the various FieldStreams.
class FieldInfo VALUE_OBJ_CLASS_SPEC {
	u2  _shorts[field_slots];
         ...
}

这个类没有虚函数,并且_shorts数组中的元素也是u2类型,也就是占用16位,在内存布局与之前介绍存储变量的布局完全一样,直接通过类中定义的方法操作_shorts数组即可。

调用field->initialize()方法存储读取出来的变量各个属性值,方法的实现如下:

void initialize(u2 access_flags,
                  u2 name_index,
                  u2 signature_index,
                  u2 initval_index  ){
    _shorts[access_flags_offset] = access_flags;
    _shorts[name_index_offset] = name_index;
    _shorts[signature_index_offset] = signature_index;
    _shorts[initval_index_offset] = initval_index;

    _shorts[low_packed_offset] = 0;
    _shorts[high_packed_offset] = 0;
}

调用_cp->basic_type_for_signature_at()从变量的签名中读取类型,方法的实现如下:

BasicType ConstantPool::basic_type_for_signature_at(int which) {
  return FieldType::basic_type(symbol_at(which));
}

Symbol* symbol_at(int which) {
    assert(tag_at(which).is_utf8(), "Corrupted constant pool");
    return *symbol_at_addr(which);
}

BasicType FieldType::basic_type(Symbol* signature) {
  return char2type(signature->byte_at(0));
}

BasicType FieldType::basic_type(Symbol* signature) {
  return char2type(signature->byte_at(0));
}

// Convert a char from a classfile signature to a BasicType
inline BasicType char2type(char c) {
  switch( c ) {
  case 'B': return T_BYTE;
  case 'C': return T_CHAR;
  case 'D': return T_DOUBLE;
  case 'F': return T_FLOAT;
  case 'I': return T_INT;
  case 'J': return T_LONG;
  case 'S': return T_SHORT;
  case 'Z': return T_BOOLEAN;
  case 'V': return T_VOID;
  case 'L': return T_OBJECT;
  case '[': return T_ARRAY;
  }
  return T_ILLEGAL;
}

调用ConstantPool类中定义的symbol_at()函数从常量池which索引处获取表示签名字符串的Symbol对象,然后根据签名第1个字符就可判断出来变量的类型。得到变量的类型后,调用fac->update()函数更新对应类型的变量数量,这在本篇文章之前已经介绍过,这里不再介绍。

下面就是将临时存储变量信息的fa中的信息copy到新的数组中,代码如下:

// Now copy the fields' data from the temporary resource array.
  // Sometimes injected fields already exist in the Java source so
  // the fields array could be too long.  In that case the
  // fields array is trimed. Also unused slots that were reserved
  // for generic signature indexes are discarded.
  Array<u2>* fields = MetadataFactory::new_array<u2>(
          _loader_data, index * FieldInfo::field_slots + num_generic_signature,
          CHECK_NULL);
  _fields = fields; // save in case of error
  {
    int i = 0;
    for (; i < index * FieldInfo::field_slots; i++) {
      fields->at_put(i, fa[i]);
    }
    for (int j = total_fields * FieldInfo::field_slots;j < generic_signature_slot; j++) {
      fields->at_put(i++, fa[j]);
    }
    assert(i == fields->length(), "");
  }

在创建fields数组时,可以看到元素类型为u2的数组的大小变为了index * FieldInfo::field_slots + num_generic_signature,其中的index表示实际共有的变量数量(因为可能还有注入的变量),另外根据实际情况分配了num_generic_signature的存储位置,下面就是从fa中获取信息copy到fields中了,逻辑比较简单,这里不再详细介绍。 

 

 

      

  

 

posted on 2020-07-31 14:55  鸠摩(马智)  阅读(649)  评论(0编辑  收藏  举报

导航