java 获取内存地址_对Java中HashCode方法的深入思考

最近在学习 Go 语言，Go 语言中有指针对象，一个指针变量指向了一个值的内存地址。学习过 C 语言的猿友应该都知道指针的概念。Go 语言语法与 C 相近，可以说是类 C 的编程语言，所以 Go 语言中有指针也是很正常的。我们可以通过将取地址符&放在一个变量前使用就会得到相应变量的内存地址。

因为本人主要开发语言是 Java，所以我就联想到 Java 中没有指针，那么 Java 中如何获取变量的内存地址呢？

如果能获取变量的内存地址那么就可以清晰的知道两个对象是否是同一个对象，如果两个对象的内存地址相等那么无疑是同一个对象反之则是不同的对象。

很多人说对象的 HashCode 方法返回的就是对象的内存地址，包括我在《Java核心编程·卷I》的第5章内容中也发现说是 HashCode 其值就是对象的内存地址。

但是 HashCode 方法真的是内存地址吗？回答这个问题前我们先回顾下一些基础知识。

==和equals

在 Java 中比较两个对象是否相等主要是通过 ==号，比较的是他们在内存中的存放地址。Object 类是 Java 中的超类，是所有类默认继承的，如果一个类没有重写 Object 的 equals方法，那么通过equals方法也可以判断两个对象是否相同，因为它内部就是通过==来实现的。

//Indicates whether some other object is "equal to" this one.public boolean equals(Object obj) { return (this == obj);}

Tips：这里额外解释个疑惑

我们学习 Java 的时候知道，Java 的继承是单继承，如果所有的类都继承了 Object 类，那么为何创建一个类的时候还可以extend其他的类？这里涉及到直接继承和间接继承的问题，当创建的类没有通过关键字 extend 显示继承指定的类时，类默认的直接继承了Object，A –> Object。当创建的类通过关键字 extend 显示继承指定的类时，则它间接的继承了Object类，A –> B –> Object。

这里的相同，是说比较的两个对象是否是同一个对象，即在内存中的地址是否相等。而我们有时候需要比较两个对象的内容是否相同，即类具有自己特有的“逻辑相等”概念，而不是想了解它们是否指向同一个对象。

例如比较如下两个字符串是否相同String a = "Hello" 和 String b = new String("Hello")，这里的相同有两种情形，是要比较 a 和 b 是否是同一个对象(内存地址是否相同)，还是比较它们的内容是否相等？这个具体需要怎么区分呢？

如果使用 == 那么就是比较它们在内存中是否是同一个对象，但是 String 对象的默认父类也是 Object，所以默认的equals方法比较的也是内存地址，所以我们要重写 equals方法，正如 String 源码中所写的那样。

这样当我们 a == b时是判断 a 和 b 是否是同一个对象，a.equals(b)则是比较 a 和 b 的内容是否相同，这应该很好理解。

JDK 中不止 String 类重写了equals 方法，还有数据类型 Integer，Long，Double，Float等基本也都重写了 equals 方法。所以我们在代码中用 Long 或者 Integer 做业务参数的时候，如果要比较它们是否相等，记得需要使用 equals 方法，而不要使用 ==。

因为使用 ==号会有意想不到的坑出现，像这种数据类型很多都会在内部封装一个常量池，例如 IntegerCache，LongCache 等等。当数据值在某个范围内时会直接从常量池中获取而不会去新建对象。

如果要使用==，可以将这些数据包装类型转换为基本类型之后，再通过==来比较，因为基本类型通过==比较的是数值，但是在转换的过程中需要注意 NPE(NullPointException)的发生。

Object中的HashCode

equals 方法能比较两个对象的内容是否相等，因此可以用来查找某个对象是否在集合容器中，通常大致就是逐一去取集合中的每个对象元素与需要查询的对象进行equals比较，当发现某个元素与要查找的对象进行equals方法比较的结果相等时，则停止继续查找并返回肯定的信息，否则，返回否定的信息。

但是通过这种比较的方式效率很低，时间复杂度比较高。那么我们是否可以通过某种编码方式，将每一个对象都具有某个特定的码值，根据码值将对象分组然后划分到不同的区域，这样当我们需要在集合中查询某个对象时，我们先根据该对象的码值就能确定该对象存储在哪一个区域，然后再到该区域中通过equals方式比较内容是否相等，就能知道该对象是否存在集合中。

通过这种方式我们减少了查询比较的次数，优化了查询的效率同时也就减少了查询的时间。

这种编码方式在 Java 中就是 hashCode 方法，Object 类中默认定义了该方法，它是一个 native 修饰的本地方法，返回值是一个 int 类型。

从注释的描述可以知道，hashCode 方法返回该对象的哈希码值。它可以为像 HashMap 这样的哈希表有益。Object 类中定义的 hashCode 方法为不同的对象返回不同的整形值。具有迷惑异议的地方就是This is typically implemented by converting the internal address of the object into an integer这一句，意为通常情况下实现的方式是将对象的内部地址转换为整形值。

如果你不深究就会认为它返回的就是对象的内存地址，我们可以继续看看它的实现，但是因为这里是 native 方法所以我们没办法直接在这里看到内部是如何实现的。native 方法本身非 java 实现，如果想要看源码，只有下载完整的 jdk 源码，Oracle 的 JDK 是看不到的，OpenJDK 或其他开源 JRE 是可以找到对应的 C/C++ 代码。我们在 OpenJDK 中找到 Object.c 文件，可以看到hashCode 方法指向 JVM_IHashCode 方法来处理。

而JVM_IHashCode方法实现在 jvm.cpp中的定义为：

JVM_ENTRY(jint, JVM_IHashCode(JNIEnv* env, jobject handle))  JVMWrapper("JVM_IHashCode");  // as implemented in the classic virtual machine; return 0 if object is NULL  return handle == NULL ? 0 : ObjectSynchronizer::FastHashCode (THREAD, JNIHandles::resolve_non_null(handle)) ; JVM_END

这里是一个三目表达式，真正计算获得 hashCode 值的是ObjectSynchronizer::FastHashCode，它具体的实现在synchronizer.cpp中，截取部分关键代码片段。

从以上代码片段中可以发现，实际计算hashCode的是 get_next_hash，还在这份文件中我们搜索get_next_hash，得到他的关键代码。

从get_next_hash的方法中我们可以看到，如果从0开始算的话，这里提供了6种计算 hash 值的方案，有自增序列，随机数，关联内存地址等多种方式，其中官方默认的是最后一种，即随机数生成。可以看出 hashCode 也许和内存地址有关系，但不是直接代表内存地址的，具体需要看虚拟机版本和设置。

equals和hashCode

equals 和 hashCode 都是 Object 类拥有的方法，包括 Object 类中的 toString 方法打印的内容也包含 hashCode 的无符号十六进制值。

public String toString() { return getClass().getName() + "@" + Integer.toHexString(hashCode());}

由于需要比较对象内容，所以我们通常会重写 equals 方法，但是重写 equals 方法的同时也需要重写 hashCode 方法，有没有想过为什么？

因为如果不这样做的话，就会违反 hashCode 的通用约定，从而导致该类无法结合所有基于散列的集合一起正常工作，这类集合包括 HashMap 和 HashSet。

这里的通用约定，从 Object 类的 hashCode 方法的注释可以了解，主要包括以下几个方面，

在应用程序的执行期间，只要对象的 equals 方法的比较操作所用到的信息没有被修改，那么对同一个对象的多次调用，hashCode 方法都必须始终返回同一个值。
如果两个对象根据 equals 方法比较是相等的，那么调用这两个对象中的 hashCode 方法都必须产生同样的整数结果。
如果两个对象根据 equals 方法比较是不相等的，那么调用者两个对象中的 hashCode 方法，则不一定要求 hashCode 方法必须产生不同的结果。但是给不相等的对象产生不同的整数散列值，是有可能提高散列表(hash table)的性能。

从理论上来说如果重写了 equals 方法而没有重写 hashCode 方法则违背了上述约定的第二条，相等的对象必须拥有相等的散列值。

但是规则是大家默契的约定，如果我们就喜欢不走寻常路，在重写了 equals 方法后没有覆盖 hashCode 方法，会产生什么后果吗？

我们自定义一个 Student 类，并且重写了 equals 方法，但是我们没有重写 hashCode 方法，那么当调用 Student 类的 hashCode 方法的时候，默认就是调用超类 Object 的 hashCode 方法，根据随机数返回的一个整型值。

我们创建两个对象并且设置属性值一样，测试下结果：

public static void main(String[] args) { Student student1 = new Student("小明

https://stackoverflow.com/questions/16418713/does-hashcode-number-represent-the-memory-address
Hashcode 是 JVM 用于散列以存储和检索对象的数字。例如，当我们在 hashmap 中添加一个对象时，JVM 会查找 hashcode 实现来决定将对象放在内存中的哪个位置。当我们再次检索对象时，使用哈希码来获取对象的位置。请注意，哈希码不是实际的内存地址，而是 JVM 从指定位置获取对象的链接，复杂度为 O(1)。

A hashcode is an integer value that represents the state of the object upon which it was called. That is why an Integer that is set to 1 will return a hashcode of "1" because an Integer's hashcode and its value are the same thing. A character's hashcode is equal to it's ASCII character code. If you write a custom type you are responsible for creating a good hashCode implementation that will best represent the state of the current instance.

Java中的hashCode 真的是地址吗？

问题

1.在java中hashCode获取是如何实现的？

2.hashCode的值是否是可预测的？

（注：hashCode（散列值）——将对象映射为一个整型值，不同的对象返回不同的数值）

正文

在Object.java#hashCode 的注解中找到怎么一句话：

(This is typically implemented by converting the internal
address of the object into an integer, but this implementation
technique is not required by the
Java&trade; programming language.)

意思是：hash值来源于这个对象的内部地址转换成的整型值。

我就很好奇了，这里的内部地址到底指的是什么地址？莫非类似下面这样

int main()
{
char var = 1;
printf("%p\n", &var);
}

console：

0028FF3F

在C当中上述代码输出的是var变量的内存地址。

为了解决这个谜团，还是得看看#Object.java#hashCode的具体实现方法了。native方法本身非java实现，如果想要看源码，只有下载完整的jdk呗（openJdk1.8）。找到Object.c文件，查看上面的方法映射表发现，hashCode被映射到了一个叫JVM_IHashCode上去了。

static JNINativeMethod methods[] = {
{"hashCode", "()I", (void *)&JVM_IHashCode},
{"wait", "(J)V", (void *)&JVM_MonitorWait},
{"notify", "()V", (void *)&JVM_MonitorNotify},
{"notifyAll", "()V", (void *)&JVM_MonitorNotifyAll},
{"clone", "()Ljava/lang/Object;", (void *)&JVM_Clone},
};

顺藤摸瓜去看看JVM_IHashCode到底干了什么？熟悉的味道，我猜在jvm.h里面有方法声明，那实现一定在jvm.cpp里面。

果然处处有惊喜，和猜想的没错，不过jvm.cpp对于JVM_IHashCode的实现调用的是ObjectSynchronizer::FastHashCode的方法。看来革命尚未成功啊！

JVM_ENTRY(jint, JVM_IHashCode(JNIEnv* env, jobject handle))
JVMWrapper("JVM_IHashCode");
// as implemented in the classic virtual machine; return 0 if object is NULL
return handle == NULL ? 0 : ObjectSynchronizer::FastHashCode (THREAD, JNIHandles::resolve_non_null(handle)) ;
JVM_END

找了一会儿，没找到，这就尴尬了。后面百度了一下，发现声明在synchronizer.hpp 实现在这里synchronizer.cpp。感谢前辈们走出的路啊！

// hashCode() generation :
//
// Possibilities:
// * MD5Digest of {obj,stwRandom}
// * CRC32 of {obj,stwRandom} or any linear-feedback shift register function.
// * A DES- or AES-style SBox[] mechanism
// * One of the Phi-based schemes, such as:
// 2654435761 = 2^32 * Phi (golden ratio)
// HashCodeValue = ((uintptr_t(obj) >> 3) * 2654435761) ^ GVars.stwRandom ;
// * A variation of Marsaglia's shift-xor RNG scheme.
// * (obj ^ stwRandom) is appealing, but can result
// in undesirable regularity in the hashCode values of adjacent objects
// (objects allocated back-to-back, in particular). This could potentially
// result in hashtable collisions and reduced hashtable efficiency.
// There are simple ways to "diffuse" the middle address bits over the
// generated hashCode values:
 
static inline intptr_t get_next_hash(Thread * Self, oop obj) {
intptr_t value = 0;
if (hashCode == 0) {
// This form uses global Park-Miller RNG.
// On MP system we'll have lots of RW access to a global, so the
// mechanism induces lots of coherency traffic.
value = os::random();
} else if (hashCode == 1) {
// This variation has the property of being stable (idempotent)
// between STW operations. This can be useful in some of the 1-0
// synchronization schemes.
intptr_t addrBits = cast_from_oop<intptr_t>(obj) >> 3;
value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom;
} else if (hashCode == 2) {
value = 1; // for sensitivity testing
} else if (hashCode == 3) {
value = ++GVars.hcSequence;
} else if (hashCode == 4) {
value = cast_from_oop<intptr_t>(obj);
} else {
// Marsaglia's xor-shift scheme with thread-specific state
// This is probably the best overall implementation -- we'll
// likely make this the default in future releases.
unsigned t = Self->_hashStateX;
t ^= (t << 11);
Self->_hashStateX = Self->_hashStateY;
Self->_hashStateY = Self->_hashStateZ;
Self->_hashStateZ = Self->_hashStateW;
unsigned v = Self->_hashStateW;
v = (v ^ (v >> 19)) ^ (t ^ (t >> 8));
Self->_hashStateW = v;
value = v;
}
 
value &= markOopDesc::hash_mask;
if (value == 0) value = 0xBAD;
assert(value != markOopDesc::no_hash, "invariant");
TEVENT(hashCode: GENERATE);
return value;
}
 
 
intptr_t ObjectSynchronizer::FastHashCode(Thread * Self, oop obj) {
if (UseBiasedLocking) {
// NOTE: many places throughout the JVM do not expect a safepoint
// to be taken here, in particular most operations on perm gen
// objects. However, we only ever bias Java instances and all of
// the call sites of identity_hash that might revoke biases have
// been checked to make sure they can handle a safepoint. The
// added check of the bias pattern is to avoid useless calls to
// thread-local storage.
if (obj->mark()->has_bias_pattern()) {
// Handle for oop obj in case of STW safepoint
Handle hobj(Self, obj);
// Relaxing assertion for bug 6320749.
assert(Universe::verify_in_progress() ||
!SafepointSynchronize::is_at_safepoint(),
"biases should not be seen by VM thread here");
BiasedLocking::revoke_and_rebias(hobj, false, JavaThread::current());
obj = hobj();
assert(!obj->mark()->has_bias_pattern(), "biases should be revoked by now");
}
}
 
// hashCode() is a heap mutator ...
// Relaxing assertion for bug 6320749.
assert(Universe::verify_in_progress() || DumpSharedSpaces ||
!SafepointSynchronize::is_at_safepoint(), "invariant");
assert(Universe::verify_in_progress() || DumpSharedSpaces ||
Self->is_Java_thread() , "invariant");
assert(Universe::verify_in_progress() || DumpSharedSpaces ||
((JavaThread *)Self)->thread_state() != _thread_blocked, "invariant");
 
ObjectMonitor* monitor = NULL;
markOop temp, test;
intptr_t hash;
markOop mark = ReadStableMark(obj);
 
// object should remain ineligible for biased locking
assert(!mark->has_bias_pattern(), "invariant");
 
if (mark->is_neutral()) {
hash = mark->hash(); // this is a normal header
if (hash) { // if it has hash, just return it
return hash;
}
hash = get_next_hash(Self, obj); // allocate a new hash code
temp = mark->copy_set_hash(hash); // merge the hash code into header
// use (machine word version) atomic operation to install the hash
test = obj->cas_set_mark(temp, mark);
if (test == mark) {
return hash;
}
// If atomic operation failed, we must inflate the header
// into heavy weight monitor. We could add more code here
// for fast path, but it does not worth the complexity.
} else if (mark->has_monitor()) {
monitor = mark->monitor();
temp = monitor->header();
assert(temp->is_neutral(), "invariant");
hash = temp->hash();
if (hash) {
return hash;
}
// Skip to the following code to reduce code size
} else if (Self->is_lock_owned((address)mark->locker())) {
temp = mark->displaced_mark_helper(); // this is a lightweight monitor owned
assert(temp->is_neutral(), "invariant");
hash = temp->hash(); // by current thread, check if the displaced
if (hash) { // header contains hash code
return hash;
}
// WARNING:
// The displaced header is strictly immutable.
// It can NOT be changed in ANY cases. So we have
// to inflate the header into heavyweight monitor
// even the current thread owns the lock. The reason
// is the BasicLock (stack slot) will be asynchronously
// read by other threads during the inflate() function.
// Any change to stack may not propagate to other threads
// correctly.
}
 
// Inflate the monitor to set hash code
monitor = ObjectSynchronizer::inflate(Self, obj, inflate_cause_hash_code);
// Load displaced header and check it has hash code
mark = monitor->header();
assert(mark->is_neutral(), "invariant");
hash = mark->hash();
if (hash == 0) {
hash = get_next_hash(Self, obj);
temp = mark->copy_set_hash(hash); // merge hash code into header
assert(temp->is_neutral(), "invariant");
test = Atomic::cmpxchg(temp, monitor->header_addr(), mark);
if (test != mark) {
// The only update to the header in the monitor (outside GC)
// is install the hash code. If someone add new usage of
// displaced header, please update this code
hash = test->hash();
assert(test->is_neutral(), "invariant");
assert(hash != 0, "Trivial unexpected object/monitor header usage.");
}
}
// We finally get the hash
return hash;
}

没想到代码这么长，确实比

int var;
return &var; 

长太多了。接下来看看这段代码到底干了些什么？

可以看到在get_next_hash函数中，有五种不同的hashCode生成策略。

第一种：是使用全局的os::random()随机数生成策略。os::random()的实现方式在os.cpp中，代码如下

void os::init_random(unsigned int initval) {
_rand_seed = initval;
}
 
 
static int random_helper(unsigned int rand_seed) {
/* standard, well-known linear congruential random generator with
* next_rand = (16807*seed) mod (2**31-1)
* see
* (1) "Random Number Generators: Good Ones Are Hard to Find",
* S.K. Park and K.W. Miller, Communications of the ACM 31:10 (Oct 1988),
* (2) "Two Fast Implementations of the 'Minimal Standard' Random
* Number Generator", David G. Carta, Comm. ACM 33, 1 (Jan 1990), pp. 87-88.
*/
const unsigned int a = 16807;
const unsigned int m = 2147483647;
const int q = m / a; assert(q == 127773, "weird math");
const int r = m % a; assert(r == 2836, "weird math");
 
// compute az=2^31p+q
unsigned int lo = a * (rand_seed & 0xFFFF);
unsigned int hi = a * (rand_seed >> 16);
lo += (hi & 0x7FFF) << 16;
 
// if q overflowed, ignore the overflow and increment q
if (lo > m) {
lo &= m;
++lo;
}
lo += hi >> 15;
 
// if (p+q) overflowed, ignore the overflow and increment (p+q)
if (lo > m) {
lo &= m;
++lo;
}
return lo;
}
 
int os::random() {
// Make updating the random seed thread safe.
while (true) {
unsigned int seed = _rand_seed;
unsigned int rand = random_helper(seed);
if (Atomic::cmpxchg(rand, &_rand_seed, seed) == seed) {
return static_cast<int>(rand);
}
}
}

根据代码注解的提示，随机数的生成策略是一种线性取余方式生成的。具体原理，看wiki吧（以后更新，或者大佬们不嫌弃分享一下呗）。

第二种：addrBits ^ (addrBits >> 5) ^ GVars.stwRandom。这里是第一次看到和地址相关的变量，addrBits通过调用cast_from_oop方法得到。cast_from_oop实现在oopsHierarchy.cpp。具体代码如下

template <class T> inline oop cast_to_oop(T value) {
return (oop)(CHECK_UNHANDLED_OOPS_ONLY((void *))(value));
}
//以下部分内容来源于 oopsHierachy.hpp
template <class T> inline T cast_from_oop(oop o) {
return (T)(CHECK_UNHANDLED_OOPS_ONLY((void*))o);
}

很遗憾的是我还是没有看到 cast_to_oop具体是怎么实现的，后面会更新的

第三种：敏感测试

value = 1;

第四种：自增序列

 value = ++GVars.hcSequence;

第五种：官方将会默认。利用位移生成随机数

// Marsaglia's xor-shift scheme with thread-specific state
// This is probably the best overall implementation -- we'll
// likely make this the default in future releases.
unsigned t = Self->_hashStateX;
t ^= (t << 11);
Self->_hashStateX = Self->_hashStateY;
Self->_hashStateY = Self->_hashStateZ;
Self->_hashStateZ = Self->_hashStateW;
unsigned v = Self->_hashStateW;
v = (v ^ (v >> 19)) ^ (t ^ (t >> 8));
Self->_hashStateW = v;
value = v;

最后来回答一开始的问题。

1.hashCode 是怎么来的？——原来有很多，自增序列，随机数，内存地址。这里又有个新问题产生了，为什么不用时间戳了？

2.可以预测值？——这很难说啊！

posted @ 2021-07-07 14:55 CharyGao 阅读(61) 评论(0) 收藏举报

刷新页面返回顶部

硅基文明

代码改变不了世界，但是改变世界需要代码。

java 获取内存地址_对Java中HashCode方法的深入思考

Java中的hashCode 真的是地址吗？

问题

正文