上面两篇文章都提到了二进制数据相关的内容,今天阅读了一下官方文档中关于二进制实现的内容,整理笔记于此.先看一张图:

 

 

思维导图解释

  • binary和bitstring内部实现机制相同
  • Erlang内部有四种二进制类型,两种容器,两种引用
  • 容器有refc binaries 和 heap binaries
  • refc binaries又可以分成两部分存放在进程堆(process heap)的ProcBin和进程堆以外的二进制对象
  • ProcBin包含一个二进制数据的元数据信息,包含了二进制数据的位置和引用计数
  • 游离在进程堆之外的二进制对象可以被任意数量的进程和任意数量的ProcBin引用,该对象包含了引用计数器,一旦计数器归零就可以移除掉
  • 所有的ProcBin对象都是链表的一部分,所以GC跟踪它们并在ProcBin消失的时候将应用计数减一
  • heap binaries 都是小块二进制数据,最大64字节,直接存放在进程堆(process heap),垃圾回收和发送消息都是通过拷贝实现,不需要垃圾回收器做特殊处理
  • 引用类型有两种:sub binaries , match contexts
  • sub binary是split_binary的时候产生的,sub binary是另外一个二进制数据的部分应用(refc 或者 heap binary),由于并没有数据拷贝所以binary的模式匹配成本相当低
  • match context类似sub binary,但是针对二进制匹配做了优化;例如它包含一个直接指向二进制数据的指针.从二进制匹配出来字段值之后移动指针位置即可.

官方文档链接:http://www.erlang.org/doc/efficiency_guide/binaryhandling.html

Internally, binaries and bitstrings are implemented in the same way.

There are four types of binary objects internally. Two of them are containers for binary data and two of them are merely references to a part of a binary.


The binary containers are called refc binaries (short for reference-counted binaries) and heap binaries.
Refc binaries consist of two parts: an object stored on the process heap, called a ProcBin, and the binary object itself stored outside all process heaps.The binary object can be referenced by any number of ProcBins from any number of processes; the object contains a reference counter to keep track of the number of references, so that it can be removed when the last reference disappears.
All ProcBin objects in a process are part of a linked list, so that the garbage collector can keep track of them and decrement the reference counters in the binary when a ProcBin disappears.

Heap binaries are small binaries, up to 64 bytes, that are stored directly on the process heap. They will be copied when the process is garbage collected and when they are sent as a message. They don't require any special handling by the garbage collector.

There are two types of reference objects that can reference part of a refc binary or heap binary. They are called sub binaries and match contexts.

A sub binary is created by split_binary/2 and when a binary is matched out in a binary pattern. A sub binary is a reference into a part of another binary (refc or heap binary, never into a another sub binary). Therefore, matching out a binary is relatively cheap because the actual binary data is never copied.
A match context is similar to a sub binary, but is optimized for binary matching; for instance, it contains a direct pointer to the binary data. For each field that is matched out of a binary, the position in the match context will be incremented.
Share |

坚强2002和你一起回头再说... gmail