NET中栈和堆的区别(比较)(1)

英文原著:

Even though with the .NET framework we don't have to actively worry about memory management and garbage collection (GC), we still have to keep memory management and GC in mind in order to optimize the performance of our applications. Also, having a basic understanding of how memory management works will help explain the behavior of the variables we work with in every program we write.  In this article I'll cover the basics of the Stack and Heap, types of variables and why some variables work as they do.

There are two places the .NET framework stores items in memory as your code executes.  If you haven't already met, let me introduce you to the Stack and the Heap.  Both the stack and heap help us run our code.  They reside in the operating memory on our machine and contain the pieces of information we need to make it all happen.

Stack vs. Heap: What's the difference?

The Stack is more or less responsible for keeping track of what's executing in our code (or what's been "called").  The Heap is more or less responsible for keeping track of our objects (our data, well... most of it - we'll get to that later.).

Think of the Stack as a series of boxes stacked one on top of the next.  We keep track of what's going on in our application by stacking another box on top every time we call a method (called a Frame).  We can only use what's in the top box on the stack.  When we're done with the top box (the method is done executing) we throw it away and proceed to use the stuff in the previous box on the top of the stack. The Heap is similar except that its purpose is to hold information (not keep track of execution most of the time) so anything in our Heap can be accessed at  any time.  With the Heap, there are no constraints as to what can be accessed like in the stack.  The Heap is like the heap of clean laundry on our bed that we have not taken the time to put away yet - we can grab what we need quickly.  The Stack is like the stack of shoe boxes in the closet where we have to take off the top one to get to the one underneath it.

 

The picture above, while not really a true representation of what's happening in memory, helps us distinguish a Stack from a Heap.
 
The Stack is self-maintaining, meaning that it basically takes care of its own memory management.  When the top box is no longer used, it's thrown out.  The Heap, on the other hand, has to worry about Garbage collection (GC) - which deals with how to keep the Heap clean (no one wants dirty laundry laying around... it stinks!).

What goes on the Stack and Heap?

We have four main types of things we'll be putting in the Stack and Heap as our code is executing: Value Types, Reference Types, Pointers, and Instructions. 

Value Types:

In C#, all the "things" declared with the following list of type declarations are Value types (because they are from System.ValueType):

  • bool
  • byte
  • char
  • decimal
  • double
  • enum
  • float
  • int
  • long
  • sbyte
  • short
  • struct
  • uint
  • ulong
  • ushort

Reference Types:

All the "things" declared with the types in this list are Reference types (and inherit from System.Object... except, of course, for object which is the System.Object object):

  • class
  • interface
  • delegate
  • object
  • string

Pointers:

The third type of "thing" to be put in our memory management scheme is a Reference to a Type. A Reference is often referred to as a Pointer.  We don't explicitly use Pointers, they are managed by the Common Language Runtime (CLR). A Pointer (or Reference) is different than a Reference Type in that when we say something is a Reference Type is means we access it through a Pointer.  A Pointer is a chunk of space in memory that points to another space in memory.  A Pointer takes up space just like any other thing that we're putting in the Stack and Heap and its value is either a memory address or null. 

 

Instructions:

You'll see how the  "Instructions" work later in this article...

How is it decided what goes where? (Huh?)

Ok, one last thing and we'll get to the fun stuff.

Here are our two golden rules:

  1. A Reference Type always goes on the Heap - easy enough, right? 

  2. Value Types and Pointers always go where they were declared.  This is a little more complex and needs a bit more understanding of how the Stack works to figure out where "things" are declared.

The Stack, as we mentioned earlier, is responsible for keeping track of where we are in the execution of our code (or what's been called).  When our code makes a call to execute a method, it puts the instructions we have coded (inside the method) on the stack, followed by the method's parameters.  Then, as we go through the code and run into variables within the method they are "stacked" on top of the stack.  This will be easiest to understand by example...

Take the following method.

           public int AddFive(int pValue)
          {
                int result;
                result = pValue + 5;
               
return result;
          }

Here's what happens at the very top of the stack.  Keep in mind that what we are looking at is "stacked" on top of many other items already living in the stack:

First the method itself (only bytes needed to execute the logic) is placed on the stack followed by its parameter (we'll talk more about passing parameters later).

 

Next, control (the thread executing the method) is passed to the instructions in the AddFive() part of the stack.

 

As the method executes, we need some memory for the "result" variable and it is allocated on the stack.

 

The method finishes execution and our result is returned.

 

And all memory allocated on the stack is cleaned up by moving a pointer to the available memory address where AddFive() used to live and we go down to the previous method on the stack (not seen here).

In this example, our "result" variable is placed on the stack.  As a matter of fact, every time a Value Type is declared within the body of a method, it will be placed on the stack.

Now, Value Types are also sometimes placed on the Heap.  Remember the rule, Value Types always go where they were declared?  Well, if a Value Type is declared outside of a method, but inside a Reference Type it will be placed within the Reference Type on the Heap.

Here's another example.

If we have the following MyInt class (which is a Reference Type because it is a class):

          public class MyInt
          {         
            
public int MyValue;
          }

and the following method is executing:

          public MyInt AddFive(int pValue)
          {
                MyInt result = new MyInt();
                result.MyValue = pValue + 5;
                return result;
          }

Just as before, the method itself (only bytes needed to execute the logic) is placed on the stack followed by its parameter.  Next, control (the thread executing the method) is passed to the instructions in the AddFive() part of the stack.

Now is when it gets interesting...

Because MyInt is a Reference Type, it is placed on the Heap and referenced by a Pointer on the Stack.

After AddFive() is finished executing (like in the first example), and we are cleaning up...

we're left with an orphaned MyInt in the heap (there is no longer anyone in the Stack standing around pointing to MyInt)!

This is where the Garbage Collection (GC) comes into play.  Once our program reaches a certain memory threshold and we need more Heap space, our GC will kick off.  The GC will stop all running threads (a FULL STOP), find all objects in the Heap that are not being accessed by the main program and delete them.  The GC will then reorganize all the objects left in the Heap to make space and adjust all the Pointers to these objects in both the Stack and the Heap.  As you can imagine, this can be quite expensive in terms of performance, so now you can see why it can be important to pay attention to what's in the Stack and Heap when trying to write high-performance code.

Ok... That great, but how does it really affect me?

Good question. 

When we are using Reference Types, we're dealing with Pointers to the type, not the thing itself.  When we're using Value Types, we're using the thing itself.  Clear as mud, right?

Again, this is best described by example.

If we execute the following method:

          public int ReturnValue()
          {
                int x = new int();
                x = 3;
                int y = new int();
                y = x;      
                y = 4;         
               
return x;
    
      }

We'll get the value 3.  Simple enough, right?

However, if we are using the MyInt class from before

     public class MyInt
          {

                public int MyValue;
          }

and we are executing the following method:

          public int ReturnValue2()
          {
                MyInt x = new MyInt();
                x.MyValue = 3;
                MyInt y = new MyInt();
                y = x;                 
                y.MyValue = 4;              
                
return x.MyValue;
          }

What do we get?...    4!

Why?...  How does x.MyValue get to be 4?... Take a look at what we're doing and see if it makes sense:

In the first example everything goes as planned:

          public int ReturnValue()
          {
                int x = 3;
                int y = x;    
                y = 4;
               
return x;
          }

In the next example, we don't get "3" because both variables "x" and "y" point to the same object in the Heap.

          public int ReturnValue2()
          {
                MyInt x;
                x.MyValue = 3;
                MyInt y;
                y = x;                
                y.MyValue = 4;
                
return x.MyValue;
          }

Hopefully this gives you a better understanding of a basic difference between Value Type and Reference Type variables in C# and a basic understanding of what a Pointer is and when it is used.  In the next part of this series, we'll get further into memory management and specifically talk about method parameters.

For now...

Happy coding.
中文翻译:

尽管在.NET framework下我们并不需要担心内存管理和垃圾回收(Garbage Collection),但是我们还是应该了解它们,以优化我们的应用程序。同时,还需要具备一些基础的内存管理工作机制的知识,这样能够有助于解释我们日常程序编写中的变量的行为。在本文中我将讲解栈和堆的基本知识,变量类型以及为什么一些变量能够按照它们自己的方式工作。

在.NET framework环境下,当我们的代码执行时,内存中有两个地方用来存储这些代码。假如你不曾了解,那就让我来给你介绍栈(Stack)和堆(Heap)。栈和堆都用来帮助我们运行代码的,它们驻留在机器内存中,且包含所有代码执行所需要的信息。


* 栈vs堆:有什么不同?

栈负责保存我们的代码执行(或调用)路径,而堆则负责保存对象(或者说数据,接下来将谈到很多关于堆的问题)的路径。

可以将栈想象成一堆从顶向下堆叠的盒子。当每调用一次方法时,我们将应用程序中所要发生的事情记录在栈顶的一个盒子中,而我们每次只能够使用栈顶的那个盒子。当我们栈顶的盒子被使用完之后,或者说方法执行完毕之后,我们将抛开这个盒子然后继续使用栈顶上的新盒子。堆的工作原理比较相似,但大多数时候堆用作保存信息而非保存执行路径,因此堆能够在任意时间被访问。与栈相比堆没有任何访问限制,堆就像床上的旧衣服,我们并没有花时间去整理,那是因为可以随时找到一件我们需要的衣服,而栈就像储物柜里堆叠的鞋盒,我们只能从最顶层的盒子开始取,直到发现那只合适的。



以上图片并不是内存中真实的表现形式,但能够帮助我们区分栈和堆。

栈是自行维护的,也就是说内存自动维护栈,当栈顶的盒子不再被使用,它将被抛出。相反的,堆需要考虑垃圾回收,垃圾回收用于保持堆的整洁性,没有人愿意看到周围都是赃衣服,那简直太臭了!


* 栈和堆里有些什么?

当我们的代码执行的时候,栈和堆中主要放置了四种类型的数据:值类型(Value Type),引用类型(Reference Type),指针(Pointer),指令(Instruction)。

1.值类型:

在C#中,所有被声明为以下类型的事物被称为值类型:

bool  
byte  
char  
decimal  
double  
enum  
float  
int  
long  
sbyte  
short  
struct  
uint  
ulong  
ushort

2.引用类型:

所有的被声明为以下类型的事物被称为引用类型:

class  
interface  
delegate  
object  
string

3.指针:

在内存管理方案中放置的第三种类型是类型引用,引用通常就是一个指针。我们不会显示的使用指针,它们由公共语言运行时(CLR)来管理。指针(或引用)是不同于引用类型的,是因为当我们说某个事物是一个引用类型时就意味着我们是通过指针来访问它的。指针是一块内存空间,而它指向另一个内存空间。就像栈和堆一样,指针也同样要占用内存空间,但它的值是一个内存地址或者为空。



4.指令:

在后面的文章中你会看到指令是如何工作的...


* 如何决定放哪儿?

这里有一条黄金规则:

1. 引用类型总是放在堆中。(够简单的吧?)

2. 值类型和指针总是放在它们被声明的地方。(这条稍微复杂点,需要知道栈是如何工作的,然后才能断定是在哪儿被声明的。)

就像我们先前提到的,栈是负责保存我们的代码执行(或调用)时的路径。当我们的代码开始调用一个方法时,将放置一段编码指令(在方法中)到栈上,紧接着放置方法的参数,然后代码执行到方法中的被“压栈”至栈顶的变量位置。通过以下例子很容易理解...

下面是一个方法(Method):
复制C#代码保存代码public int AddFive(int pValue)
{
     int result;
     result = pValue + 5;
     return result;
}
           
现在就来看看在栈顶发生了些什么,记住我们所观察的栈顶下实际已经压入了许多别的内容。

首先方法(只包含需要执行的逻辑字节,即执行该方法的指令,而非方法体内的数据)入栈,紧接着是方法的参数入栈。(我们将在后面讨论更多的参数传递)



接着,控制(即执行方法的线程)被传递到堆栈中AddFive()的指令上,



当方法执行时,我们需要在栈上为“result”变量分配一些内存,



The method finishes execution and our result is returned.
方法执行完成,然后方法的结果被返回。



通过将栈指针指向AddFive()方法曾使用的可用的内存地址,所有在栈上的该方法所使用内存都被清空,且程序将自动回到栈上最初的方法调用的位置(在本例中不会看到)。



在这个例子中,我们的"result"变量是被放置在栈上的,事实上,当值类型数据在方法体中被声明时,它们都是被放置在栈上的。

值类型数据有时也被放置在堆上。记住这条规则--值类型总是放在它们被声明的地方。好的,如果一个值类型数据在方法体外被声明,且存在于一个引用类型中,那么它将被堆中的引用类型所取代。


来看另一个例子:

假如我们有这样一个MyInt类(它是引用类型因为它是一个类类型):
复制C#代码保存代码public class MyInt
{
     public int MyValue;
}
然后执行下面的方法:
复制C#代码保存代码public MyInt AddFive(int pValue)
{
     MyInt result = new MyInt();
     result.MyValue = pValue + 5;
     return result;
}
就像前面提到的,方法及方法的参数被放置到栈上,接下来,控制被传递到堆栈中AddFive()的指令上。


接着会出现一些有趣的现象...

因为"MyInt"是一个引用类型,它将被放置在堆上,同时在栈上生成一个指向这个堆的指针引用。


在AddFive()方法被执行之后,我们将清空...


我们将剩下孤独的MyInt对象在堆中(栈中将不会存在任何指向MyInt对象的指针!)


这就是垃圾回收器(后简称GC)起作用的地方。当我们的程序达到了一个特定的内存阀值,我们需要更多的堆空间的时候,GC开始起作用。GC将停止所有正在运行的线程,找出在堆中存在的所有不再被主程序访问的对象,并删除它们。然后GC会重新组织堆中所有剩下的对象来节省空间,并调整栈和堆中所有与这些对象相关的指针。你肯定会想到这个过程非常耗费性能,所以这时你就会知道为什么我们需要如此重视栈和堆里有些什么,特别是在需要编写高性能的代码时。

Ok... 这太棒了, 当它是如何影响我的?

Good question.   

当我们使用引用类型时,我们实际是在处理该类型的指针,而非该类型本身。当我们使用值类型时,我们是在使用值类型本身。听起来很迷糊吧?

同样,例子是最好的描述。

假如我们执行以下的方法:
复制C#代码保存代码public int ReturnValue()
{
     int x = new int();
     x = 3;
     int y = new int();
     y = x;
     y = 4;
     return x;
}
我们将得到值3,很简单,对吧?

假如我们首先使用MyInt类
复制C#代码保存代码public class MyInt
{
     public int MyValue;
}
接着执行以下的方法:
复制C#代码保存代码public int ReturnValue2()
{
     MyInt x = new MyInt();
     x.MyValue = 3;
     MyInt y = new MyInt();
     y = x;
     y.MyValue = 4;
     return x.MyValue;
}
我们将得到什么?...     4!

为什么?...   x.MyValue怎么会变成4了呢?...   看看我们所做的然后就知道是怎么回事了:

在第一例子中,一切都像计划的那样进行着:
复制C#代码保存代码public int ReturnValue()
{
     int x = 3;
     int y = x;
     y = 4;
     return x;
}


在第二个例子中,我们没有得到"3"是因为变量"x"和"y"都同时指向了堆中相同的对象。
复制C#代码保存代码public int ReturnValue2()
{
     MyInt x;
     x.MyValue = 3;
     MyInt y;
     y = x;
     y.MyValue = 4;
     return x.MyValue;
}


希望以上内容能够使你对C#中的值类型和引用类型的基本区别有一个更好的认识,并且对指针及指针是何时被使用的有一定的基本了解。在系列的下一个部分,我们将深入内存管理并专门讨论方法参数。

posted on 2007-11-09 13:11  hzwang  阅读(241)  评论(0)    收藏  举报

导航