在梳理Erlang/OTP相关的内容时,我发现无论如何都无法避开proc_lib模块,说它是OTP的基础模块一点不为过.

    proc_lib模块的功能:This module is used to start processes adhering to the OTP Design Principles.即proc_lib用来启动符合OTP设计原则的进程.OTP设计原则是什么?请移步这里:http://www.cnblogs.com/me-sa/archive/2011/11/20/erlang0015.html OTP的behavior都是使用proc_lib实现创建新进程,所以我们说这个模块是OTP的基石一点都不为过.上文已经提到过,我们也可以直接使用这个模块来创建符合OTP原则的进程.

    proc_lib暴露出来的方法较少,我们先看一下有一个整体印象:

 

用proc_lib创建的进程有什么与众不同
1. 会多一些信息

    与直接使用spawn创建进程(后面我们称之为"普通erlang进程")相比,使用proc_lib初始化进程会多一些信息,比如注册名,进程的父进程信息,初始化调用的函数等等.下面是两种方式创建进程后查看到的进程运行时元数据:
=PROGRESS REPORT==== 21-Nov-2011::21:18:49 ===
         application: sasl
          started_at: 'demo@192.168.1.123'
Eshell V5.8.4  (abort with ^G)
(demo@192.168.1.123)1> Fun = fun() -> receive a-> 1/0  after infinity -> ok end end .
#Fun<erl_eval.20.21881191>
(demo@192.168.1.123)2> P = spawn(Fun).
<0.48.0>
(demo@192.168.1.123)3> erlang:process_info(P).
[{current_function,{erl_eval,receive_clauses,8}},
 {initial_call,{erlang,apply,2}},
 {status,waiting}, {message_queue_len,0},
 {messages,[]}, {links,[]},
 {dictionary,[]}, {trap_exit,false},
 {error_handler,error_handler}, {priority,normal},
 {group_leader,<0.29.0>}, {total_heap_size,233},
 {heap_size,233}, {stack_size,10}, {reductions,18},
 {garbage_collection,[{min_bin_vheap_size,46368},
                      {min_heap_size,233},
                      {fullsweep_after,65535},
                      {minor_gcs,0}]},
 {suspending,[]}]
(demo@192.168.1.123)4> P2 = proc_lib:spawn(Fun).
<0.51.0>
(demo@192.168.1.123)5> erlang:process_info(P2).
[{current_function,{erl_eval,receive_clauses,8}},
 {initial_call,{proc_lib,init_p,3}},
 {status,waiting},
 {message_queue_len,0}, {messages,[]},
 {links,[]}, {dictionary,[{'$ancestors',[<0.45.0>]},
              {'$initial_call',{erl_eval,'-expr/5-fun-1-',0}}]},
 {trap_exit,false}, {error_handler,error_handler},
 {priority,normal}, {group_leader,<0.29.0>},
 {total_heap_size,233}, {heap_size,233},
 {stack_size,14}, {reductions,25},
 {garbage_collection,[{min_bin_vheap_size,46368},
                      {min_heap_size,233},
                      {fullsweep_after,65535},
                      {minor_gcs,0}]},
 {suspending,[]}]
(demo@192.168.1.123)6>

   我们可以挑一个监控树中的进程看一下它的元数据,使用这个就可以erlang:process_info(whereis(rex)). 对比一下增加了哪些信息?紧接着的问题就是,这些信息是在什么时候怎样加入到进程的?我们挑选一段proc_lib的典型代码看:

spawn(M,F,A) when is_atom(M), is_atom(F), is_list(A) ->
    Parent = get_my_name(),
    Ancestors = get_ancestors(),
    erlang:spawn(?MODULE, init_p, [Parent,Ancestors,M,F,A]).

 %下面是相关的方法的实现

get_my_name() ->
    case proc_info(self(),registered_name) of
       {registered_name,Name} -> Name;
                  _                      -> self()
    end.

get_ancestors() ->
    case get('$ancestors') of
         A when is_list(A) -> A;
             _                 -> []
    end.

proc_info(Pid,Item) when node(Pid) =:= node() ->
    process_info(Pid,Item);
proc_info(Pid,Item) ->
    case lists:member(node(Pid),nodes()) of
 true ->
     check(rpc:call(node(Pid), erlang, process_info, [Pid, Item]));
 _ ->
     hidden
    end.

check({badrpc,nodedown}) -> undefined;
check({badrpc,Error})    -> Error;
check(Res)               -> Res.

2.进程退出时的不同处理

    普通Erlang进程只有退出原因是normal的时候才会被认为是正常退出,使用proc_lib启动的进程退出原因是shutdown或者{shutdown,Term}的时候也被认为是正常退出.因为应用程序(监控树)停止而导致的进程终止,进程退出的原因会标记为shutdown.使用proc_lib创建的进程退出的原因不是normal也不是shutdown的时候,就会创建一个进程崩溃报告,这个会写入默认的SASL的事件handler,错误报告会在只有在启动了SASL的时候才能看到.如何启动SASL?请移步这里查看:http://www.cnblogs.com/me-sa/archive/2011/11/20/erlang0016.html 崩溃报告包含了进程初始化写入的信息.
  来吧,咱们现在就动手搞崩一个进程看看:

=PROGRESS REPORT==== 21-Nov-2011::20:47:56 ===
         application: sasl                                                          %方便查看我们这里启动SASL并直接把结果输出在Shell中
          started_at: 'demo@192.168.1.123'
Eshell V5.8.4  (abort with ^G)
(demo@192.168.1.123)1> Fun = fun() -> receive a-> 1/0  after infinity -> ok end end . %接收到消息a之后会执行1/0,进程就会崩溃报错
#Fun<erl_eval.20.21881191>
(demo@192.168.1.123)2> P= spawn(Fun). %先创建一个普通的Erlang进程
<0.48.0>
(demo@192.168.1.123)3> P!a.   %发消息搞崩它
a
(demo@192.168.1.123)4>                                         %shell输出下面的错误信息
=ERROR REPORT==== 21-Nov-2011::20:48:50 ===
Error in process <0.48.0> on node 'demo@192.168.1.123' with exit value: {badarith,[{erlang,'/',[1,0]}]}

(demo@192.168.1.123)4> P2= proc_lib:spawn(Fun).   %使用proc_lib创建一个进程
<0.51.0>
(demo@192.168.1.123)5> P2!a.  %发消息搞崩它
a
(demo@192.168.1.123)6>
=CRASH REPORT==== 21-Nov-2011::20:49:09 ===   %这里就是包含更多信息的CRASH REPORT
  crasher:
    initial call: erl_eval:-expr/5-fun-1-/0
    pid: <0.51.0>
    registered_name: []
    exception error: bad argument in an arithmetic expression
      in operator  '/'/2
         called as 1 / 0
    ancestors: [<0.45.0>]
    messages: []
    links: []
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 233
    stack_size: 24
    reductions: 114
  neighbours:
(demo@192.168.1.123)6>

  

 http://learnyousomeerlang.com/errors-and-processes 上关于进程退出的例子:

Exception source: spawn_link(fun() -> ok end)
Untrapped Result: - nothing -
Trapped Result{'EXIT', <0.61.0>, normal}
The process exited normally, without a problem. Note that this looks a bit like the result ofcatch exit(normal), except a PID is added to the tuple to know what processed failed.
创建进程,进程创建之后马上就退出了,如果没有trap什么消息都不会出现,如果trap能够接收到一条进程正常退出的消息.

Exception source: spawn_link(fun() -> exit(reason) end)
Untrapped Result** exception exit: reason
Trapped Result{'EXIT', <0.55.0>, reason}
The process has terminated for a custom reason. In this case, if there is no trapped exit, the process crashes. Otherwise, you get the above message.
进程因为特定的原因退出,如果trap能够得到退出进程的PID信息.
Exception source: spawn_link(fun() -> exit(normal) end)
Untrapped Result: - nothing -
Trapped Result{'EXIT', <0.58.0>, normal}
This successfully emulates a process terminating normally. In some cases, you might want to kill a process as part of the normal flow of a program, without anything exceptional going on. This is the way to do it.
不会调用exit就是异常退出,exit可以是正常退出,这里就演示了这个情况
Exception source: spawn_link(fun() -> 1/0 end)
Untrapped ResultError in process <0.44.0> with exit value: {badarith, [{erlang, '/', [1,0]}]}
Trapped Result{'EXIT', <0.52.0>, {badarith, [{erlang, '/', [1,0]}]}}
The error ({badarith, Reason}) is never caught by a try ... catch block and bubbles up into an'EXIT'. At this point, it behaves exactly the same as exit(reason) did, but with a stack trace giving more details about what happened.
进程出现异常 Trap前后没有太大区别只是格式化了
Exception source: spawn_link(fun() -> erlang:error(reason) end)
Untrapped ResultError in process <0.47.0> with exit value: {reason, [{erlang, apply, 2}]}
Trapped Result{'EXIT', <0.74.0>, {reason, [{erlang, apply, 2}]}}
Pretty much the same as with 1/0. That's normal, erlang:error/1 is meant to allow you to do just that.
还记得erlang:error exit 和throw的区别吗?
Exception source: spawn_link(fun() -> throw(rocks) end)
Untrapped ResultError in process <0.51.0> with exit value: {{nocatch, rocks}, [{erlang, apply, 2}]}
Trapped Result{'EXIT', <0.79.0>, {{nocatch, rocks}, [{erlang, apply, 2}]}}
Because the throw is never caught by a try ... catch, it bubbles up into an error, which in turn bubbles up into an EXIT. Without trapping exit, the process fails. Otherwise it deals with it fine.
抛出去了但是没有catch的逻辑

And that's about it for usual exceptions. Things are normal: everything goes fine. Exceptional stuff happens: processes die, different signals are sent around.

Then there's exit/2. This one is the Erlang process equivalent of a gun. It allows a process to kill another one from a distance, safely. Here are some of the possible calls:

Exception source: exit(self(), normal)
Untrapped Result** exception exit: normal
Trapped Result{'EXIT', <0.31.0>, normal}
When not trapping exits, exit(self(), normal) acts the same as exit(normal). Otherwise, you receive a message with the same format you would have had by listening to links from foreign processes dying.
Exception source: exit(spawn_link(fun() -> timer:sleep(50000) end), normal)
Untrapped Result: - nothing -
Trapped Result: - nothing -
This basically is a call to exit(Pid, normal). This command doesn't do anything useful, because a process can not be remotely killed with the reason normal as an argument.
Exception source: exit(spawn_link(fun() -> timer:sleep(50000) end), reason)
Untrapped Result** exception exit: reason
Trapped Result{'EXIT', <0.52.0>, reason}
This is the foreign process terminating for reason itself. Looks the same as if the foreign process called exit(reason) on itself.

Exception source: exit(spawn_link(fun() -> timer:sleep(50000) end), kill)
Untrapped Result** exception exit: killed
Trapped Result{'EXIT', <0.58.0>, killed}
Surprisingly, the message gets changed from the dying process to the spawner. The spawner now receives killed instead of kill. That's because kill is a special exit signal. More details on this later.

Exception source: exit(self(), kill)
Untrapped Result** exception exit: killed
Trapped Result** exception exit: killed
Oops, look at that. It seems like this one is actually impossible to trap. Let's check something.

Exception source: spawn_link(fun() -> exit(kill) end)
Untrapped Result** exception exit: killed
Trapped Result{'EXIT', <0.67.0>, kill}
Now that's getting confusing. When another process kills itself with exit(kill) and we don't trap exits, our own process dies with the reason killed. However, when we trap exits, things don't happen that way.

    如果想干掉的进程自己处于一个死循环中,没有机会接受消息,那该如何处理呢?kill就是为这种场景设计的,kill会被设计为一种特殊的信号,不能被trap, 这样来保证想干掉的进程真的能被干掉.kill是干掉进程的杀手锏,万不得已还有最后一招.

    由于设计上kill不能被trap,所以其他进程接收到kill的reason时会被转换成killed.

    

2014-8-26 8:49:22 这里补充一个测试代码:

19> self().
<0.81.0>
20> [ spawn_link(fun()-> receive die->exit(order_to_die)  end end)  || P<-lists:seq(1,10)].
[<0.84.0>,<0.85.0>,<0.86.0>,<0.87.0>,<0.88.0>,<0.89.0>,
 <0.90.0>,<0.91.0>,<0.92.0>,<0.93.0>]
21> process_info(self()).
[{current_function,{erl_eval,do_apply,6}},
 {initial_call,{erlang,apply,2}},
 {status,running},
 {message_queue_len,0},
 {messages,[]},
 {links,[<0.86.0>,<0.90.0>,<0.92.0>,<0.93.0>,<0.91.0>,
         <0.88.0>,<0.89.0>,<0.87.0>,<0.84.0>,<0.85.0>,<0.30.0>]},
 {dictionary,[]},
 {trap_exit,false},
 {error_handler,error_handler},
 {priority,normal},
 {group_leader,<0.26.0>},
 {total_heap_size,3573},
 {heap_size,2586},
 {stack_size,24},
 {reductions,9040},
 {garbage_collection,[{min_bin_vheap_size,46422},
                      {min_heap_size,233},
                      {fullsweep_after,65535},
                      {minor_gcs,23}]},
 {suspending,[]}]
22> exit(pid(0,92,0),normal).
true
23> process_info(pid(0,92,0)).
[{current_function,{prim_eval,'receive',2}},
 {initial_call,{erlang,apply,2}},
 {status,waiting},
 {message_queue_len,0},
 {messages,[]},
 {links,[<0.81.0>]},
 {dictionary,[]},
 {trap_exit,false},
 {error_handler,error_handler},
 {priority,normal},
 {group_leader,<0.26.0>},
 {total_heap_size,233},
 {heap_size,233},
 {stack_size,9},
 {reductions,17},
 {garbage_collection,[{min_bin_vheap_size,46422},
                      {min_heap_size,233},
                      {fullsweep_after,65535},
                      {minor_gcs,0}]},
 {suspending,[]}]
24> pid(0,92,0)!die.
** exception exit: order_to_die
25> self().
<0.98.0>
Eshell V6.0  (abort with ^G)
1> self().
<0.33.0>
2> process_flag(trap_exit,true).
false
3> process_info(self()).
[{current_function,{erl_eval,do_apply,6}},
 {initial_call,{erlang,apply,2}},
 {status,running},
 {message_queue_len,0},
 {messages,[]},
 {links,[<0.27.0>]},
 {dictionary,[]},
 {trap_exit,true},
 {error_handler,error_handler},
 {priority,normal},
 {group_leader,<0.26.0>},
 {total_heap_size,987},
 {heap_size,987},
 {stack_size,24},
 {reductions,1557},
 {garbage_collection,[{min_bin_vheap_size,46422},
                      {min_heap_size,233},
                      {fullsweep_after,65535},
                      {minor_gcs,0}]},
 {suspending,[]}]
4> [ spawn_link(fun()-> receive die->exit(order_to_die)  end end)  || P<-lists:seq(1,10)].
[<0.38.0>,<0.39.0>,<0.40.0>,<0.41.0>,<0.42.0>,<0.43.0>,
 <0.44.0>,<0.45.0>,<0.46.0>,<0.47.0>]
5> exit(pid(0,41,0),over).
true
6> self().
<0.33.0>
7> flush().
Shell got {'EXIT',<0.41.0>,over}
ok
8> is_process_alive(pid(0,44,0)).
true
9> process_info(pid(0,44,0)).
[{current_function,{prim_eval,'receive',2}},
 {initial_call,{erlang,apply,2}},
 {status,waiting},
 {message_queue_len,0},
 {messages,[]},
 {links,[<0.33.0>]},
 {dictionary,[]},
 {trap_exit,false},
 {error_handler,error_handler},
 {priority,normal},
 {group_leader,<0.26.0>},
 {total_heap_size,233},
 {heap_size,233},
 {stack_size,9},
 {reductions,17},
 {garbage_collection,[{min_bin_vheap_size,46422},
                      {min_heap_size,233},
                      {fullsweep_after,65535},
                      {minor_gcs,0}]},
 {suspending,[]}]
10> 

  

  

 

 

使用proc_lib启动进程 start/start_link

gen_server的start方法文档是这样描述的:   

 The gen_server process calls  Module:init/1 to initialize. To ensure a synchronized start-up procedure,start_link/3,4 does not return until Module:init/1 has returned.

gen_server执行start/start_link的时候是一个同步的方式,其底层实现就是使用的proc_lib创建一个进程并等待其启动完成.我们先看一段proc_lib的典型代码:

start(M, F, A, Timeout) when is_atom(M), is_atom(F), is_list(A) ->
    Pid = ?MODULE:spawn(M, F, A),
    sync_wait(Pid, Timeout).

可以看到在创建了进程之后,执行了一个sync_wait的方法实现同步等待,很容易猜到这个方法的实现:

sync_wait(Pid, Timeout) ->
    receive
     {ack, Pid, Return} ->
         Return;
     {'EXIT', Pid, Reason} ->  %如果调用start_link方式创建进程而且创建的进程在调用init_ack之前就死掉了,如果调用进程做了退出捕获(trap_exit)
         {error, Reason}         %会返回{error,Reason}
     after Timeout ->
         unlink(Pid),
         exit(Pid, kill),
         flush(Pid),
         {error, timeout}  %如果指定了Time参数,这个方法就会等待Time毫秒等待新进程调用init_ack,超时了还没有调用就会返回{error,timeout}并将新进程干掉.
    end.

可以想到,进程启动完成后肯定会有一个发送响应消息动作结束当前等待,这里也有现成的方法可以用: init_ack

init_ack(Parent, Return) ->
    Parent ! {ack, self(), Return},
    ok.

-spec init_ack(term()) -> 'ok'.
init_ack(Return) ->
    [Parent|_] = get('$ancestors'),
    init_ack(Parent, Return).

2012-3-31 12:26:35 更新

看一个hotwheel的例子tolbrino-hotwheels-8dca95a\src\janus_acceptor.erl:

acceptor_init(Parent, Port, Module) ->
State = #state{
parent = Parent,
port = Port,
module = Module
},
error_logger:info_msg("Listening on port ~p~n", [Port]),
case (catch do_init(State)) of
{ok, ListenSocket} ->
proc_lib:init_ack(State#state.parent, {ok, self()}),
acceptor_loop(State#state{listener = ListenSocket});
Error ->
proc_lib:init_ack(Parent, Error),
error
end.

查看进程init_call与进程崩溃报告格式化

proc_lib提供了两个方法来查看进程的init函数 

initial_call(Process) -> {Module,Function,Args} | false

translate_initial_call(Process) -> {Module, Function, Arity}
我们执行proc_lib:initial_call(whereis(rex)).查看一下rpc模块的初始化方法,结果是:{rpc,init,['Argument__1']}

这里出于节省内存的考虑并没有保存实际的参数值而是使用原子'Argument__1'代替.如果初始化参数中包含fun,查看一下获得的结果仅仅是告诉你这是一个几个参数的fun并没有保存fun,之所以没有保存是因为一方面影响升级另一方面浪费内存.看下面的代码:

 (demo@192.168.1.123)57> Fun =fun() -> receive X -> X after infinity -> ok end end.
#Fun<erl_eval.20.67289768>
(demo@192.168.1.123)58> P =spawn(Fun).
<0.11746.25>
(demo@192.168.1.123)59> proc_lib:initial_call(P).
false
(demo@192.168.1.123)60> P2 =proc_lib:spawn(Fun).
<0.11749.25>
(demo@192.168.1.123)61> proc_lib:initial_call(P2).
{erl_eval,'-expr/5-fun-1-',[]}

 (demo@192.168.1.123)63> proc_lib:translate_initial_call(P).

{proc_lib,init_p,5}
(demo@192.168.1.123)64> proc_lib:translate_initial_call(P2).
{erl_eval,'-expr/5-fun-1-',0}

proc_lib提供了format函数来格式化进程崩溃报告,大家也可以操练一下.

进程Hibernate

我会在gen_fsm里面提到进程hibernate,本文暂且略过.

 

明天还要熬夜,今天早点休息,晚安,各位!

 

P.S @淘宝褚霸 昨天微博上对我说“c语言和系统功力才是最主要的,这个搞明白了,erlang顺手搞定”,记录于此,铭记在心。