内联函数建议编译器将制定的函数体插入并取代每一处调用该函数的地方,从而节省了每一次函数调用带来的时间开支,选择内联函数时,必须要在程序占用空间和程序执行效率之间进行权衡,因为过多的对较为复杂的函数进行内联扩展将带来很大的存储资源开支.另外注意对于递归函数的内联扩展可能带来部分编译器的无穷编译.内联扩展是一种特别的用于消除调用函数时所造成的固有的时间消耗方法。一般用于能够快速执行的函数,因为在这种情况下函数调用的时间消耗显得更为突出。
                                                                                                                                                                            --维基百科内联函数摘要

从维基百科的描述中可以看到内联函数解决的问题是:函数的调用时间比函数执行时间相当的时候,通过空间换时间,获得执行效率.实现角度Inlining是通过代码复制的方式节省进栈出栈的开销.Erlang的编译器可以将Erlang模块中的函数进行内联编译,内联(inlining)的含义是把一个方法的调用替换成函数体并把参数替换成实际值.
Erlang内联不是默认值;必须明确指定compile选项( 形式: {inline,[{Name,Arity},...]} ) 或者在源代码使用-compile.

%%Example of explicit inlining:
-compile({inline,[pi/0]}).
pi() -> 3.1416.

%% Example of implicit inlining:
-compile(inline).

%% Aggressive inlining - will increase code size.
-compile(inline).
-compile({inline_size,100}).
如果一个函数被编译成inline,原始的函数还是会被保留,我们可以直接在erlang shell中调用这个方法.
-module(test).
-compile(export_all).
-compile({inline,[server_id/0]}).

server_id() ->
2396.

内联编译不一定提高运行时的效率.例如内联可能增加Beam的栈消耗,对于递归函数调用这显然是有损性能的.{inline_size,Size}就是用来控制方法在多大程度上可以inline.Size默认值是24,这样inline代码与没有做inline的代码size相当,只有相当小的方法会被做inline.

那这个Size到底是指什么的大小呢?是代码函数?是代码个数?还是别的什么?我在erlangqa.org提了这个问题,得到了litaocheng的解答:

http://www.erlangqa.com/?qa=100/inline_size-size-%E4%B8%AD%E7%9A%84size%E6%98%AF%E6%8C%87%E4%BB%80%E4%B9%88

请参考otp_src/compiler/src/cerl_inline.erl的weight/1函数。
相应的Erlang表达式都有不同的权重。inline_size指的是函数汇编后的权重值。
可以通过 erlc +\'S\' your.erl来得到汇编文件:your.S。

同时参考cerl_inline.erl文件中:当inline_size为30时,90%的情况下可以得到最大加速。inline_size为100-150时,98%的情况下可以最大优化。如果指定更大值,会使代码尺寸变大,性能反而受到影响。

按图索骥找到cerl_inline.erl weight/1的代码:
weight Code
 1 default_effort() -> 150.
2 default_size() -> 24.
3 default_unroll() -> 1.
4
5 %% Base costs/weights for different kinds of expressions. If these are
6 %% modified, the size limits above may have to be adjusted.
7
8 weight(var) -> 0; % We count no cost for variable accesses.
9 weight(values) -> 0; % Value aggregates have no cost in themselves.
10 weight(literal) -> 1; % We assume efficient handling of constants.
11 weight(data) -> 1; % Base cost; add 1 per element.
12 weight(element) -> 1; % Cost of storing/fetching an element.
13 weight(argument) -> 1; % Cost of passing a function argument.
14 weight('fun') -> 6; % Base cost + average number of free vars.
15 weight('let') -> 0; % Count no cost for let-bindings.
16 weight(letrec) -> 0; % Like a let-binding.
17 weight('case') -> 0; % Case switches have no base cost.
18 weight(clause) -> 1; % Count one jump at the end of each clause body.
19 weight('receive') -> 9; % Initialization/cleanup cost.
20 weight('try') -> 1; % Assume efficient implementation.
21 weight('catch') -> 1; % See `try'.
22 weight(apply) -> 3; % Average base cost: call/return.
23 weight(call) -> 3; % Assume remote-calls as efficient as `apply'.
24 weight(primop) -> 2; % Assume more efficient than `apply'.
25 weight(binary) -> 4; % Initialisation base cost.
26 weight(bitstr) -> 3; % Coding/decoding a value; like a primop.
27 weight(module) -> 1. % Like a letrec with a constant body
也就是说,不同的表达式有不同的权重值,Size既不是代码函数也不是函数个数,而是依赖于该权重值.这里我不再继续跟进了,cerl_inline.erl的注释提供了详细的信息:
Normal execution times for inlining are between 0.1 and 0.3 seconds (on the author's current equipment). The default effort limit of 150 is high enough that most normal programs never hit the limit even once, and for difficult programs, it generally keeps the execution times below 2-5 seconds. Using an effort counter of 1000 will thus have no further effect on most programs, but some programs may take as much as 10 seconds or more. Effort counts larger than 2500 have never been observed even on very ill-conditioned programs.

Size limits between 6 and 18 tend to actually shrink the code, because of the simplifications made possible by inlining. A limit of 16 seems to be optimal for this purpose, often shrinking the executable code by up to 10%. Size limits between 18 and 30 generally give the same code size as if no inlining was done (i.e., code duplication balances out the simplifications at these levels). A size limit between 1 and 5 tends to inline small functions and propagate constants, but does not cause much simplifications do be done, so the net effect will be a slight increase in code size. For size limits above 30, the executable code size tends to increase with about 10% per 100 units, with some variations depending on the sizes of functions in the source code.

Typically, about 90% of the maximum speedup achievable is already reached using a size limit of 30, and 98% is reached at limits around 100-150; there is rarely any point in letting the code size increase by more than 10-15%. If too large functions are inlined, cache effects will slow the program down.
感兴趣的可以找到原始论文看下,论文地址: "Fast and Effective Procedure Inlining", International Static Analysis Symposium 1997  http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.2438 
算法不跟进,但是实验要做的,就按照litaocheng建议的方法,我做了一个这样的demo,写一个简单的方法,这个方法在另一个方法里面被多次调用(代码如下).我是在windows环境中使用,在shell中使用命令c(test,['S']).得到assembler code文件.
-module(test).
-export([get_name/0, show/1]).

get_name() ->
"This is Test Module".

show(A) ->
A=get_name(),
A=get_name(),
A=get_name(),
A=get_name(),
A=get_name(),
A=get_name(),
io:format("This is ~p ~n",[A]).
生成的assembler code文件如下:
{module, test}.  %% version = 0

{exports, [{get_name,0},{module_info,0},{module_info,1},{show,1}]}.

{attributes, []}.

{labels, 15}.


{function, get_name, 0, 2}.
{label,1}.
{line,[{location,"test.erl",22}]}.
{func_info,{atom,test},{atom,get_name},0}.
{label,2}.
{move,{literal,"This is Test Module"},{x,0}}.
return.


{function, show, 1, 4}.
{label,3}.
{line,[{location,"test.erl",25}]}.
{func_info,{atom,test},{atom,show},1}.
{label,4}.
{allocate,1,1}.
{move,{x,0},{y,0}}.
{line,[{location,"test.erl",26}]}.
{call,0,{f,2}}.
{test,is_eq_exact,{f,5},[{x,0},{y,0}]}.
{line,[{location,"test.erl",27}]}.
{call,0,{f,2}}.
{test,is_eq_exact,{f,6},[{x,0},{y,0}]}.
{line,[{location,"test.erl",28}]}.
{call,0,{f,2}}.
{test,is_eq_exact,{f,7},[{x,0},{y,0}]}.
{line,[{location,"test.erl",29}]}.
{call,0,{f,2}}.
{test,is_eq_exact,{f,8},[{x,0},{y,0}]}.
{line,[{location,"test.erl",30}]}.
{call,0,{f,2}}.
{test,is_eq_exact,{f,9},[{x,0},{y,0}]}.
{line,[{location,"test.erl",31}]}.
{call,0,{f,2}}.
{test,is_eq_exact,{f,10},[{x,0},{y,0}]}.
{test_heap,2,0}.
{put_list,{y,0},nil,{x,1}}.
{move,{literal,"This is ~p ~n"},{x,0}}.
{line,[{location,"test.erl",32}]}.
{call_ext_last,2,{extfunc,io,format,2},1}.
{label,5}.
{line,[{location,"test.erl",26}]}.
{badmatch,{x,0}}.
{label,6}.
{line,[{location,"test.erl",27}]}.
{badmatch,{x,0}}.
{label,7}.
{line,[{location,"test.erl",28}]}.
{badmatch,{x,0}}.
{label,8}.
{line,[{location,"test.erl",29}]}.
{badmatch,{x,0}}.
{label,9}.
{line,[{location,"test.erl",30}]}.
{badmatch,{x,0}}.
{label,10}.
{line,[{location,"test.erl",31}]}.
{badmatch,{x,0}}.


{function, module_info, 0, 12}.
{label,11}.
{line,[]}.
{func_info,{atom,test},{atom,module_info},0}.
{label,12}.
{move,{atom,test},{x,0}}.
{line,[]}.
{call_ext_only,1,{extfunc,erlang,get_module_info,1}}.


{function, module_info, 1, 14}.
{label,13}.
{line,[]}.
{func_info,{atom,test},{atom,module_info},1}.
{label,14}.
{move,{x,0},{x,1}}.
{move,{atom,test},{x,0}}.
{line,[]}.
{call_ext_only,2,{extfunc,erlang,get_module_info,2}}.
添加了inline选项之后的结果,这个demo比较变态生成的代码效果也比较明显:
{module, test}.  %% version = 0

{exports, [{get_name,0},{module_info,0},{module_info,1},{show,1}]}.

{attributes, []}.

{labels, 10}.


{function, get_name, 0, 2}.
{label,1}.
{line,[{location,"test.erl",23}]}.
{func_info,{atom,test},{atom,get_name},0}.
{label,2}.
{move,{literal,"This is Test Module"},{x,0}}.
return.


{function, show, 1, 4}.
{label,3}.
{line,[{location,"test.erl",26}]}.
{func_info,{atom,test},{atom,show},1}.
{label,4}.
{test,is_eq_exact,{f,5},[{literal,"This is Test Module"},{x,0}]}.
{test_heap,2,1}.
{put_list,{x,0},nil,{x,1}}.
{move,{literal,"This is ~p ~n"},{x,0}}.
{line,[{location,"test.erl",33}]}.
{call_ext_only,2,{extfunc,io,format,2}}.
{label,5}.
{line,[{location,"test.erl",27}]}.
{badmatch,{literal,"This is Test Module"}}.


{function, module_info, 0, 7}.
{label,6}.
{line,[]}.
{func_info,{atom,test},{atom,module_info},0}.
{label,7}.
{move,{atom,test},{x,0}}.
{line,[]}.
{call_ext_only,1,{extfunc,erlang,get_module_info,1}}.


{function, module_info, 1, 9}.
{label,8}.
{line,[]}.
{func_info,{atom,test},{atom,module_info},1}.
{label,9}.
{move,{x,0},{x,1}}.
{move,{atom,test},{x,0}}.
{line,[]}.
{call_ext_only,2,{extfunc,erlang,get_module_info,2}}.
 
Erlang Assembly Code处于 not documented的状态,下面有两篇相关的文章: