基于Linux2.6.32 系统下分析进程模型

1、简介

本文主要是基于Linux Kernel 2.6.32 的源码,进行深入分析进程模型，具体包含如下内容：

操作系统是怎么组织进程的
进程状态如何转换（给出进程状态转换图）
进程是如何调度的
谈谈自己对该操作系统进程模型的看法

（Linux Kernel 2.6.32源代码的连接地址：https://elixir.bootlin.com/linux/v2.6.32/source/fs）

2、进程

2.1什么是进程

　　进程（Process）是计算机中的程序关于某数据集合上的一次运行活动，是系统进行资源分配和调度的基本单位，是操作系统结构的基础。在早期面向进程设计的计算机结构中，进程是程序的基本执行实体；在当代面向线程设计的计算机结构中，进程是线程的容器。程序是指令、数据及其组织形式的描述，进程是程序的实体。

　　简单的说，计算机上所有可运行的软件，通常也包括操作系统，被组织成若干顺序进程，简称进程。

2.2怎么查看进程

　　一个进程就是一个正在执行程序的实例，在Windowns系统上，通过打开任务管理器，我们就可以看到目前正在运行的程序：

　　在Linux操作系统中，我们可以用 ps 指令（ps查看正处于Running的进程， ps aux 查看所有的进程），来查看当前正在运行的进程的一些基本信息。

3、操作系统是怎么组织进程的

3.1进程的组成部分

　　进程一般是由三个部分组成的：进程控制块、程序段、数据段。

3.1.1进程控制块：

　　Linux是通过 task_struct （PCB）结构体来描述一个进程的所有信息，结构体被定义在 include/linux/sched.h 中。进程创建时，操作系统就新建一个PCB结构，它之后就常驻内存，任一时刻可以存取, 在进程结束时删除。PCB是进程实体的一部分，操作系统通过PCB表来管理和控制进程，是进程存在的唯一标志。

PCB通常包含的内容：

进程描述信息	进程控制和管理信息	资源分配清单	处理相关信息
进程标识符（PID）	进程当前状态	代码段指针	通用寄存器值
用户标志符（UID）	进程优先级	数据段指针	地址寄存器值
	代码运行入口地址	堆栈段指针	控制寄存器值
	程序的外存地址	文件描述符	标志寄存器值
	进入内存时间	键盘	状态字
	处理机占用时间	鼠标
	信号量使用

部分代码：

  1  struct task_struct {  
  2     //进程状态
  3     volatile long state;    
  4    //内存指针
  5     void *stack;    
  6     atomic_t usage; 
  7     //有几个进程正在使用该结构  
  8     unsigned int flags; 
  9     //反应进程状态的信息，但不是运行状态  
 10     unsigned int ptrace;  
 11 
 12 　　 #ifdef CONFIG_SMP  
 13     struct task_struct *wake_entry;  
 14     int on_cpu;   //在哪个CPU上运行  
 15     #endif  
 16     int on_rq;  //on _ rq表示实体当前是否计划在运行队列中。 
 17     int prio, static_prio, normal_prio;  //静态优先级，动态优先级  
 18 /* 
 19 任务结构使用三个元素来表示进程的优先级: prio
 20 而normal _ prio表示进程的动态优先级，static _ prio表示进程的静态优先级。
 21 静态优先级是启动进程时分配给进程的优先级，它可以被修改。
 22 使用nice和sched _ setscheduler系统调用，但要在进程的运行时间。
 23 normal _ priority表示基于静态优先级和进程的调度策略。因此，相同的静态优先级将导致不同的结果
 24 */  
 25     unsigned int rt_priority;  //实时任务的优先级  
 26     const struct sched_class *sched_class;  //与调度相关的函数  
 27     struct sched_entity se; //调度实体  
 28     struct sched_rt_entity rt; //实时任务调度实体  
 29 
 30     #ifdef CONFIG_PREEMPT_NOTIFIERS  //配置抢占树，抢占的结构体的读写机制，即RCU机制。
 31     struct hlist_head preempt_notifiers; //与抢占有关的  
 32     #endif  
 33 
 34    
 35     unsigned char fpu_counter;  //包含连续上下文开关的数量
 36     #ifdef CONFIG_BLK_DEV_IO_TRACE  
 37     unsigned int btrace_seq;  
 38     #endif  
 39 
 40     unsigned int policy;  //调度策略  
 41     cpumask_t cpus_allowed;//多核体系结构中管理CPU的位图：Cpumasks provide a bitmap suitable   
 42                                //for representing the set of CPU's in a system, one bit position per CPU number.   
 43                                // In general, only nr_cpu_ids (<= NR_CPUS) bits are valid.  
 44 
 45     #ifdef CONFIG_PREEMPT_RCU  
 46     int rcu_read_lock_nesting; //RCU是一种新型的锁机制可以参考博文：http://blog.csdn.net/sunnybeike/article/details/6866473。  
 47     char rcu_read_unlock_special;  
 48 #if defined(CONFIG_RCU_BOOST) && defined(CONFIG_TREE_PREEMPT_RCU)  
 49     int rcu_boosted;  
 50 #endif  
 51     struct list_head rcu_node_entry;  
 52 #endif 
 53 #ifdef CONFIG_TREE_PREEMPT_RCU  
 54     struct rcu_node *rcu_blocked_node;  
 55 #endif 
 56 #ifdef CONFIG_RCU_BOOST  
 57     struct rt_mutex *rcu_boost_mutex;  
 58 #endif 
 59 #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)  
 60     struct sched_info sched_info;   //调度相关的信息，如在CPU上运行的时间/在队列中等待的时间等。  
 61 #endif  
 62 
 63     struct list_head tasks;   //任务队列  
 64 #ifdef CONFIG_SMP  
 65     struct plist_node pushable_tasks;  
 66 #endif  
 67 
 68     struct mm_struct *mm, *active_mm;   //mm是进程的内存管理信息  
 69 /*关于mm和active_mm 
 70 lazy TLB应该是指在切换进程过程中如果下一个执行进程不会访问用户空间，就没有必要flush TLB； 
 71 kernel thread运行在内核空间，它的mm_struct指针mm是0，它不会访问用户空间。 if (unlikely(!mm))是判断切换到的新进程是否是kernel thread， 
 72 如果是，那么由于内核要求所有进程都需要一个mm_struct结构，所以需要把被切换出去的进程（oldmm）的mm_struct借过来存储在 
 73 active_mm（ next->active_mm = oldmm;），这样就产生了一个anomymous user， atomic_inc(&oldmm->mm_count)就用于增加被切换进程的mm_count， 
 74 然后就利用 enter_lazy_tlb标志进入lazeTLB模式（MP），对于UP来说就这个函数不需要任何动作； 
 75 if (unlikely(!prev->mm))这句话是判断被切换出去的进程是不是kernel thread，如果是的话就要释放它前面借来的mm_struct。 
 76 而且如果切换到的进程与被切换的kernel thread的page table相同，那么就要flush与这些page table 相关的entry了。 
 77 注意这里的连个if都是针对mm_struct结构的mm指针进行判断，而设置要切换到的mm_struct用的是active_mm; 
 78 对于MP来说，假如某个CPU＃1发出需要flushTLB的要求，对于其它的CPU来说如果该CPU执行kernel thread，那么由CPU设置其进入lazyTLB模式， 
 79 不需要flush TLB,当从lazyTLB模式退出的时候，如果切换到的下个进程需要不同的PageTable，那此时再flush TLB；如果该CPU运行的是普通的进程和＃1相同， 
 80 它就要立即flush TLB了 
 81 
 82 大多数情况下mm和active_mm中的内容是一样的；但是在这种情况下是不一致的，就是创建的进程是内核线程的时候，active_mm = oldmm(之前进程的mm)， mm = NULL, 
 83 */  
 84 #ifdef CONFIG_COMPAT_BRK  
 85     unsigned brk_randomized:1;  
 86 #endif  
 87 #if defined(SPLIT_RSS_COUNTING)  
 88     struct task_rss_stat    rss_stat;  //RSS is the total memory actually held in RAM for a process.  
 89 #endif  
 90     int exit_state;  //进程退出时的状态  
 91     int exit_code, exit_signal; //进程退出时发出的信号  
 92     int pdeath_signal;  
 93     unsigned int group_stop;   
 94     unsigned int personality;  
 95     unsigned did_exec:1;    //表示当前进程是在执行原来的代码还是在执行由execve调度的新的代码。  
 96     unsigned in_execve:1;  
 97     unsigned in_iowait:1;   
 98     unsigned sched_reset_on_fork:1;  
 99     unsigned sched_contributes_to_load:1;  
100 
101     pid_t pid;  //进程ID  
102     pid_t tgid; //线程组ID  
103 
104 #ifdef CONFIG_CC_STACKPROTECTOR  //配置堆栈保护措施
105     unsigned long stack_canary;  //canary值 保护编译器 防止堆栈溢出 导致的返回地址被填充
106 #endif  
107     struct task_struct *real_parent;   
108     struct task_struct *parent; 
109     struct list_head children;  
110     struct list_head sibling;    
111     struct task_struct *group_leader;  //线程组的头结点 
112 
113     struct list_head ptraced;   //跟踪器的头结点，跟踪器 跟踪 进程的逻辑流，即PC指令流  
114     struct list_head ptrace_entry;  
115  
116     struct pid_link pids[PIDTYPE_MAX];  
117     struct list_head thread_group;  //用来保存线程组的PID
118 
119     struct completion *vfork_done;    
120 
121         int __user *set_child_tid;  //指向用户创造创立的线程的TID号 
122         int __user *clear_child_tid;  //指向被清除的线程的TID号 
123         putime_t utime, stime, utimescaled, stimescaled;  // utime是进程用户态耗费的时间，stime是用户内核态耗费的时间。                                                         
124          //而后边的两个值应该是不同单位的时间cputime_t gtime;    
125         #ifndef CONFIG_VIRT_CPU_ACCOUNTING   
126         cputime_t prev_utime, prev_stime;  
127         #endif  
128         unsigned long nvcsw, nivcsw;  //上下文切换计数
129         struct timespec start_time;  //单调时间
130         struct timespec real_start_time;  //开机时间
131         unsigned long min_flt, maj_flt;   
132         struct task_cputime cputime_expires;  //进程到期的时间？  
133         struct list_head cpu_timers[3];  
134 
135 const struct cred __rcu *real_cred; 
136         const struct cred __rcu *cred; 
137         struct cred *replacement_session_keyring; 
138         char comm[TASK_COMM_LEN]; 
139 
140         int link_count, total_link_count;  //硬连接的数量？  
141         #ifdef CONFIG_SYSVIPC  //进程间通信相关的东西  
142         struct sysv_sem sysvsem;  
143         #endif  
144         #ifdef CONFIG_DETECT_HUNG_TASK  //挂起任务检测
145         unsigned long last_switch_count;  
146         #endif  
147 struct thread_struct thread; /*因为task_stcut是与硬件体系结构无关的，因此用thread_struct这个结构来包容不同的体系结构*/    
148         struct fs_struct *fs;  
149         struct files_struct *files;   
150         struct signal_struct *signal;  
151         struct sighand_struct *sighand;  
152         sigset_t blocked, real_blocked;  
153         sigset_t saved_sigmask; 
154         struct sigpending pending;  //表示进程收到了信号但是尚未处理。  
155         unsigned long sas_ss_sp;size_t sas_ss_size;  
156        
157         int (*notifier)(void *priv);  
158         void *notifier_data;  
159         sigset_t *notifier_mask;  
160         struct audit_context *audit_context; 
161         #ifdef CONFIG_AUDITSYSCALL  
162         uid_t loginuid;  
163         unsigned int sessionid;  
164         #endif  
165         seccomp_t seccomp;    
166         u32 parent_exec_id;   
167         u32 self_exec_id; 
168         spinlock_t alloc_lock;  
169         #ifdef CONFIG_GENERIC_HARDIRQS  //处理程序线程
170         struct irqaction *irqaction;
171         #endif 
172         struct plist_head pi_waiters;
173         struct rt_mutex_waiter *pi_blocked_on;  
174         #endif  
175         #ifdef CONFIG_DEBUG_MUTEXES //互斥死锁检测  
176         struct mutex_waiter *blocked_on;  
177         #endif  
178         #ifdef CONFIG_TRACE_IRQFLAGS  
179         unsigned int irq_events;  
180         unsigned long hardirq_enable_ip;  
181         unsigned long hardirq_disable_ip;  
182         unsigned int hardirq_enable_event;  
183         unsigned int hardirq_disable_event;  
184         int hardirqs_enabled;  
185         int hardirq_context;  
186         unsigned long softirq_disable_ip;  
187         unsigned long softirq_enable_ip;  
188         unsigned int softirq_disable_event;  
189         unsigned int softirq_enable_event;  
190         int softirqs_enabled;   
191         int softirq_context;  
192         #endif  
193         #ifdef CONFIG_LOCKDEP  
194         # define MAX_LOCK_DEPTH 48UL  
195         u64 curr_chain_key;  
196         int lockdep_depth; //锁的深度  
197         unsigned int lockdep_recursion;  
198         struct held_lock held_locks[MAX_LOCK_DEPTH];  
199         gfp_t lockdep_reclaim_gfp;  
200         #endif  
201         void *journal_info; //文件系统日志信息   
202         struct bio_list *bio_list; //块IO设备表  
203         #ifdef CONFIG_BLOCK   
204         struct blk_plug *plug;  
205         #endif  
206 
207         struct reclaim_state *reclaim_state;  
208         struct backing_dev_info *backing_dev_info;  
209         struct io_context *io_context;  
210         unsigned long ptrace_message;  
211         siginfo_t *last_siginfo;  
212         struct task_io_accounting ioac; //用于记录单个任务的IO统计信息的结构
213         #if defined(CONFIG_TASK_XACCT)  
214         u64 acct_rss_mem1;   
215         u64 acct_vm_mem1;    
216         cputime_t acct_timexpd;  
217         #endif  
218         #ifdef CONFIG_CPUSETS  
219         nodemask_t mems_allowed;    
220         int mems_allowed_change_disable;  
221         int cpuset_mem_spread_rotor;  
222         int cpuset_slab_spread_rotor;  
223         #endif  
224         #ifdef CONFIG_CGROUPS  
225         struct css_set __rcu *cgroups;  
226         struct list_head cg_list;  
227         #endif  
228         #ifdef CONFIG_FUTEX  
229         struct robust_list_head __user *robust_list;  
230         #ifdef CONFIG_COMPAT  
231         struct compat_robust_list_head __user *compat_robust_list;  
232         #endifstruct list_head pi_state_list;  
233         struct futex_pi_state *pi_state_cache;  
234         #endif  
235         #ifdef CONFIG_PERF_EVENTS  
236         struct perf_event_context *perf_event_ctxp[perf_nr_task_contexts];  
237         struct mutex perf_event_mutex;  
238         struct list_head perf_event_list;  
239         #endif  
240         #ifdef CONFIG_NUMA  
241         struct mempolicy *mempolicy;  
242         short il_next;  
243         short pref_node_fork;  
244         #endifatomic_t fs_excl; //是否允许进程独占文件系统。为0表示否。  
245         struct rcu_head rcu; //缓存上次用于拼接的管道
246         struct pipe_inode_info *splice_pipe;  
247         #ifdef CONFIG_TASK_DELAY_ACCT  
248         struct task_delay_info *delays;  
249         #endif  
250         #ifdef CONFIG_FAULT_INJECTION  
251         int make_it_fail;  
252         #endif  
253         struct prop_local_single dirties;  
254         #ifdef CONFIG_LATENCYTOP  
255         int latency_record_count;  
256         struct latency_record latency_record[LT_SAVECOUNT];  
257         #endif  
258         unsigned long timer_slack_ns;  
259         unsigned long default_timer_slack_ns;  
260         struct list_head *scm_work_list;  
261         #ifdef CONFIG_FUNCTION_GRAPH_TRACER  
262         int curr_ret_stack; //返回函数跟踪的返回地址堆栈
263         struct ftrace_ret_stack *ret_stack; //上次排程的时间戳记 
264         unsigned long long ftrace_timestamp;  
265         atomic_t trace_overrun;  
266         atomic_t tracing_graph_pause;  
267         #endif  
268         #ifdef CONFIG_TRACING  
269         unsigned long trace; //位掩码与跟踪递归计数器
270         unsigned long trace_recursion;  
271         #endif 
272         #ifdef CONFIG_CGROUP_MEM_RES_CTLR   
273         struct memcg_batch_info {
274         int do_batch;  //启动批未知时递增 
275         struct mem_cgroup *memcg; 
276         unsigned long nr_pages;  
277         unsigned long memsw_nr_pages;  
278         } memcg_batch;  
279         #endif  
280         #ifdef CONFIG_HAVE_HW_BREAKPOINT  
281         atomic_t ptrace_bp_refcnt;  
282         #endif  
283      }

3.1.2程序段

　　程序段是能被进程调度程序调度到CPU执行的程序代码段。

3.1.3数据段

　　一个进程的数据段，可以是进程对应的程序加工处理的原始数据，也可以是程序执行是产的中心或最终结果。

3.2进程的状态

　　一般来说，进程有三个状态：

运行态：该时刻进程实际占用CPU；

就绪态:可运行，但因为其他进程正在运行而暂时停止；

阻塞态：除非某种外部事件发生，否则进程不能进行。

3.3进程状态的转换

　　对于进程状态的各类转换，这里引用网上（http://book.51cto.com/art/201106/270771.htm）查找到《Linux内核设计与实现（原书第3版）》的进程状态转换图来说明：

4、进程是如何调度的

　　对于Linux中的进程调度器，其演变过程是O(n)，O(1)，CFS。因为在Linux Kernel 2.6.23 之后就开始采用的CFS进程调度器，所以下面主要讲一下CFS调度器。

CFS调度器：

　　CFS（完全公平调度器）是Linux内核2.6.23版本开始采用的进程调度器，它的基本原理是这样的：设定一个调度周期（sched_latency_ns），目标是让每个进程在这个周期内至少有机会运行一次，换一种说法就是每个进程等待CPU的时间最长不超过这个调度周期；然后根据进程的数量，大家平分这个调度周期内的CPU使用权，由于进程的优先级即nice值不同，分割调度周期的时候要加权；每个进程的累计运行时间保存在自己的vruntime（进程的虚拟运行时间）字段里，哪个进程的vruntime最小就获得本轮运行的权利。

　　因此下面主要就说一下进程的vruntime值在下面三种情况下的变化。

4.1 新进程的vruntime（进程的虚拟运行时间）的初值

　　在CFS调度器中，每个CPU的就绪队列cfs_rq都维护一个min_vruntime字段，记录该就绪队列中所有进程的vruntime最小值，新进程的初始vruntime值就是以它所在运行队列的min_vruntime为基础来设置的，与老进程保持在一个合理的差距范围内。

进程的就绪队列中就存储了CFS相关的虚拟运行时钟的信息, struct cfs_rq定义如下：

 1 struct cfs_rq
 2 {
 3     struct load_weight load;   /*所有进程的累计负荷值*/
 4     unsigned long nr_running;  /*当前就绪队列的进程数*/
 5 
 6     // ========================
 7     u64 min_vruntime;  //  队列的虚拟时钟, 
 8     // =======================
 9     struct rb_root tasks_timeline;  /*红黑树的头结点*/
10     struct rb_node *rb_leftmost;    /*红黑树的最左面节点*/
11 
12     struct sched_entity *curr;      /*当前执行进程的可调度实体*/
13         ...
14 };

内核设置min_vruntime是通过update_min_vruntime函数来设置的：

 1 static void update_min_vruntime(struct cfs_rq *cfs_rq)
 2 {
 3     /*  初始化vruntime的值, 相当于如下的代码
 4     if (cfs_rq->curr != NULL)
 5         vruntime = cfs_rq->curr->vruntime;
 6     else
 7         vruntime = cfs_rq->min_vruntime;
 8     */
 9     u64 vruntime = cfs_rq->min_vruntime;
10 
11     if (cfs_rq->curr)
12         vruntime = cfs_rq->curr->vruntime;
13 
14 
15     /*  检测红黑树是都有最左的节点, 即是否有进程在树上等待调度
16      *  cfs_rq->rb_leftmost(struct rb_node *)存储了进程红黑树的最左节点
17      *  这个节点存储了即将要被调度的结点  
18      *  */
19     if (cfs_rq->rb_leftmost)
20     {
21         /*  获取最左结点的调度实体信息se, se中存储了其vruntime
22          *  rb_leftmost的vruntime即树中所有节点的vruntiem中最小的那个  */
23         struct sched_entity *se = rb_entry(cfs_rq->rb_leftmost,
24                            struct sched_entity,
25                            run_node);
26         /*  如果就绪队列上没有curr进程
27          *  则vruntime设置为树种最左结点的vruntime
28          *  否则设置vruntiem值为cfs_rq->curr->vruntime和se->vruntime的最小值
29          */
30         if (!cfs_rq->curr)  /*  此时vruntime的原值为cfs_rq->min_vruntime*/
31             vruntime = se->vruntime;
32         else                /* 此时vruntime的原值为cfs_rq->curr->vruntime*/
33             vruntime = min_vruntime(vruntime, se->vruntime);
34     }
35 
36     /* ensure we never gain time by being placed backwards. 
37      * 为了保证min_vruntime单调不减
38      * 只有在vruntime超出的cfs_rq->min_vruntime的时候才更新
39      */
40     cfs_rq->min_vruntime = max_vruntime(cfs_rq->min_vruntime, vruntime);
41 #ifndef CONFIG_64BIT
42     smp_wmb();
43     cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
44 #endif
45 }

在这里值得一提的是红黑树。

　　R-B Tree，全称是Red-Black Tree，又称为“红黑树”，它一种特殊的二叉查找树。红黑树的每个节点上都有存储位表示节点的颜色，可以是红(Red)或黑(Black)。

红黑树的特性:
（1）每个节点或者是黑色，或者是红色。
（2）根节点是黑色。
（3）每个叶子节点（NIL）是黑色。 [注意：这里叶子节点，是指为空(NIL或NULL)的叶子节点！]
（4）如果一个节点是红色的，则它的子节点必须是黑色的。
（5）从一个节点到该节点的子孙节点的所有路径上包含相同数目的黑节点。

注意：
(01) 特性(3)中的叶子节点，是只为空(NIL或null)的节点。
(02) 特性(5)，确保没有一条路径会比其他路径长出俩倍。因而，红黑树是相对是接近平衡的二叉树。

红黑树示意图如下：

4.2休眠时进程的vruntime值

　　如果休眠时期进程的vruntime保持不变，由于其他进程的vruntime在运行中不断的推进，那么等待休眠结束的时候，它的vruntime就会比别人小很多，这样会让它获得较长时间抢占CPU，这对其他进程就显得很不公平。因此在CFS中，进程在休眠时期被唤醒后，需要重新设置该进程的vruntime值，以min_vruntime值为基础，给予一定的补偿。

下面我们看下place_entity是怎么计算新进程的vruntime已经对被唤醒进程的vruntime给予的补偿：

 1 static void
 2 place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 3 {
 4         u64 vruntime = cfs_rq->min_vruntime;
 5  
 6         /*
 7          * The 'current' period is already promised to the current tasks,
 8          * however the extra weight of the new task will slow them down a
 9          * little, place the new task so that it fits in the slot that
10          * stays open at the end.
11          */
12         if (initial && sched_feat(START_DEBIT)) /* initial表示新进程 */
13                 vruntime += sched_vslice(cfs_rq, se);
14  
15         /* sleeps up to a single latency don't count. */
16         if (!initial) { /* 休眠进程 */
17                 unsigned long thresh = sysctl_sched_latency; /* 一个调度周期 */
18  
19                 /*
20                  * Halve their sleep time's effect, to allow
21                  * for a gentler effect of sleepers:
22                  */
23                 if (sched_feat(GENTLE_FAIR_SLEEPERS)) /* 若设了GENTLE_FAIR_SLEEPERS */
24                         thresh >>= 1; /* 补偿减为调度周期的一半 */
25  
26                 vruntime -= thresh;
27         }
28  
29         /* ensure we never gain time by being placed backwards. */
30         vruntime = max_vruntime(se->vruntime, vruntime);
31  
32         se->vruntime = vruntime;
33 }

　　其中sched_vslice = (调度周期 * 进程权重 / 所有进程总权重) * NICE_0_LOAD / 进程权重；也就是算出进程应分配的实际cpu时间，再把它转化为vruntime。

4.3进程从一个CPU迁移到另一个CPU上的时候vruntime的值

　　在多CPU系统上，不同的CPU的负载是不一样的，有的CPU更忙一些，而每个CPU都有自己的运行队列，每个队列中的进程的vruntime也走得有快有慢。这样在一个进程从CPU迁移到另一个CPU的时候如果它的vruntime还是保持原值的话，可能会吃亏也可能会占便宜。

　　为了公平起见，在CFS中，当进程从一个CPU的运行队列中出来 (dequeue_entity) 的时候，它的vruntime要减去队列的min_vruntime值；而当进程加入另一个CPU的运行队列 ( enqueue_entiry) 时，它的vruntime要加上该队列的min_vruntime值。这样，进程从一个CPU迁移到另一个CPU之后，vruntime保持相对公平。

主要代码如下：

 1 static void
 2 dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 3 {
 4 ...
 5         /*
 6          * Normalize the entity after updating the min_vruntime because the
 7          * update can refer to the ->curr item and we need to reflect this
 8          * movement in our normalized position.
 9          */
10         if (!(flags & DEQUEUE_SLEEP))
11                 se->vruntime -= cfs_rq->min_vruntime;
12 ...
13 }
14  
15 static void
16 enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
17 {
18         /*
19          * Update the normalized vruntime before updating min_vruntime
20          * through callig update_curr().
21          */
22         if (!(flags & ENQUEUE_WAKEUP) || (flags & ENQUEUE_WAKING))
23                 se->vruntime += cfs_rq->min_vruntime;
24 ...
25 }

5、对该操作系统进程模型的看法

　　在操作系统中最核心的概念就是进程，这是对正在运行程序的一个抽象。可以说我们想要运行的每个程序都是由进程来决定的。而进程模型在我看来就是为了更好的利用资源，更合理的运行我们的程序，从而对进程进行的优化和管理。在深入操作系统观看那个源码后，我们可以知道，为了让个个进程运行的相对公平些，其对进程会出现的一些情况都要进行相对应的调整，让其在相对范围内的进程能够公平的利用CPU。

6、参考资料

http://book.51cto.com/art/201106/270771.htm

https://blog.csdn.net/melong100/article/details/6329201

https://blog.csdn.net/gatieme/article/details/52067748

posted @ 2018-05-01 15:16 私语沫言阅读(529) 评论(0) 收藏举报

刷新页面返回顶部

私语沫言

基于Linux2.6.32 系统下分析进程模型

公告