进程优先级

2022-03-31新闻

1.进程优先级：

内核使用[0~139]这140个数来表示140种优先级。

内核使用一个简单些的数值范围，从0到139（包含），用来表示内部优先级。同样是值越低，优

先级越高。从0到99的范围专供实时进程使用。 nice值[20, +19]映射到范围100到139，如图2-14所示。

实时进程的优先级总是比普通进程更高。

下面列出了task_struct结构体中与权限相关的几个成员：

a) static_prio，指普通进程的静态优先级(实时进程没用该参数)，值越小优先级越高。静态优先级是进程启动时分配的优先级。它可以用nice()或者sched_setscheduler()系统调用更改,否则在运行期间一直保持恒定。

b) rt_priority，表示实时进程的优先级(普通进程没用该参数)，它的值介于[0~99]之间(包括0和99)。注意：rt_priority是值越大优先级越高。

c) normal_prio是基于前两个参数static_prio或rt_priority计算出来的。可以这样理解：static_prio和rt_priority分别代表普通进程和实时进程「静态」的优先级，代表进程的固有属性。由于他们两的「单位」不同（一个是点头yes，摇头no；另一个是摇头yes，点头no），一个是值越小优先级越高，另一个是值越大优先级越高。有必要用normal_prio统一下"单位"。统一成值越小优先级越高,因此，normal_prio也可以理解为:统一了单位的「静态」优先级。

d) prio,叫做动态优先级,它表示进程的有效优先级，顾名思义，在系统中需要判断进程优先级时用的便是该参数，调度器考虑的优先级也就是它。对于实时进程来说，有效优先级prio就等于它的normal_prio（「统一单位」后的优先级）。有效优先级对普通进程来说尤为重要，进程可以临时提高优先级，通过改变prio的值实现，所以优先级的提高不影响进程的静态优先级。顺带说明一下，子进程的有效优先级prio初始划为父进程的静态优先级，而不是父进程的有效优先级（也就是说，父进程的优先级如果临时提高了，该特性不会遗传给子进程）。

e) policy, 调度策略，共有五种可能值：SCHED_NORMAL,SCHED_IDLE,SCHED_BATCH,SCHED_FIFO,SCHED_RR。普通进程的policy是前三种值之一，实时进程的policy是后两种值之一。

下列宏用于在各种不同表示形式之间转换（MAX_RT_PRIO指定实时进程的最大优先级，而MAX_PRIO则是普通进程的最大优先级数值）：

#define MAX_USER_RT_PRIO 100 #define MAX_RT_PRIO MAX_USER_RT_PRIO #define MAX_PRIO (MAX_RT_PRIO + 40) #define DEFAULT_PRIO (MAX_RT_PRIO + 20) /* * Convert user-nice values [ -20 ... 0 ... 19 ] * to static priority [ MAX_RT_PRIO..MAX_PRIO-1 ], * and back. */ #define NICE_TO_PRIO(nice) (MAX_RT_PRIO + (nice) + 20) #define PRIO_TO_NICE(prio) ((prio) - MAX_RT_PRIO - 20) #define TASK_NICE(p) PRIO_TO_NICE((p)->static_prio)

2.进程优先级的计算

static_prio是计算的起点。假定它已经设置好，而内核现在想要计算其他进程p的动态优先级是用函数effective_prio(p)计算出来的：

p->prio= effective_prio(p);

看看 effective_prio函数的具体实现：该函数有两个作用：

1.设置了进程p的normal_prio。

2.返回了进程的有效优先级。

/* * Calculate the current priority, i.e. the priority * taken into account by the scheduler. This value might * be boosted by RT tasks, or might be boosted by * interactivity modifiers. Will be RT if the task got * RT-boosted. If not then it returns p->normal_prio. */ static int effective_prio ( struct task_struct * p ) { undefined //计算普通优先级 p -> normal_prio = normal_prio ( p ); /* * If we are RT tasks or we were boosted to RT priority, * keep the priority unchanged. Otherwise, update priority * to the normal priority: */ /*

* 如果是实时进程或已经提高到实时优先级，则保持优先级不变。否则，返回普通优先级：

*/ if ( ! rt_prio ( p -> prio )) return p -> normal_prio ; return p -> prio ; } /* * Calculate the expected normal priority: i.e. priority * without taking RT-inheritance into account. Might be * boosted by interactivity modifiers. Changes upon fork, * setprio syscalls, and whenever the interactivity * estimator recalculates. */ static inline int normal_prio ( struct task_struct * p ) { undefined int prio ; if ( task_has_dl_policy ( p )) //SCHED_DEADLINE 新支持的实时进程调度策略 prio = MAX_DL_PRIO - 1 ; // MAX_DL_PRIO = -1

//判断进程的调度策略policy是不是SCHED_FIFO和SCHED_RR中的一种，如果是则它是实时进程,返回true，反之则返回false。

else if ( task_has_rt_policy ( p )) prio = MAX_RT_PRIO - 1 - p -> rt_priority ; else prio = __normal_prio ( p ); return prio ; }

普通优先级需要根据普通进程和实时进程进行不同的计算。 __normal_prio的计算只适用于普通进程。而实时进程的普通优先级计算，则需要根据其rt_priority设置。由于更高的rt_priority值表示更高的实时优先级，内核内部优先级的表示刚好相反，越低的值表示的优先级越高。因此，实时进程在内核内部的优先级数值，正确的算法是MAX_RT_PRIO - 1 - p->rt_priority。这一次请注意，与effective_prio相比，实时进程的检测不再基于优先级数值，而是通过task_struct中设置的调度策略来检测

MAX_RT_PRIO的值是100（也就是实时进程的优先级的最大数值加1），normal_prio()函数实际上就是了单位统一的过程。它的执行流程是这样的：如果p是实时进程，那么就返回99-rt_priority（rt_priority是值越大表示进程优先级越高，normal_priority反之,所以通过这个方式将rt_priority转换为normal_priority），如果进程p是普通进程,不需要统一"单位",那么直接返回它的静态优先级static_prio。

/* * __normal_prio - return the priority that is based on the static prio */ static inline int __normal_prio ( struct task_struct * p ) { undefined return p -> static_prio ; }

为什么内核在effective_prio中检测实时进程是基于优先级数值，而非task_has_rt_policy？对于临时提高至实时优先级的非实时进程来说，这是必要的，这种情况可能发生在

使用实时互斥量（RT-Mutex）时。

综上：a) 因此对于实时进程来说：prio=effective_prio()=normal_prio。normal_prio=MAX_RT_PRIO-1-rt_priority

b) 对于优先级没有提高的普通进程来说：prio=effective_prio()=normal_prio=static_prio

c) 对于优先级提高的普通进程来说：prio=effective_prio()，normal_prio=static_prio。prio的值被其他函数更改过，所以与初始时不同。

d) nice值

nice值也用来用来表示普通进程的优先等级，它介于[-20~19]之间，也是值越小优先级越高。之前讲过普通进程的优先值范围是[100~139]，刚好和nice值一一对应起来：优先等级=nice值+120。nice值并不是表示进程优先级的一种新的机制，只是优先级的另一个表示而已。sys_nice()系统调用设置的是进程的静态优先级static_prio.

3.计算负荷权重

进程的重要性不仅是由优先级指定的，而且还需要考虑保存在task_struct->se.load的负荷权重。 set_load_weight负责根据进程类型及其静态优先级计算负荷权重。

在进程被调度的先后顺序中，讲到影响进程在就绪队列中的参数是进程的权重值weight。而weight是由进程的静态优先级static_prio决定的，静态优先级越高（static_prio值越小）weight值越大。静态优先级和weight是通过prio_to_weight数组对应起来的。静态优先级为100（nice值为-20）的进程，其weight值为prio_to_weight[0],静态优先级为k的(nice值为k-120)的进程,weight值为prio_to_weight[k-100]。

普通进程的默认nice值为0，即默认静态优先级为120，它的weight值为prio_to_weight[20]，即1024。因此NICE_O_LOAD的值就是1024,NICE_0_LOAD的命名也就是这么来的。

很重要的规定：nice值为0的进程虚拟运行时间（vruntime)行走速度和真实运行时间(runtime)行走的速度相同。

权重计算的代码也需要考虑进程类型。实时进程的权重是普通进程的两倍。另一方面，SCHED_IDLE进程的权重总是非常小：

set_load_weight代码的实现：

static void set_load_weight ( struct task_struct * p ) { undefined int prio = p -> static_prio - MAX_RT_PRIO ; struct load_weight * load = & p -> se . load ; /* * SCHED_IDLE tasks get minimal weight: */ if ( p -> policy == SCHED_IDLE ) { undefined load -> weight = scale_load ( WEIGHT_IDLEPRIO ); load -> inv_weight = WMULT_IDLEPRIO ; return ; } //# define scale_load(w) (w) 内核不仅计算出权重本身，还存储了用于除法的值。 load -> weight = scale_load ( prio_to_weight [ prio ]); load -> inv_weight = prio_to_wmult [ prio ]; }

不仅进程，而且就绪队列也关联到一个负荷权重。每次进程被加到就绪队列时，内核会调用inc_nr_running。这不仅确保就绪队列能够跟踪记录有多少进程在运行，而且还将进程的权重添加到就绪队列的权重中：

static void enqueue_task_fair ( struct rq * rq , struct task_struct * p , int flags ) { .... inc_nr_running ( rq ); } static inline void inc_nr_running ( struct rq * rq ) { undefined rq -> nr_running ++ ; //队列上进程数统计 ..... } static inline void update_load_add ( struct load_weight * lw , unsigned long inc ) { undefined //inc 对应于调用函数入参 se->load.weight lw -> weight += inc ; lw -> inv_weight = 0 ; }

在进程从就绪队列移除时，会调用对应的函数（dec_nr_running、 dec_nr_running、 update_load_sub）。

更多Linux内核源码高阶知识请加开发交流Q群篇【318652197】获取，进群免费获取相关资料，免费观看公开课技术分享，入群不亏,快来加入我们吧~
前100名进群领取，额外赠送一份价值699的内核资料包（含视频教程、电子书、实战项目及代码)

资源免费领

学习直通车