行程優先級

2022-03-31新聞

1.行程優先級：

內核使用[0~139]這140個數來表示140種優先級。

內核使用一個簡單些的數值範圍，從0到139（包含），用來表示內部優先級。同樣是值越低，優

先級越高。從0到99的範圍專供即時行程使用。 nice值[20, +19]對映到範圍100到139，如圖2-14所示。

即時行程的優先級總是比普通行程更高。

下面列出了task_struct結構體中與許可權相關的幾個成員：

a) static_prio，指普通行程的靜態優先級(即時行程沒用該參數)，值越小優先級越高。靜態優先級是行程啟動分時配的優先級。它可以用nice()或者sched_setscheduler()系統呼叫更改,否則在執行期間一直保持恒定。

b) rt_priority，表示即時行程的優先級(普通行程沒用該參數)，它的值介於[0~99]之間(包括0和99)。註意：rt_priority是值越大優先級越高。

c) normal_prio是基於前兩個參數static_prio或rt_priority計算出來的。可以這樣理解：static_prio和rt_priority分別代表普通行程和即時行程「靜態」的優先級，代表行程的固有內容。由於他們兩的「單位」不同（一個是點頭yes，搖頭no；另一個是搖頭yes，點頭no），一個是值越小優先級越高，另一個是值越大優先級越高。有必要用normal_prio統一下"單位"。統一成值越小優先級越高,因此，normal_prio也可以理解為:統一了單位的「靜態」優先級。

d) prio,叫做動態優先級,它表示行程的有效優先級，顧名思義，在系統中需要判斷行程優先級時用的便是該參數，排程器考慮的優先級也就是它。對於即時行程來說，有效優先級prio就等於它的normal_prio（「統一單位」後的優先級）。有效優先級對普通行程來說尤為重要，行程可以臨時提高優先級，透過改變prio的值實作，所以優先級的提高不影響行程的靜態優先級。順帶說明一下，子行程的有效優先級prio初始劃為父行程的靜態優先級，而不是父行程的有效優先級（也就是說，父行程的優先級如果臨時提高了，該特性不會遺傳給子行程）。

e) policy, 排程策略，共有五種可能值：SCHED_NORMAL,SCHED_IDLE,SCHED_BATCH,SCHED_FIFO,SCHED_RR。普通行程的policy是前三種值之一，即時行程的policy是後兩種值之一。

下列宏用於在各種不同表示形式之間轉換（MAX_RT_PRIO指定即時行程的最大優先級，而MAX_PRIO則是普通行程的最大優先級數值）：

#define MAX_USER_RT_PRIO 100 #define MAX_RT_PRIO MAX_USER_RT_PRIO #define MAX_PRIO (MAX_RT_PRIO + 40) #define DEFAULT_PRIO (MAX_RT_PRIO + 20) /* * Convert user-nice values [ -20 ... 0 ... 19 ] * to static priority [ MAX_RT_PRIO..MAX_PRIO-1 ], * and back. */ #define NICE_TO_PRIO(nice) (MAX_RT_PRIO + (nice) + 20) #define PRIO_TO_NICE(prio) ((prio) - MAX_RT_PRIO - 20) #define TASK_NICE(p) PRIO_TO_NICE((p)->static_prio)

2.行程優先級的計算

static_prio是計算的起點。假定它已經設定好，而內核現在想要計算其他行程p的動態優先級是用函式effective_prio(p)計算出來的：

p->prio= effective_prio(p);

看看 effective_prio函式的具體實作：該函式有兩個作用：

1.設定了行程p的normal_prio。

2.返回了行程的有效優先級。

/* * Calculate the current priority, i.e. the priority * taken into account by the scheduler. This value might * be boosted by RT tasks, or might be boosted by * interactivity modifiers. Will be RT if the task got * RT-boosted. If not then it returns p->normal_prio. */ static int effective_prio ( struct task_struct * p ) { undefined //計算普通優先級 p -> normal_prio = normal_prio ( p ); /* * If we are RT tasks or we were boosted to RT priority, * keep the priority unchanged. Otherwise, update priority * to the normal priority: */ /*

* 如果是即時行程或已經提高到即時優先級，則保持優先級不變。否則，返回普通優先級：

*/ if ( ! rt_prio ( p -> prio )) return p -> normal_prio ; return p -> prio ; } /* * Calculate the expected normal priority: i.e. priority * without taking RT-inheritance into account. Might be * boosted by interactivity modifiers. Changes upon fork, * setprio syscalls, and whenever the interactivity * estimator recalculates. */ static inline int normal_prio ( struct task_struct * p ) { undefined int prio ; if ( task_has_dl_policy ( p )) //SCHED_DEADLINE 新支持的即時行程排程策略 prio = MAX_DL_PRIO - 1 ; // MAX_DL_PRIO = -1

//判斷行程的排程策略policy是不是SCHED_FIFO和SCHED_RR中的一種，如果是則它是即時行程,返回true，反之則返回false。

else if ( task_has_rt_policy ( p )) prio = MAX_RT_PRIO - 1 - p -> rt_priority ; else prio = __normal_prio ( p ); return prio ; }

普通優先級需要根據普通行程和即時行程進行不同的計算。 __normal_prio的計算只適用於普通行程。而即時行程的普通優先級計算，則需要根據其rt_priority設定。由於更高的rt_priority值表示更高的即時優先級，內核內部優先級的表示剛好相反，越低的值表示的優先級越高。因此，即時行程在內核內部的優先級數值，正確的演算法是MAX_RT_PRIO - 1 - p->rt_priority。這一次請註意，與effective_prio相比，即時行程的檢測不再基於優先級數值，而是透過task_struct中設定的排程策略來檢測

MAX_RT_PRIO的值是100（也就是即時行程的優先級的最大數值加1），normal_prio()函式實際上就是了單位統一的過程。它的執行流程是這樣的：如果p是即時行程，那麽就返回99-rt_priority（rt_priority是值越大表示行程優先級越高，normal_priority反之,所以透過這個方式將rt_priority轉換為normal_priority），如果行程p是普通行程,不需要統一"單位",那麽直接返回它的靜態優先級static_prio。

/* * __normal_prio - return the priority that is based on the static prio */ static inline int __normal_prio ( struct task_struct * p ) { undefined return p -> static_prio ; }

為什麽內核在effective_prio中檢測即時行程是基於優先級數值，而非task_has_rt_policy？對於臨時提高至即時優先級的非即時行程來說，這是必要的，這種情況可能發生在

使用即時互斥量（RT-Mutex）時。

綜上：a) 因此對於即時行程來說：prio=effective_prio()=normal_prio。normal_prio=MAX_RT_PRIO-1-rt_priority

b) 對於優先級沒有提高的普通行程來說：prio=effective_prio()=normal_prio=static_prio

c) 對於優先級提高的普通行程來說：prio=effective_prio()，normal_prio=static_prio。prio的值被其他函式更改過，所以與初始時不同。

d) nice值

nice值也用來用來表示普通行程的優先等級，它介於[-20~19]之間，也是值越小優先級越高。之前講過普通行程的優先值範圍是[100~139]，剛好和nice值一一對應起來：優先等級=nice值+120。nice值並不是表示行程優先級的一種新的機制，只是優先級的另一個表示而已。sys_nice()系統呼叫設定的是行程的靜態優先級static_prio.

3.計算負荷權重

行程的重要性不僅是由優先級指定的，而且還需要考慮保存在task_struct->se.load的負荷權重。 set_load_weight負責根據行程型別及其靜態優先級計算負荷權重。

在行程被排程的先後順序中，講到影響行程在就緒佇列中的參數是行程的權重值weight。而weight是由行程的靜態優先級static_prio決定的，靜態優先級越高（static_prio值越小）weight值越大。靜態優先級和weight是透過prio_to_weight陣列對應起來的。靜態優先級為100（nice值為-20）的行程，其weight值為prio_to_weight[0],靜態優先級為k的(nice值為k-120)的行程,weight值為prio_to_weight[k-100]。

普通行程的預設nice值為0，即預設靜態優先級為120，它的weight值為prio_to_weight[20]，即1024。因此NICE_O_LOAD的值就是1024,NICE_0_LOAD的命名也就是這麽來的。

很重要的規定：nice值為0的行程虛擬執行時間（vruntime)行走速度和真實執行時間(runtime)行走的速度相同。

權重計算的程式碼也需要考慮行程型別。即時行程的權重是普通行程的兩倍。另一方面，SCHED_IDLE行程的權重總是非常小：

set_load_weight程式碼的實作：

static void set_load_weight ( struct task_struct * p ) { undefined int prio = p -> static_prio - MAX_RT_PRIO ; struct load_weight * load = & p -> se . load ; /* * SCHED_IDLE tasks get minimal weight: */ if ( p -> policy == SCHED_IDLE ) { undefined load -> weight = scale_load ( WEIGHT_IDLEPRIO ); load -> inv_weight = WMULT_IDLEPRIO ; return ; } //# define scale_load(w) (w) 內核不僅計算出權重本身，還儲存了用於除法的值。 load -> weight = scale_load ( prio_to_weight [ prio ]); load -> inv_weight = prio_to_wmult [ prio ]; }

不僅行程，而且就緒佇列也關聯到一個負荷權重。每次行程被加到就緒佇列時，內核會呼叫inc_nr_running。這不僅確保就緒佇列能夠跟蹤記錄有多少行程在執行，而且還將行程的權重添加到就緒佇列的權重中：

static void enqueue_task_fair ( struct rq * rq , struct task_struct * p , int flags ) { .... inc_nr_running ( rq ); } static inline void inc_nr_running ( struct rq * rq ) { undefined rq -> nr_running ++ ; //佇列上行程數統計 ..... } static inline void update_load_add ( struct load_weight * lw , unsigned long inc ) { undefined //inc 對應於呼叫函式入參 se->load.weight lw -> weight += inc ; lw -> inv_weight = 0 ; }

在進程從就緒隊列移除時，會調用對應的函數（dec_nr_running、 dec_nr_running、 update_load_sub）。

更多Linux內核源碼高階知識請加開發交流Q群篇【318652197】獲取，進群免費獲取相關資料，免費觀看公開課技術分享，入群不虧,快來加入我們吧~
前100名進群領取，額外贈送一份價值699的內核資料包（含視訊教程、電子書、實戰計畫及程式碼)

資源免費領

學習直通車