[LKDP] updated scheduler/initial.tex

lkdp-discuss

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LKDP] updated scheduler/initial.tex

From:	KDLinux
Subject:	[LKDP] updated scheduler/initial.tex
Date:	Mon, 12 Aug 2002 05:12:47 -0700 (PDT)

Hi all,
 Here is the updated version of initial.tex of process
scheduling document.
 Any comments/suggesions are welcome.



=====
-- KD.
www.kirandivekar.cjb.net

__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com

\chapter{Initialization}
        \section{start\_kernel Function}
                When a PC is powered on, it initializes all the hardware 
present in the system and makes sure, each hardware is up and running. So, 
after initial hardware boot sequence, kernel comes into picture by trasfering 
control to \textit{start\_kernel} \index{start\_kernel} function. This function 
is defined in the file \url{init/main.c}. Only the first CPU calls 
start\_kernel(), while all others call function 
initialize\_secondary()\footnote{defined in \url{arch/i386/kernel/smpboot.c}}. 
Ihe start\_kernel function performs intialization of the all Operating Systems' 
Blocks and initializes the data structures used by the kernel. This function 
calls sub-functions to perform the initialization of IRQ requests, process 
scheduler, softirq systems, kernel timers, signal mechanism and SMP(symmetric 
multi-processing) machanism while the initialize\_secondary function just 
copies the stack pointer and EIP values from the Task State Segment(TSS).

        \begin{verbatim}

                lock_kernel();
                printk(linux_banner);
                setup_arch(&command_line);
                setup_per_cpu_areas();
                printk("Kernel command line: %s\n", saved_command_line);
                parse_options(command_line);
                trap_init();
                init_IRQ();
                sched_init();
                softirq_init();
                /* timer and memory initialization code */
                fork_init();
                signals_init();
                smp_init();

        \end{verbatim}
 
                \subsection{Macro lock\_kernel}
                The macro lock\_kernel is defined in in the file 
\url{include/linux/smp_lock.h}. This is only defined for non-SMP systems, 
because there are no inter-CPU locks on single CPU systems. On i386 SMP 
systems, lock\_kernel is an inline function: \url{incllude/asm/i386/smplock.h}
        \begin{verbatim}
                extern __inline__ void lock_kernel(void)
                {
                        if (!++current->lock_depth)
                        spin_lock(&kernel_flag);
                }
        \end{verbatim}
 
                So on a non-SMP system, the macro expands to 'do\{\}  while(0)' 
and gets optimised away.

                \subsection{Functions trap\_init,init\_IRQ}
                The functions trap\_init and init\_IRQ are architecture 
dependant and perform the initilization of IRQ hardware. These functions are 
defined in the architecture dependant section of kernel code. Let us look at 
them one by one.

                Function trap\_init() \label{init:trap_init}
                \textit{File: }\url{arch/i386/kernel/traps.c}\\
                This function is used to initialize the IDT with an exception 
handler functions for each recognized exception. This job is accomplished 
through the set\_trap\_gate and set\_system\_gate macros. The x86 
microprocessors issue 20 different exceptions(0 - 19). The kernel must provide 
a dedicated exception handler for each exception type. The table in 
~\ref{appendix 1} shows the exception and its corresponding exception handler 
along with the signals sent by the exception handlers. 
        \begin{verbatim}
                set_trap_gate(0,&divide_error);
                set_trap_gate(1,&debug);
                set_intr_gate(2,&nmi);
                set_system_gate(3,&int3);       /* int3-5 can be called from 
all */
                set_system_gate(4,&overflow);
                set_system_gate(5,&bounds);
                set_trap_gate(6,&invalid_op);
                .
                .
                .
                set_trap_gate(19,&simd_coprocessor_error);
                set_call_gate(&default_ldt[0],lcall7);
                set_call_gate(&default_ldt[4],lcall27);

        \end{verbatim}
                All these functions map to \_set\_gate macro with appropriate 
parameters. Please refer to ~\ref{appendix 2} for details about different types 
of gates. The macro \_set\_gate is explained below:

        \begin{verbatim}
                #define _set_gate(gate_addr,type,dpl,addr) \
                do { \
                  int __d0, __d1; \
                    __asm__ __volatile__ ("movw %%dx,%%ax\n\t" \
                  "movw %4,%%dx\n\t" \
                  "movl %%eax,%0\n\t" \
                  "movl %%edx,%1" \
                  :"=m" (*((long *) (gate_addr))), \
                   "=m" (*(1+(long *) (gate_addr))), "=&a" (__d0), "=&d" (__d1) 
\
                  :"i" ((short) (0x8000+(dpl<<13)+(type<<8))), \
                   "3" ((char *) (addr)),"2" (__KERNEL_CS << 16)); \
                } while (0)
        \end{verbatim}
        % TODO Macro explaination
        % Setup TSS and LDT in GDT for each task

                Function trap\_init() \label{init:trap_init}
                \textit{File: }\url{arch/i386/kernel/i8259.c}\\
                This function is used to setup all interrupt vectors and 
interrupt gates. Linux kernel uses vectors 0 to 31 for exceptions and 
nonmaskable interrupts while remaining vectors are software interrupts. 
Precisely, vectors 32(0x20) to 47(0x2f) are maskable interrupts, caused by IRQs 
while the remaining vectors ranging from 48 to 255 may be used to identify 
software interrupts. Also, Linux uses only the 128(0x80) vector, which it uses 
to implement system calls.[SYSCALL\_VECTOR].

        \begin{verbatim}
                for (i = 0; i < NR_IRQS; i++) {
                        /* NR_IRQS   224 
                           as per include/asm-i386/irq.h 
                        /* FIRST_EXTERNAL_VECTOR = 0x20 ie. 32 
                           as per include/asm-i386/hw_irq.h  */

                        int vector = FIRST_EXTERNAL_VECTOR + i;
                        if (vector != SYSCALL_VECTOR)
                                set_intr_gate(vector, interrupt[i]);
                }
                set_intr_gate(FIRST_DEVICE_VECTOR, interrupt[0]); 
                set_intr_gate(RESCHEDULE_VECTOR, reschedule_interrupt);
                set_intr_gate(INVALIDATE_TLB_VECTOR, invalidate_interrupt);
                set_intr_gate(CALL_FUNCTION_VECTOR, call_function_interrupt);
        \end{verbatim}

                The interrupts gates corresponding to FIRST\_DEVICE\_VECTOR, 
RESCHEDULE\_VECTOR, INVALIDATE\_TLB\_VECTOR, CALL\_FUNCTION\_VECTOR are set. 
These are special IRQ vectors used by the SMP architecture, generally ranging 
from 0xf0 to 0xff. Refer to \url{include/asm-i386/hw_irq.h} for more details.

                \subsection{Function sched\_init}
                Process is a basic entity in any unix based system. In a 
multitasking environment, number of processes are executing on single/multiple 
CPU/CPUs. Each process gets a fair chance to execute on the CPU depending on 
the process characteristics. This allocation is done by a special process known 
as "scheduler". The process scheduler is initialized by calling the function 
sched\_init defined in \url{kernel/sched.c}

                \subsubsection{scheduler data structure}
                The runqueue data structure is explained here in order to 
understand the initialization code. The scheduler uses an array of runqueues 
\index{runqueues} as basic data structure defined in \url{kernel/sched.c}. The 
NR\_CPUS\index{NR\_CPUS}\footnote{defined in \url{include/linux/threads.h}} 
represents the number of CPUs present in the system. Its value is 32 in SMP 
mode and 1 in non-SMP mode.
        \begin{verbatim}
        struct runqueue {
                spinlock_t lock;
                unsigned long nr_running, nr_switches, expired_timestamp;
                signed long nr_uninterruptible;
                task_t *curr, *idle;
                prio_array_t *active, *expired, arrays[2];
                int prev_nr_running[NR_CPUS];
                task_t *migration_thread;
                list_t migration_queue;
        } ____cacheline_aligned;

        static struct runqueue runqueues[NR_CPUS] __cacheline_aligned;
        \end{verbatim}

                The description of the elements of the above structure follows:
        \begin{description}
        \item[lock] Spinlock used by the runqueue in order to gain atomic 
access of the CPU.
        \item[nr\_running] Total number of runnable processes i.e. in 
TASK\_RUNNING state.
        \item[task\_t curr, idle] Tasks associated with the current 
runqueue.\footnote{task\_t is typedefinition of task\_struct 
\url{include/linux/sched.h}}
        \item[prio\_array\_t arrays] Priority structure containing the priority 
bitmap of size BITMAP\_SIZE along with a linked list (queue) of size MAX\_PRIO.
        \item[migration\_thread] \index{migration\_thread} Migration thread 
associated with the runqueue.Refer to section ~\ref{psched:structs} for more 
details.
        \item[migration\_queue] \index{migration\_queue} Migration queue 
associated with the runqueue.Refer to section ~\ref{psched:structs} for more 
details.
        \end{description}

                \par Each process has a process descriptor associated with it. 
This process information is stored in struct task\_struct defined in 
\url{include/linux/sched.h}. All process descriptors are linked together by 
process list and the runqueue list links together process descriptors all the 
runnable processes. In both cases, the init\_task process descriptor acts as 
the list header.

                Function sched\_init() \label{init:sched_init}
                \par The function sched\_init initializes the runqueue data 
structure for all the CPUs. It contains 2 copies of priority array structure 
which are initialized to active and expired priorities. The INIT\_LIST\_HEAD 
macro initializes the the linked list (queue) of priority structure of size 
MAX\_PRIO, (value greater than any user task priority \url{include/sched.h} for 
details) and also clears the priority bitmap.

        \begin{verbatim}
                for (i = 0; i < NR_CPUS; i++) {
                        prio_array_t *array;
                        rq = cpu_rq(i);
                        rq->active = rq->arrays;
                        rq->expired = rq->arrays + 1;
                        spin_lock_init(&rq->lock);
                        INIT_LIST_HEAD(&rq->migration_queue);
                        for (j = 0; j < 2; j++) {
                                array = rq->arrays + j;
                                for (k = 0; k < MAX_PRIO; k++) {
                                        INIT_LIST_HEAD(array->queue + k);
                                        __clear_bit(k, array->bitmap);
                                /* refer to include\asm-i386/bitops.h */
                                }
                        }
                        __set_bit(MAX_PRIO, array->bitmap);
                }
        }
        \end{verbatim}

                The sched\_init function also starts a process in SMP mode by 
seting up the a runqueue on current CPU, its "curr" and "idle" pointers to 
itself. Then it initializes all the timers by calling the function 
init\_timervecs \footnote{kernel/timer.c} and initializes the bottom halves 
associated with the task queue (TQUEUE\_BH) and immediate queue 
(IMMEDIATE\_BH). The function wake\_up\_process is explained in the 
~\ref{Scheduler Chapter}. Also, refer section ~\ref{int:bh} for more 
information about \texttt{bottom halves}.

        \begin{verbatim}
                rq = this_rq();
                rq->curr = current;
                rq->idle = current;
                wake_up_process(current);

                init_timervecs();
                init_bh(TQUEUE_BH, tqueue_bh);
                init_bh(IMMEDIATE_BH, immediate_bh);
        \end{verbatim}

                \subsection{Function softirq\_init}
                The concept of tasklets, softirqs are introduced from kernel 
version 2.4 and the primary tasklet\_struct is defined in 
\url{include/linux/interrupt.h}. Don't forget to read the properties of 
tasklets in the include file.
                Softirqs were introduced that take advantage of multiple 
processors in an SMP, and allow each CPU to run a softirq. Softirqs are thus, 
multithreaded analogue of Bottom Halves\index{bottom half} which can run on 
multiple CPUs at once. The deprecated bottom-halves are reimplemented using 
softirqs. Both bottom-halves and softirqs are statically registered. The 2.4 
Linux kernel also introduce tasklets, which are dynamically registrable 
softirqs, which are guaranteed to only run on one CPU at a time. Refer section 
~\ref{int:tasklet} for more information about \texttt{Tasklets}.

                Function softirq\_init() \label{init:softirq_init}
                \textit{File: }\url{kernel/softirq.c}\\
                The softirq\_init function initializes all the tasklets 
\index{tasklets} by calling tasklet\_init function. Aglobal array struct 
tasklet\_struct bh\_task\_vec[32] is used to initialize all the tasklets. These 
tasklets are associated with first 32 (0x00-0x1F) IRQ(software interrupt) 
lines. The function tasklet\_init associates the function bh\_action as the IRQ 
handler for all IRQs from 0 to 31.
                \par The function open\_softirq initializes the softirqs 
realted to TASKLET\_SOFTIRQ and HI\_SOFTIRQ\footnote{defined in 
\url{include/linux/interrupt.h}}. The function pointers tasklet\_action and 
tasklet\_hi\_action are stored in an array related to softirq action (consists 
of function and data).
        \begin{verbatim}
                static struct softirq_action softirq_vec[32] 
__cacheline_aligned_in_smp;
        \end{verbatim}

        \begin{verbatim}
                for (i=0; i<32; i++)
                        tasklet_init(bh_task_vec+i, bh_action, i);

                open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL);
                open_softirq(HI_SOFTIRQ, tasklet_hi_action, NULL);
        \end{verbatim}

                \subsection{Function fork\_init}
                Function fork\_init() \label{init:fork_init}
                \textit{File: }\url{kernel/fork.c}\\
                The fork\_init function calls 
kmem\_cache\_create\footnote{Refer to \url{mm/slab.c} for more detailes of slab 
and memory allocation} to initialize the slab cache related to task\_structs. 
It also determines the maximum number of threads depending upon the physical 
memory available. The name parameter passed  is \texttt{task\_struct} which can 
be found in the file \url{/proc/slabinfo}. The max\_threads value is used to 
set the resource limits for the init\_task.\footnote{The init\_task is defined 
in a special file containing only data \url{arch/i386/kernel/init_task.c} which 
calls the macro INIT\_TASK defined in \url{include/linux/init_task.h}}

        \begin{verbatim}
                task_struct_cachep =
                       kmem_cache_create("task_struct",
                                          sizeof(struct task_struct),0,
                                          SLAB_HWCACHE_ALIGN, NULL, NULL);
                if (!task_struct_cachep)
                       panic("fork_init(): cannot create task_struct SLAB 
cache");

                max_threads = mempages / (THREAD_SIZE/PAGE_SIZE) / 8;

                init_task.rlim[RLIMIT_NPROC].rlim_cur = max_threads/2;
                init_task.rlim[RLIMIT_NPROC].rlim_max = max_threads/2;
        \end{verbatim}

        \begin{verbatim}
                /* Please, Note
                 * value for mempages is set in the function setup_arch()
                 * in arch/i386/kernel/setup.c 
                 */
                #define THREAD_SIZE (2*PAGE_SIZE)      /* 
include/asm-i386/thread_info.h */
                /* Effectively, max_threads = mempages / 2*8 ; */
        \end{verbatim}

                \subsection{Function signals\_init}
                The signals\_init also calls function kmem\_cache\_create to 
initialize the slab cache related to signals and to create a signals related 
cache. The code can be found in \url{kernel/signal.c}. The name parameter 
passed  is \texttt{sigqueue} which can be found in the file 
\url{/proc/slabinfo}.

        \begin{verbatim}
        sigqueue_cachep =
                kmem_cache_create("sigqueue",
                                  sizeof(struct sigqueue),
                                  __alignof__(struct sigqueue),
                                  SIG_SLAB_DEBUG, NULL, NULL);
        if (!sigqueue_cachep)
                panic("signals_init(): cannot create sigqueue SLAB cache");
        \end{verbatim}

                \subsection{Function smp\_init}
                The smp\_init is architecture dependant function used to 
perform SMP initialization of all CPUs. The function code invokes functions 
smp\_boot\_cpus, do\_boot\_cpus from \url{arch/i386/kernel/smpboot.c}. These 
functions perform basic SMP related initialization. Refer to section 
\ref{smp:init} for more details on SMP initialization and scheduling.

        \begin{verbatim}
                smp_boot_cpus();
                smp_threads_ready=1;
                smp_commence();
        \end{verbatim}

                \par After completing all the initialization stuff, what does 
the kernel do? Correct, it sits idle. The function 
\textit{cpu\_idle}\label{init:idle} is architecture dependant and is defined in 
\url{arch/i386/kernel/process.c}. The kernel waits for some process to get 
scheduled using \textit{schedule()} function.
        \begin{verbatim}
                while(1) {
                        while (!need_resched())
                                idle();
                        schedule();
                }
        \end{verbatim}

[Prev in Thread]

Current Thread

[Next in Thread]

[LKDP] updated scheduler/initial.tex, KDLinux <=

Prev by Date: [LKDP] lkdp/mm swapping.tex graphics/multiqueue.png gr...
Previous by thread: [LKDP] lkdp/mm swapping.tex graphics/multiqueue.png gr...
Index(es):
- Date
- Thread