Understanding the linux kernel Chapter4 Interrupts and Exceptions

Interrupts and Exceptions

classification

  • Interrupts
    • Maskable interrupts
    • Nonmaskable interrupts
  • Exceptions
    • Processor-detected exceptions
      • Faults
      • Traps
      • Aborts
    • Programmed exceptions

IRQs and Interrupts

the way a hardware device controller used to manage the interrupt requests(IRQs) is the Interrupt Request line(in case of sophisticated devices, using several IRQ lines). all IRQ lines connect to the input-pin of Programmable Interrupt Controller.

Each IRQ line can be selectively disabled. the PIC can be told to stop issuing interrupts that refer to a given IRQ line, or to resume issuing them. Disabled interrupts are not lost; the PIC sends them to the CPU as soon as they are enabled again.

Selective enabling/disabling of IRQs is not the same as global masking/unmasking of maskable interrupts. When the IF flag of the eflags register is clear(use ass_instruction cli sti), each maskable interrupt issued by the PIC is temporarily ignored by the CPU.

The Advanced Programmable Interrupt Controller(APIC)

designed for mutiprocessor, each microprocessor include a local APIC and all local APIC connected to an external I/O APIC(i.e. multi-APIC system).


distributing interrupts
The I/O APIC use the Interrupt Redirection Table to indicate the interrupt vector and priority, the destination processor, and how the processor is selected, with all those contents be programmable.

Interrupt requests coming from external hardware devices can be distributed among the available CPUs in two ways:

  • Static distribution

the IRQ signal is distributed base on the interrupt redirection table.

  • Dynamic distribution

The IRQ signal is delivered to the local APIC of the processor that is executing the process with the lowest priority

generating interprocessor interrupts

When a CPU wishes to send an interrupt to another CPU, it stores the interrupt vector and the identifier of the target’s local APIC in the Interrupt Command Register (ICR) of its own local APIC. They are actively used by Linux to exchange messages among CPUs.

Exceptions

Interrupt Descriptor Table

A system table called Interrupt Descriptor Table (IDT) associates each interrupt or exception vector with the address of the corresponding interrupt or exception handler.

idtr CPU Register
store the address of IDT

three types descriptors stored in IDT(intel classification)
Linux uses interrupt gates to handle interrupts and trap gates to handle exceptions.

  • Task gate

Includes the TSS(Task State Segment) selector of the process that must replace the current one when an interrupt signal occurs.

  • Interrupt gate

Includes the Segment Selector and the offset inside the segment of an interrupt or exception handler. set IF flag to disable maskable interrupts.

  • Trap gate

same as Interrupt gate except that it don't set IF flag.

Hardware handling of Interrupts and Exceptions

privilege check of exception handler

Makes sure the interrupt was issued by an authorized source. First, it compares the Current Privilege Level (CPL), which is stored in the two least significant bits of the cs register, with the Descriptor Privilege Level (DPL) of the Segment Descriptor included in the GDT. Raises a “General protection” exception if the CPL is lower than the DPL, because the interrupt handler cannot have a lower privilege than the program that caused the interrupt. For programmed exceptions, makes a further security check: compares the CPL with the DPL of the gate descriptor included in the IDT and raises a “General protection” exception if the DPL is lower than the CPL. This last check makes it possible to prevent access by user applications to specific trap or interrupt gates.
(CPL must have higher privilege than the gate so it can accessed this gate(for programmed exceptions), and the handler pointered by gate must have higher privilege than CPL so it can do something to resolve the problem(do interrupt handler))

In conclusion, DPL of the Segment Descriptor identify the lowest CPL that can access this interrupt handler. Conversely, the DPL of the IDT gate descriptor identify the highest CPL that can handled by the interrupt_handler pointed by the gate descriptor.

prepare stage
find handler(check privilege)-----save info and change stack(tss)-----jump to handler

return from handler
load info stored in stack(if stack was changed, change it back)-------clear segement registers

Nested Execution of Interruptis and Exceptions

for exception

Because the “Page Fault” exception handler never gives rise to further exceptions, at most two kernel control paths associated with exceptions (the first one caused by a system call invocation, the second one caused by a Page Fault) may be stacked, one on top of the other.

for interruption

An interrupt handler may preempt both other interrupt handlers and exception handlers. Conversely, an exception handler never preempts an interrupt handler. The only exception that can be triggered in Kernel Mode is “Page Fault”, which we just described. But interrupt handlers never perform operations that can induce page faults, and thus, potentially, a process switch.

On multiprocessor systems, several kernel control paths may execute concurrently. Moreover, a kernel control path associated with an exception may start executing on a CPU and, due to a process switch, migrate to another CPU(no process switch can take place until an interrupt handler is running).

Initializing the Interrupt Descriptor Table

Interrupt, Trap, and System Gates

the linux classification of interruptions in IDT

  • Interrupt gate

An Intel interrupt gate that cannot be accessed by a User Mode process (the gate’s DPL field is equal to 0).(means user code can't use this interrupt gate.)

  • System gate

An Intel trap gate that can be accessed by a User Mode process (the gate’s DPL
field is equal to 3). The three Linux exception handlers associated with the vectors 4, 5, and 128 are activated by means of system gates, so the three assembly language instructions into, bound, and int $0x80 can be issued in User Mode.

  • Sytem interrupt gate

An Intel interrupt gate that can be accessed by a User Mode process (the gate’s DPL field is equal to 3). The exception handler associated with the vector 3 is activated by means of a system interrupt gate, so the assembly language instruction int3 can be issued in User Mode.

  • Trap gate

An Intel trap gate that cannot be accessed by a User Mode process (the gate’s DPL field is equal to 0). Most Linux exception handlers are activated by means of trap gates.

  • Task gate

An Intel task gate that cannot be accessed by a User Mode process (the gate’s DPL field is equal to 0). The Linux handler for the “Double fault” exception is activated by means of a task gate.

set gate in IDT

set_intr_gate(n,addr);
set_system_gate(n,addr);
set_system_intr_gate(n,addr);
set_trap_gate(n,addr);
set_task_gate(n,gdt);

Preliminary Initialization of the IDT

When kernel initializating: the setup_idt( ) assembly language function starts by filling all 256 entries of idt_table with the same interrupt gate, which refers to the ignore_int( ) interrupt handler. ignore_int() is an empty interrupt handler which print "Unknown interrupt” messages.

The ignore_int( ) handler should never be executed. The occurrence of “Unknown interrupt” messages on the console or in the log files denotes either a hardware problem (an I/O device is issuing unforeseen interrupts) or a kernel problem (an interrupt or exception is not being handled properly).

Exceptoin Handling

Most exceptions issued by the CPU are interpreted by Linux as error conditions. When one of them occurs, the kernel sends a signal to the process that caused the exception to notify it of an anomalous condition

three steps

  • Save the contents of most registers in the Kernel Mode stack (this part is coded in assembly language)
  • Handle the exception by means of a high-level C function
  • Exit from the handler by means of the ret_from_exception( ) function.

handle the Double fault

The “Double fault” exception is handled by means of a task gate instead of a trap or system gate, because it denotes a serious kernel misbehavior.
Thus, the exception handler that tries to print out the register values does not trust the current value of the esp register. When such an exception occurs, the CPU fetches the Task Gate Descriptor stored in the entry at index 8 of the IDT. This descriptor points to the special TSS segment descriptor stored in the 32nd entry of the GDT. Next, the CPU loads the eip and esp registers with the values stored in the corresponding TSS segment. As a result, the processor executes the doublefault_fn() exception handler on its own private stack.

Saving the Registers for the Exception handler

sava error code and the address of the handler in the stack, then jup to assembly code labeled as error_code, which performs:

  • store info for the invoking of the handler
  • handy invoke handler base on the info stored in the stack.

Entering and leaving the Exception handler

steps:

  • do_exception_name() ---> do_trap()(save exception info in current->thread and send signal to the process)
  • check whether exception occurred in User Mode or in Kernel Mode, if in kernel mode
    • case 0:kernel fault, invoke die() to print all info on the console and call do_exit() to terminates the current process.
    • case 1:invaild argumetn passed to the kernel.
  • jmp the code labeled ret_from_exception().

Interrupt Handling

three main classes of interrupts

  • I/O interrupts

    An I/O device requires attention; the corresponding interrupt handler must query the device to determine the proper course of action.

  • Timer interrupts

    Some timer, either a local APIC timer or an external timer, has issued an interrupt; this kind of interrupt tells the kernel that a fixed-time interval has elapsed. These interrupts are handled mostly as I/O interrupts;

  • Interprocessor interrupts

    A CPU issued an interrupt to another CPU of a multiprocessor system.

I/O Interrupt Handling

some device might share the same IRQ line,which achieved by:

  • IRQ sharing(interrupt service routinews(ISRs))
  • IRQ dynamic allocation

Linux divides the actions to be performed following an interrupt into three classes:

  • Critical

Actions such as acknowledging an interrupt to the PIC, reprogramming the PIC or the device controller, or updating data structures accessed by both the device and the processor.

  • Noncritical

Actions such as updating data structures that are accessed only by the processor(for instance, reading the scan code after a keyboard key has been pushed).These actions can also finish quickly, so they are executed by the interrupt handler immediately, with the interrupts enabled.

  • Noncritical deferrable

Actions such as copying a buffer’s contents into the address space of a process. These may be delayed for a long time interval without affecting the kernel operations; the interested process will just keep waiting for the data.

basic action of I/O handler:

  • Save the IRQ value and the register’s contents on the Kernel Mode stack.
  • Send an acknowledgment to the PIC that is servicing the IRQ line
  • Execute the interrupt service routines (ISRs) associated with all the devices that share the IRQ.
  • exit and call ret_from_intr().

IRQ data structures

unexpected interrupt

either if there is no ISR associated with the IRQ line, or if no ISR associated with the line recognizes the interrupt as raised by its own hardware device.

1.irq_desc_t

Every interrupt vector has its own irq_desc_t descriptor. All such descriptors are grouped together in the irq_desc array.

status
stores Flags describing the IRQ line status.

IRQ_INPROGRESS
IRQ_DISABLED//The IRQ line has been deliberately disabled by a device driver.
IRQ_PENDING
IRQ_REPLAY
IRQ_AUTODETECT
IRQ_WAITING
IRQ_LEVEL
IRQ_MACKED
IRQ_PRE_CPU

depth

The depth field and the IRQ_DISABLED flag of the irq_desc_t descriptor specify whether the IRQ line is enabled or disabled.

Every time the disable_irq() or disable_irq_nosync() function is invoked, the depth field is increased; if depth is equal to 0, the function disables the IRQ line and sets its IRQ_DISABLED flag. Conversely, each invocation of the enable_irq() function decreases the field; if depth becomes 0, the function enables the IRQ line and clears its IRQ_DISABLED flag.

handler

points to the PIC object(such as hw_irq_controller, see below) that services the IRQ line.

action

Identifies the interrupt service routines(ISR) to be invoked when the IRQ occurs. The field points to the first element of the list of irqaction descriptors associated with the IRQ. irqaction descriptor(describe the device sharing this irq line, see blow).

2.hw_interrupt_type

PIC objects, consisting of the PIC name and seven PIC standard methods.

3.irqaction

each irqaction refers to a specfic hardware device and a specific interrupt.

handler

Points to the interrupt service routine(ISR) for an I/O device. This is the key field that allows many devices to share the same IRQ.

flags

This field includes a few fields that describe the relationships between the IRQ line and the I/O device.

SA_INTERRUPT//The handler must execute with interrupts disabled
SA_SHIRQ//The device permits its IRQ line to be shared with other devices
SA_SAMPLE_RANDOM

next

Points to the next element of a list of irqaction descriptors. The elements in the list refer to hardware devices that share the same IRQ.

4.irq_stat

the irq_stat array includes NR_CPUS entries, one for every possible CPU in the system. Each entry of type irq_cpustat_t includes a few counters and flags used by the kernel to keep track of what each CPU is currently doing.

IRQ distribution in multiprocessor systems

TPR(task priority register)

arbitration priority registers

if tpr is same, base on the arbitration priority registers of local cpu.

IRQ affinity of a CPU

distribute interrupt to cpu handly in case of unfair dstribution. Linux 2.6 makes use of a special kernel thread called kirqd to correct, if necessary,the automatic assignment of IRQs to CPUs.

by modifying the Interrupt Redirection Table entries of the I/O APIC, it is possible to route an interrupt signal to a specific CPU. This can be done by invoking the set_ioapic_affinity_irq() function.(or change the CPU bitmap mask in the /proc/irq/n/smp_affinity, n denotes the interrupt vector)

Multiple Kernel Mode stacks

if thread_union is 8kb, this process's kernel stack is used to every kernel control path.Conversely,if 4kb, there are tree type stack:

  • The exception stack

contained in per_process thread union date structure.

  • The hard IRQ stack

handle interrupt, There is one hard IRQ stack for each CPU in the system, and each stack is contained in a single page frame.(contained in hardirq_stack array)

  • The soft IRQ stack

handler deferrable task.There is one soft IRQ
stack for each CPU in the system, and each stack is contained in a single page frame.(contained in softirq_stack array)

All hard IRQ stacks are contained in the hardirq_stack array, while all soft IRQ stacks are contained in the softirq_stack array. Each array element is a union of type irq_ctx that span a single page. At the bottom of this page is stored a thread_info structure, while the spare memory locations are used for the stack;

execution of interrupt handler

functions interrupt[n]

interrupt[n] is used to initialize entries in IDT.

Initialization of the IDT
for (i = 0; i < NR_IRQS; i++)
 if (i+32 != 128)
   set_intr_gate(i+32,interrupt[i]);

The interrupt array is built through a few assembly language instructions.

interrupt[n] store the address of below code
pushl $n-256
jmp common_interrupt

The kernel represents all IRQs through negative numbers, because it reserves positive interrupt numbers to identify system calls.

common_interrupt

the code labeled common_interrupt stores registers and call fun do_IRQ() and jump to ret_from_intr to return.

common_interrupt
common_interrupt:
 SAVE_ALL
 movl %esp,%eax
 call do_IRQ
 jmp ret_from_intr

eax points to the stack location containing the last register value pushed on by SAVE_ALL.

The do_IRQ() function

The do_IRQ() function is invoked to execute all interrupt service routines associated with an interrupt. It is declared as follows:

do_IRQ()
_ _attribute_ _((regparm(3))) unsigned int do_IRQ(struct pt_regs *regs)

The regparm keyword instructs the function to go to the eax register to find the value of the regs argument;

steps do_IRQ() do

  • call irq_enter() to increase preempt_count which represent the number of nested interrupt handlers.
  • check and change to hard IRQ stack if needed.
  • Invokes the __do_IRQ() function passing to it the pointer regs and the IRQ number obtained from the regs->orig_eax field
  • if stack was changed to hard IRQ stack, changed back.
  • call irq_exit() macro to decrease preempt_count and check deferrable task.

The __do_IRQ() function

The __do_IRQ() function receives as its parameters an IRQ number (through the eax register) and a pointer to the pt_regs structure where the User Mode register values have been saved.

steps

  • disable local interrupts untill the handler terminates.(this interrupts still can be accepted by other CPUs)
  • set few flags of the irq_desc_t
  • check if the interrupt is disabled. if so, do nothing.
  • set IRQ_INPROGRESS flag and invoke handle_IRQ_event().
  • invoke irq_desc_t->handler->end

handle_IRQ_event()

  • Enables the local interrupts with the sti assembly language instruction if the SA_INTERRUPT flag is clear.
  • Executes each interrupt service routine of the interrupt.(call each action->handler in list irq_desc_t->action)
  • Disables local interrupts with the cli assembly language instruction.
  • return 0 if no interrupt service routine has recognized interrupt, 1 otherwise

SA_INTERRUPT

The SA_INTERRUPT flag of the main IRQ descriptor determines whether interrupts must be enabled or disabled when the do_IRQ( )function invokes an ISR.

Dynamic allocation of IRQ lines

There is a way in which the same IRQ line can be used by several hardware devices even if they do not allow IRQ sharing. The trick is to serialize the activation of the hardware devices so that just one owns the IRQ line at a time.

example
//creates a new irqaction descriptor
struct irqaction* irq = request_irq(6, floppy_interrupt, SA_INTERRUPT|SA_SAMPLE_RANDOM, "floppy", NULL);

//set up interrupt
setup_irq(6, irq);

//if ops on device concluded, release
free_irq(6,NULL);

Interprocessor Interrupt Handling

Interprocessor interrupts allow a CPU to send interrupt to any other CPU in the system.

three kinds of interprocessor interrupts

  • CALL_FUNCTION_VECTOR(vector 0xfb)

Sent to all CPUs but the sender, forcing those CPUs to run a function passed by the sender. The corresponding interrupt handler is named call_function_interrupt( ).

  • RESCHEDULE_VECTOR(vector 0xfc)

When a CPU receives this type of interrupt, the corresponding handler—named reschedule_interrupt()—limits itself to acknowledging the interrupt.

  • INVALIDATE_TLB_VECTOR(vector 0xfd)

Sent to all CPUs but the sender, forcing them to invalidate their Translation Lookaside Buffers.

ops
send_IPI_all( )
//Sends an IPI to all CPUs (including the sender)
send_IPI_allbutself( )
//Sends an IPI to all CPUs except the sender
send_IPI_self( )
//Sends an IPI to the sender CPU
send_IPI_mask()
//Sends an IPI to a group of CPUs specified by a bit mask

Softirqs and Tasklets

Softirqs and tasklets are strictly correlated, because tasklets are implemented on top of softirqs. Softirqs are statically allocated (i.e., defined at compile time), while tasklets can also be allocated and initialized at runtime. Softirqs can run concurrently on several CPUs, even if they are of the same type. Thus, softirqs are reentrant functions. Tasklets' execution is controlled more strictly by the kernel. Tasklets of the same type are always serialized: in other words, the same type of tasklet cannot be executed by two CPUs at the same time. However, tasklets of different types can be executed concurrently on several CPUs.

terms
softirq

which appears in the kernel source code, often denotes both kinds of deferrable functions.

interrupt context

it specifies that the kernel is currently executing either an interrupt handler or a deferrable function.

Softirqs

The index of a sofirq determines its priority: a lower index means higher priority because softirq functions will be executed starting from index 0.

Data structures used for softirqs

1.softirq_vec

softirq_vec array, includes 32 elements of type softirq_action. The priority of a softirq is the index of the corresponding softirq_action element inside the array.

2.softirq_action

The softirq_action data structure consists of two fields: an action pointer to the softirq function and a data pointer to a generic data structure that may be needed by the softirq function.

3.preempt_count

store in the thread_info( of the current process or of the irq_ctx union) , used to keep track both of kernel preemption and of nesting of kernel control paths.

There is a good reason for the name of the preempt_count field: kernel preemptability has to be disabled either when it has been explicitly disabled by the kernel code (preemption counter not zero) or when the kernel is running in interrupt context.

The in_interrupt() macro checks the hardirq and softirq counters in the current_thread_info()->preempt_count field. If either one of these two counters is positive, the macro yields a nonzero value, otherwise it yields the value zero.

4.irq_cpustat_t->__softirq_pending

per-CPU 32-bit mask describing the pending softirqs.

The do_softirq() function

steps

  • check state with in_interrupt()
  • executes local_irq_save to save state of IF and disable interrupt on local CPU.
  • if needed, changed to soft IRQ stack in array softirq_ctx.
  • invoke __do_sofirq()
  • restore kernel stack if it was changed in step 3.
  • execute local_irq_restore to restore state
    of flag IF.

The __do_softirq() funcion

The _ _do_softirq() function reads the softirq bit mask of the local CPU and executes the deferrable functions corresponding to every set bit.(performs a fixed number of iterations to handle new accessd task)

steps

  • Initializes the iteration counter to 10.
  • Copies the softirq bit mask of the local CPU.
  • invoke local_bh_disable() to increase the softirq counter.
  • Clears the softirq bitmap of the local CPU, so that new softirqs can be activated
  • Executes local_irq_enable() to enable local interrupts.
  • call softirq_vec[n]->action
  • local_irq_disable()
  • decrease iteration counter and jump to step 4 untill counter == 0 or pending == 0
  • If there are more pending softirqs, it invokes wakeup_softirqd() to wake up the kernel thread that takes care of the softirqs for the local CPU

The ksoftirqd kernel threads

ksoftirqd
for(;;) {
 set_current_state(TASK_INTERRUPTIBLE);
 schedule();
 /* now in TASK_RUNNING state */
 while (local_softirq_pending()) {
 preempt_disable();
 do_softirq();
 preempt_enable();
 cond_resched();
 }
}

When awakened, the kernel thread checks the local_softirq_pending() softirq bit mask and invokes, if necessary, do_softirq().

high frequency softirqs

softirqs may reactivate themselves causing high frequency softirqs

one solution is ksoftirqd kernel threads

Tasklets

base on softirqs

Tasklets and high-priority tasklets are stored in the tasklet_vec and tasklet_hi_vec arrays, respectively. Both of them include NR_CPUS elements of type tasklet_head, and each element consists of a pointer to a list of tasklet descriptors(struct tasklet_struct).

tasklet_struct
state

  • TASKLET_STATE_SCHED//means has been scheduled for execution
  • TASKLET_STATE_RUN

ops

//disable tasklet(by invrease the count field of tasklet_struct)
tasklet_disable()
tasklet_disable_nosync()//return untill running tasklet terminated

//reenable
tasklet_enable()

//activate the tasklet
tasklet_schedule()
tasklet_hi_schedule()

//execute tasklet, which is registed in softirq_vec and invoked by *do_softirq()*
tasklet_hi_action()
tasklet_action()

tasklet_action()

  • disable local interrupts
  • get local CPU number n**
  • store the list of tasklet_vec[n] and set it NULL
  • enable local interrupts
  • execute the tasklet function if not disabled or is executing.
    • In multiprocessor systems, checks the TASKLET_STATE_RUN flag of the tasklet.(avoid same tasklet executing in the other CPU.)

Notice that, unless the tasklet function reactivates itself, every tasklet activation triggers at most one execution of the tasklet function.

Work Queues

it allows kernel functions to be activated (much like deferrable functions) and later executed by special kernel threads called worker threads.

differents between work queues and deferrable task

deferrable functions run in interrupt context while functions in work queues run in process context(can block).However, a function in a work queue is executed by a kernel thread, so there is no User Mode address space to access(can't access user data).

Work queue data structures

1.workqueue_struct

contains an array of NR_CPUS elements(struct cpu_workqueue_struct)

2.work_struct

represent the pending task storing in field worklist of cpu_workquueu_struct

Work queue functions

ops
//create
create_workqueue("workqueue_name");//returns the address of a workqueue_struct descriptor for the newly created and creat n(CPU number) worker threads. 
create_singlethread_workqueue();//create one worker thread

//destroy 
destroy_workqueue()

//insert work
queue_work();//with no repeat work
//insert work in worklist untill tiemr point
queue_delayed_work()//receive a parameter represent the time delay of the execution

//cancel delayed work if a delayed work has not insert into worklist
cancel_delayed_work()

// blocks the calling process until all functions that are pending in the work queue terminate.(not wait for the task register before the calling of the flush_workqueue)
flush_workqueue();//ignore inserted work after invoking flush_workquue

The prdefined work queue

the kernel offers a predefined work queue called events, which can be freely used by every kernel developer.

ops

functions executed in the predefined work queue should not block for a long time: because the execution of the pending functions in the work queue list is serialized on each CPU, a long delay negatively affects the other users of the predefined work queue.

Returning from Interrupts and Exceptions

several issues must be considered before return

  • Number of kernel control paths being concurrently executed
  • Pending process switch requests
  • Pending signals
  • Single-step mode
  • Virtual-808 mode

A few flags are used to keep track of pending process switch requests, of pending sigals, and of single step execution; they are stored in the flags field of the thread_info descriptor.

flow of return from interrupt(exception)

A difference(between interrupt and exception) exists only if support for kernel preemption has been selected as a compilation option: in this case, local interrupts are immediately disabled when returning from exceptions.

posted @ 2024-02-14 15:14  A2023  阅读(123)  评论(0)    收藏  举报