VxWorks-SMP.ppt-道客多多_道客多多docduoduo.com

资源描述

1、Chapter 2: VxWorks-SMP Architecture,GPP-VE,Revision 1,2,Agenda,VxWorks SMP HW requirements System startup BSP Spinlocks Wind Kernel spinlock CPU affinity Atomic memory operations Core information and management RTP application Cache sub system,Tasks and interrupt lock Task variables,3,Objectives,By

2、the end of this chapter, you will be able to: List the basic building blocks of VxWorks 6.6 SMP Follow the startup sequence Articulate the differences between UP and SMP BSP Understand when and how to use spinlocks Identify the special Wind Spinlock Know how to use CPU affinity and what are the limi

3、tations Be familiar with memory atomic operations Gather management information about cores Know what to expect when migrating from UP to SMP,4,VxWorks SMP What is it?,Wind River platform includes a device operating system and middleware integrated with appropriate processors and boards, as well as

4、a complete development environment (Workbench) that addresses all aspects of the device software development processVxWorks 6.6 Includes: VxWorks SMP operating system BSP and SMP compatible device driver architecture (VxBus) SMP ready development environment with debugger and tools that provides end

5、-to-end development suite,5,VxWorks SMP OS,What are the hardware requirements for VxWorks SMP? Shared Memory: Each core views the same physical memory subsystem Core control: When VxWorks boots on a single core this core acts as the bootstrap core, other cores are held in reset or wait-state until t

6、hey are released. The bootstrap core must have a way to determine core/s inventory and to be able to release other cores from wait/reset state Cache: L1 cache is available for each core, used to keep cache coherency between cores (cache snooping) Interrupt Control: A specific interrupt can be routed

7、 to a selected coreAtomic Operations: Spinlock is the fundamental building block of SMP, spinlocks are implemented with atomic operations. Core synchronization can not operate without atomic operations Independent cores: For the first VxWorks SMP release (6.6) it is assumed that all cores are fully

8、independent and thus do not share any hardware resource Notice that other OSs, such as Linux, may have different HW requirements, however all of the above are mandatory for VxWorks SMP,6,VxWorks SMP How to Build an image,VxWorks SMP is offered as an add-on, customers who are using UP will have no pe

9、rformance issue as they will use the exact same OS A build macro VXBUILD=SMP is to specified in order to build VxWorks SMP binaries. This build macro causes a preprocessor macro _WRS_VX_SMP to be defined Preprocessor macro _WRS_VX_SMP has been defined Prebuild SMP binaries are included in the VxWork

10、s installation (/target/proj/_smp) To ensure VxWorks UP to VxWorks SMP compatibility: VxWorks SMP unsupported APIs must not be used (e.g. intLock() Only public interfaces must be used Any definition varies based on the _WRS_VX_SMP macro must not reside in a public header file,7,VxWorks SMP System St

11、artup,When an SMP target board is powered on, only a single core, the bootstrap core, will be active and responsible for executing the bulk of the target hardware initialization code, and initializing the VxWorks OS.,The non-boostrap cores will be enabled near the end of the usrRoot() function (entr

12、y point for of the root task). Thus, all the cores will be enabled just prior to initialization customers kernel applications. In the current VxWorks SMP release (6.6) the system boot will not execute in parallel on multiple cores, thus the boot time will not improve when compared to UP systems,8,Vx

13、Works SMP System Startup,9,VxWorks SMP BSP,Board Support Package (BSP) is the board dependent glue code between the operating system and the underlying hardware. As such it needs to provide different services to a UP and an SMP VxWorks.The differences between UP and SMP BSP include: Build Environmen

14、t CPU Inventory CPU Identification Bootstrapping Reboot handling VxBus Support Interrupt routing and assignment,10,VxWorks SMP BSP differences between UP and SMP BSP,Build Environment: the build macro VXBUILD=SMP should be defined which will result in _WRS_VX_SMP being included.CPU inventory: SMP sy

15、stem determines how many cores to enable by looking at configuration information from three subsystems: Architecture The max number of cores supported by the architecture Configuration Number of CPU that the VxWorks runtime is configured for BSP Number of actual cores that are presentVxWorks queries

16、 each of the above subsystems and uses the smallest value returned as a number of cores to enable for SMP,11,VxWorks SMP BSP differences between UP and SMP BSP,CPU Identification: VxWorks needs to be able to uniquely identify each core in the system. vxCpuIndexGet() API performs whatever hw-dependen

17、t activities needed to determine the core index for the core that is performing the operation. The bootstrap core is number 0Bootstrapping: Each BSPs bootstrap code needs to contain logic that can allow it to determine whether or not it is the boostrap core, and to change its execution path to eithe

18、r run, if it is the bootstrap core or busy-wait to be released by the bootstrap processor. WIND_CPU_STATE parameter is an architecture-dependent data structure that is used by the core-bootstrap to configure the other cores. WIND_CPU_STATE includes the program counter and stack pointer of each core.

19、 (See appendix A or click here for more details about WIND_CPU_STATE structure),12,VxWorks SMP BSP differences between UP and SMP BSP,Reboot handling: When VxWorks decides to reboot the system, it will invoke the BSP-provided function sysToMonitor(). Only the BSP has a full understanding of the over

20、all HW layout, as such, the BSP must perform whatever actions are necessary to reboot the system. This may involve BIOS call or direct call to HW registersVxBus Support: Only VxBus-compatible drivers can be used with the SMP configuration of VxWorks. See chapter 5 in this training material for more

21、information about VxBus,13,VxWorks SMP BSP differences between UP and SMP BSP,Interrupt routing and assignment: For the current release (6.6) an interrupt can be routed to only one, pre-defined core. However, in the future releases an interrupt from a device can potentially be delivered simultaneous

22、ly to more than one core. This will reduce interrupt latency by allowing core with the lowest latency to gain access to ISR faster In VxBus, interrupt routing and assignment are handled through the use of VxBus-compliant interrupt controller. See chapter 5 for more information about VxBus,14,VxWorks

23、 SMP - Spinlock,Spinlock provides a facility for a short-term mutual exclusion and synchronization in SMP system A spinlock is a busy-wait, in which a core waits for the spinlock to become available, and acquires the spinlock once it is available All other cores must wait for the spinlock to be rele

24、ased before they may acquire it,15,VxWorks SMP - Spinlock,Since the spinlock is owned by the core (not the task) preemption of the task holding the spinlock is forbidden from the time the acquisition begins until the spinlock is released Blocking a task holding a spinlock will reduce performance, in

25、crease system latency and may result in a deadlock,Blocking Scenario,Non-Blocking Scenario,16,VxWorks SMP - Spinlock,VxWorks spinlocks use a FIFO to manage spinlock requests, this means that each request will be served “fairly” and will not starve. This feature is exclusive to Wind River SMP and not

26、 available in Linux SMP for example An SMP system is inherently less non-deterministic than a UP system. The spinlock is by nature a non-deterministic operation, and since this forms the basis for all synchronization primitives, it makes SMP less deterministic than UP. The VxWorks deterministic spin

27、lock eliminates some of this indeterminism, but doesnt eliminate it. VxWorks spinlocks operate as full memory barriers between acquisition and release. Thus all memory accesses held while a spinlock is acquired are performed in strict order Tasks cannot be deleted while they hold a spinlock, the del

28、ete will be deferred until the spinlock is given, at which time the task will be deleted,17,VxWorks SMP - Spinlock,There are two types of spinlocks: ISR-Callable spinlock: Used to address contention between ISRs/Tasks and ISRs. Disable interrupts on the local core while taken Disable task preemption

29、 on local core while taken Interrupts and tasks on other cores are not effectedTask-only spinlocks: Used to address contention between tasks Disable task preemption on local core while taken Interrupts and tasks on other core are not effected,18,VxWorks SMP - Spinlock,ISR-Callable Spinlocks APIs spi

30、nLockIsrInit() Initializes an ISR-callable spinlock spinLockIsrTake() Acquires an ISR-callable spinlock spinLockIsrGive() Relinquishes ownership of an ISR-callable spinlockTask-Only spinlocks APIs spinLockTaskInit() Initializes a task-only spinlock spinLockTaskTake() Acquires a task-only spinlock sp

31、inLockTaskGive() Relinquishes ownership of a task-only spinlock,19,VxWorks SMP Wind Spinlock (kernel),Wind Kernel Spinlock: VxWorks kernel uses a spinlock to protect kernels critical section. For example, whenever the kernel needs to move a task from a pend to a ready queue, the Wind kernel spinlock

32、 is acquired The Wind Kernel spinlock does not disable interrupts while the lock is held (to minimize interrupt latency) Each core has a queue used to defer jobs (aka “work queue”) If a task, or another ISR, was interrupted while holding the Wind Kernel spinlock, the requested operation is deferred

33、by queuing a job onto the kernel work queue. The task or ISR that was interrupted will complete its operation while holding the Wind Kernel spinlock. Before the spinlock is released the task drains the job queue,20,Read/Write Semaphores*,Read/Write semaphores provide enhanced performance for applica

34、tion that can effectively make use of differentiation between read and write access to a resource A read/write semaphore can be taken in either read mode or write mode A task holding a semaphore in write mode has exclusive access to a resource Read/write semaphore allows multiple readers on critical

35、 section, thus allowing more than one task to read a resource in a truly concurrent manner The maximum number of tasks that can take a read / write semaphore in read mode can be specified when the semaphore is created,Read/write semaphores are not SMP related, this semaphore is introduced in 6.6 rel

36、ease,21,Read/Write Semaphores,A semaphore is created (semRWCreate() with a max number of readers, there is always only one writerA reader semaphore can be taken (semRTake() by multiple tasks simultaneously,22,Read/Write Semaphores,When a write semaphore is taken, it will block any further access to

37、the critical section and waits until read semaphores exit the critical section,When a read/write semaphore becomes available, precedence is given to pended tasks that require write access (regardless of task priority) Increase concurrent execution in SMP at the cost of semaphore bookkeeping overhead

38、,23,VxWorks SMP CPU Affinity What,VxWorks SMP provides the ability to assign tasks to a specific CPU, this is known as task CPU affinity Mutual-exclusion on a core allows interrupt masking, for example, if task A disables interrupts on core 1 (out of 4 cores) all other cores can still execute ISRs,2

39、4,VxWorks SMP CPU Affinity Why,Task CPU affinity is used when: The software architecture requires that a task will be executed on a specific core Tasks are frequently contending for the same spinlock which increase busy-waiting, placing these tasks on the same CPU will increase system performance. T

40、his will free executing time on other cores (see an example on next slide) The device specifications require that a set of tasks will not run concurrently. For example when it is required that task A will never run concurrently with task B Helps with cache locality which helps with performance since

41、 it avoids moving tasks between CPUs and hence avoids cache misses and invalidations CPU affinity is inherited between tasks, e.g. if task A spawns task B then task A and B share the same CPU affinity CPU affinity is not inherited when the task that is created is an RTPs initial task,25,VxWorks SMP

42、CPU Affinity Why,Task A & B are not affiliated with a core, the tasks can be randomly executed on any core. However, the tasks share the same critical section,Task A & B are affiliated with core 0,26,VxWorks SMP CPU Affinity How,Task CPU Affinity Routines taskCpuAffinitySet() Sets the CPU affinity f

43、or a task taskCpuAffinityGet() - Returns the CPU affinity for a task Both APIs take a CPU set variable of type cpuset_t ( typedef unsigned int cpuset_t; ) Each bit in a cpuset_t variable corresponds to a specific core, or core index with the first bit representing core 0 The cpuset_t is manipulated

44、via macros,27,VxWorks SMP CPU Affinity Implications,If a task has a CPU affinity then the scheduling algorithm, will not guarantee that the N highest priority tasks will execute (N=Number of cores), however the highest priority ready task will always be running. Example:,the N highest tasks are 1, 2

45、, 3, 4 - however, the tasks that will run are 1, 3, 4, 5. Task 2 cannot run because it is excluded by Task 1,28,VxWorks SMP CPU Affinity,29,VxWorks SMP Interrupt CPU Affinity,Interrupt from devices can be routed to any one of the CPUs through a programmable interrupt controller (PIC) By default, int

46、errupts are routed to the bootstrap core (core 0) Interrupt CPU affinity can be useful to (statically) load-balance the interrupt execution among several cores Runtime assignment of interrupts to a specific core occurs at boot time, when the system reads interrupt configuration information from the

47、BSP,30,VxWorks SMP Atomic Memory Operation,Atomic memory operations are a set of simple operations that can be applied to a single data item in memory atomically. This means the operation is not interrupted by any other operation on the data item Can be used as a simpler alternative to spinlocks, su

48、ch as for updating a single data element Atomic memory caller must ensure the location has memory access attributes and an alignment that allows atomic memory access Atomic memory routine are available in user space and from kernel Atomic operators are divided into four logical groups: arithmetic, l

49、ogical, read/write, compare/swap,31,VxWorks SMP Atomic Memory Operation,32,VxWorks SMP Atomic Memory Operation,The vxCas operator is the most complex of the atomic operators. It is designed to be used to update a data structure by: Reading a value from a data structure Updating the value, according

50、to the needs of the algorithm Writing the value back, but only if the data structure has been left unchanged since the original read from the data structure occurred,33,VxWorks SMP Atomic Memory Operation,VxCas returns TRUE if the swap is actually executed, FALSE otherwise. Example: bool vxCas_examp

51、le (void)atomic_t atomicVar; /* atomic variable */atomic_t old; /* value to compare with */atomic_t new; /* value to swap with */bool casDone; /* return value of the API */* setup */old = (atomic_t) (0 - rand ();new = (atomic_t) rand ();atomicVar = (atomic_t) 0;/* if the value of atomicVar equals old assign a new value to atomicVar and return TRUE*/* if the value of atomicVar does not equal old then just return FALSE */casDone = vxCas (,

展开阅读全文