Chapter 1
1.1 Real-Time Operating Systems
Real time computing may be defined as that type of computing in which the correctness of the system depends not only on the logical result of the computation, but also on the time at which the results are produced. . A real-time system must satisfy bounded response-time constraints; otherwise risk severe consequences, including failure. Real-time systems are classified as hard, firm and soft systems. In hard real-time systems, failure to meet response-time constraints leads to system failure. Firm real-time systems are those systems with hard deadlines, but where a certain low probability of missing a deadline can be tolerated. Systems in which performance is degraded but not destroyed by failure to meet response-time constraints are called soft real-time systems. An RTOS differs from common OS, in that the user when using the former has the ability to directly access the microprocessor and peripherals. Such an ability of the RTOS helps to meet deadlines.
1.2 Basic requirements:
The following are the basic requirements of an RTOS:
(i) Multi-threading and preemptibility
To support multiple tasks in real-time applications, an RTOS must be multi-threaded and
preemptible. The scheduler should be able to preempt any thread in the system and give the
resource to the thread that needs it most. An RTOS should also handle multiple levels of
interrupts i.e., the RTOS should not only be preemptible at thread level, but at the interrupt level as well.
(ii) Thread priority
In order to achieve preemption, an RTOS should be able to determine which thread needs
a resource the most, i.e., the thread with the earliest deadline to meet. Ideally, this should be done at run-time. However, in reality, such a deadline-driven OS does not exist. To handle deadlines, each thread is assigned a priority level. Deadline information is converted to priority levels and the OS allocates resources according to the priority levels of threads. Although the approach of resource allocation among competing threads is prone to error, in absence of another solution, the notion of priority levels is used in an RTOS.
(iii) Predictable thread synchronization mechanisms
For multiple threads to communicate among each other, in a timely fashion, predictable
inter-thread communication and synchronization mechanisms are required. Also, supported
should be the ability to lock/unlock resources to achieve data integrity.
(iv) Priority inheritance
When using priority scheduling, it is important that the RTOS has a sufficient number of
priority levels, so that applications with stringent priority requirements can be implemented Unbounded priority inversion occurs when a higher priority task must wait on a low priority task to release a resource while the low priority task is waiting for a medium priority task. The RTOS can prevent priority inversion by giving the lower priority task the same priority as the higher priority task that is being blocked (called priority inheritance). In this case, the blocking task can finish execution without being preempted by a medium priority task. The designer must make sure that the RTOS being used prevents unbounded priority inversion .
(v) Predefined latencies
An OS that supports a real-time application needs to have information about the timing of
its system calls. The behavior metrics to be specified are:
(a) Task switching latency: Task or context-switching latency is the time to save the context of
a currently executing task and switch to another task. It is important that this latency be short.
(b) Interrupt latency: This is the time elapsed between the execution of the last instruction of
the interrupted task and the first instruction in the interrupt handler, or simply the time from
interrupt to task run . This is a metric of system response to an external event.
(c) Interrupt dispatch latency: This is the time to go from the last instruction in the interrupt
handler to the next task scheduled to run. This indicates the time needed to go from interrupt
level to task level.
Realtime applications depend on the operating system to handle multiple events within fixed time constraints. The more responsive the OS, the more “room” a realtime application has to maneuver when meeting its deadlines.
1.3 Introduction to QNX RTOS:
The QNX Operating System is ideal for realtime applications. It provides multitasking, priority-driven preemptive scheduling, and fast context switching - all essential ingredients of a realtime system. QNX RTOS has been deployed in embedded systems for over 20 years in mission and life-critical systems, medical instruments, aviation and space systems, process-control systems, and in-car devices.In the subsequent chapters we examine the different features of QNX.
In the Chapter-2 we take a look at the system architecture of QNX that provides core real-time services with massive scalability and portability. Also in this chapter we describe the microkernel of QNX systems which is the heart of the system. The microkernel comprises so many functions, that for all intents and purposes it is the entire operating system.
In Chapter-3 we take a look at the Process manager of the QNX systems.
Chapter-4 deals with the Device Manager.
Chapter-5 is dedicated to the study of File-system Manager and Resource manager.
In Chapter-6 we conclude with a summary of the features of QNX RTOS systems and its applications.
Chapter 2
2.1 QNX RTOS
The main responsibility of an operating system is to manage a computer's resources. All activities in the system - scheduling application programs, writing files to disk, sending data across a network, and so on - should function together as seamlessly and transparently as possible. The QNX Operating System is ideal for embedded realtime applications. It can be scaled to very small sizes and provides multitasking, threads, priority-driven preemptive scheduling, and fast context-switching - all essential ingredients of a realtime system. Moreover, the QNX/Neutrino OS delivers these capabilities with a POSIX-standard API; there's no need to forgo standards in order to achieve a small OS. QNX/Neutrino is also remarkably flexible. Developers can easily customize the OS to meet the needs of their applications. From a ``bare-bones'' configuration of a kernel with a few small modules to a full-blown network-wide system equipped to serve hundreds of users, QNX/Neutrino lets you set up your system to use only those resources you require to tackle the job at hand.
QNX/Neutrino achieves its unique degree of efficiency, modularity, and simplicity through two fundamental principles:
- microkernel architecture
- message-based interprocess communication
The QNX RTOS has a client-server architecture consisting of a microkernel and optional cooperating processes. The microkernel implements only the core services, like threads, signals, message passing, synchronization, scheduling and timer services. The microkernel itself is never scheduled. Its code is executed only as the result of a kernel call, the occurrence of a hardware interrupt or a processor exception. Additional functionality is implemented in cooperative processes, which act as server processes and respond to the request of client processes (e.g. an application process). Examples of such server processes are the file system manager, process manager, device manager, network manager, etc. While the kernel runs at privilege level 0 of the Intel processor, the managers and device drivers run at levels 1 and 2 (to perform IO operations). Application processes on the other hand run at privilege level 3, and can therefore only execute general instructions of the processor.
The QNX RTOS is a message-based operating system. Message passing is the fundamental means of inter-process communication (IPC) in this RTOS. The message passing service is based on the client-server model: the client (e.g. an application process) sends a message to a server (e.g. device manager) who replies with the result. A lot of the QNX NEUTRINO RTOS API calls use the message passing mechanism. For example, when an application process wants to open a file, the system call is translated into a message that is sent to the file system manager. The file manager (after accessing the disk via its device drivers) replies with a file handle. This message passing mechanism is network transparent i.e., the system can be seamlessly distributed over several nodes, without requiring any changes in the application code. When passing messages between client and server, the QNX RTOS uses a mechanism called “client-driven priority”. This means that a server process inherits the priority level of the client process requesting a service. When the client’s request is serviced, the server process can regain its original priority level. When multiple clients are requesting a service from a server, the server process assumes the priority level of the highest priority client process. This is to avoid priority inversion. A client-server architecture has many advantages, one of which is robustness. Every manager (except for the process manager) and device driver runs in its own virtual memory address space, resulting in a robust and reliable system. This price to pay is performance: execution of system calls results in a few context switches (with an overhead caused by memory protection) resulting in somewhat lower performance.
QNX consists of a small kernel in charge of a group of cooperating processes. As the following illustration shows, the structure looks more like a team than a hierarchy, as several players of equal rank interact with each other and with their ``quarterback''
The QNX Microkernel coordinating the system managers.
Conventional RTOSs use a single flat memory architecture where hard-to-detect programming errors like corrupt C pointers can cause programs to overwrite each other or the kernel. The inevitable result: system failure. A QNX-based system, however, can intelligently recover from software faults, even in drivers and other critical programs— without rebooting—because every OS component runs in its own MMU-protected address space. More importantly, because of the QNX RTOS’s elegant, efficient design, full MMU protection doesn’t come at the expense of performance. With fast context-switch speeds and low latencies, QNX delivers reliable real-time performance.
Graphic
2.2 The QNX Microkernel
A microkernel OS is structured as a tiny kernel that provides the minimal services used by a team of optional cooperating processes, which in turn provide the higher-level OS functionality. The microkernel itself lacks file systems and many other services normally expected of an OS - those services are provided by optional processes. The real goal in designing a microkernel OS is not simply to ``make it small.'' A microkernel OS embodies a fundamental change in the approach to delivering OS functionality. To call any kernel a “microkernel” simply because it happens to be small would miss the point entirely. QNX is a microkernel implementation of the core POSIX 1003.1, 1003.1a, 1003.1b, 1003.1c, and 1003.1d features used in embedded real-time systems, along with the fundamental QNX message-passing services. The POSIX features that aren't implemented in the microkernel (file and device I/O, for example) are provided by optional processes and DLLs (dynamically linked libraries).
Successive QNX microkernels have seen a reduction in the code required to implement a given kernel call. The object definitions at the lowest layer in the kernel code have become more specific, allowing greater code reuse (such as folding various forms of POSIX signals, realtime signals, and QNX pulses into common data structures and code to manipulate those structures).
At its lowest level, Neutrino contains a few fundamental objects and the highly tuned routines that manipulate them. The Neutrino microkernel is built from this foundation.
2.3 System processes
All QNX services, except those provided by the Microkernel, are handled via standard QNX processes. A typical QNX configuration has the following system processes:
- Process Manager (Proc)
- Device Manager (Dev)
- File system Manager (Fsys)
- Network Manager(Net)
2.3.1 Process manager:
When Neutrino is used without the Process Manager, the runtime memory model that is produced when application threads are linked in consists of a single address space with multiple threads of execution. This image represents a single multi-threaded process with process id 1. On an Intel processor, the threads in the process execute at priority level 1. When the threads enter the microkernel via a kernel call, the microkernel executes at ring 0.
Creating multiple processes (each of which may contain multiple threads) requires the Process Manager, which adds another 32K of code and provides three new capabilities in addition to those provided by the microkernel:
- process management - manages process creation, destruction, and process attributes such as user ID (uid) and group ID (gid).
- memory management - manages a range of memory protection capabilities, DLLs, and interprocess POSIX shared-memory primitives.
- pathname management - manages the pathname space into which resource managers may attach.
2.3.2 Device manager:
The QNX Device Manager (Dev) is the interface between processes and terminal devices. These terminal devices are located in the I/O namespace with names starting with /dev. For example, a console device on QNX would have a name such as: /dev/con1.
2.3.3 File-System manager:
QNX/Neutrino provides a rich variety of filesystems. Like most service-providing processes in the OS, these filesystems execute outside the kernel; applications use them by communicating via messages generated by the shared-library implementation of the POSIX API.
2.3.4 Network manager:
The Network Manager is responsible for propagating the QNX messaging primitives across a local area network. The standard messaging primitives used in local messaging are used unmodified in remote messaging. The Network Manager (Net) gives QNX users a seamless extension of the operating system's powerful messaging capabilities. Communicating directly with the Microkernel, the Network Manager enhances QNX's message-passing IPC by efficiently propagating messages to remote machines. In addition, the Network Manager offers three advanced features:
- increased throughput via load balancing
- fault tolerance via redundant connectivity
- bridging between QNX networks
Network Manager does not have to be built into the operating system image. It may be started and stopped at any time to provide or remove network messaging capabilities.When the Network Manager starts, it registers with the Process Manager and Microkernel. This activates existing code within the two that interfaces to the Network Manager. This means that network messaging and remote process creation are not just a layer added onto the operating system. Network messaging is integrated into the very heart of the messaging and process-management primitives themselves.
.In the following sections we look at the above mentioned system processes.
Chapter 3
3.1 Process Manager
The Process Manager works closely with the Microkernel to provide essential operating system services. Although it shares the same address space as the Microkernel (and is the only process to do so), the Process Manager runs as a true process. As such, it is scheduled to run by the Microkernel like all other processes and it uses the Microkernel's message-passing primitives to communicate with other processes in the system.
The Process Manager is responsible for creating new processes in the system and managing the most fundamental resources associated with a process. These services are all provided via messages. For example, if a running process wants to create a new process, it does so by sending a message containing the details of the new process to be created. Note that since messages are network-wide, you can easily create a process on another node by sending the process-creation message to the Process Manager on that node.
3.2 Process creation primitives
QNX supports three process-creation primitives:
- fork()
- exec()
- spawn()
Both fork() and exec() are defined by POSIX, while the implementation of spawn() is unique to QNX.
fork()The fork() primitive creates a new process that is an exact image of the calling process. The new process shares the same code as the calling process and inherits a copy of all of the calling process's data.
exec()The exec() primitive replaces the calling process image with a new process image. There's no return from a successful exec(), because the new process image overlays the calling process image. It's common practice in POSIX systems to create a new process - without removing the calling process - by first calling fork(), and then having the child of the fork() call exec().
spawn()The spawn() primitive creates a new process as a child of the calling process. It can avoid the need to fork() and exec(), resulting in a faster and more efficient means for creating new processes. Unlike fork() and exec(), which by their very nature operate on the same node as the calling process, the spawn() primitive can create processes on any node in the network.
3.3 The life cycle of a process: A process goes through four phases:
1. Creation
2. Loading
3. Execution
4. Termination
Creating a process consists of allocating a process ID for the new process and setting up the information that defines the environment of the new process. Most of this information is inherited from the parent of the new process.
3.3.2 Loading
The loading of process images is done by a loader thread. The loader code resides in the Process Manager, but the thread runs under the process ID of the new process. This lets the Process Manager handle other requests while loading programs.
3.3.3 Execution
Once the program code has been loaded, the process is ready for execution; it begins to compete with other processes for CPU resources. All processes run concurrently with their parents. In addition, the death of a parent process does not automatically cause the death of its child processes.
3.3.4 Termination
A process is terminated in either of two ways:
- a signal whose defined action is to cause process termination is delivered to the process
- the process invokes exit(), either explicitly or by default action when returning from main()
Termination involves two stages: A termination thread in the Process Manager is run. This code is in the Process Manager but the thread runs with the process ID of the terminating process. This thread closes all open file descriptors and releases the following:
o any virtual circuits held by the process
o all memory allocated to the process
o any symbolic names
o any major device numbers (I/O managers only)
o any interrupt handlers
o any proxies
o any timers
After the termination thread is run, notification of process termination is sent to the parent process (this phase runs inside the Process Manager).
3.4 Process manager responsibilities
- message passing - the Microkernel handles the routing of all messages among all processes throughout the entire system ..
- scheduling - the scheduler is a part of the Microkernel and is invoked whenever a process changes state as the result of a message or interrupt .
The process manager uses the following kernel calls to support the following:
- threads
- message passing
- signals
- timers
- interrupt handlers
- semaphores
- mutual exclusion locks (mutexes)
- condition variables (condvars)
The entire OS is built upon these calls. Neutrino is fully preemptable, even while passing messages between processes; it resumes the message pass where it left off before preemption.
The minimal complexity of the Neutrino microkernel significantly helps place an upper bound on the longest non-preemptable code path through the kernel, while the small code size makes addressing complex multiprocessor issues a tractable problem. Services were chosen for inclusion in the microkernel on the basis of having a short execution path. Operations requiring significant work (e.g. process loading) were assigned to external processes/threads, where the effort to enter the context of that thread would be insignificant compared to the work done within the thread to service the request.
Rigorous application of this rule to dividing the functionality between the kernel and external processes destroys the myth that a microkernel OS must incur higher runtime overhead than a monolithic kernel OS. Given the work done between context switches (implicit in a message pass), and the very quick context-switch times that result from the simplified kernel, the time spent performing context switches becomes “lost in the noise” of the work done to service the requests communicated by the message passing between the processes that make up the QNX OS. The following diagram illustrates the preemptability and interruptibility of the kernel through various stages of processing a message-pass request.
Interrupts are disabled, or preemption is held off, for only very brief intervals. The exit case exhibits the variability of 14 to 40 opcodes only when “error'” cases (page faults and memory violations) in application processing occur. For the normal, non-error case, the work to exit the microkernel is only 14 opcodes.
When building an application (real-time, embedded, graphical, or otherwise), the developer may want several algorithms within the application to execute concurrently. Within QNX/Neutrino, this concurrency is achieved by using the POSIX thread model, which defines a process as containing one or more threads of execution.
A thread can be thought of as the minimum “unit of execution”, the unit of scheduling and execution in the microkernel. A process, on the other hand, can be thought of as a “container” for threads, defining the “address space” within which threads will execute. A process will always contain at least one thread. Depending on the nature of the application, threads might execute independently with no need to communicate between the algorithms (unlikely), or they may need to be tightly coupled, with high-bandwidth communications and tight synchronization. To assist in this communication and synchronization, QNX provides a rich variety of IPC and synchronization services.
Neutrino and the process manager can be configured to provide a mix of threads and processes (as defined by POSIX) to create at least the following “concurrent execution” environments:
- A team of threads, all running within the address space of a single process. This is comparable to the classical "real-time kernel" runtime model. Without the assistance of the process manager, QNX can create this runtime environment.
- A team of processes, each containing one thread, with no memory protection between processes.
- A team of processes, each with one thread, with each process running in a separate, MMU-protected address space. This is the typical UNIX-without-threads runtime environment. The optional process manager is required for this environment.
- A team of processes, each containing a team of cooperating threads, with all the processes MMU-protected from each other. This is the modern runtime environment found in UNIX, Windows NT, etc. The optional process manager is required for this environment.
3.5.1 When scheduling decisions are made
The execution of a running thread is temporarily suspended whenever the microkernel is entered as the result of a kernel call, exception, or hardware interrupt. A scheduling decision is made whenever the execution state of any thread changes - it doesn't matter which processes the threads might exist within. Threads are scheduled globally across all processes.
Normally, the execution of the suspended thread will resume, but the scheduler will perform a context switch from one thread to another whenever the running thread:
- is blocked
- is preempted
- yields.
(a) When thread is blocked
The running thread will block when it must wait for some event to occur (response to an IPC request, wait on a mutex, etc.). The blocked thread is removed from the ready queue and the highest priority ready thread is then run. When the blocked thread is subsequently unblocked, it is placed on the end of the ready queue for that priority level.
The running thread will be preempted when a higher-priority thread is placed on the ready queue (it becomes READY, as the result of its block condition being resolved). The preempted thread remains at the start of the ready queue for that priority and the higher-priority thread runs.
(c) When thread yields
The running thread voluntarily yields the processor (sched_yield()) and is placed on the end of the ready queue for that priority. The highest-priority thread then runs (which may still be the thread that just yielded).
To meet the needs of various applications, QNX provides three scheduling algorithms:
- FIFO scheduling
- round-robin scheduling
- adaptive scheduling
Each thread in the system may run using any one of these methods. They are effective on a per-thread basis, not on a global basis for all threads and processes on a node. These scheduling algorithms apply only when two or more threads that share the same priority are READY (i.e. the threads are directly competing with each other). If a higher-priority thread becomes READY, it immediately preempts all lower-priority threads. Although a thread inherits its scheduling algorithm from its parent process, the thread can request to change the algorithm applied by the kernel.
FIFO scheduling : In FIFO scheduling, a thread selected to run continues executing until it:
- voluntarily relinquishes control (e.g. it blocks)
- is preempted by a higher-priority thread
Round-robin scheduling : In round-robin scheduling, a thread selected to run continues executing until it:
- voluntarily relinquishes control
- is preempted by a higher-priority thread
- consumes its timeslice
A timeslice is the unit of time assigned to every process. Once it consumes its timeslice, a thread is preempted and the next READY thread at the same priority level is given control. A timeslice is 50 milliseconds.
Adaptive scheduling : In adaptive scheduling, a thread behaves as follows:
- If the thread consumes its timeslice (i.e. it doesn't block), its priority is reduced by 1. This is known as priority decay. A "decayed" thread won't continue decaying, even if it consumes yet another timeslice without blocking - it will drop only one level below its original priority.
- If the thread blocks, it immediately reverts to its original priority.
Adaptive scheduling can be used in environments where potentially compute-intensive background threads are sharing the computer with interactive users. It is found that adaptive scheduling gives the compute-intensive threads sufficient access to the CPU, yet retains fast interactive response for other threads. It is rarely used for most real-time control systems.
3.6 QNX IPC:
IPC plays a fundamental role in the transformation of QNX/Neutrino from an embedded realtime kernel into a full-scale POSIX operating system. As various service-providing processes are added to the Neutrino microkernel, IPC is the “glue” that connects those components into a cohesive whole.
Although message passing is the primary form of IPC in QNX/Neutrino, several other forms are available as well. Unless otherwise noted, those other forms of IPC are built over QNX message passing. The strategy is to create a simple, robust IPC service that can be tuned for performance through a simplified code path in the microkernel; more “feature cluttered” IPC services can then be implemented from these. As part of the engineering effort that went into defining the Neutrino microkernel, the focus on message passing as the fundamental IPC primitive was deliberate. As a form of IPC, message passing is synchronous and copies data.
3.6.1 Synchronous message passing
A thread that does a MsgSendv() to another thread (which could be within another process) will be blocked until the target thread does a MsgReceivev(), processes the message, and executes a MsgReplyv(). If a thread executes a MsgReceivev() without a previously sent message pending, it will block until another thread executes a MsgSendv().
A thread undergoing state changes in a typical send-receive-reply transaction. This inherent blocking synchronizes the execution of the sending thread, since the act of requesting that the data be sent also causes the sending thread to be blocked and the receiving thread to be scheduled for execution - this happens without requiring explicit work by the kernel to determine which thread to run next (as would be the case with most other forms of IPC). Execution and data move directly from one context to another.
Possible process states in a QNX system
Data queuing capabilities are omitted from these messaging primitives because queueing could be implemented when needed within the receiving thread. The sending thread is often prepared to wait for a response; queueing is unnecessary overhead and complexity (i.e. it slows down the non-queued case). As a result, the sending thread doesn't need to make a separate, explicit blocking call to wait for a response had some other IPC form been used.
While the send and receive operations are blocking and synchronous, MsgReplyv() (or MsgError()) doesn't block. Since the client thread is already blocked waiting for the reply, no additional synchronization is required, so a blocking MsgReplyv() isn't needed. This allows a server to reply to a client and continue processing while the kernel and/or networking code asynchronously passes the reply data to the sending thread and marks it ready for execution. As most servers will tend to do some processing to prepare to receive the next request (at which point they block again), this works out well.
Chapter 4
4.1 The Device Manager
The QNX Device Manager (Dev) is the interface between processes and terminal devices. These terminal devices are located in the I/O namespace with names starting with /dev. For example, a console device on QNX would have a name such as: /dev/con1
QNX programs access terminal devices using the standard read(), write(), open(), and close() functions. A terminal device is presented to a QNX process as a bidirectional stream of bytes that can be read or written by the process. The Device Manager regulates the flow of data between an application and the device. Some processing of this data is performed by Dev according to parameters in a terminal control structure (called termios), which exists for each device. The termios parameters control low-level functionality such as:
- line-control discipline (including baud rate, parity, stop bits, and data bits)
- echoing of characters
- input line editing
- recognizing, and acting on, breaks and hangups
- software and hardware flow control
- translation of output characters
System consoles are managed by the Dev.con driver process. The display adapter and the screen, plus the system keyboard, are collectively referred to as the console.
QNX permits multiple sessions to be run concurrently on consoles by means of virtual consoles. The Dev.con console driver process typically manages more than one set of I/O queues to Dev, which are made available to user processes as a set of terminal devices with names like /dev/con1, /dev/con2, etc. From the application's point of view, there “really are” multiple consoles available to be used. Of course, there's only one physical screen and keyboard, so only one of these virtual consoles is actually displayed at any one time. The keyboard is “attached” to whichever virtual console is currently visible.
4.4 Serial devices
Serial communication channels are managed by the Dev.ser driver process. This driver can manage more than one physical channel; it provides terminal devices with names such as /dev/ser1, /dev/ser2, etc. When we start Dev.ser, we can specify command-line arguments that determine which - and how many - serial ports are installed . Dev.ser is an example of a purely interrupt-driven I/O server. After initializing the hardware, the process itself goes to sleep. Received interrupts place input data directly into the input queue. The first output character on an idle channel is transmitted to the hardware when Dev issues the first kick call into the driver. Subsequent characters are transmitted by the appropriate interrupt being received.
Parallel printer ports are managed by the Dev.par driver process. When we start Dev.par, we specify a command-line argument that determines which parallel port is installed. Dev.par is an output-only driver, so it has no input or canonical input queues. Dev.par is an example of a completely non-interrupt I/O server. The parallel printer process normally remains RECEIVE-blocked, waiting for data to appear in its output queue and a kick from Dev. When data is available to print, Dev.par runs in a busy-wait loop (at relatively low adaptive priority), while waiting for the printer hardware to accept characters. This low-priority busy-wait loop ensures that overall system performance isn't affected, yet on the average produces the maximum possible throughput to the parallel device.
Chapter 5
5.1 The File system Manager
QNX/Neutrino provides a rich variety of file systems. Like most service-providing processes in the OS, these file systems execute outside the kernel; applications use them by communicating via messages generated by the shared-library implementation of the POSIX API. The file systems are resource managers. Each file system adopts a portion of the pathname space and provides file system services through the standard POSIX API (open(), close(), read(), write(), lseek(), etc.).
This implementation means that:
- filesystems may be started and stopped dynamically
- multiple filesystems may run concurrently
- applications are presented with a single unified pathname space and interface, regardless of the configuration and number of underlying filesystems.
5.2 File system classes
The many filesystems available can be categorized into 4 classes as follows:
Image: A special filesystem that presents the modules in the image and is always present.
Block: Traditional filesystems that operate on block devices like hard disks and CD-ROM drives. This includes the POSIX, DOS, and CD-ROM filesystems.
Flash: Non-block-oriented filesystems designed explicitly for the characteristics of Flash memory devices. This includes the Flash filesystem.
Network: Filesystems that provide network file access to the filesystems on remote host computers. This includes the NFS and CIFS filesystems.
Since it's common to run many filesystems under Neutrino, they have been designed as a family of drivers and DLLs to maximize code reuse. This means the cost of adding an additional filesystem is typically smaller than might otherwise be expected. Once an initial filesystem is running, the incremental memory cost for additional filesystems is minimal, since only the code to implement the new filesystem protocol would be added to the system
In QNX, a file is an object that can be written to, read from, or both. QNX implements at least six types of files; five of these are managed by Fsys:
- Regular files - consist of randomly accessible sequences of bytes and have no other predefined structure.
- Directories - contain the information needed to locate regular files; they also contain status and attribute information for each regular file.
- Symbolic links - contain a pathname to a file or directory that is to be accessed in place of the symbolic link file. These files are often used to provide multiple paths to a single file.
- Pipes and FIFOs - serve as I/O channels between cooperating processes.
- Block special files - refer to devices, such as disk drives, tapes, and disk drive partitions. These files are normally accessed in a manner that hides the hardware characteristics of the device from applications.
The sixth filetype, the character special file, is managed by the Device Manager. Other filetypes may be managed by other managers.
5.3 File access
Access to regular files and directories is controlled by mode bits stored in the file's inode . These bits permit read, write, and execute capability based on effective user and group IDs. There are three access qualifiers:
- user only
- group only
- others
5.4 Regular files and Directories
QNX views a regular file as a randomly accessible sequence of bytes that has no other predefined internal structure. Application programs are responsible for understanding the structure and content of any specific regular file. Regular files constitute the majority of files found in file systems. File systems are supported by the File system Manager and are implemented on top of the block special files that define disk partitions .
A directory is a file that contains directory entries. Each directory entry associates a filename with a file. A filename is the symbolic name that lets you identify and access a file.
In QNX, file data can be referenced by more than one name. Each filename is called a link. There are two kinds of links: hard links, which we refer to simply as ``links,'' and symbolic links. In order to support links for each file, the filename is separated from the other information that describes a file. The non-filename information is kept in a storage table called an inode (for ``information node''). If a file has only one link (i.e. one filename), the inode information (i.e. the non-filename information) is stored in the directory entry for the file. If the file has more than one link, the inode is stored as a record in a special file named /.inodes, as are the file's directory entry points to the inode record.
5.5 Extents
In QNX, regular files and directory files are stored as a sequence of extents. An extent is a contiguous set of blocks on disk. Files that have only a single extent store the extent information in the directory entry. But if more than one extent is needed to hold the file, the extent location information is stored in one or more linked extent blocks. Each extent block can hold location information for up to 60 extents.
A file consisting of multiple consecutive regions on a disk - called extents in QNX.
When the Filesystem Manager needs to extend a file whose last extent is full, it first tries to extend the last extent, even if only by one block. But if the last extent can't be extended, a new extent is allocated to extend the file. To allocate new extents, the Filesystem Manager uses a “first fit” policy. A special table in the Filesystem Manager contains an entry for each block represented in the /.bitmap file . Each of these entries defines the largest contiguous free extent in the area defined by its corresponding block. The Filesystem Manager chooses the first entry in this table large enough to satisfy the request for a new extent.
. There are two other situations in which a file can have an entry in the /.inodes file:
- If a file's filename is longer than 16 characters, the inode information is stored in the /.inodes file, making room for a 48-character filename in the directory entry.
- If a file has had more than one link and all links but one have been removed, the file continues to have a separate /.inodes file entry. This is done because the overhead of searching for the directory entry that points to the inode entry would be prohibitive (there are no back links from inode entries to directory entries).
5.6 High-performance disk access
The Filesystem Manager has several features that contribute to high-performance disk access:
- elevator seeking
- buffer cache
- multi-threading
- client-driven priority
- temporary files
- ramdisks
Elevator seeking minimizes the overall seek time required to read or write data from or to disk. Outstanding I/O requests are ordered such that they can all be performed with one sweep of the disk head assembly, from the lowest to the highest disk address. Elevator seeking also has integrated enhancements to ensure that multi-sector I/O is performed whenever possible.
The buffer cache is an intelligent buffer between the Filesystem Manager and the disk driver. The buffer cache attempts to store filesystem blocks in order to minimize the number of times the Filesystem Manager has to access the disk. By default, the size of the cache is determined by total system memory, but you can specify a different size via an option to Fsys.
Read operations are synchronous. Write operations, on the other hand, are usually asynchronous. When the data enters the cache, the Filesystem Manager replies to the client process to indicate that the data is written. The data is then written to the disk as soon as possible, typically less than five seconds later.
Applications can modify write behavior on a file-by-file basis. For example, a database application can cause all writes for a given file to be performed synchronously. This would ensure a high level of file integrity in the face of potential hardware or power problems that might otherwise leave a database in an inconsistent state.
5.6.3 Multi-threading
The Filesystem Manager is a multi-threaded process. That is, it can manage several I/O requests simultaneously. This allows the Filesystem Manager to fully exploit potential parallelism since it can do both of the following:
- access several devices in parallel
- satisfy I/O requests from the buffer cache while other I/O requests that access physical disks are in progress
5.6.4 Client-driven priority
The Filesystem Manager may have its priority driven by the priority of the processes that send it messages. When the Filesystem Manager receives a message, its priority is set to that of the process that sent the message.
5.6.5 Temporary files
QNX has a performance option for opening temporary files that are written and then reread in a short period of time. For such files, the Filesystem Manager attempts to keep the data blocks in the cache and will write the blocks to disk only if absolutely necessary.
5.6.6 Ramdisks
The Filesystem Manager has an integrated ramdisk capability that allows up to 8M of memory to be used as a simulated disk. Since the Filesystem Manager uses highly efficient multipart messaging, data moves from the ramdisk directly to the application buffers.
The Filesystem Manager is able to bypass the buffer cache because the ramdisk is built in, not implemented as a driver. Because they eliminate the delays of physical hardware and don't rely on the filesystem cache, ramdisks provide greater determinism in read/write operations than hard disks.
The QNX filesystem achieves high throughput without sacrificing reliability. This has been accomplished in several ways.
While most data is held in the buffer cache and written after only a short delay, critical filesystem data is written immediately. Updates to directories, inodes, extent blocks, and the bitmap are forced to disk to ensure that the filesystem structure on disk is never corrupt (i.e. the data on disk should never be internally inconsistent).
Sometimes all of the above structures must be updated. For example, if you move a file to a directory and the last extent of that directory is full, the directory must grow. In such cases, the order of operations has been carefully chosen such that if a catastrophic failure occurs with the operation only partially completed (e.g. a power failure), the filesystem, upon rebooting, would still be ``sane.'' At worst, some blocks may have been allocated, but not used
5.8 Filesystem recovery
Even in the best systems, true catastrophes such as these may happen:
- Bad blocks may develop on a disk because of power surges or brownouts.
- A naive or malicious user with access to superuser privileges might reinitialize the filesystem (via the dinit utility).
- An errant program (especially one run in a non-QNX environment) may ignore the disk partitioning information and overwrite a portion of the QNX partition.
So that we can recover as many of your files as possible if such events ever occur, unique “signatures” have been written on the disk to aid in the automatic identification and recovery of the critical filesystem pieces. The inodes file (/.inodes), as well as each directory and extent block, all contain unique patterns of data that the chkfsys utility can use to reassemble a truly damaged filesystem.
5.8 Resource managers
In order to give QNX/Neutrino a great degree of flexibility, to minimize the runtime memory requirements of the final system, and to cope with the wide variety of devices that may be found in a custom embedded system, QNX allows user-written processes to act as resource managers that can be started and stopped dynamically.
Neutrino resource managers are responsible for presenting an interface to various types of devices. This may involve managing actual hardware devices (like serial ports, parallel ports, network cards, and disk drives) or virtual devices (like /dev/null, a network filesystem, and pseudo-ttys). In other operating systems, this functionality is traditionally associated with device drivers. But unlike device drivers, Neutrino's resource managers don't require any special arrangements with the kernel. In fact, a resource manager looks just like any other user-level program.
Because QNX/Neutrino is a distributed, microkernel OS with virtually all non-kernel functionality provided by user-installable programs, a clean and well-defined interface is required between client programs and resource managers. All resource manager functions are documented; there's private interface between the kernel and a resource manager. In fact, a resource manager is basically a user-level server program that accepts messages from other programs and, optionally, communicates with hardware. Again, the power and flexibility of the native QNX/Neutrino IPC services allow the resource manager to be decoupled from the OS. One of the salient features of Neutrino is the ability to use threads. By using multiple threads, a resource manager can be structured so that several threads are waiting for messages and then simultaneously handling them. This thread management is another convenient function provided by the resource manager shared library. Besides keeping track of both the number of threads created and the number of threads waiting, the library also takes care of maintaining the optimal number of threads.
The binding between the resource manager and the client programs that use the associated resource is done through a flexible mechanism called pathname space mapping.
In pathname space mapping, an association is made between a pathname and a resource manager. The resource manager sets up this pathname space mapping by informing the QNX/Neutrino Process Manager that it is the one responsible for handling requests at (or below, in the case of filesystems), a certain mountpoint. This allows the Process Manager to associate services (i.e. functions provided by resource managers) with pathnames.
For example, a serial port may be managed by a resource manager called Devc.ser, but the actual resource may be called /dev/ser1 in the pathname space. Therefore, when a program requests serial port services, it typically does so by opening a serial port - in this case /dev/ser1
All communications between the client program and the resource manager are done through native QNX/Neutrino IPC messaging. This allows for a number of unique features:
- A well-defined interface to application programs. In a development environment, this allows a very clean division of labor for the implementation of the client side and the resource manager side.
- A simple interface to the resource manager. Because all interactions with the resource manager go through QNX/Neutrino IPC, and there are no special "back door" hooks or arrangements with the OS, the writer of a resource manager can focus on the task at hand, rather than worry about all the special considerations needed in other operating systems.
- Free network transparency. Because the underlying QNX/Neutrino IPC messaging mechanism is inherently network-distributed without any additional effort required by the client or server (resource manager), programs can seamlessly access resources on other nodes in the network without even being aware that they're going over a network.
The resource manager architecture contains three parts:
1. A channel has to be created, so that client programs can connect to the resource manager to send it messages.
2. The pathname (or pathnames) that the resource manager is going to be responsible for is registered with the Process Manager, so that it can resolve open requests for that particular pathname to this resource manager.
3. Messages are received and processed.
This message-processing structure is required for each and every resource manager. However, we provide a set of convenient library functions to handle this functionality (and other key functionality as well). By supporting pathname space mapping, having a well-defined interface to resource managers, and providing a set of libraries for common resource manager functions, QNX/Neutrino offers the developer unprecedented flexibility and simplicity in developing "drivers" for new hardware - a critical feature for many embedded systems
Chapter 6
Conclusion:
Tiny yet powerful, the QNX microkernel lies at the heart of the QNX RTOS. As we have seen, QNX delivers core realtime services for embedded applications, including message passing, POSIX thread services, mutexes, condition variables, semaphores, signals, and scheduling. It can be smoothly extended to support file systems, networking and other OS-level capabilities with off the shelf, service-providing modules.
QNX is the only RTOS that scales without special coding. Design systems with a single-processor architecture or, for the ultimate in next-generation scalability, take advantage of QNX’s unique capability and build dual-processor systems that act as true, tightly coupled symmetric multiprocessors.
In fact, it can be concluded that with its small size, scalability, flexibility and reliability QNX is an ideal operating system for real-time applications.
Bibliography
· William Stallings “Operating Systems – Internals and Design Principles,4th Edition.” Prentice-Hall, Inc.
· http://www.qnx.com/nc
· http://swd.de/documents/manuals
· http://rtoq4viop.com