24.12.2012 Views

Chapter 4 - DSpace at Waseda University

Chapter 4 - DSpace at Waseda University

Chapter 4 - DSpace at Waseda University

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Analyzing Real-Time Performance Problems in<br />

Embedded Linux<br />

組込みLinuxにおけるカーネルのリアルタイム<br />

性能に関する問題の分析<br />

A DISSERTATION<br />

SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE<br />

AND THE COMMITTEE ON GRADUATE STUDIES<br />

OF WASEDA UNIVERSITY<br />

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS<br />

FOR THE DEGREE OF<br />

DOCTOR OF ENGINEERING<br />

July 2010<br />

Ki Duk Kwon


Abstract<br />

These days embedded systems are used in various fields as home appliances and smart<br />

phones, PDA or automobile. Embedded systems, which are usually designed to perform<br />

specific purpose, are significantly developing with embedded software technology. Among<br />

embedded software technologies, embedded oper<strong>at</strong>ing systems have changed innov<strong>at</strong>ively.<br />

All the embedded systems are currently oper<strong>at</strong>e without any problems, there is a possibility<br />

for an unexpected error to occur. There are no one-hundred percent perfect systems. Even<br />

commercial embedded systems have a possibility to occur kernel problems.<br />

When developing embedded systems, usually problems can be c<strong>at</strong>egorized as two groups –<br />

user level [59] and kernel level. In a user-level problem, it is not th<strong>at</strong> hard to fix since there<br />

are a lot of tools for developing and debugging. In contrast, when a problem occurs in kernel<br />

level, it is much more difficult to fix up than in user-level. This is because tools for kernel<br />

development usually provide minimum functions and in many case the functions are not<br />

helpful to fix the problems. Due to these characteristics of an embedded system, it is not easy<br />

iii


to propose a solution for the problem which occurs during project development. Moreover<br />

the structure of embedded systems is becoming more complex swiftly. Therefore, in order to<br />

analyze and solve problems in systems, a framework which makes performance measurement<br />

and analysis is urgently needed.<br />

In this paper, we propose a system architecture called Kernel Analysis System (KAS) which<br />

analyzes the event log of embedded kernel. KAS figures out problems in kernel quickly and<br />

has three main layers. First, is the Detection Layer. In this layer, the KAS finds out problems<br />

by checking all events th<strong>at</strong> occurred in the kernel and counting there number. Second, is the<br />

Separ<strong>at</strong>ion Layer. In this layer, the KAS separ<strong>at</strong>es only the events rel<strong>at</strong>ed to problems from<br />

all executed events. Third, is the Analysis Layer. In this layer, the KAS analyzes the problems<br />

by calcul<strong>at</strong>ing all the events’ running time and the number of error occurrences so as to figure<br />

out the cause of the problems. KAS cannot fix up every problem. Currently, KAS tested<br />

problem analysis of HRTimer, but it is not possible to analyze other problems. However, it<br />

proved by using KAS along with the analysis of kernel timer, which is one of the most<br />

important and difficult area in kernel, th<strong>at</strong> it is possible for developer or administr<strong>at</strong>or to<br />

analyze timer problems quickly and efficiently.<br />

iv


Contents<br />

Abstract ……………………………………………………………………..…. v<br />

1. Introduction<br />

1.1 Motiv<strong>at</strong>ion ……………………………………………………………………............ 3<br />

1.2 Challenge ……………………………………………………………………………. 4<br />

1.3 Contribution …………………………………………………………………………. 6<br />

1.4 Outline ………………………………………………………………………………. 7<br />

2. Background<br />

2.1 Embedded System ………………………………………………………………….… 8<br />

2.2 Linux Kernel …………………………………………………………………………. 9<br />

2.3 Embedded Linux ……………………………………………………………………. 12<br />

2.4 System Monitoring …………………………………………………………………. 14<br />

v


2.5 Event Log …………………………………………………………………………… 15<br />

2.6 Problem Analysis …………………………………………………...………………. 17<br />

2.7 Linux Timer ………………………………………………………………...……… 18<br />

3. Rel<strong>at</strong>ed Work<br />

3.1 Event log ………………………………………………………………………….... 23<br />

3.1.1 Event Logging ……………………………………………………………….. 23<br />

3.1.1 Event Log Monitoring ……………………………………………………… 25<br />

3.2 System Monitoring …………………………………………………………………. 26<br />

3.3 Performance Analysis Tools ………………………………………………...…… 29<br />

3.4 Linux Trace Toolkit next gener<strong>at</strong>ion ………………………………………..….. 31<br />

4. System Framework<br />

4.1 Introduction …………………………………………………………………….........35<br />

4.2 Kernel Analysis System …………………………………………..………..…..... 36<br />

4.2.1 Detection Layer ……………………………………………………….……. 40<br />

4.2.1.1 When developing Embedded System ………………………………... 43<br />

4.2.1.2 Used by users ……………………………………………………..….. 43<br />

4.2.1.3 Process Flow of DL ……………………………………………..…… 44<br />

4.2.2 Separ<strong>at</strong>ion Layer …………………………………………………………….. 46<br />

4.2.3 Analysis Layer ……………………………………………………………... 47<br />

vi


4.3 Kernel Analysis System Algorithm …………………………………….................... 48<br />

4.3.1 Important function and parameter ………………………………………….… 50<br />

4.4 Trace Point and Event Log ……………………………………................................ 52<br />

4.5 LTTng and KAS ………………………………………………………………….… 53<br />

4.6 Summary ………………………………………………………………….………... 55<br />

5. Case Study<br />

5.1 Timer L<strong>at</strong>ency …………………………………………………………………........ 57<br />

5.2 Preemptive vs. Non-preemptive ……………………………………………............ 59<br />

5.2.1 Preemptive Kernel …………………………………………………………... 59<br />

5.2.2 Non-preemptive Kernel ………………………………………………….….. 61<br />

5.3 High Resolution Timer ………………………………………………………….…. 62<br />

5.4 L<strong>at</strong>ency Policy ………………………………………………………………….….. 64<br />

5.5 Evalu<strong>at</strong>ion …………………………………………………………………….……. 65<br />

5.5.1 Result of KAS ………………………………………………………………... 67<br />

5.5.1.1 Result of DL …………………………………………………………... 68<br />

5.5.1.2 Result of SL ………………………………………………………..….. 69<br />

5.5.1.3 Result of AL ……………………………………………………….….. 70<br />

5.5.2 Analysis of HRTimer L<strong>at</strong>ency ………………………………………………... 72<br />

5.6 Summary ………………………………………………………………………….... 78<br />

vii


6. Conclusions and Future Work<br />

6.1 Conclusions …………………………………………………………………….…... 79<br />

6.2 Future Work …………………………………………………………………..…….. 80<br />

Appendix<br />

6.2.1 Real-Time Architecture of KAS …………………………………………...... 81<br />

6.2.2 KAS and CABI …………………………………………………………….... 82<br />

A.1 RTOS ……………………………………………………………………………..... 83<br />

A.2 RT-Linux …………………………………………………………………………... 85<br />

A.3 Real-Time Scheduling …………………………………………………………….. 85<br />

A.4 CABI …………………………………………………………………………...….. 88<br />

Bibliography …….………………………………………………..…………………… 91<br />

Acknowledgements ………………………………………………...……………… 100<br />

Public<strong>at</strong>ion List …………………………………...……………………………….... 101<br />

viii


List of Tables<br />

2.1 General monitoring tools by Resource …………………………….………….... 14<br />

4.1 Important parameters in KAS ………………………….…………………………...... 50<br />

4.2 Important functions in KAS ………………..……………………………………… 50<br />

5.1 Result of Detection Layer ……………………………………………………………... 66<br />

5.2 Result of Analysis Layer ……………..………….…………………………………… 68<br />

5.3 Result of Analysis Layer ………………………………………………………….. 69<br />

ix


List of Figures<br />

2.1 User-space vs. kernel-space ………………………………..…………………………... 11<br />

2.2 Process of general event logging ………………………………………………….......... 16<br />

2.3 The execution of local timer soft interruption handler ……………………….……... 19<br />

3.1 flow chart of the event log …………………………………………………………....... 24<br />

3.2 For example free command …………………………………………………….……..... 26<br />

3.3 Process viewer for Linux ……………………………….………...……………..……... 28<br />

3.4 Mevalet viewer’s execution ……………………………………………………………. 29<br />

3.5 LTTV viewer’s execution ………………………………………………………….…… 31<br />

3.6 Event logging sequence of LTTng ………………………………………………….….. 32<br />

4.1 Normal method of event logging and analysis …………………………………………. 36<br />

4.2 Kernel Analysis System Architecture …………………………………………….…….. 37<br />

4.3 Event log process flow of KAS ……………………………………………………….… 37<br />

4.4 Problem definition ……………...…………………………………………………….… 44<br />

4.5 Process flow of detection layer ………………………………………………….……... 44<br />

x


4.6 Event log separ<strong>at</strong>ed in SL …………………………………………………………….… 45<br />

4.7 Process flow of Separ<strong>at</strong>ion Layer ……………………………………………………… 46<br />

4.8 Process flow of analysis layer ……………………………………………………….…. 47<br />

4.9 Dependency of each module is KAS …………………………………………………... 47<br />

4.10 Pseudo code of KAS …………………………………………………………….……. 48<br />

4.11 parameter and event log …………………………………………………………...….. 50<br />

4.12 Rel<strong>at</strong>ion between trace point and event log ………………………………………..… 51<br />

4.13 An example showing the usage of trace point in LTTng and KAS …………………... 52<br />

4.14 Basic trace point offered by LTTng ………………………………………………….. 53<br />

4.15 Procedure of problem analysis process of LTTng and KAS ………………….……… 53<br />

5.1 Task Preemption L<strong>at</strong>ency Model ……………………………………………………..... 56<br />

5.2 Priority Task L<strong>at</strong>ency Model ……………………………………………………….….. 57<br />

5.3 Process of interrupt of preemptive kernel ………………………………………….….. 58<br />

5.4 3 Process of interrupt of non-preemptive kernel ………………………………….…… 59<br />

5.5 Hrtimer l<strong>at</strong>ency model …………………………………………………………….…… 62<br />

5.6 Source code of setitimer ………………………………………………………….…….. 64<br />

5.7 set up setitimer …………………………….……………………………………….. 64<br />

5.8 Periodic process of HRTimer is set by 100μs ………….…………………….... 64<br />

5.9: Source code of DL for line inform<strong>at</strong>ion …………………………………….….… 66<br />

5.10: Result of Separ<strong>at</strong>ion Layer …………………..…………………………….………… 67<br />

5.11: Event log of part where delay occurred …………………………………………….. 70<br />

xi


5.12: One of the reasons of HRTimer l<strong>at</strong>ency ……………………………………..……..… 71<br />

5.13: Result of analysis of HRTimer l<strong>at</strong>ency ………………………………………….…… 72<br />

5.14: Kernel source of softirq modified HRTimer ……………………………………......... 72<br />

5.15: Result of experiment on Linux-RT and general Linux in 100μs …………………..… 73<br />

5.16: Result of an experiment on Linux-RT and general Linux in 1ms …………………..... 74<br />

5.17: Result of an experiment of HRTimer l<strong>at</strong>ency in Linux-2.6.24 …………………….. 74<br />

5.18: Result of experiment of HRTimer l<strong>at</strong>ency in Linux-2.6.24-changed-softirq ……….... 75<br />

5.19: Result of experiment of HRTimer l<strong>at</strong>ency in Linux-2.6.24-rt-p<strong>at</strong>ched …………….… 75<br />

6.1: Real-Time Architecture of KAS ……………………………………………………….. 80<br />

6.2: KAS and CABI ………………………………………………………………………... 80<br />

A.1 Classific<strong>at</strong>ion of real-time scheduling algorithm ……………………………………… 85<br />

A.2 Control the consumptions of the resources by CABI ………………………….………. 88<br />

xii


<strong>Chapter</strong> 1<br />

Introduction<br />

Today, thanks to the rapid development of hardware, not only embedded hardware but also<br />

embedded software have developed drastically. In addition, a lot of embedded system’s<br />

studies are being performed remarkably especially in ubiquitous systems or cyber physical<br />

systems (CPS) [62]. Recently a smartphone [57] or PDA phone, for example, is spreading<br />

rapidly. A PDA phone, which is a mobile phone with a PDA’s functions, is equipped with<br />

high-performance CPU and general-purpose oper<strong>at</strong>ing system (OS) to enforce various<br />

functions, including multimedia fe<strong>at</strong>ures. A smartphone, which is a mobile phone offering<br />

PC-likely advanced capacities, supports not only PDA’s functions but also remote control,<br />

Internet, user-friendly interface - a touch screen and a handwriting input. Moreover, since it<br />

supports wireless internet, various functions such as E-mail, web browsing, fax, banking<br />

services and games became available. Some of them has already started to standardize its


<strong>Chapter</strong> 1<br />

functions or equipped with its own OS. For these innov<strong>at</strong>ions, people prefer to use a<br />

smartphones, and the role of smartphone is increasing, substituting for computers.<br />

In terms of hardware, there is the Moore's Law, which is one of the most important laws in<br />

history of computer hardware. Moore's law describes a long-term trend in the history of<br />

computing hardware, in which the number of transistors th<strong>at</strong> can be placed inexpensively on<br />

an integr<strong>at</strong>ed circuit double approxim<strong>at</strong>ely every two years. Although the Moore's law has<br />

s<strong>at</strong>isfied in the past several decades, these days it seems to be <strong>at</strong> a breaking point. Over the<br />

past year the efficiency of semiconductor manufacturing has gre<strong>at</strong>ly increased, and the<br />

internal structure of semiconductor has also become much integr<strong>at</strong>ed. Therefore there is only<br />

some possibility to make it more integr<strong>at</strong>ed. On the other hand, software has a lot of<br />

development potential. Even the same hardware, a developer can set up different kinds of<br />

software in it. Moreover, even the same software, depends on the developer, the performance<br />

[4] and occurred errors are different. Therefore, the software is becoming important as time<br />

goes by.<br />

In general, software consists of OS, middleware and applic<strong>at</strong>ion. Because each group has<br />

different fe<strong>at</strong>ures and functions, every group is much important as itself. However, OS is<br />

remarkably important since it controls and manages all the hardware on embedded systems to<br />

keep running applic<strong>at</strong>ions without any problems. Therefore, it is the fundamental of computer<br />

science. Many OSes such as Unix, Linux and Windows are currently used in personal<br />

computers. And, there are many OSes for embedded systems, too.<br />

For example, there is Android [53] from Google, iOS4 [54] from Apple, embedded<br />

Windows from Microsoft and embedded Linux. Especially, embedded Linux has been widely<br />

2


3<br />

<strong>Chapter</strong> 1<br />

used because it retains many powerful fe<strong>at</strong>ures of general Linux – multi-tasking, a variety of<br />

network environments, different types of file systems, system scalability and it is also<br />

provided for free. However, embedded systems compared with the general systems have a<br />

disadvantage. For example, there are not enough skilled developers and many constraints on<br />

hardware. And then, a market of embedded systems wants the fast development cycle but it is<br />

hard because of the lack of skilled developers. These days embedded systems became one of<br />

the most important areas on all the industries. Therefore, debugging [55] a various problems<br />

and improving the performance of embedded systems also will be an important area in the<br />

short future.<br />

1.1 Motiv<strong>at</strong>ion<br />

Recently in many fields, embedded OSes have been used such as home appliances and<br />

mobile phones or PDA. There are important reasons why embedded OSes are using widely.<br />

In general-purpose OS a user needs to perform the various functions, but in an embedded OS<br />

a user needs to perform only the minimum required functions. Therefore, using only minimal<br />

resources to configure the system make the cost lower. Moreover, embedded OSes are<br />

usually made by a specific purpose. For example, S<strong>at</strong>ellite or missile control needs the<br />

stability of a real time system. In this case, an embedded system such as real-time oper<strong>at</strong>ing<br />

system (RTOS) [14] is suitable r<strong>at</strong>her than a general-purpose OS.<br />

Some people think th<strong>at</strong> the development of an embedded system can be faster than th<strong>at</strong> of<br />

a general system. However, we should not only focus on rapid applic<strong>at</strong>ion development<br />

(RAD) of Software Engineering. Fast development can be gre<strong>at</strong>, but sometimes it causes


<strong>Chapter</strong> 1<br />

huge losses in cost. For instance, an electric power’s system error resulted in power outage in<br />

the whole of New York City. Also people tend to think th<strong>at</strong> the high quality program is<br />

rel<strong>at</strong>ed to the development period. However, if the period of development is more delayed<br />

than expected, the quality degrad<strong>at</strong>ion is easy to occur. Furthermore shortening the period of<br />

development usually causes many problems. Therefore, developers are needed a strong<br />

analysis ability to complete a project without any problems. In this paper, we will focus on<br />

how to trace problems in kernel and how to solve the problems efficiently.<br />

1.2 Challenge<br />

When developing embedded systems, usually problems can be c<strong>at</strong>egorized in two groups –<br />

the user level [59] and the kernel level. To solve a user-level problem, there are a lot of tools<br />

for debugging. In contrast, when a problem occurs in the kernel level [7], it is much more<br />

difficult to fix up than in the user-level. This is because tools for kernel development usually<br />

provide minimum functions and in many cases the functions are not helpful to fix the<br />

problems. Moreover, even though all the embedded systems are currently oper<strong>at</strong>e without<br />

any problems, there is a possibility for an unexpected error to occur. There are no one-<br />

hundred percent perfect systems. Even commercial embedded systems have a possibility to<br />

occur kernel problems. Sometimes these problems usually do not find when developing<br />

embedded systems. After commercialized as a product, however, it still has a possibility to<br />

occur kernel problems. Furthermore, this unexpected small error causes inconvenience for<br />

many users and a possibly life-thre<strong>at</strong>ening problem.<br />

An embedded system’s project is usually complex and requires developers with a high level<br />

4


5<br />

<strong>Chapter</strong> 1<br />

of comprehension about hardware and software, compared to the general software’s project.<br />

In addition, an embedded system’s project has a lot of hardware constraints. Due to these<br />

characteristics of an embedded system, it is not easy to propose a solution for the problem<br />

which occurs during project development. These days various solutions for these problems<br />

are proposed.<br />

Among the many solutions for analysis problems describes how to use the event log.<br />

� Kernel processes many events such as memory-rel<strong>at</strong>ed events, system call events,<br />

network-rel<strong>at</strong>ed events in a very short time. These events help to analyze problems<br />

and suggest a solution to developers. Hence, we re-define the logging inform<strong>at</strong>ion to<br />

analyze these events inform<strong>at</strong>ion.<br />

� Generally, there are two main ways to analyze event inform<strong>at</strong>ion. First, it is to<br />

visualize kernel events. This way can analyze the problem approxim<strong>at</strong>ely but not<br />

exactly. Another way is to print the event log in the text mode. It is very efficient<br />

when a developer needs to analyze a problem exactly. Therefore, by using the test<br />

mode’s advantage, we will find problems and suggest solutions.<br />

� It requires much time and effort to find problems in the kernel mode compared to in<br />

the user mode [63]. Therefore, the tool for detecting the kernel problem and fixing it<br />

up is very important in the embedded system development.<br />

While the embedded system development, there are a lot of important factors such as<br />

hardware, development time, development costs. Same as these factors, the analysis tool is<br />

the one of the most important factor in the development environment for embedded systems.


<strong>Chapter</strong> 1<br />

1.3 Contribution<br />

When developing a project, there are a variety of SDK and various debugging methods [12]<br />

[23] [38] [40]. However, while developing embedded systems, developers should solve<br />

problems by using limited tools and debugging methods [36]. Debugging and performance<br />

tuning [41] [56] are an important part in the system development. The system developer for<br />

debugging and performance tuning needs error messages th<strong>at</strong> gener<strong>at</strong>ed by file system<br />

errors, network errors [34], hard drive errors, and memory errors in the systems. In addition,<br />

it is important to c<strong>at</strong>ch where the problem happened and wh<strong>at</strong> it is. The proposed system<br />

using the event log is the following contributions.<br />

� Among the huge event logs, we can find the problem easily.<br />

� Separ<strong>at</strong>ing the error d<strong>at</strong>a from whole d<strong>at</strong>a and we can analyze only the error d<strong>at</strong>a.<br />

� Analyzing the event logs, we can analyze the cause of the problem.<br />

We would like to suggest solutions to solve the problems in the kernel and to improve<br />

performance by finding and analyzing problems effectively.<br />

1.4 Outline<br />

The dissert<strong>at</strong>ion is structured as follows. <strong>Chapter</strong> 1 talks about the research motiv<strong>at</strong>ion,<br />

challenges and contributions.<br />

In Section 2, we explain all the background knowledge which is rel<strong>at</strong>ed to our work. We<br />

6


7<br />

<strong>Chapter</strong> 1<br />

explain about embedded systems, the Linux kernel and embedded Linux. In addition, we<br />

describe system monitoring, event log and problem analysis to solve the problems.<br />

In Section 3, we explain rel<strong>at</strong>ed work, reviews and discussions. We explain the tools for<br />

system monitoring and event log analysis. In Section 4, we propose a new system framework.<br />

Our system framework has three main layers: Detection Layer, Separ<strong>at</strong>ion Layer and<br />

Analysis Layer. We explain each layer in details. In Section 5, we analyze the kernel timer by<br />

using the kernel analysis system th<strong>at</strong> we propose. We analyze the high resolution timer<br />

l<strong>at</strong>ency problem. Finally, in Section 6, we explain a conclusion and suggest possible future<br />

directions.


<strong>Chapter</strong> 2<br />

Background<br />

2.1 Embedded System<br />

As developing of technologies in the field of electricity, electron, and computers, there are<br />

many kinds of applied equipment in our daily life, for example, T.V, refriger<strong>at</strong>ors, microwave,<br />

washing machines, cellular phones, computers, PDA, cyber home care systems in apartment,<br />

elev<strong>at</strong>or systems, ATM, and airport traffic control systems, etc. These various technologies<br />

are closely rel<strong>at</strong>ed with our daily life and also helpful in our daily life.<br />

The embedded system [26] is an electronic control system th<strong>at</strong> is combined between<br />

hardware and software. All applied equipments oper<strong>at</strong>ed in our daily life such as electronic<br />

devices, home appliances, and control units are composed of not only a simple electric circuit<br />

but also microprocessors. The embedded system is built in programs to oper<strong>at</strong>e dedic<strong>at</strong>ed


9<br />

<strong>Chapter</strong> 2<br />

functions via microprocessors. An early version of embedded systems was very simple. It<br />

was built into an 8bit/16bit controller and it still has been used. As recently embedded system<br />

industries are using in more powerful microprocessors and digital signal processing (DSP)<br />

chips. It is necessary to show embedded OSes in order to control these large systems.<br />

Early embedded systems oper<strong>at</strong>ed by sequential program without OS, and it was out of<br />

sequential program when occurred interrupts. Therefore, there was no necessity of using OS<br />

and it was wasted system resources. However, recently the embedded system is larger than<br />

before and it is to increase the system complexity by networks and multimedia, etc. Therefore<br />

embedded system is hard to oper<strong>at</strong>e sequential program. These changes cause the necessity of<br />

OS in embedded systems and also its system cannot ignore real-time characteristic, therefore,<br />

embedded systems the used real-time OS. The products th<strong>at</strong> adapted the real-time OS are<br />

more increasing now. In the field, many embedded systems use real-time OSes according to<br />

its purposes.<br />

2.2 Linux Kernel<br />

Linux is a member of the large family of Unix-like oper<strong>at</strong>ing systems. A rel<strong>at</strong>ive newcomer<br />

experiencing sudden spectacular popularity starting in the l<strong>at</strong>e 1990s, Linux joins such well<br />

known commercial Unix oper<strong>at</strong>ing systems as System V. Linux was initially developed by<br />

Linus Torvalds in 1991 as an oper<strong>at</strong>ing system for IBM-comp<strong>at</strong>ible personal computers<br />

based on the Intel 80386 microprocessor. Linus remains deeply involved with improving<br />

Linux, keeping it up to d<strong>at</strong>e with various hardware developments and coordin<strong>at</strong>ing the<br />

activity of hundreds of Linux developers around the world. The Linux kernel is loc<strong>at</strong>ed in


<strong>Chapter</strong> 2<br />

memory and to manage system devices and memory, processes, I/O devices. Every system<br />

has the kernel and it affects whole performance [4] of the system by its kernel performance.<br />

Therefore kernel is important such as the embedded system industries.<br />

The most important fe<strong>at</strong>ure of the Linux kernel is th<strong>at</strong> users can modify the kernel by<br />

themselves. The Linux kernel also distributes type of sources and it can download through<br />

distributed package and ftp or BBS user group such as other Linux programs. The<br />

environments for compile easily set up by using few well-made scripts and easily find<br />

documents in the internet. The policy of open sources is one of the reasons and the Linux<br />

kernel and user groups were achieved quantum leap. Linux’s open mind can make rapid and<br />

strong kernel even if other OSes are fettered by commercialism.<br />

The following points are Linux kernel’s strengths [24].<br />

� No royalty: Linux can be downloaded from the Internet without a continental free. It<br />

is to decrease development costs.<br />

� Open source: it is possible to expand OS.<br />

� As Linux system is stable, the possibility of error is low.<br />

� Linux can be used in a variety of types of hardware<br />

� Safe: Security model used Linux based on ideas used UNIX security, famous for its<br />

toughness and proven quality.<br />

� An immedi<strong>at</strong>e modific<strong>at</strong>ion is possible when the kernel bugs occurred.<br />

10


Here are Linux kernel’s weaknesses.<br />

� Limited develop environment.<br />

� A large number of different Linux distributions.<br />

� Is open source products can be trusted?<br />

11<br />

<strong>Chapter</strong> 2<br />

The Linux kernel is composed of two major modes as the user mode and the kernel mode.<br />

Figure 2.1 shows running the Linux kernel mode. The first is the user-space th<strong>at</strong> applic<strong>at</strong>ions<br />

are running. And a second is the kernel-space th<strong>at</strong> kernel modules and device drivers are<br />

running. The signals such as system calls, ioctl() exchange between the user-space and the<br />

kernel-space. And also the signals such as H/W interfaces or protocols exchange between<br />

hardware and kernel-space.<br />

Figure 2.1: User-space vs. kernel-space


<strong>Chapter</strong> 2<br />

2.3 Embedded Linux<br />

An embedded OS has to supports developing environment such as middleware, library,<br />

development tools, and analysis tools for analysis kernel problems. The Linux kernel th<strong>at</strong> is<br />

the commonly OS among embedded system OSes. Nevertheless, embedded systems such as<br />

cellular phone and real-time applied product have been used by the RTOS because of its<br />

required time constraint such as hard real-time systems. However, rapidly improved<br />

performance of embedded systems causes limit<strong>at</strong>ion of systems based on RTOS. Therefore<br />

Linux OS th<strong>at</strong> strengthened real-time characteristic engages public <strong>at</strong>tention again.<br />

Embedded Linux means simply Linux th<strong>at</strong> used in the embedded system. Early embedded<br />

Linux was developed with small memory and low performance processor. Therefore<br />

embedded Linux has been minimized its size and functions, and customized because of<br />

limit<strong>at</strong>ion to be built in small memory. Above conditions are essential factor of embedded<br />

Linux. Nevertheless, the embedded Linux was applied in various products.<br />

There are lots of reasons th<strong>at</strong> Linux get the spotlight in the embedded system industry.<br />

Three big reasons are as follows. First, there is no royalty and licensing cost. Open source<br />

licensing agreement is one of reasons for being Linux today. Second, it supports functions<br />

th<strong>at</strong> RTOS could not support for various devices such as smartphone and PDA etc. Gradually<br />

embedded devices are going to change by various demands of memory size, wireless internet,<br />

and hard disk etc. It means th<strong>at</strong> the demands could not be existed from RTOS are going to be<br />

more bigger and bigger such as safety, various graphic user interface (GUI), memory security,<br />

and support personal inform<strong>at</strong>ion etc. If developing of the embedded system, there is no need<br />

12


13<br />

<strong>Chapter</strong> 2<br />

to program in order to oper<strong>at</strong>e above functions. We can reuse various applic<strong>at</strong>ion libraries,<br />

and device drivers from Linux. The last, it is more flexible to programmers can select the<br />

development environment and debugging environment [55] than other embedded OSes.<br />

However, embedded Linux has weakness of stability because it has no official quality<br />

testing. Also there is lack of developers in the embedded system. In order to develop various<br />

products by using embedded Linux, engineers are needed such as device driver developer,<br />

embedded applic<strong>at</strong>ion developer, and GUI program developer.<br />

Even if embedded Linux has more minimized and light weighted than general Linux, the<br />

kernel is larger than RTOS. It was difficult to use embedded Linux in the embedded system.<br />

However recent embedded system oper<strong>at</strong>es similar to Pentium computer’s performance<br />

through the high clock speed. Therefore, embedded Linux is going to be more useful and<br />

practical.<br />

2.4 System Monitoring<br />

Generally system monitoring [44] [58] is finding problems in the system. System<br />

monitoring tools th<strong>at</strong> how the kernel uses system resources efficiently, or why problem has<br />

occurred in the CPU [61], memory, disk I/O, and network etc. However, system monitoring is<br />

not simple to find problems. For example, think about disk problem and wh<strong>at</strong> kind of check<br />

lists are needed to analyze its problem.<br />

� How much disk space remains?


<strong>Chapter</strong> 2<br />

� How often CPU accesses to I/O process per second?<br />

� How many I/O process reads/writes?<br />

� How much d<strong>at</strong>a reads/writes?<br />

Examples of above questions are very small piece of system monitoring and in order to study<br />

disk drive’s function, more various monitoring is needed.<br />

In the field, there are monitoring programs as follow. The monitoring programs of Table 2.1<br />

are autom<strong>at</strong>ically set up when install the Linux kernel. On the other hands, there are a lot of<br />

monitoring tools th<strong>at</strong> not installed autom<strong>at</strong>ically such as sar, iost<strong>at</strong>, nmap, netc<strong>at</strong>, and ntop.<br />

Table 2.1: General monitoring programs by resources<br />

Resource Monitoring program<br />

CPU top, ps, uptime, vmst<strong>at</strong>, pstree, iost<strong>at</strong>, sar<br />

Memory free, vmst<strong>at</strong>, sar<br />

Disk I/O df, du, quota, iost<strong>at</strong>, sar<br />

Network ping, netst<strong>at</strong>, traceroute, tcpdump, nmap, netc<strong>at</strong>, ntop<br />

File Lsof<br />

In general, system monitoring [58] has to monitor regularly when the system is working<br />

normally. Then the system administr<strong>at</strong>or can analyze any problems easily and quickly when<br />

the problems are occurring. In other words, the system monitoring is necessary for non-<br />

problem systems.<br />

14


2.5 Event Log<br />

15<br />

<strong>Chapter</strong> 2<br />

In general, event log [35] is a record while running the Linux kernel. These events are<br />

recorded by sequential order and the network inform<strong>at</strong>ion is recorded as well. In briefly, it<br />

provides the facts of th<strong>at</strong>, “when, where, wh<strong>at</strong>, who, and why.” These event logs provide a<br />

standard for analysis problems. Also it makes use of prevention before the problem occurring.<br />

In addition, event logs are using problem verific<strong>at</strong>ion of real-time and verific<strong>at</strong>ion of network<br />

st<strong>at</strong>us. For example, if the Linux system is down in the middle of oper<strong>at</strong>ion, everything is<br />

going to waste. How does it explains and how does it prevents such repe<strong>at</strong>ed problems.<br />

In general, analysis of event logs progresses as follows.<br />

� Collection: To collect logs with various methods.<br />

� Storage: To transmit events to the one place and save them.<br />

� Analysis: To analyze events with various methods.<br />

� Finding of the causes: To find the causes of problems on the basis of d<strong>at</strong>a analysis.<br />

In Figure 2.2, it is the most general way to log the events by an event logging tool. The log<br />

server g<strong>at</strong>hers the event inform<strong>at</strong>ion (such as network event, system call, interrupt etc.).<br />

Event log is necessary to find the cause of problems and to make solution, but it is difficult<br />

to analyze problem. Because event log produces another type of log according to the tools<br />

and logged event inform<strong>at</strong>ion is huge. Also it takes a lot of time to analyze logs because the<br />

amount of log and facts of logs is huge.


<strong>Chapter</strong> 2<br />

2.6 Problem Analysis<br />

Figure 2.2: Process of general event logging<br />

Every professional developer says to focus on more time in the process of reading code<br />

than writing code. In other words, it is to make more efforts and time on<br />

improvement/review/bugging than before writing codes. If it was not simple hobby or<br />

homework, problem analysis is very important work.<br />

In the past UNIX period, every system programmer is the same as a system manager and<br />

their work is equal to each to each. However, in these days, their work divided in each field.<br />

By diving each field, the strength is more focus on their specialized field whereas the<br />

weakness is hard to analysis of problems from between each field. The marvelous<br />

investig<strong>at</strong>ional technique of problems is the proper balance among demand of fast solution,<br />

improvement of skills, and efficient practical use of experts. When problem occurred, must<br />

collect inform<strong>at</strong>ion and record it.<br />

Brief definitions of the lists of are as follows.<br />

16


� The exact time the problem occurred<br />

� Dynamic oper<strong>at</strong>ing system inform<strong>at</strong>ion<br />

� Wh<strong>at</strong> we were doing when the problem occurred<br />

� A problem description<br />

� Anything th<strong>at</strong> may have triggered the problem<br />

� Technical investig<strong>at</strong>ion (Symptom and Cause)<br />

17<br />

<strong>Chapter</strong> 2<br />

Symptom in the technical investig<strong>at</strong>ion (Symptom and Cause) is external evidence of<br />

problem. These symptoms are classified under five c<strong>at</strong>egories as follows [16].<br />

� Error<br />

� Crash<br />

� Hang (or very slow performance)<br />

� Performance problem<br />

� Unexpected behavior/output<br />

It is easy solve the problems after collecting inform<strong>at</strong>ion of above problems and classifying<br />

the problems.<br />

2.7 Linux Timer<br />

Linux Kernel Timer has two main works.


<strong>Chapter</strong> 2<br />

� To count time accur<strong>at</strong>ely.<br />

� To manage the deadline<br />

The Linux kernel makes timer function oper<strong>at</strong>e by using timer interrupts periodically.<br />

Especially, the function to manage the time limit is useful. For instance, it is effective to use<br />

in re-sending for networks, re-executing for non-responded devices, polling process for the<br />

device which cannot make interruption.<br />

There are mainly two types of Linux timer.<br />

� Global timer<br />

� To manage system time.<br />

� To make interrupt periodically.<br />

� Alarm function.<br />

� CPU local timer<br />

� To execute for certain CPU.<br />

� To occur in each CPU periodically.<br />

� Acknowledge the interrupt on local APIC.<br />

In Figure 2.3, it is the sequence to execute local timer interruption. Generally, if a local timer<br />

interrupt occurs then a local timer soft interrupt are executed. If the local timer interrupt<br />

occurs the interrupt handler is executed. And then, the local timer soft interrupt occurred and<br />

soft interrupt hander is running.<br />

18


Figure 2.3: The execution of local timer soft interruption handler<br />

And then, there are many kinds of timers in the Linux kernel.<br />

19<br />

<strong>Chapter</strong> 2<br />

� RTC: Every system has a real-time clock th<strong>at</strong> runs in itself regardless of any other<br />

chips. After booting, the Linux kernel reads RTC and sets up the present time.<br />

� TSC: The 80x86 micro-processor has a clock pin which receives signals from the<br />

outer oscill<strong>at</strong>or. Whenever the CLK pin receives the signals, the signals are saved in<br />

the 64 bit Time Stamp Counter register.<br />

� PIT: PIT is a counter which triggers an interrupt when it reaches the programmed<br />

count. There are one-shot mode and periodic mode. One-shot timers interrupt only<br />

once, and then stop counting. Periodic timers interrupt every time when they reach a<br />

specific value.<br />

� APIC: Local time of CPU. APIC gener<strong>at</strong>es an interrupt once or in a cycle such like<br />

PIT. However APIC sends the interrupt only to its own processor.<br />

� ACPI PMT: ACPI Power Management Timer. ACPI PMT is built in the ACPI main


<strong>Chapter</strong> 2<br />

board. Its clock signal is set up with approxim<strong>at</strong>ely 3.58MHz and it increases<br />

counters in every clock.<br />

� HRTimer: HRTimer [28] provides high resolution (nanosecond) timers and exploits<br />

the system dependent timers/clocks.<br />

20


<strong>Chapter</strong> 3<br />

Rel<strong>at</strong>ed Work<br />

There is a report th<strong>at</strong> more than 90 percent of the computer systems in global are based on<br />

embedded systems. In our routine, embedded systems are widely used these days. Embedded<br />

systems have developed since a computer has invented and extended. However, in recent<br />

years, much <strong>at</strong>tention has been given to the embedded systems because it is becoming<br />

complic<strong>at</strong>ed. In other words, the technology of semiconductor and network has evolved<br />

rapidly. In addition, the technology of software was developed a lot such as multimedia and<br />

internet technology, etc. For example, the smartphone have started various works th<strong>at</strong> are<br />

music player, movie player, game and internet, etc. as more than a simple message transfer<br />

function. The various functions have increased embedded systems complexity [9]. In the past,<br />

embedded systems were simple hardware. However, nowadays embedded systems hardware<br />

increased complexity by advanced hardware and many needs of users. Along with advanced


<strong>Chapter</strong> 3<br />

hardware such as SoC (Systems on Chip) technology, embedded software became very<br />

complex software. These technologies cause many innov<strong>at</strong>ions such as smartphone, PDA,<br />

netbook and tablet PC, etc.<br />

On the other hand, these innov<strong>at</strong>ions make system complic<strong>at</strong>ed, and increases errors and<br />

bugs continuously when the developing of the embedded systems. Moreover, these problems<br />

need a lot of time and effort to fix up. In addition, it became very important to way to fix up<br />

the errors and bugs because it is closely rel<strong>at</strong>ed to the performance and stability of the system.<br />

Therefore, most developers and system managers are analyzing event logs to figure out the<br />

best solutions for the problems.<br />

3.1 Event log<br />

Event logging and event logs monitoring play an important role in modern IT systems.<br />

Today, many applic<strong>at</strong>ions, oper<strong>at</strong>ing systems, network devices, and other system components<br />

are able to log their events to a local or remote log server. For this reason, event logs are an<br />

excellent source for determining the health st<strong>at</strong>us of the system, and a number of tools have<br />

been developed over the past 15-20 years for monitoring event logs in real-time [45].<br />

3.1.1 Event logging<br />

The events th<strong>at</strong> occur in the system depend on the st<strong>at</strong>us of the system, it is always<br />

changing. When a system component encounters an event, the component could emit an<br />

22


23<br />

<strong>Chapter</strong> 3<br />

event message th<strong>at</strong> describes the event. For example, when a disk of a server becomes full,<br />

the server could gener<strong>at</strong>e a time stamped “disk full” message for appending to a local log file<br />

or for sending over the network as an SNMP trap. Event logging is a procedure of storing<br />

event messages to the event log, where event log is a regular file th<strong>at</strong> is modified by<br />

appending event messages. (Although sometimes d<strong>at</strong>abases of event messages are also called<br />

event logs) Log client is the system component th<strong>at</strong> emits event messages for event logging.<br />

In this thesis, the term event has often been used for denoting event messages when it is clear<br />

from the context.<br />

In modern IT systems, event logs play an important role:<br />

� Since in most cases event messages are appended to event logs in real-time as they<br />

are emitted by system components, event logs are an excellent source of inform<strong>at</strong>ion<br />

for monitoring the system,<br />

� Inform<strong>at</strong>ion th<strong>at</strong> is stored to the event log can be useful for analysis <strong>at</strong> a l<strong>at</strong>er time,<br />

e.g., for audit procedures or for retrospective incident analysis.<br />

Event logging can take place in various ways. In the simplest case the log client keeps the<br />

event log on a local disk and modifies it when an event occurs. Unfortun<strong>at</strong>ely, event logs will<br />

be sc<strong>at</strong>tered across the system with this logging str<strong>at</strong>egy, each log possibly requiring separ<strong>at</strong>e<br />

monitoring or other analysis. Furthermore, the str<strong>at</strong>egy assumes the presence of a local disk<br />

which is not the case for many network nodes (e.g., switches and routers).<br />

Figure 3.1 centralized logging infrastructure. This is the flow chart of the event log which


<strong>Chapter</strong> 3<br />

shows how useful the logged event logs are for a system developer.<br />

3.1.2 Event Log Monitoring<br />

Figure 3.1: flow chart of the event log<br />

Because of the importance of event logs as the source of system health inform<strong>at</strong>ion, many<br />

tools have been developed over the past 15-20 years for monitoring event logs in real-time.<br />

Sw<strong>at</strong>ch [Hansen and Atkins, 1993] was the first such tool and is still used by many sites.<br />

Sw<strong>at</strong>ch [47] monitors log files by reading every event message line th<strong>at</strong> is appended to the<br />

log file, and compares it with rules where the conditional part of each rule is a regular<br />

expression (rules are stored in a textual configur<strong>at</strong>ion file). If the regular expression of a<br />

certain rule m<strong>at</strong>ches the event message line, Sw<strong>at</strong>ch executes the action part of the rule.<br />

Actions include sending a mail, executing an external program, writing a notific<strong>at</strong>ion to the<br />

system console, etc. Sw<strong>at</strong>ch has also an option for ignoring repe<strong>at</strong>ed event messages for a<br />

given time interval.<br />

Another popular tool for event log monitoring is Logsurfer [Ley and Ellerman, 1996]. Like<br />

24


25<br />

<strong>Chapter</strong> 3<br />

Sw<strong>at</strong>ch, Logsurfer [46] uses a rule-based approach for event processing, employs regular<br />

expressions for recognizing input events, and monitors log files by comparing appended<br />

message lines with its rules. Apart from executing actions immedi<strong>at</strong>ely when certain event<br />

messages are observed, Logsurfer also supports contexts and dynamic rules. Context is a<br />

memory-based buffer for storing event messages, and Logsurfer can report the content of a<br />

context through an external program. Dynamic rule is a rule th<strong>at</strong> has been cre<strong>at</strong>ed from<br />

another rule with a special action.<br />

In addition to commonly used Sw<strong>at</strong>ch and Logsurfer, a number of other tools exist for<br />

monitoring event logs in real-time, and the interested reader is referred to the Log analysis<br />

website [48] for more inform<strong>at</strong>ion. Apart from standalone monitoring tools, some systems<br />

and network management pl<strong>at</strong>forms like HP OpenView Oper<strong>at</strong>ions (formerly called ITO)<br />

[64] and Tivoli Risk Manager [65] have also capabilities for monitoring event logs.<br />

Nevertheless, in order to use these the capabilities, the whole pl<strong>at</strong>form must be deployed<br />

which is a complex and time-consuming task.<br />

3.2 System Monitoring<br />

System monitoring is to check how to system is working on. If a system works very slowly,<br />

system manager should figure out wh<strong>at</strong> is the cause and how to fix it up. This is not rare for a<br />

system manager. The system management starts with checking the system’s condition<br />

periodically. The monitoring of the system is very important because it is needed when<br />

problems occur. In addition, if the manager misinterprets the monitoring, it makes incorrect<br />

error report and needs a lot of time to fix up. For instance, in Linux, many people use order


<strong>Chapter</strong> 3<br />

free to check memory. However, if there is a problem like below, how can you solve it? The<br />

system which does not running, however the system has used 503M memory of system. The<br />

manager thought th<strong>at</strong> there is a problem in the system and reboots the system. However, it<br />

still said, “Used”.<br />

Figure 3.2: For example free command<br />

Reading a disk is very slow compared to memory. If many people access the system and<br />

execute order “ls”, where “ls” is one of the simple orders of reading, the system becomes<br />

very slow. In this case, if the memory which has read the inform<strong>at</strong>ion from disk saves the<br />

inform<strong>at</strong>ion temporarily, the system will work better. It called “disk buffering”, and buffer<br />

cache is widely used for it. If the size of cache is fixed, even the memory is huge, there will<br />

be a memory lack problem and occur swapping. It causes the system time-consuming. In<br />

Linux, systems autom<strong>at</strong>ically control empty space to buffer cache in order to improve the<br />

efficiency of the system. For instance, in Figure 3.2 the usable memory is “free + buffers +<br />

cached”, and “-/+ buffers/cache:” means the control.<br />

General Linux system managements and man pages still do not figure out these parts. Based<br />

on the principle of kernel and OS, the result has to be analyzed. Therefore the system<br />

monitoring and the analyzing of the result are very important from this view.<br />

26


27<br />

<strong>Chapter</strong> 3<br />

The most widely used system monitoring tool is Nagios [66], which is the for network<br />

monitoring. Nagios is able to monitoring host and network. The internet service is possible<br />

the remote control for the local server monitoring. For example, the system monitor can<br />

execute Nagios by using network and manage monitoring reports as connecting to central<br />

management server.<br />

There is another system monitoring tool, which is mostly used in Linux, order top. Order<br />

top is to print out the condition of the process, CPU and memory, and oper<strong>at</strong>ing time and<br />

average loading number. There are two orders – ntop and htop.<br />

Ntop [67] is a free network monitoring software. Ntop displays network usage inform<strong>at</strong>ion<br />

in a similar fashion to the top command output. The current version of ntop fe<strong>at</strong>ures both<br />

command line and web-based user interfaces, and is available on both UNIX and Win32<br />

pl<strong>at</strong>forms. Ntop focuses on:<br />

• Traffic measurement,<br />

• Traffic monitoring,<br />

• Network optimiz<strong>at</strong>ion and planning, and<br />

• Detection of network security viol<strong>at</strong>ions.<br />

Htop [68] is similar to the top command with few additional fe<strong>at</strong>ures. The main difference is<br />

th<strong>at</strong> you can use a mouse to interact with the htop command output. Figure 3.3 shows htop,<br />

an interactive process viewer for Linux. It is a text-mode applic<strong>at</strong>ion (for console or X<br />

terminals).


<strong>Chapter</strong> 3<br />

Figure 3.3: Process viewer for Linux<br />

3.3 Performance Analysis Tools<br />

Debugging and tuning [60] are one of the most important parts in the system development.<br />

After development, they are still important because there is a possibility to occur unexpected<br />

errors. Therefore, the tools for the performance analyze play a critical role to find and fix up<br />

problems.<br />

One of the most famous tools, there is Linux kernel st<strong>at</strong>e tracer (LKST) [11]. It is an event<br />

tracer [21] which records the kernel’s condition inform<strong>at</strong>ion. For instance, it records various<br />

kinds of kernel inform<strong>at</strong>ion such as contact switch, signal transmission, interrupt, memory<br />

alloc<strong>at</strong>ion, packet transmission. Among them, there are two critically important functions.<br />

� Process root trace: help to grasp where the problem has happened and wh<strong>at</strong> is going<br />

on.<br />

28


29<br />

<strong>Chapter</strong> 3<br />

� LKST log tool: It is the tool for analyzing log d<strong>at</strong>a th<strong>at</strong> have function to suggest<br />

solution for problems.<br />

Kernel function trace (KFT) [2] [49] is a kernel function tracing system. The KFT system<br />

provides for capturing these callouts th<strong>at</strong> was add instrument<strong>at</strong>ion to every function entry and<br />

exit and gener<strong>at</strong>ing a trace of events, with timing details. KFT is excellent <strong>at</strong> providing a<br />

good timing overview of kernel procedures. The trace d<strong>at</strong>a contains some general inform<strong>at</strong>ion<br />

regarding PID, start time and end time, the times are in time stamp counter (TSC) ticks.<br />

System Director Mevalet [8], which is developed by NEC JAPAN, helps to analyze system<br />

performance analysis. It detects problems early and prevents them in advance. Mevalet is<br />

able to express system’s behavior by CPU, DISK and Network. In addition, Mevalet can<br />

analyze the bottle neck problem and the performance tuning problem in an embedded system.<br />

It is not needed to modify applic<strong>at</strong>ion because of Mevalet p<strong>at</strong>ched in OS level, and there are a<br />

lot of choices to select languages and middlewares.<br />

Figure 3.4 Mevalet viewer’s execution


<strong>Chapter</strong> 3<br />

In Figure 3.4, through Mevalet viewer’s execution, a user can check a lot of inform<strong>at</strong>ion such<br />

as the process name and CPU processing time and Inter-Process Communic<strong>at</strong>ions (IPC).<br />

Finally, there is a utility problem, NMON [22] from IBM. Generally, NMON is very useful<br />

to monitor a system because it shows a lot of inform<strong>at</strong>ion r<strong>at</strong>her than order top in Linux. You<br />

can download it <strong>at</strong> the IBM homepage. NMON has designed for Linux professions to<br />

monitor performance and analyze AIX Oper<strong>at</strong>ion system.<br />

It provide mainly these inform<strong>at</strong>ion<br />

� CPU using inform<strong>at</strong>ion<br />

� Memory using inform<strong>at</strong>ion<br />

� R<strong>at</strong>io of disk I/O, transfer, r<strong>at</strong>io of R/W<br />

� Free storage of file system<br />

� R<strong>at</strong>ion of network I/O, transfer, r<strong>at</strong>io of R/W<br />

Based on inform<strong>at</strong>ion, NMON output draws a graph, and makes a graphic file.<br />

3.4 Linux Trace Toolkit next gener<strong>at</strong>ion<br />

Through the executing trace, LTTng [3] [5] [20] [32] [39] [42] [59] analyzes the system<br />

exactly. Executing traces shows a lot of inform<strong>at</strong>ion such as task handling time, period, an<br />

assigned process inform<strong>at</strong>ion. In addition, it calcul<strong>at</strong>es the delay time of applic<strong>at</strong>ion programs<br />

or the time for a certain program to read disk.<br />

30


It is very useful for these purposes.<br />

� To understand the system problems.<br />

31<br />

<strong>Chapter</strong> 3<br />

� To analyze system performance by the monitoring system and applic<strong>at</strong>ion program.<br />

� To analyze the communic<strong>at</strong>ion network among processes.<br />

Moreover, LTTng is different with strace [69] or gprof [70] or Dtrace [18] in th<strong>at</strong> it shows<br />

whole system including inside of the kernel.<br />

By using LTTng, we can copy and record the events occurring inside of the kernel such as<br />

thread, fork, interrupt, signal, and memory inform<strong>at</strong>ion, etc. from the kernel space to user<br />

space quickly. In addition to using LTTV (Linux Trace Tool Viewer) [3] [5], we can record<br />

and review the event log visually, and the overhead is reduced from 1.54 to 2.28 [3].<br />

Figure 3.5: LTTV viewer’s execution<br />

Figure 3.5 shows LTTV after event logging. It provides time inform<strong>at</strong>ion by the nanosecond.<br />

In addition, it analyzes each CPU’s event.


<strong>Chapter</strong> 3<br />

Figure 3.6: Event logging sequence of LTTng<br />

Figure 3.6 shows an event logging sequence. To execute logging events, it adds a trace point<br />

to extract an event in the kernel first. And then, it executes the LTTng daemon and event<br />

logging, and save inform<strong>at</strong>ion.<br />

32


<strong>Chapter</strong> 4<br />

Infrastructure Framework<br />

In this chapter, we will be describing Kernel Analysis System (KAS) to solve problems<br />

occurring in kernel. Infrastructure of KAS is composed of three main layers. Each layer has<br />

following as:<br />

� Detection Layer (DL): In this layer, problems occurred in the kernel is found by<br />

using event log and saved the line inform<strong>at</strong>ion (start line and end line), making it<br />

easier when used by other layer. Also, by counting overall problem occurrence, it is<br />

possible to check the entire problem occurrence r<strong>at</strong>io.<br />

� Separ<strong>at</strong>ion Layer (SL): In this layer, a problem occurred as event log is divided from<br />

whole event logs. By dividing event logs, a developer and administr<strong>at</strong>or can check<br />

the problem easily and possible to find out wh<strong>at</strong> event has really occurred.<br />

� Analysis Layer (AL): In this layer, by using problem occurred an event log, we can<br />

easily and quickly detect the cause of problem by displaying execution time,


<strong>Chapter</strong> 4<br />

execution times, and whole l<strong>at</strong>ency time of each event.<br />

4.1 Introduction<br />

As mentioned above, embedded systems often used in daily life mostly. One of most<br />

important factors in embedded systems is the characteristic of real-time. Real-time<br />

characteristic is the most important measure th<strong>at</strong> divides the general Linux kernel and the<br />

embedded kernel. In other words, an embedded system has a strong real-time characteristic.<br />

Automobile brake system can be a simple example of an embedded system. The automobile<br />

brake system is one of real-time systems th<strong>at</strong> must not allow the delay. If l<strong>at</strong>ency of the<br />

automobile break system occurred th<strong>at</strong> causes a traffic accident. Also, real-time must be<br />

assured for home appliances used, such as microwaves and washing machines. Time is an<br />

important factor for the navig<strong>at</strong>ion system of an airplane or a weapon system. As mentioned<br />

above, most of embedded systems are cre<strong>at</strong>ed by concerning real-time characteristics as one<br />

of important factors. From small and light devices to very large devices, most of embedded<br />

systems have to guarantee the deadline and when delay occurs, there is high a possibility of<br />

serious accident.<br />

Therefore, there is a need for a tool to analyze a l<strong>at</strong>ency problem of timers [33] and other<br />

problems occurring in the kernel. An excellent kernel analysis tool is the most important for<br />

problem solving and applic<strong>at</strong>ion development. There are a lot of kernel analysis tools in other<br />

to analyze the Linux kernel. Some are provided as commerce products and some are provided<br />

as open sources. Kernel analysis tools are essential to any kernels but most of kernel analysis<br />

tools are not complete. Very basic analysis tools are partly provided by Linux and it is<br />

34


impossible to analyze every problem by the tool.<br />

35<br />

<strong>Chapter</strong> 4<br />

An analysis tool is changed according to development environment. If the main purpose is<br />

network analysis, there are well-known tools such as ethereal [72], MRTG, and Ntop. In<br />

addition, there are open resource programs such as Nagios and JFFNMS [71]. There are<br />

nmon, strace, and many other usable tools for memory or other monitoring tools.<br />

Undoubtedly, before solving problems by using mentioned tools, it is syslog [48] th<strong>at</strong> can be<br />

very simply checked in Linux. Mostly, log is recorded in /var/log/message but by changing<br />

established value in /etc/syslog.conf log can be saved in a certain place. Syslog file is a text-<br />

based message log recorded by the syslog daemon. By w<strong>at</strong>ching this file periodically, it is<br />

possible to trace important hint on common system stability such as lack of disk space,<br />

memory lack, I/O error, device failure.<br />

As mentioned above, although there are analysis tools and event logging tools, these tools<br />

only have ability to save logged event inform<strong>at</strong>ion in a text or to show them in a viewer.<br />

Therefore, in this thesis we propose KAS which can analyze the cause of a problem quickly<br />

and efficiently by using even inform<strong>at</strong>ion th<strong>at</strong> has been logging.<br />

4.2 Kernel Analysis System<br />

Normally, kernel analysis tools show or output st<strong>at</strong>us inform<strong>at</strong>ion of kernel (CPU<br />

utiliz<strong>at</strong>ion, memory inform<strong>at</strong>ion, time inform<strong>at</strong>ion, etc.) in text type. Sometimes, an event<br />

th<strong>at</strong> has been logging is displayed by using a viewer which makes it easier for a developer or<br />

an administr<strong>at</strong>or to see. However, the analysis is not easy for a developer and a system<br />

administr<strong>at</strong>or. In a case of text mode, it is difficult to find out where the problem has occurred


<strong>Chapter</strong> 4<br />

due to its large amount. Moreover, as inform<strong>at</strong>ion displayed in a viewer, event inform<strong>at</strong>ion is<br />

normally outputted in nanosecond, making it difficult to find the problem and the cause of<br />

problems. Figure 4.1 shows the normally used an event analysis method. An administr<strong>at</strong>or or<br />

a developer analyzes by choosing between two analysis methods, text or viewer.<br />

Figure 4.1: Normal method of event logging and analysis<br />

If the problem and the cause of problem were quickly and efficiently analyzed by using<br />

event inform<strong>at</strong>ion occurred from the kernel, the development time of the embedded system<br />

will decrease and the reliability and stability will increase.<br />

Figure 4.2 shows the architecture of Kernel Analysis System (KAS). 10-20 years ago, the<br />

embedded system was developed mainly about simple work and a number of processes<br />

cannot be running in one system. However, recently it is essential to process many programs<br />

(mail, internet, music player, movie player, game, etc.) to be processed in one embedded<br />

system. Increasing of complexity will increase the possibility of the problem occurrence in<br />

the kernel. As increased system complexity is become harder and harder to solve it. Therefore,<br />

to analyze problems th<strong>at</strong> occur in the kernel, a solution can be found by analyzing event<br />

inform<strong>at</strong>ion.<br />

36


Figure 4.2: Kernel Analysis System Architecture<br />

37<br />

<strong>Chapter</strong> 4<br />

For example, in order to analyze the timer l<strong>at</strong>ency, not only timer event but also all the<br />

inform<strong>at</strong>ion regarding to events (for example, system call, interrupt, thread, memory etc.) th<strong>at</strong><br />

occurred in the kernel must be analyzed. If we want to analyze the specific problem, we have<br />

to input the hook point into the kernel source for logging the event inform<strong>at</strong>ion.<br />

Figure 4.3: Event log process flow of KAS<br />

Figure 4.3 shows the flow of the event log process in KAS. We can analyze the problem<br />

more efficiently than normal event log solving methods (example Figure 4.1).


<strong>Chapter</strong> 4<br />

Advantage of analysis by using KAS is following.<br />

� Fast Problem Diagnosis<br />

Normally, due to a large amount of logs when diagnosing by looking <strong>at</strong> text and<br />

viewer, it consumes a large amount of time and effort, but if a developer uses KAS,<br />

it is possible to find problem quickly.<br />

� Reduction of development time<br />

When developing a system, it takes more time to analyze an error, a bug, or<br />

performance improvement than coding works. Therefore, if we can quickly find<br />

the errors and the by using KAS it is possible to decrease developing time.<br />

� Occurrence r<strong>at</strong>e of bug and error<br />

If it is possible to diagnose and solve an error or a bug accur<strong>at</strong>ely when developing<br />

a system, we can decrease the occurrence r<strong>at</strong>e of problems. Moreover, as one error<br />

or bug can be the cause of occurrence of another error or bug it is very helpful to<br />

decrease the problem occurrence r<strong>at</strong>e of the whole system by solving one problem<br />

accur<strong>at</strong>ely.<br />

� After development<br />

Even though every problem was solved during the development period of<br />

embedded systems, there is 80% possibility of an occurrence r<strong>at</strong>e of problem in<br />

commercialized embedded systems. In other words, there is high possibility of<br />

occurrence r<strong>at</strong>e of problem when an embedded system is used by normal<br />

38


39<br />

<strong>Chapter</strong> 4<br />

customers. Therefore, there is problem occurred in commercialized embedded<br />

system, a developer and a system manager can minimized their damage quickly by<br />

analyzing problem with KAS.<br />

� Increase of system`s stability<br />

By solving the embedded system`s problem occurred in the development period,<br />

the system stability can get better.<br />

We suggest a system which can analyze kernel events, find out problems for the kernel<br />

and propose an effective solution. Because developing in an embedded system is in cross<br />

development environment, it differs from developing in a server or PC. Therefore, if a timer<br />

problem occurs, more time and effort is needed to fix up in an embedded system compared<br />

in server or PC environment for a system developer of an embedded system, the system we<br />

suggest would enhance the convenience in development and the stability in the system.<br />

4.2.1 Detection Layer<br />

If some problems occur, it is the most important thing to figure out the cause of the<br />

problems. Thus, it is important to find out where the problem came from when the problems<br />

occur in the embedded kernel which is used to invent embedded system. Embedded Linux<br />

which is frequently used in an embedded systems sees several events as system moves. It is<br />

hard to know the best way of debugging when problems occur in this kind of complic<strong>at</strong>ed<br />

systems. A number of inventors and experts are actually looking for the way to find out and<br />

solve the problem quickly. It happens to take 6 months solving one problem during the<br />

development period or it can be solved immedi<strong>at</strong>ely sometimes because there are various


<strong>Chapter</strong> 4<br />

ways of debugging according to the level and fe<strong>at</strong>ures of the problems. While there are some<br />

problems which ask for a lot of time to analyze the problems. On the other hand, there are<br />

also problems which can be debugged simply. However, most of the real projects do not<br />

include simple problems. Therefore, the important issue here is th<strong>at</strong> how to solve the problem.<br />

� Reproduction of problem<strong>at</strong>ic situ<strong>at</strong>ion<br />

In order to solve the problem, it is important to be well aware of the way to<br />

reproduce the problem. How the error occurs and how it causes the systemic<br />

problems are very vital. However, the problems occurred from an embedded<br />

system varies from the simple problem resulted from the error of one source code<br />

to the complic<strong>at</strong>ed problems which are accumul<strong>at</strong>ed by each of single problems<br />

causing butterfly effect. For instances, in case of embedded Linux OS, although the<br />

memory leak problem does not have any problems during the short term period test,<br />

it might occur in long, repe<strong>at</strong>ed time test. In this situ<strong>at</strong>ion, it is hard to link cause<br />

and result unless cre<strong>at</strong>ion of memory leak is checked. If a program is written to be<br />

locked after giving 1Kbyte to function in device driver, it would take long time for<br />

a system to stop due to memory lack. Therefore, how fast it can be replayed is a<br />

good stepping-stone for debugging.<br />

� Understanding of a problem clearly.<br />

By working with various engineers it is possible to see many fe<strong>at</strong>ures of<br />

engineers. Some developers concentr<strong>at</strong>ed in ‘Copy & Paste’, too sensitive th<strong>at</strong><br />

lines up space of program line, and a fast developer, etc. Among these developers,<br />

it is one who knows specific details about cre<strong>at</strong>ed codes th<strong>at</strong> have a highest ability<br />

40


41<br />

<strong>Chapter</strong> 4<br />

in debugging. Without a question, the system engineers need some conditions to<br />

debug the Linux system.<br />

� To understand the Linux system deeply.<br />

� To understand the rel<strong>at</strong>ionship between hardware and software.<br />

� To have p<strong>at</strong>ience for solving of problem.<br />

� An accur<strong>at</strong>e analysis of problem.<br />

� Using a debugger<br />

When a problem occurs during developing, a developer must be done quickly by<br />

using a debugger. If one is skillful with a debugger program, it is possible to do the<br />

debugging quickly.<br />

� Approach for problem solving<br />

To understand the problem of complic<strong>at</strong>ed embedded systems, it is better to start<br />

analyzing when driver or applic<strong>at</strong>ion is minimized and gradually approaching to<br />

problem than analyzing while numerous service programs, such as applic<strong>at</strong>ion,<br />

device driver, are running It is undoubted story. To do debugging process in large<br />

OS, it is easy to look into the problem by analyzing fractionally.<br />

We have mentioned various methods for debugging. A skillful developer will know above<br />

methods very well. However, not every developer is skillful and it is possible for the<br />

developer with many experiences to spend long time solving the problem if the problem is<br />

very complic<strong>at</strong>ed. Therefore, DL is a step th<strong>at</strong> check the problem in a system and if a


<strong>Chapter</strong> 4<br />

problem has occurred it find where it was occurred. If looking <strong>at</strong> the period when a problem<br />

has occurred, there are chances th<strong>at</strong> the problem may occur when embedded systems are<br />

developing and when a user is using embedded systems.<br />

4.2.1.1 When developing embedded system<br />

Commonly, embedded devices used by users are products cre<strong>at</strong>ed by a developer. These<br />

products were completed after going through a number of tests by many developers and<br />

adjusting many bugs and errors. Then there needs a way to solve numerous problems when a<br />

developer develops a system. It is possible to solve the problem by using lots of existing<br />

debugging methods but there needs to be a method to solve the problem much more quickly<br />

and efficiently. KAS does the event logging by using LTTng and solve the problem based on<br />

the logged d<strong>at</strong>a. Analysis method of text-based logs is accur<strong>at</strong>e but takes too much time.<br />

Because there are large amount of d<strong>at</strong>a to analyze and to need knowledge of event<br />

inform<strong>at</strong>ion. When one does event logging for 5 to 10 minutes by LTTng, the amount of test<br />

d<strong>at</strong>a is a few gigabytes to over 10 gigabytes. It is possible to check this kind of logs by<br />

looking <strong>at</strong> it from the top to bottom to find out why a problem has occurred. The traditional<br />

debugging is the slowest way to find bugs. However, no one knows how long it will take to<br />

debug serious problems. It might take days, weeks, or months. Therefore, it is important to<br />

find the cause of the problem quickly and efficiently when a problem has occurred.<br />

4.2.1.2 Used by users<br />

According to the Ganssle Research Group of United St<strong>at</strong>es of America, they say<br />

“80% of all embedded systems are delivered l<strong>at</strong>e” and “New code generally has 50 to 100<br />

42


43<br />

<strong>Chapter</strong> 4<br />

bugs per 1000 lines”. It means there is a possibility of a problem occurrence in<br />

embedded systems. Generally, as the development period of an embedded system is<br />

pretty short, it lacks enough test and verific<strong>at</strong>ion. Therefore, we need to consider how<br />

to solve such a problem. The answer is to solve quickly using event logs when a<br />

problem occurred. It is a job of a developer or an administr<strong>at</strong>or to find out the cause<br />

of the problem.<br />

4.2.1.3 Process Flow of DL<br />

In DL, a problem is found by logged event inform<strong>at</strong>ion. To find a problem by using KAS, it<br />

is needed to define the problem which one would like to find out (for instance, for timer:<br />

algorithm on whether a timer has passed deadline or not). It is easy to define the timer l<strong>at</strong>ency<br />

problem because we only need to check whether it passed the deadline of a process. In Figure<br />

4.4 HRTimer_Tick means the deadline (expired time of task) of a high-resolution timer and<br />

HRTimer_L<strong>at</strong>ency means the whole time before the high-resolution timer expired (including<br />

l<strong>at</strong>ency). Therefore, the problem of a timer can be defined as whether running time of a<br />

certain process has passed or not passed to the deadline.<br />

Figure 4.5 is the flow of DL’s processing. Firstly, as shows in Figure 4.4, DL defines a<br />

problem and checks the whole d<strong>at</strong>a from the top to bottom and find out problems. If the<br />

problems are not found, it continues to check without any results but if there is a problem, it<br />

saves the line inform<strong>at</strong>ion and checks how many times the problem has occurred. When the<br />

search is finished to the bottom, Separ<strong>at</strong>ion Layer, which is the next layer, will be processed.


<strong>Chapter</strong> 4<br />

4.2.2 Separ<strong>at</strong>ion Layer<br />

Figure 4.4: Problem definition<br />

Figure 4.5: Process flow of detection layer<br />

Commonly, to find a problem from the Linux kernel, a developer or administr<strong>at</strong>or analyzes<br />

text-based low d<strong>at</strong>a or image inform<strong>at</strong>ion using d<strong>at</strong>a viewer to analyze the problem. Among<br />

44


45<br />

<strong>Chapter</strong> 4<br />

them, the most accur<strong>at</strong>e method is to analyze the text-based d<strong>at</strong>a. However, to analyze by the<br />

text-based d<strong>at</strong>a has low efficiency in the usage of time. Therefore, in SL, to separ<strong>at</strong>e the<br />

problem event logs from a whole event log is a gre<strong>at</strong> help to a programmer. Separ<strong>at</strong>ed d<strong>at</strong>a<br />

can be used for an administr<strong>at</strong>or or a developer to have the accur<strong>at</strong>e problem diagnosis<br />

Figure 4.6: Event log separ<strong>at</strong>ed in SL<br />

Figure 4.6 is event logs when the problem occurred th<strong>at</strong> was separ<strong>at</strong>ed from the whole event<br />

logs. The method of separ<strong>at</strong>ion is based on line inform<strong>at</strong>ion th<strong>at</strong> received from DL. By<br />

reading the entire event log from the top to bottom, we can separ<strong>at</strong>es event logs th<strong>at</strong> m<strong>at</strong>ches<br />

with line inform<strong>at</strong>ion of DL.<br />

Figure 4.7 shows the process flow of SL. After reading result line inform<strong>at</strong>ion (start line and<br />

end line) of DL, SL compares the start line and the whole event log line. After reading the<br />

end line from line inform<strong>at</strong>ion, SL starts the separ<strong>at</strong>ion work. If line inform<strong>at</strong>ion is read until<br />

the end of the line than KAS executes AL.


<strong>Chapter</strong> 4<br />

4.2.3 Analysis Layer<br />

Figure 4.7: Process flow of Separ<strong>at</strong>ion Layer<br />

Normally, a log is provided by analysis tools but they do not analyze problems. However,<br />

Analysis Layer (AL) of KAS does not accur<strong>at</strong>ely analyze the cause of the problem. Analysis<br />

is a job of an administr<strong>at</strong>or or a developer. However, by analyzing the result of AL, it is<br />

possible to find out where, why, and when the problem has occurred. To do this, it is possible<br />

by using the st<strong>at</strong>istics inform<strong>at</strong>ion done in AL. Figure 4.8 shows the process flow of AL. First,<br />

based on the result from DL and SL, AL reads the result of DL and the result of SL. Next, it<br />

checks how many times each event has occurred and their execution time. Also, it calcul<strong>at</strong>es<br />

and saves the l<strong>at</strong>ency time of the occurred problem in event logs. Of course, if a developer or<br />

an administr<strong>at</strong>or needs more inform<strong>at</strong>ion than mentioned in this thesis, we can take needed<br />

inform<strong>at</strong>ion by the modific<strong>at</strong>ion program of AL.<br />

46


4.3 KAS Algorithm<br />

Figure 4.8: Process flow of analysis layer<br />

47<br />

<strong>Chapter</strong> 4<br />

Processing the order of each layer is decided in KAS. As the result of each layer is used in<br />

other layer, the process of one layer must be finished to run another layer.<br />

Figure 4.9: Dependency of each module is KAS<br />

Figure 4.9 shows the dependency of each layer. A result after processing DL is used by SL.<br />

A result after processing DL and SL is being used by AL. Of course, when an administr<strong>at</strong>or


<strong>Chapter</strong> 4<br />

and a developer analyzes a problem, they need to analyze accur<strong>at</strong>ely by using every result<br />

from each layer. In some the cases, the cause of a problem can be found just by analyzing<br />

results from SL, but to have the accur<strong>at</strong>e analysis, it is recommended to use every result from<br />

each layer.<br />

Figure 4.10: Pseudo code of KAS<br />

48


49<br />

<strong>Chapter</strong> 4<br />

In Figure 4.10, the pseudo code shows the rel<strong>at</strong>ion between each layer. First, in the<br />

detection layer, KAS checks whether the kernel problem occurred or not. Although it was e<br />

xplained in Figure 4.9, if it happened, the detection_problem() function saves inform<strong>at</strong>ion<br />

of the loc<strong>at</strong>ion and the number of times errors occurred. Next, in the separ<strong>at</strong>ion layer, the<br />

separ<strong>at</strong>ion_d<strong>at</strong>a() function separ<strong>at</strong>es the events of from the whole event log by using<br />

position_d<strong>at</strong>a (line inform<strong>at</strong>ion). After th<strong>at</strong>, the save_separ<strong>at</strong>ion_d<strong>at</strong>a() function saves the<br />

inform<strong>at</strong>ion. Finally, in the analysis layer, the analysis() function analyzes the inform<strong>at</strong>ion,<br />

and the analysis_save_d<strong>at</strong>a() function unifies and save the d<strong>at</strong>a analyzed. A problem solution<br />

can be more easily and effectively found by analyzing the cause of the problem using the<br />

results from the three steps defined above.<br />

4.3.1 Important Function and Parameter<br />

In this section we will be explaining about various parameter functions used by KAS.<br />

We will introduce important parameters among parameters declared by KAS. First, the<br />

most important parameter is event_name. The variable is read before analyzing by KAS, it<br />

decides wh<strong>at</strong> events will KAS analyze by reading event_name variable. Next we have the<br />

event_time variable. This parameter is a variable th<strong>at</strong> saves the performance time of every<br />

event. By looking <strong>at</strong> event_time, one decides how much time each event used. After th<strong>at</strong>,<br />

there is the event_description variable. This variable is a variable th<strong>at</strong> saves inform<strong>at</strong>ion of<br />

each event except for event name and time inform<strong>at</strong>ion (PID, syscall_id, CPU_id, etc.).


<strong>Chapter</strong> 4<br />

Table 4.1: Important parameters in KAS<br />

Parameter Description<br />

char *event_name; variable of logging event`s name<br />

double event_time; variable of processing time of each event<br />

char *event_description;<br />

Inform<strong>at</strong>ion on each event<br />

(PID, syscall_id, CPU_id, etc.)<br />

Figure 4.11 shows how parameters explained in Table 4.1 m<strong>at</strong>ched with actual event<br />

inform<strong>at</strong>ion value.<br />

Figure 4.11: parameter and event log<br />

Table 4.2 show three most important functions. As mentioned above, KAS is largely divided<br />

in to three layers. The problem_detection() function is the main function in DL and the<br />

Table 4.2: Important functions in KAS<br />

Function Description<br />

void problem_detection(); Function to find occurred problem.<br />

void separ<strong>at</strong>ion_d<strong>at</strong>a();<br />

Function th<strong>at</strong> separ<strong>at</strong>ed event log from the whole event<br />

log.<br />

void problem_analysis(); Function th<strong>at</strong> analyze event logs by using result of SL.<br />

50


51<br />

<strong>Chapter</strong> 4<br />

problem_separ<strong>at</strong>ion_d<strong>at</strong>a() function is the main function in SL. For lastly, the<br />

problem_analysis() function is the main function in AL.<br />

4.4 Trace Point and Event Log<br />

Most of commonly used system performance tools and kernel analysis tools can be logging<br />

after adding trace point to the kernel source. Looking <strong>at</strong> Figure 4.12, a trace point is added to<br />

the kernel source and added trace point is recognized during a running event log tracer<br />

daemon and an added event is logged.<br />

Figure 4.12: Rel<strong>at</strong>ion between trace point and event log<br />

When there is a wanted event log is KAS, a trace point is added to LTTng. Then, the event<br />

log daemon of LTTng log event. Figure 4.13 is an example of adding the trace point. First, to<br />

trace event inform<strong>at</strong>ion from the kernel, the kernel source is modified. After th<strong>at</strong>, add the<br />

event name th<strong>at</strong> was added to the kernel source to event_name.h file. By doing this, it is


<strong>Chapter</strong> 4<br />

possible to event logging using the LTTng and KAS can analyze event logs.<br />

Figure 4.13: An example showing the usage of trace point in LTTng and KAS.<br />

4.5 LTTng and KAS<br />

As simply mentioned in <strong>Chapter</strong> 3, LTTng is a performance monitoring tool th<strong>at</strong> is<br />

currently used by various corpor<strong>at</strong>ions and research centers such as Ecole Polytechnique de<br />

Montreal, Google, IBM research, Autodesk, Wind River, Montavista and STMicroelectronics.<br />

The tools are possible to log numerous inform<strong>at</strong>ion from the Linux kernel. If there is wanted<br />

inform<strong>at</strong>ion, it is possible to modify the kernel source.<br />

Figure 4.14 is the event inform<strong>at</strong>ion list of basic logging done by LTTng. Basic integr<strong>at</strong>ion is<br />

integr<strong>at</strong>ed in one group with rel<strong>at</strong>ed inform<strong>at</strong>ion and a specific event is described in the group.<br />

52


53<br />

<strong>Chapter</strong> 4<br />

There are 18 groups th<strong>at</strong> are described above and events th<strong>at</strong> are defines to be in a group is<br />

118. Basically, it is possible to analyze 118 events. However, it is possible to log other events<br />

by adding a trace point.<br />

Figure 4.14: Basic trace point offered by LTTng<br />

Figure 4.15: Procedure of problem analysis process of LTTng and KAS<br />

Figure 4.15 is the illustr<strong>at</strong>ing process flow of before and after KAS. Normally, LTTng does<br />

event logging and after th<strong>at</strong> analyze event logs by Linux Toolkit Viewer (LTTV). However,<br />

the proposed system analyzes KAS after event logging. Wh<strong>at</strong>ever analysis is done using<br />

LTTV or KAS, the problem analyzed lastly by a developer and an administr<strong>at</strong>or. Therefore,


<strong>Chapter</strong> 4<br />

KAS’s dependency with LTTng is low. It is possible to modify DL of KAS and analyze<br />

between SL and AL when other event logging tools log the events.<br />

4.6 Summary<br />

In this chapter, we described the infrastructure of KAS. KAS is composed of three layers<br />

and each layer saves the result of event log analysis. By analyzing the result of every layer, it<br />

is possible to have exact analysis on problem occurred in the Linux kernel. Therefore, we<br />

proposed the KAS th<strong>at</strong> is one of the event log analysis methods to analyze the kernel problem<br />

efficiently.<br />

54


<strong>Chapter</strong> 5<br />

Case Study<br />

The kernels such as Linux, Windows, Mac OS, Micro-kernel oper<strong>at</strong>es as time. Therefore, it<br />

is the timer which is the most important factor of kernel. Occurrence of timer delay can be<br />

the problem of kernel itself, but it can be problem of middleware or applic<strong>at</strong>ion. Especially,<br />

in case of RTOS or embedded system, timer is much more important.<br />

In this thesis, we measured the l<strong>at</strong>ency of High Resolution Timer (HRTimer) [6] in the<br />

Linux kernel. We found out and analyzed the l<strong>at</strong>ency of HRTimer in kernel by using KAS. It<br />

was proved th<strong>at</strong> when analyzing by proposed system, it is possible to find problem quickly<br />

and analyze accur<strong>at</strong>ely.


<strong>Chapter</strong> 5<br />

5.1 Timer L<strong>at</strong>ency<br />

When a process is calling a function of the Linux kernel, it uses a system call. However,<br />

when hardware is calling the Linux kernel, it uses interrupt. When the kernel receives<br />

interrupts it stops its process and oper<strong>at</strong>es an interrupt handler. It is clear th<strong>at</strong> a priority is<br />

given to the interrupt. Request of interrupt handler with higher priority will stop the lower<br />

priority task and it will resume when finished the higher priority task. All kernels oper<strong>at</strong>e by<br />

interrupt (hardware interrupt or software interrupt). As a timer also oper<strong>at</strong>es inside the kernel,<br />

it is also oper<strong>at</strong>e by the interrupt. The timer controller will gener<strong>at</strong>e interrupts periodically.<br />

Commonly, Linux timer interrupts utilize a global timer interrupt and a local timer interrupt.<br />

Timer l<strong>at</strong>ency means to miss deadline. There are two reasons the timer l<strong>at</strong>ency. Firstly,<br />

l<strong>at</strong>ency arises as there are many required tasks to run after occurrence of interrupt. Figure 5.1<br />

is shown, each l<strong>at</strong>ency required from hardware interrupts to be scheduled.<br />

Figure 5.1 Task Preemption L<strong>at</strong>ency Model<br />

� Interrupt L<strong>at</strong>ency: L<strong>at</strong>ency before starting of Interrupt Service Routine (ISR) after<br />

occurred hardware interrupts [30]. Hardware l<strong>at</strong>ency, interrupt disable l<strong>at</strong>ency,<br />

interrupt vectoring l<strong>at</strong>ency, interrupt disp<strong>at</strong>ch l<strong>at</strong>ency are included.<br />

56


57<br />

<strong>Chapter</strong> 5<br />

� Interrupt Service Routine L<strong>at</strong>ency: Until running an interrupt service routine after<br />

occurrence of interrupt.<br />

� Scheduler L<strong>at</strong>ency: Until reaching scheduler after handling interrupt service routine.<br />

� Scheduling L<strong>at</strong>ency: L<strong>at</strong>ency from start of scheduler to it ends.<br />

� Task Preemption L<strong>at</strong>ency: Until starting higher priority task after stopping lower<br />

priority task.<br />

Secondly, there is a possibility th<strong>at</strong> a delay may occur during a task is running when higher<br />

priority task is occurred compare to current running task. Figure 5.2 shows the l<strong>at</strong>ency due to<br />

priority. Normally, the l<strong>at</strong>ency such as Figure 5.2 is occurring frequently in preemptive kernel.<br />

Figure 5.2 Priority Task L<strong>at</strong>ency Model<br />

5.2 Preemptive vs. Non-preemptive<br />

The Linux kernel version before kernel 2.6 is non-preemptive kernel [15] [27] and after<br />

kernel 2.6 can choose between preemptive and non-preemptive. It is impossible to stop the<br />

process in non-preemptive kernel when the process entered from user mode [63] to kernel


<strong>Chapter</strong> 5<br />

mode [31]. In opposite, preemptive kernel controls to process th<strong>at</strong> can be stopped forcefully<br />

by using scheduling policy or other interrupt when process is working as kernel mode. FCFS<br />

(First-Come-First-Served) is the represent<strong>at</strong>ive non-preemptive scheduling and Round-Robin<br />

is the represent<strong>at</strong>ive preemptive scheduling.<br />

5.2.1 Preemptive Kernel<br />

The importance of embedded kernels [1], as same as all other OS is to have preemption. If<br />

the Linux kernel has a preemption function th<strong>at</strong> is the preemptive kernel. The preemptive<br />

kernel, the real-time characteristic, means there will be guarantee on deadline of high-priority<br />

task. As the response time of the real-time kernel in embedded systems is directly rel<strong>at</strong>ed to<br />

the safety and reliability of the systems, it is needed to minimize interval between response<br />

times by using the preemptive kernel.<br />

Figure 5.3 Process of interrupt of preemptive kernel<br />

Figure 5.3 shows the order of interrupts in the preemptive kernel. If the interrupt of a high<br />

58


59<br />

<strong>Chapter</strong> 5<br />

priority task occurs while a low priority task is running, the low priority task goes to the sleep<br />

mode and high priority task starts working. The preemptive kernel can run the low priority<br />

task after high priority task ends.<br />

5.2.2 Non-preemptive Kernel<br />

Non-preemptive section is very important in general-purpose OS and RTOS as priority is<br />

not allowed in non-preemptive section. RTOS is decided the performance by response time in<br />

non-preemptive section. Wh<strong>at</strong>ever there is a request of high priority task, it cannot be<br />

performed immedi<strong>at</strong>ely. There are some problems with response time. When the l<strong>at</strong>ency of a<br />

certain non-preemptive section is 10 seconds, the real-time task can be running after 10<br />

seconds. Therefore, problem of non-preemptive task is solved in general Linux by using<br />

locking of critical section.<br />

Figure 5.4 Process of interrupt of non-preemptive kernel


<strong>Chapter</strong> 5<br />

Figure 5.4 shows the interrupt process in the non-preemptive kernel. Unlike the preemptive<br />

kernel, a low priority task keeps running even though high priority task occurs during the<br />

working of low priority task. After the low priority task ends, the kernel runs the high priority<br />

task.<br />

5.3 High Resolution Timer<br />

We already have a timer subsystem (kernel/timers.c), why do we need two timer<br />

subsystems? Normally, the most fine-grained time supported by the timer in Linux kernel is<br />

1ms. However, embedded Linux needs much more fine grained time. Therefore, any system<br />

engineers are trying to integr<strong>at</strong>e high-resolution and high-precision fe<strong>at</strong>ures into the existing<br />

timer framework. However, general Linux timer cannot support accuracy of microseconds.<br />

HRTimer provides microsecond resolution with lower overhead and controls time more el<br />

abor<strong>at</strong>ely than other timer. It is not possible to use HRTimer in every system. To use<br />

HRTimer supported from hardware. The HRTimer system allows a user space program to be<br />

wake up from a timer event with better accuracy, when using the POSIX timer APIs. Without<br />

this system, the best accuracy th<strong>at</strong> can be obtained for timer events is 1 jiffy. This depends on<br />

the setting of HZ in the kernel. In the 2.4 kernel, HZ was set to 100, which means th<strong>at</strong> the<br />

best accuracy you could get on a timer wakeup in user space was 10 milliseconds.<br />

In other to use HRTimer needs as follows:<br />

� Need to verify th<strong>at</strong> the kernel has support for this fe<strong>at</strong>ure for your target<br />

processor (and board).<br />

60


� Need to configure support for it in the Linux kernel.<br />

� Set CONFIG_HIGH_RES_TIMERS=y in kernel config.<br />

� Compile the kernel.<br />

The timer th<strong>at</strong> support microsecond APIs are as follows:<br />

61<br />

<strong>Chapter</strong> 5<br />

� timer_settime(): sets the time until the next expir<strong>at</strong>ion of the timer is specified by<br />

timer-id.<br />

� setitimer(): system provides each process with an interval timer. When the timer<br />

expires, a signal is sent to the process, and the timer expired.<br />

� nanosleep():set up high resolution sleep.<br />

� ualarm(): cause the SIGALRM signal to be gener<strong>at</strong>ed for the calling process after<br />

the number of microseconds.<br />

� usleep(): cause the calling thread to be suspended from execution until either the<br />

number of real-time microseconds<br />

The HRTimer is not occurring timer interrupt periodically. The period of HRTimer can set<br />

up a programmer.<br />

5.4 L<strong>at</strong>ency Policy<br />

In this thesis, we describe about timer l<strong>at</strong>ency and policy of HRTimer to measure HRTimer


<strong>Chapter</strong> 5<br />

l<strong>at</strong>ency. Figure 5.5 shows the model of the HRTimer l<strong>at</strong>ency. When a local apic interrupt<br />

occurs software interrupts are occurred after an interrupt handler takes action. If HRTimer<br />

expired, it is stopped by kernel_timer_itimer_expired.<br />

Figure 5.5 Hrtimer l<strong>at</strong>ency model<br />

We define the HRTimer l<strong>at</strong>ency model as follows:<br />

�<br />

�<br />

lapic<br />

T : The time from occurring HRTimer hardware interrupt to occurring hardirq<br />

handler to be expired.<br />

softirq<br />

T : The time from occurring softirq to until processing softirq handler<br />

expired<br />

� . T : The time from after softirq handler to until expired HRTimer.<br />

When there is delay in timer, following formula is required to check how delay was<br />

occurred.<br />

� Formula (1):<br />

time<br />

T Means a time of the HRTimer’s execution which is the all of time<br />

for HRTimer processing time and HRTimer l<strong>at</strong>ency.<br />

� Formula (2): Checking whether time l<strong>at</strong>ency occurred or not by comparing<br />

62<br />

tick<br />

HRT to


time tick<br />

T ( HRT is time set by a programmer). If<br />

time l<strong>at</strong>ency happened.<br />

5.5 Evalu<strong>at</strong>ion<br />

63<br />

<strong>Chapter</strong> 5<br />

l<strong>at</strong>ency<br />

HRT > 0, the HRTimer consider<br />

time lapic softirq expired<br />

T = T + T + T<br />

(1)<br />

l<strong>at</strong>ency time tick<br />

HRT = T - HRT<br />

(2)<br />

This section addresses the specific<strong>at</strong>ion of experiments set up and evalu<strong>at</strong>ion of HRTimer<br />

l<strong>at</strong>ency. The system is with a 1.83GHz Intel Pentium 4 uniprocessor and 1GB RAM, on<br />

which is running a Linux kernel 2.6.24.<br />

First of all, we apply the LTTng p<strong>at</strong>ch to the Linux kernel in order to collect event logs.<br />

We use the setitimer() system call to send a SIGALRM signal to processor when a timer is<br />

finished, the function of setitimer() occurs interrupts in the process itself <strong>at</strong> certain future<br />

time. Figure 5.6 shows the setitimer function. First, it is possible to input ITIMER_REAL,<br />

ITIMER_VIRTUAL, and ITIMER_PROF as a real-time timer. In this experiment, we use the<br />

ITIER_REAL argument which is a real-time timer th<strong>at</strong> is not rel<strong>at</strong>ed to running of process and<br />

gener<strong>at</strong>es SIGALRM after time out. The second argument is possible to set time value and<br />

time out is gener<strong>at</strong>ed after the set time value. Also, it is more accuracy than alarm and<br />

possible to set exact time value.


<strong>Chapter</strong> 5<br />

Figure 5.6 Interface of setitimer<br />

Figure 5.7 set up setitimer<br />

Figure 5.7 shows how setitimer oper<strong>at</strong>es. Setitimer establishes two time values. it_value<br />

sets the first period of oper<strong>at</strong>ion and it_interval sets the value of oper<strong>at</strong>ion time after the first<br />

period.<br />

We set<br />

tick<br />

HRT as 100μs and 1ms and set the cycle of repetition as 100,000 with heavy<br />

background load. Figure 5.8 shows the repetition process when HRTimer period is set to<br />

100μs. From first period to n-th period it is oper<strong>at</strong>ed continuously and when measuring every<br />

cycle calcul<strong>at</strong>ed the HRTimer l<strong>at</strong>ency separ<strong>at</strong>ely.<br />

Figure 5.8 Periodic process of HRTimer is set by 100μs<br />

64


The<br />

l<strong>at</strong>ency<br />

HRT analysis is based on a loop as follows:<br />

� reads the hardirq for HRTimer<br />

� reads softirq of HRTimer<br />

� reads itimer_expired time<br />

� computes formula (1) and formula (2)<br />

5.5.1 Result of KAS<br />

65<br />

<strong>Chapter</strong> 5<br />

In this section, we explain the results analyzed by KAS. The result of each layer is outputted<br />

in text type d<strong>at</strong>a.<br />

5.5.1.1 Result of DL<br />

Table 5.1 shows the results from the Detection Layer. L<strong>at</strong>ency is expressed as<br />

time tick<br />

T - HRT , and it means HRTimer l<strong>at</strong>ency. We defined equ<strong>at</strong>ion<br />

l<strong>at</strong>ency<br />

HRT ³ 100μs as<br />

l<strong>at</strong>ency, and record the number of l<strong>at</strong>ency times where l<strong>at</strong>ency-count as Table 5.1. It records<br />

not only the l<strong>at</strong>ency-time but also the line inform<strong>at</strong>ion (start line and end line) where the<br />

l<strong>at</strong>ency has occurred.<br />

Figure 5.9 shows a source code which saves the line inform<strong>at</strong>ion causing the l<strong>at</strong>ency. As<br />

l<strong>at</strong>ency_start_end[start_end_num][] is two dimensional arrangement variable on which the<br />

start_end_num saved the l<strong>at</strong>ency times and on the back part, inform<strong>at</strong>ion of start line and end<br />

line of l<strong>at</strong>ency is saved.


<strong>Chapter</strong> 5<br />

Table 5.1: Result of Detection Layer<br />

HRTimer l<strong>at</strong>ency (ns) L<strong>at</strong>ency count Start line End line<br />

100,032 1 1,203 1,588<br />

116,693 2 5,606 5,810<br />

1,423 - - -<br />

… … … …<br />

100,806 15 18,445 18,548<br />

950 - - -<br />

93 - - -<br />

5,393 - - -<br />

100,465 16 31,220 31,310<br />

1423 - - -<br />

548 - - -<br />

46 - - -<br />

103,898 17 45,101 45,204<br />

432 - - -<br />

Figure 5.9: Source code of DL for line inform<strong>at</strong>ion<br />

66


5.5.1.2 Result of SL<br />

67<br />

<strong>Chapter</strong> 5<br />

In SL, it reads the l<strong>at</strong>ency_start_end value, which is the loc<strong>at</strong>ion inform<strong>at</strong>ion saved in DL,<br />

and its event log in same time. After th<strong>at</strong>, it separ<strong>at</strong>es d<strong>at</strong>a of l<strong>at</strong>ency from whole event log.<br />

Figure 5.10: Result of Separ<strong>at</strong>ion Layer<br />

Figure 5.10 shows the result of SL. It separ<strong>at</strong>es every event occurred from<br />

smp_apic_timer_interrupt_entry to smp_apic_timer_interrupt_exit, which shows entry and<br />

exit of HRTimer interrupt.<br />

5.5.1.3 Result of AL<br />

AL is the last layer of KAS and it finds st<strong>at</strong>istics inform<strong>at</strong>ion about every delay occurred in<br />

HRTimer. St<strong>at</strong>istic inform<strong>at</strong>ion is outputted in every period and it is possible to trace l<strong>at</strong>ency<br />

by using the st<strong>at</strong>istic inform<strong>at</strong>ion.


<strong>Chapter</strong> 5<br />

Table 5.2: Result of Analysis Layer<br />

Event name Execution times Consumption (ns)<br />

kernel_arch_syscall_exit 114 129,687<br />

kernel_arch_syscall_entry 114 269,678<br />

kernel_sched_try_wakeup 8 16,628<br />

kernel_timer_itimer_expired 1 688<br />

kernel_softirq_raise 2 2,976<br />

kernel_softirq_exit 3 2,257<br />

kernel_softirq_entry 3 4,182<br />

kernel_timer_set 4 4,013<br />

kernel_timer_upd<strong>at</strong>e_time 1 1,955<br />

kernel_send_signal 2 2,939<br />

kernel_irq_exit 3 9,675<br />

kernel_irq_entry 3 15,591<br />

mm_page_free 9 18,643<br />

mm_page_alloc 351 553,913<br />

fs_writev 2 1,616<br />

fs_write 1 1,010<br />

fs_read 4 3,317<br />

fs_ioctl 4 3,126<br />

fs_pollfd 24 19,678<br />

fs_select 66 54,945<br />

kernel_sched_schedule 6 9,645<br />

input_event 7 30,820<br />

net_socket_call 85 53,884<br />

net_socket_sendmsg 85 68,806<br />

net_dev_receive 6 4,761<br />

net_dev_xmit 6 15,093<br />

68


69<br />

<strong>Chapter</strong> 5<br />

Table 5.2 shows the st<strong>at</strong>istic inform<strong>at</strong>ion on occurred l<strong>at</strong>ency such as event names and<br />

execution times and consumption time. Execution time shows how many times each event<br />

was called in a period and consumption time shows overall time of each event during a<br />

period in nanosecond. From Table 5.2, it is possible to analyze event (mm_page_alloc,<br />

net_socket_call, and net_socket_sendmsg) has higher number than other events by analyzing<br />

execution times and consumption time.<br />

Table 5.3: Result of Analysis Layer: (a) Execution times when the time l<strong>at</strong>ency did not<br />

occur, (b) Execution times when the time l<strong>at</strong>ency occurred<br />

Event name (a) Execution times (b) Execution times<br />

kernel_arch_syscall_entry 37 92<br />

kernel_arch_syscall_exit 37 91<br />

net_socket_recvmsg 0 2<br />

net_socket_sendmsg 0 85<br />

net_dev_xmit 0 85<br />

mm_page_alloc 3 359<br />

mm_page_free 3 20<br />

… … …<br />

kernel_softirq_entry 1 6<br />

kernel_softirq_exit 1 6<br />

kernel_timer_itimer_expired 1 1<br />

Table 5.3 also shows the results of the analysis layer. In the table, the case (a) means<br />

execution times when the time l<strong>at</strong>ency did not occur, and the causes (b) means execution<br />

times when the time l<strong>at</strong>ency occurred by the network stress and I/O stress program. By<br />

comparing the case (a) to the case (b), we can figure out wh<strong>at</strong> event cause the time l<strong>at</strong>ency. In<br />

the result, the events - net_socket_sendmsg, net_dev_xmit, mm_page_alloc - were executed


<strong>Chapter</strong> 5<br />

most of time. Especially, the mm_page_alloc event caused the biggest time l<strong>at</strong>ency.<br />

Consequently, we can find the events which are occurred l<strong>at</strong>ency.<br />

5.5.2 Analysis of HRTimer L<strong>at</strong>ency<br />

In this section, we analyze cause of delay in HRTimer and describe a solution based on the<br />

analyzed result by KAS.<br />

Figure 5.11: Event log of part where delay occurred<br />

70


71<br />

<strong>Chapter</strong> 5<br />

Figure 5.11 is the d<strong>at</strong>a showing the part where HRTimer l<strong>at</strong>ency occurred. According to the<br />

result of AL, events such as net_socket_sending, net_dev_xmit, and mm_page_alloc have the<br />

occurred a lot. These events were oper<strong>at</strong>ed before the HRTimer event (HRTIMER_SOFTIRQ)<br />

was processed and according to the analysis of the kernel source, these events have higher<br />

priority than softriq of HRTimer.<br />

Figure 5.12 shows the result analyzed based on event logs is Figure 5.11. Figure 5.12 shows<br />

how HRTimer analyzes its l<strong>at</strong>ency. In the process of executing, between each softirq handler<br />

execution, HRTimer softirq (HRTIMER_SOFTIRQ) is executed. We can find th<strong>at</strong><br />

run_hrtimer_softirq() occurred after net_dev_xmit (NET_TX_SOFTIRQ) and mm_page_alloc<br />

(BLOCK_SOFTIRQ) th<strong>at</strong> is higher priority than HRTimer softirq.<br />

Figure 5.12: One of the reasons of HRTimer l<strong>at</strong>ency<br />

Therefore, Figure 5.13 shows process of interrupt in timeline. When softriq with high<br />

priority is in progress, softriq with low priority cannot be executed. After the high priority<br />

softriq is finished, the low priority (HRTimer softriq) is executed.


<strong>Chapter</strong> 5<br />

Figure 5.13: Result of analysis of HRTimer l<strong>at</strong>ency<br />

Figure 5.14 is the kernel source after changing the priority of softriq which was the cause<br />

of HRTimer’s delay. We had an experiment after raising the priority of HRTimer’s softriq<br />

than network softirq and block softirq in interrupt.h.<br />

Figure 5.14: Kernel source of softirq modified HRTimer<br />

After changing the priority of softriq, we repe<strong>at</strong>ed experiment with 100μs in same<br />

environment as before. Figure 5.15 shows the d<strong>at</strong>a analyzed by using KAS after having<br />

repe<strong>at</strong>ed experiments. We measured the various methods th<strong>at</strong> are tested by real-time p<strong>at</strong>ched<br />

Linux and general-purpose Linux and unmodified softriq and modified softriq. According to<br />

72


73<br />

<strong>Chapter</strong> 5<br />

the d<strong>at</strong>a shows in Figure 5.15(a), there were 608 delays in Linux-rt-p<strong>at</strong>ched and 5,546 delays<br />

in general-purpose Linux. In Figure 5.15(b), the number of delays in Linux-rt-p<strong>at</strong>ched was<br />

1,607 and 1,620 in general-purpose Linux.<br />

Figure 5.15: Result of experiment on Linux-RT and general Linux in 100μs. (a) Linux-<br />

2.6.24-rt-p<strong>at</strong>ched and Linux-2.6.24-not_changed-softirq. (b) Linux-2.6.24-rt-p<strong>at</strong>ched and<br />

Linux-2.6.24-changed-softirq.<br />

Figure 5.16 shows the result of experiment on Linux-rt-p<strong>at</strong>ched and general-purpose Linux<br />

in 1ms. By looking <strong>at</strong> d<strong>at</strong>a in Figure 5.16(a), there were 229 delays in Linux-rt-p<strong>at</strong>ched and<br />

524 delays in general Linux. In Figure 5.15(b), d<strong>at</strong>a shows 289 delays in Linux-rt-p<strong>at</strong>ched<br />

and 225 delays in general-purpose Linux. By comparing delay time of HRTimer to 100μs and<br />

1ms, it is clear th<strong>at</strong> the incidence of delay in 1ms is much low. Also, the result of Figure<br />

5.16(b) shows th<strong>at</strong> the result of solving delay problem of HRTimer by KAS is lower than<br />

Figure 5.16(a). This is the result of experiment after modifying softriq, which is one of causes<br />

of l<strong>at</strong>ency.


<strong>Chapter</strong> 5<br />

Figure 5.16: Result of experiment on Linux-RT and general Linux in 1ms. (a) Linux-<br />

2.6.24-rt-p<strong>at</strong>ched and Linux-2.6.24-not_changed-softirq. (b) Linux-2.6.24-rt-p<strong>at</strong>ched and<br />

Linux-2.6.24-changed-softirq.<br />

Figure 5.17 shows the result of an experiment of HRTimer l<strong>at</strong>ency in Linux-2.6.24. And<br />

Figure 5.18 shows the result of HRTimer l<strong>at</strong>ency in changed softirq Linux-2.6.24 and not<br />

changed the environments of Figure 5.17. We can find out th<strong>at</strong> delay has decreased compare<br />

to general-purpose Linux kernel.<br />

Figure 5.17: Result of an experiment of HRTimer l<strong>at</strong>ency in Linux-2.6.24 with heavy<br />

background load<br />

74


75<br />

<strong>Chapter</strong> 5<br />

Figure 5.18: Result of an experiment of HRTimer l<strong>at</strong>ency in Linux-2.6.24-changed-softirq<br />

with heavy background load<br />

Figure 5.19: Result of experiment of HRTimer l<strong>at</strong>ency in Linux-2.6.24-rt-p<strong>at</strong>ched with heavy<br />

background load<br />

Also, Figure 5.19 shows the result of an experiment of HRTimer l<strong>at</strong>ency in Linux-2.6.24-rt-<br />

p<strong>at</strong>ched with heavy background load. This is the result in same environment as Figure 5.17<br />

after applying real-time p<strong>at</strong>ch to Linux-2.6.24.


<strong>Chapter</strong> 5<br />

5.6 Summary<br />

We analyzed l<strong>at</strong>ency of HRTimer by using KAS th<strong>at</strong> is not perfect solution for timer<br />

l<strong>at</strong>ency. There is not only one reason of l<strong>at</strong>ency in the kernel but also complex reasons of<br />

l<strong>at</strong>ency from the kernel such as the rel<strong>at</strong>ion between processes, dependency with hardware<br />

interrupt (hardirq) [10] [19] and software interrupt (softirq). Thus, there’re so many reason of<br />

l<strong>at</strong>ency it’s impossible to solve the problem by one perfect solution. After high resolution<br />

timer l<strong>at</strong>ency experiment and analyzing event log, the most common problem is the priority<br />

of softirq.<br />

KAS evalu<strong>at</strong>ed problem of HRTimer, currently it is not possible to analyze other problems.<br />

However, it proved by using KAS along with the analysis of kernel timer, which is one of the<br />

most important and difficult problem in kernel. If a developer and a system administr<strong>at</strong>or<br />

used the KAS th<strong>at</strong> it is possible to analyze timer problems quickly and efficiently.<br />

76


<strong>Chapter</strong> 6<br />

Conclusions and Future Work<br />

In this chapter, we describe a conclusion, and point out the flow of KAS, and describe future<br />

work.<br />

6.1 Conclusions<br />

Embedded systems are widely used in various fields. Especially an embedded Linux has<br />

many advantages in th<strong>at</strong> it includes many strong points’ of Linux. However, sometimes the<br />

embedded Linux confronts complex problems because an embedded system is becoming<br />

complex rapidly. Sometimes the problems occur after system has launched. Therefore, not<br />

only the developer of the system but also the user of it has a possibility to be damaged by<br />

unexpected problems.


<strong>Chapter</strong> 6<br />

In embedded systems, there are a lot of errors and bugs. Some errors and bugs are solved<br />

easily. However, there are a lot of complic<strong>at</strong>ed problems such as memory leak and timer<br />

l<strong>at</strong>ency. In addition, it is not easy to find where the problem has occurred and how a<br />

developer or system manager fixes the problems up so as to improve the performance of<br />

system.<br />

Every embedded kernel gener<strong>at</strong>es event inform<strong>at</strong>ion such as irq, system call, I/O, memory<br />

and network It is the event inform<strong>at</strong>ion th<strong>at</strong> is the one of the most useful solution to analyze<br />

the cause of a problem in embedded systems to improve the performance. However, there is<br />

another problem th<strong>at</strong> the event inform<strong>at</strong>ion is huge. Therefore, to analyze all the event<br />

inform<strong>at</strong>ion is not easy and not effective.<br />

We propose a new system architecture which analyzes the event log through the embedded<br />

kernel - KAS. The KAS finds out problems through the kernel exceedingly quickly, and<br />

separ<strong>at</strong>es them from the whole event inform<strong>at</strong>ion. And then, it starts to analyze them<br />

st<strong>at</strong>istically and provides a developer or system manager with the recommended solution. In a<br />

case study, we tested HRTimer’s l<strong>at</strong>ency, and got a result th<strong>at</strong> the cause of HRTimer’s l<strong>at</strong>ency<br />

is usually coming from softirq priority. However, only by solving the priority problem, the<br />

HRTimer’s l<strong>at</strong>ency problem is not able to be settled perfectly since sometimes it is caused by<br />

hardirq’s l<strong>at</strong>ency or l<strong>at</strong>ency <strong>at</strong> an interrupt locking. However, we examined how effective and<br />

quick to solve problems <strong>at</strong> an embedded kernel by using the KAS.<br />

6.2 Future Work<br />

In this section, we describe problems of the KAS, and explain how to improve it. Actually,<br />

the KAS does not support real-time analysis. In addition, it needs to improve to be connected<br />

78


an accounting system th<strong>at</strong> is able to control processes.<br />

6.2.1 Real-Time Architecture of KAS<br />

79<br />

<strong>Chapter</strong> 6<br />

KAS starts to analyze <strong>at</strong> the end of event log inform<strong>at</strong>ion. This can be a problem because it<br />

needs to save all the event log inform<strong>at</strong>ion first. In addition, problems can not be prevented in<br />

advance because KAS detects problems from the event log inform<strong>at</strong>ion. From these reasons,<br />

there are two main problems in KAS. First, it is not easy to analysis problem because the<br />

event log inform<strong>at</strong>ion is huge. Second, it can not be prevented in advance.<br />

To improve the KAS, we need to decrease of the event log inform<strong>at</strong>ion not by saving all the<br />

inform<strong>at</strong>ion but by saving only the part of the problem in real-time. This can be very helpful<br />

to decrease workload of a system and, save time for analyzing SL and AL <strong>at</strong> KAS.<br />

In Figure 6.1, KAS is the real-time architecture model of KAS. It checks problems through<br />

KAS daemon in real-time. If problems have detected, KAS saves only the part of the event<br />

log inform<strong>at</strong>ion which problems has occurred, and analyzes them by using SL and AL.<br />

6.2.2 KAS and CABI<br />

We tested the HRTimer l<strong>at</strong>ency <strong>at</strong> the case study. From the result of the case study, there are<br />

some processes th<strong>at</strong> occupied all of the CPU usage or timer l<strong>at</strong>ency in the Linux kernel.<br />

Therefore, if KAS finds out a process which occupies CPU usage too much, it sends the<br />

process id to an accounting system. The accounting system [50] can manage the CPU usage


<strong>Chapter</strong> 6<br />

Figure 6.1: Real-Time Architecture of KAS<br />

based on the process id to lower the timer l<strong>at</strong>ency in embedded systems. Currently, we have<br />

integr<strong>at</strong>ed KAS and the CABI system [51] [52] in the Linux kernel; however, we did not<br />

experiment to measure the timer l<strong>at</strong>ency using the integr<strong>at</strong>ed system.<br />

Figure 6.2: KAS and CABI<br />

80


81<br />

<strong>Chapter</strong> 6<br />

Figure 6.2 shows the integr<strong>at</strong>ed system with KAS and CABI. KAS finds out the<br />

process which causes the time l<strong>at</strong>ency, and sends process id to CABI. And then, the<br />

CABI manages the CPU usage.


Appendix<br />

A.1 RTOS<br />

Commonly, a goal of an Oper<strong>at</strong>ion System is to provide a convenient environment for a<br />

user to run programs. In other words, OS is a system program th<strong>at</strong> supports how to use<br />

computer systems easily and uses computer hardware efficiently. Therefore, OS is the core<br />

software to use computers and it plays a very important role to control hardware, software<br />

and d<strong>at</strong>a. Real-Time Oper<strong>at</strong>ion System (RTOS), which is one of OSes, can be defined in<br />

many means but it is an OS th<strong>at</strong> guarantees interrupts to be processed in a period time which<br />

can be suitable to real-time applic<strong>at</strong>ions such as embedded applic<strong>at</strong>ions. In embedded<br />

systems, OSes can be largely divided into a real-time OS and a non-real-time OS. VxWorks,<br />

pSOS, VRTX, QNX, OSE, Nucleus, and MC/OSII can be good examples of real-time OS<br />

among commercialized OS until now. All of these real-time OSes support preemptive<br />

multitasking, and POSIX API. In preemptive kernel, as each task has priority, high priority


83<br />

Appendix<br />

task execute more than low priority tasks. Of course there are kernel mode and user mode in<br />

real-time OS as other OSes. Also, by providing the integr<strong>at</strong>ed development environment<br />

(IDE) and debugging tool, it makes it possible for developers to develop software easily.<br />

However, the problem is th<strong>at</strong> as real-time OSes pay royalty, it increasing development cost of<br />

system and increased product cost.<br />

Following is characteristics of the RTOS.<br />

� Support multithread and preempt mode.<br />

� Guarantee priority each process.<br />

� Support synchroniz<strong>at</strong>ion among threads.<br />

� OS must be running clearly (processing time of interrupt l<strong>at</strong>ency time and system<br />

call, time of OS and driver to mask interrupt).<br />

Also, there is deadline for real-time and according to time constraints it can be divided into<br />

three types.<br />

� Hard real-time system: hardware or software th<strong>at</strong> must oper<strong>at</strong>e within the confines of<br />

a stringent deadline. If deadline is missed, it occur the cost loss and the damage to<br />

users.<br />

� Soft real-time system: Failure to meet a deadline is considered neither in applic<strong>at</strong>ion<br />

nor system failure. The system can toler<strong>at</strong>e some occasional deadline misses.<br />

Due to system`s characteristic, real-time system have constraints for H/W and S/W. In case<br />

of hardware, to provide reliability, fault tolerance and scalability must be used. Also, for


Appendix<br />

software, real-time OS must be composed with the real-time task scheduling, task<br />

synchroniz<strong>at</strong>ion, interrupt priority and real-time clock according to purpose of pl<strong>at</strong>form.<br />

A.2 RT-Linux<br />

RT-Linux, which added real-time characteristic to the general-purpose Linux, was started<br />

by Victor Yodaiken from New Mexico Tech. As the Linux kernel has low real-time<br />

characteristic, real-time kernel was made to use the real-time applic<strong>at</strong>ions. Thus, Linux<br />

kernel was not changed and real-time module was p<strong>at</strong>ched to the Linux kernel. By doing this,<br />

it is possible to execute real-time applic<strong>at</strong>ions with to modify minimum sources. After<br />

p<strong>at</strong>ching real-time module, the processes of Linux kernel is assigned to the lower priority<br />

then the priority of real-time applic<strong>at</strong>ion.<br />

A scheduling method of general-purpose Linux was executed by time-slice. If there is<br />

higher priority task then current task than current task will not be stopped. The high priority<br />

task waits until it receives time-slice. For this reason, general-purpose Linux has low real-<br />

time characteristic. However, RT-Linux has better real-time characteristic compare to the<br />

Linux kernel. The RT-Linux supports soft real-time and hard real-time. However, in case of<br />

hard real-time, RT-Linux does not supports real-rime perfectly.<br />

A.3 Real-time scheduling<br />

Real-time system means the system time takes while managing d<strong>at</strong>a delivered from the<br />

sensor and delivering the result to actu<strong>at</strong>or. If CPU processes one task, scheduling becomes<br />

very simple. Only performance of CPU will give effect to deadline. As CPU processes<br />

84


85<br />

Appendix<br />

various tasks with many characteristics in same time, scheduling problem became<br />

complic<strong>at</strong>ed. The purpose of scheduling is to prevent real-time task close to deadline due to<br />

task with less real-time characteristic. According to problem of real-time scheduling,<br />

hundreds of algorithms were developed according to characteristic of task as seen in Figure<br />

A.1.<br />

Figure A.1: Classific<strong>at</strong>ion of real-time scheduling algorithm<br />

In real-time scheduling, task is divided into periodic task, aperiodic task, and sporadic task<br />

according to time characteristics. Periodic task means process repe<strong>at</strong>ed in decided cycle.<br />

Aperiodic task is a task with no time characteristic. Task th<strong>at</strong> outputs st<strong>at</strong>us inform<strong>at</strong>ion<br />

according to order of administr<strong>at</strong>or can be an example of aperiodic task. Sporadic task means<br />

task th<strong>at</strong> has certain time characteristic like periodic task but it is not known when it would<br />

be processed. Also, according to seriousness of not keeping deadline, it can be divided into


Appendix<br />

critical task and non-critical task. Moreover, according to whether CPU is conceded or not<br />

when high priority task arrives, it can be divided in to preemptive task and non-preemptive<br />

task. Common task concedes CPU to high priority task, which is preemptive task. However,<br />

when task is processing critical section, it cannot be prior occupied.<br />

Methods of giving priority to task are st<strong>at</strong>ic priority method and dynamic priority method.<br />

St<strong>at</strong>ic priority is a method th<strong>at</strong> never changes priority th<strong>at</strong> was assigned from scheduler and<br />

dynamic priority means method th<strong>at</strong> changes dynamically according to process th<strong>at</strong> will be<br />

occurred. The dynamic priority method can be scheduled more effectively more than st<strong>at</strong>ic<br />

priority. The dynamic priority algorithm mostly used in commercial real-time OS, such as<br />

VxWorks or PSOS, are fixed priority scheduling and Earliest Deadline First (EDF) method.<br />

Both of algorithms are preemptive scheduling method and they are algorithm th<strong>at</strong> assumes<br />

single processor.<br />

Firstly, as shown in the name, fixed priority scheduling is a method th<strong>at</strong> gives fixed priority<br />

to every task. It was a problem about how to grant fixed priority. In 1973, Liu and Laryland<br />

proposed algorithm called r<strong>at</strong>e-monotonic scheduling. This method proposed to grant high<br />

priority to task with short period (premise: every task is periodic task and deadline of task is<br />

equal to period of task). Due to simplicity of r<strong>at</strong>e-monotonic scheduling and its m<strong>at</strong>hem<strong>at</strong>ical<br />

characteristic which makes it possible to schedule anytime Utiliz<strong>at</strong>ion Factor (UF) is under<br />

0.67, it was often used for these few decades.<br />

EDF is an algorithm th<strong>at</strong> grants higher priority to task th<strong>at</strong> is shorter to deadline. For this<br />

reason, it is also called as deadline driven algorithm. This algorithm can be used to schedule<br />

not only periodic task but also all other tasks on single processor architecture. It was proved<br />

th<strong>at</strong> it is most optimal m<strong>at</strong>hem<strong>at</strong>ically.<br />

86


A.4 CABI<br />

87<br />

Appendix<br />

CABI is the system which manages CPU resource in Linux. Linux does not restrict the<br />

resource consumption for their processes. For example, when malicious applic<strong>at</strong>ion programs<br />

are downloaded and executed, the programs may consume a large amount of the CPU<br />

capacity easily. For the multimedia applic<strong>at</strong>ions, more fine grain and CPU reserv<strong>at</strong>ion control<br />

is needed. These requirements such as CPU QoS are increasing even in the Embedded<br />

System area. To solve this problem, CABI (CPU Accounting and Blocking Interfaces,<br />

currently th<strong>at</strong> change the name to Common resource Accounting and Blocking Interfaces)<br />

[50] [51] proposed, a general-purpose resource monitoring and restriction system th<strong>at</strong><br />

prevents the excessive use of the resource capacity of a process or a group of processes. The<br />

CABI implemented in the Linux kernel.<br />

CABI was designed by the consider<strong>at</strong>ion of the following three issues [51]<br />

� Simplicity<br />

CABI should be simple and generic to be used in a variety of OS services such as<br />

security enhancement, class-based accounting, overload monitoring, and processor<br />

reserv<strong>at</strong>ion.<br />

� Accuracy<br />

CABI should monitor the CPU capacity of each process very accur<strong>at</strong>ely for making<br />

the execution of applic<strong>at</strong>ion more stable. A fine-grained timer is used to realize the<br />

accur<strong>at</strong>e monitoring.


Appendix<br />

� Portability<br />

CABI should be implemented in a variety of oper<strong>at</strong>ing systems. The system confines<br />

the interface to a few hooks in the host kernel.<br />

Figure A.2: Control the consumptions of the resources by CABI<br />

Figure A.2 is the example of CABI controlling the process group. It puts together all the<br />

rel<strong>at</strong>ed process and control utiliz<strong>at</strong>ion of CPU for each object. If CPU utiliz<strong>at</strong>ion of Audio &<br />

Video applic<strong>at</strong>ion is set by 60%, only 60% of the process of MPEG, Mailer and Browser<br />

included in Audio & Video applic<strong>at</strong>ion’s object can be used in the whole CPU utiliz<strong>at</strong>ion. The<br />

CABI is a resource monitoring and restriction system th<strong>at</strong> has the purpose of improving the<br />

system’s reliability and security. The system is a very generic to offer various services, such<br />

as security improvement, overload control, and class-based accounting, th<strong>at</strong> require CPU<br />

resource control [50].<br />

88


Bibliography


Bibliography<br />

[1] L. Abeni, A. Goel, C. Krasic, J. Snow, and J. Walpole, “A measurement-<br />

based analysis of the real-time performance of the Linux kernel”, In Real-<br />

Time Technology and Applic<strong>at</strong>ions Symposium (RTAS 2002), Sept. 2002.<br />

[2] Tim Bird, “Learning the kernel and finding performance problems with<br />

KFI”, In CELF Intern<strong>at</strong>ional Technical Conference, 2005.<br />

[3] M<strong>at</strong>hieu Desnoyers and Michel R. Dagenais, “The lttng tracer: A low<br />

impact performance and behavior monitor for gnu/Linux”, In OLS (Ottawa<br />

Linux Symposium) 2006, July.<br />

[4] Robert W. Wisniewski, Reza Azimi, M<strong>at</strong>hieu Desnoyers, Maged M,<br />

Michael, Jose Moreira, Doron Shiloach, and Livio Soares, “Experiences<br />

Understanding Performance in a Commercial Scale-Out Environment”, 13th<br />

Intern<strong>at</strong>ional Euro-Par Conference, 2007.<br />

[5] LTTng project, http://ltt.polymtl.ca/.<br />

[6] G.Anzinger,. http://high-res-timers.sourceforge.net/, High resolution timers<br />

project.<br />

[7] Yaghmour K. and Dagenais M. R., “Measuring and characterizing system<br />

behavior using kernel-level event logging”, In Proceedings of the Annual<br />

Technical Conference on USENIX Annual Technical Conference, 13_26,<br />

2000.<br />

[8] System Director Mevalet, http://www.nec.co.jp/cced/mevalet.<br />

[9] M<strong>at</strong>hieu Desnoyers and Michel Dagenais, “Low disturbance embedded<br />

system tracing with Linux Trace Toolkit Next Gener<strong>at</strong>ion”, In ELC<br />

(Embedded Linux Conference) 2006.<br />

[10] Martin Bligh, M<strong>at</strong>hieu Desnoyers and Rebecca Schultz, “Linux Kernel<br />

Debugging on Google-sized clusters”, Proceedings of the Linux<br />

Symposium June, 2007, Ottawa, Ontario in Canada.<br />

90


[11] Linux Kernel St<strong>at</strong>e Tracer, http://lkst.sourceforge.net .<br />

91<br />

Bibliography<br />

[12] Debugging with D<strong>at</strong>a Display Debugger, User Guide and Reference<br />

Manual First Edition, for DDD Version 3.3.9. 15 January, 2004.An<br />

Introduction to the Real-time OS \& Nucleus PLUS Training Guide.<br />

Acceler<strong>at</strong>ed Technology Inc.<br />

[13] Kernel Function Trace, http://eLinux.org/Kernel\Function \Trace.<br />

[14] Nucleus, “An Introduction to the Real-time OS & Nucleus PLUS<br />

Training Guide”, Acceler<strong>at</strong>ed Technology.<br />

[15] Yu-Chung and K.-J Lin, “Enhancing the Real-Time Capability of the<br />

Linux Kernel”, In Proceedings of the IEEE Real Time Computing<br />

Systems and Applic<strong>at</strong>ions, Hiroshima, Japan, October 1998.<br />

[16] Mark Wilding and Dan Behman, “Self-Linux Mastering -The Art of<br />

Problem Determin<strong>at</strong>ion”.<br />

[17] Ki Duk Kwon, Joon Mo Jung and Sang Hong Kwon. “A Dynamic<br />

Voltage Scaling Algorithm for Aperiodic Tasks”, The Korea Academia-<br />

Industrial cooper<strong>at</strong>ion Society (KAIS), vol.7 no.5 P.866-874, October<br />

2006.<br />

[18] Bryan M. Cantrill, Michael W.Shapiro, and Adam H.Leventhal.<br />

“Dtrace: Dynamic instrument<strong>at</strong>ion of production system”, In USENIX04,<br />

2004.<br />

[19] SystemTAP: Vara Prasad, William Cohen, Frank Ch. Eigler, Martin<br />

Hunt, Jim Keniston, and Brad Chen. “Loc<strong>at</strong>ing system problems using<br />

dynamic instrument<strong>at</strong>ion”, In OLS05 (Ottawa Linux Symposium) , 2005.<br />

[20] M<strong>at</strong>hieu Desnoyers and Michel R. Dagenais, “Deploying LTTng on<br />

Exotic Embedded Architectures”, Embedded Linux Conference 2009.


Bibliography<br />

[21] C. Yuan, N. Lao, J.-R. Wen, J. Li, Z. Zhang, Y.-M. Wang, W.-Y. Ma,<br />

"Autom<strong>at</strong>ed Known Problem Diagnosis with Event Traces", Microsoft<br />

Research Technical Report MSR-TR-2005-81, Jun. 2005.<br />

[22] Stephen Atkins, IBM pSeries Technical Support, IBM Software Group,<br />

http://www.ibm.com/deve-loperworks/aix/ library/au-nmon_analyser/.<br />

[23] Lauterbach, “Integr<strong>at</strong>ed Run and Stop Mode Debugging for Embedded<br />

System”, Embedded System Conference 2007.<br />

[24] The Linux Advantage (Join the Linux revolution), Sage Software, Inc.<br />

[25] IA-PC HPET (High Precision Event Timer) Specific<strong>at</strong>ion, Intel<br />

Corpor<strong>at</strong>ion, 2004.<br />

[26] Embedded World, http://www.embeddedworld.co.kr/english/.<br />

[27] Luis Henriques. “Threaded IRQs on Linux PREEMPT-RT”, In<br />

Intern<strong>at</strong>ional Workshop on Oper<strong>at</strong>ing Systems Pl<strong>at</strong>forms for Embedded<br />

Real-Time Applic<strong>at</strong>ions. Pages 23-32. Dublin, Ireland June 2009.<br />

[28] T. Gleixner and D. Niehaus, “Hrtimers and beyond: Transforming the<br />

Linux timer subsystem”, in Proc. Linux Symposium, Ottawa, Ontario,<br />

Canada, July 2006.<br />

[29] B. Srinivasan, S. P<strong>at</strong>her, R. Hill, F. Ansari, and D. Niehaus. “A firm<br />

real-time system implement<strong>at</strong>ion using commercial off-the shelf<br />

hardware and free software”, In 4 th Real-Time Technology and<br />

Applic<strong>at</strong>ions Symposium, Denver, June 1998.<br />

[30] M<strong>at</strong>hieu Desnoyers and M. R. Dagenais, “Tracing for hardware, driver,<br />

and binary reverse engineering in Linux” , CodeBreaks Journal, vol. 1,<br />

no. 1, 2007.<br />

[31] Gabriel M<strong>at</strong>ni and M. Dagenais, “Autom<strong>at</strong>a-based approach for kernel<br />

trace analysis”, in Proceedings of the 22nd IEEE Canadian Conference<br />

92


93<br />

Bibliography<br />

on Electrical and Computer Engineering, (St. John's, Newfoundland,<br />

Canada), May 2009.<br />

[32] M<strong>at</strong>hieu Desnoyers and M. Dagenais, “LTTng, filling the gap between<br />

kernel instrument<strong>at</strong>ion and a widely usable kernel tracer”, in Proceedings<br />

of the 3rd Annual Linux Found<strong>at</strong>ion Collabor<strong>at</strong>ion Summit, (San<br />

Francisco, California), April 2009.<br />

[33] Jean-Hughes Deschenes, M<strong>at</strong>hieu Desnoyers, and M. Dagenais,<br />

“Tracing Time Oper<strong>at</strong>ing System St<strong>at</strong>e Determin<strong>at</strong>ion”, The Open<br />

Software Engineering Journal, vol. 2, pp. 40-44, 2008.<br />

[34] Eric Clement and M. Dagenais, “Traces synchroniz<strong>at</strong>ion in distributed<br />

networks”, Journal of Computer Systems, Networks, and<br />

Communic<strong>at</strong>ions, vol. 2009, 2009.<br />

[35] M. Dagenais, R. Moore, R. Wisniewski, K. Yaghmour, and T. Zanussi,<br />

“Efficient and accur<strong>at</strong>e tracing of events in Linux clusters”, in<br />

Proceedings of the 2003 High Performance Computing Systems and<br />

Applic<strong>at</strong>ions & OSCAR Symposium, (Sherbrooke, Quebec Canada), pp.<br />

291-294, May 2003.<br />

[36] Ki Duk Kwon, Midori Sugaya and T<strong>at</strong>suo Nakajima. “KTAS: Analysis<br />

of Timer L<strong>at</strong>ency for Embedded Linux Kernel”, Intern<strong>at</strong>ional Journal of<br />

Advanced Science and Technology (IJAST), vol.18, May, 2010.<br />

[37] E. Merlo, M. Dagenais, P. Bachand, J. S. Sormani, S. Gradara, and G.<br />

Antoniol, “Investig<strong>at</strong>ing large software system evolution: The Linux<br />

kernel”, in Proceedings of the 26th Annual Intern<strong>at</strong>ional Computer<br />

Software and Applic<strong>at</strong>ions Conference, (Oxford, England), pp. 421-426,<br />

August 2000.


Bibliography<br />

[38] Magdalena Balazinska, E. Merlo, M. R. Dagenais, B. Laguë, and K.<br />

Kontogiannis, “Advanced clone analysis to support object-oriented<br />

system refactoring”, in Proceedings of the 7th Working Conference on<br />

Reverse Engineering, (Brisbane, Australia), November 2000.<br />

[39] Karim Yaghmour and M. R. Dagenais, “The Linux Trace Toolkit”, in<br />

Actes de la conférence Linux Expo, (Montreal, Quebec, Canada), April<br />

2000.<br />

[40] Y. Blaquière, M. Dagenais, and Y. Savaria, “A new accur<strong>at</strong>e and<br />

hierarchical timing analysis approach”, in Proceedings of the IEEE<br />

European Design Autom<strong>at</strong>ion Conference, February 1993.<br />

[41] Kwon Ki Duk, Sugaya Midori, Ohno Yuuki and Nakajima T<strong>at</strong>suo.<br />

“Performance analysis of inform<strong>at</strong>ion explosion by using LTTng”,<br />

Inform<strong>at</strong>ion Processing Society of Japan (IPSJ), 5-299, March 2008.<br />

[42] Ohno Yuuki, Sugaya Midori and Kwon Ki-Duk. “Performance<br />

analysis of distributed applic<strong>at</strong>ions in the inform<strong>at</strong>ion explosion era”,<br />

Inform<strong>at</strong>ion Processing Society of Japan (IPSJ), 5-147, March 2008.<br />

[43] Kiduk Kwon, Midori Sugaya, T<strong>at</strong>suo Nakajima. "Analysis of High<br />

Resolution Timer L<strong>at</strong>ency Using Kernel Analysis System in Embedded<br />

System”, 12th IEEE Symposium on Object/component/service-oriented<br />

Real-time distributed Computing Co-loc<strong>at</strong>ed with First Intern<strong>at</strong>ional<br />

Workshop on Software Technologies for Future Dependable Distributed<br />

Systems(STFSSD 2009), pp.122-126, March 2009.<br />

[44] Martin Schulz, Brian S. White, Sally A. McKee, Hsien-Hsin Lee, and<br />

Jürgen Jeitner, “Owl: Next Gener<strong>at</strong>ion System Monitoring”, In<br />

Proceedings of Computing Frontiers (CF'05) , Ischia, IT, May 2005.<br />

94


95<br />

Bibliography<br />

[45] R. Vaarandi, “Tools and Techniques for Event Log Analysis”, PhD<br />

Thesis, Tallinn <strong>University</strong> of Technology, 2005.<br />

[46] J. E. Prewett. “Analyzing cluster log files using logsurfer”, In Proc.<br />

Annual Conf. on Linux Clusters,2003.<br />

[47] Sw<strong>at</strong>ch, http://sial.org/howto/logging/sw<strong>at</strong>ch.<br />

[48] Syslog, http://www.loganalysis.org.<br />

[49] Nicholas Mc Guire, “Kernel Function Instrument<strong>at</strong>ion – KFT”,<br />

Distributed & Embedded System Lab, Lanzhou <strong>University</strong>, December 31,<br />

2006.<br />

[50] Midori Sugaya, Shuichi Oikawa, T<strong>at</strong>suo Nakajima, “Accounting<br />

System: A fine-grained CPU resource protection mechanism for<br />

Embedded System”, IEEE Intern<strong>at</strong>ional Symposium on Object-oriented<br />

Real-time distributed Computing (ISORC) 2006: 72-84.<br />

[51] CABI, http://osrg.dcl.info.waseda.ac.jp/~doly/cabi/.<br />

[52] Midori sugaya, Yuki Ohno, Andrej van der zee, T<strong>at</strong>suo Nakajim, “A<br />

Lightweight Anomaly Detection System for Inform<strong>at</strong>ion Appliances”,<br />

IEEE Intern<strong>at</strong>ional Symposium on Object-oriented Real-time distributed<br />

Computing (ISORC) 2009<br />

[53] Android, http://developer.android.com/sdk/index.html.<br />

[54] iOS4, http://developer.apple.com/technologies/iphone/wh<strong>at</strong>s-new.html<br />

[55] Kirk Glerum, Kinshuman Kinshumann, Steve Greenberg, Gabriel Aul,<br />

Vince Orgovan, Greg Nichols, David Grant, Gretchen Loihle, and Galen<br />

Hunt, “Debugging in the (Very) Large: Ten Years of Implement<strong>at</strong>ion and<br />

Experience”, Proceedings of the 22nd ACM Symposium on Oper<strong>at</strong>ing<br />

Systems Principles (SOSP '09).


Bibliography<br />

[56] Balakrishnan, S.; Ravi Rajwar; Upton, M.; Lai, K., “The impact of<br />

performance asymmetry in emerging multicore architectures”, Computer<br />

Architecture, 2005. ISCA '05 Proceedings 32 nd Intern<strong>at</strong>ional Symposium,<br />

vol.no.pp. 506- 517, 4-8 June 2005.<br />

[57] Min Hong Yun, Woo Sik Kim, Jae Ho Lee, Do Hyoung Kim, Sun Ja<br />

Kim, “Embedded Linux Solution for Smartphone System”, Electronics<br />

and Telecommunic<strong>at</strong>ions Research Institute (ETRI), vol 21, no 1,<br />

February 2006.<br />

[58] Robert W. Wisniewski , Peter F. Sweeney , Kartik Sudeep ,<br />

M<strong>at</strong>thias Hauswirth, “Performance and environment monitoring for<br />

whole-system characteriz<strong>at</strong>ion and optimiz<strong>at</strong>ion”, 2004.<br />

[59] M<strong>at</strong>hieu Desnoyers and M. Dagenais, “LTTng: Tracing across<br />

execution layers, from the hypervisor to user-space”, in Proceedings of<br />

the 2008 Linux Symposium, (Ottawa, Canada), July 2008.<br />

[60] Tokuda, H., Kotera, M, “A Real-Time Tool Set for the ARTS Kernel”,<br />

Proceedings of 9th IEEE Real-Time Systems Symposium, Dec., 1988 .<br />

[61] Clifford W. Mercer and Ragun<strong>at</strong>han Rajkumar, “An Interactive<br />

Interface and RT-Mach Support for Monitoring and Controlling<br />

Resource Management”, In Proceedings of the Real-Time Technology<br />

and Applic<strong>at</strong>ions Symposium, May 1995.<br />

[62] Edward A. Lee, “Cyber-Physical Systems – Are Computing<br />

Found<strong>at</strong>ions Adequ<strong>at</strong>e?”, NSF Workshop On Cyber-Physical Systems:<br />

Research Motiv<strong>at</strong>ion, Techniques and Roadmap, October 16-17, 2006.<br />

[63] Venjamin Poirier, R. Roy, and M. Dagenais, “Unified kernel and user<br />

space distributed tracing for message passing analysis”, in Proceedings of<br />

96


97<br />

Bibliography<br />

the First Intern<strong>at</strong>ional Conference on Parallel, Distributed and Grid<br />

Computing for Engineering, (Pecs, Hungary), April 2009.<br />

[64] HP, “HP OpenView Oper<strong>at</strong>ions for Windows Troubleshooting Guide”,<br />

February 2004.<br />

[65] IBM Corpor<strong>at</strong>ion Software Group, “IBM Tivoli Risk Manager”, 2004.<br />

[66] Ethan Galstad , “Nagios® Version 2.x Document<strong>at</strong>ion”, November<br />

2006.<br />

[67] Aiko Pras, João Paulo Almeida, Yohannes Albertino Ramlie, “An<br />

Overview - NTOP – Network TOP”, <strong>University</strong> of Twente in<br />

Netherlands, June 2000.<br />

[68] http://htop.sourceforge.net/ .<br />

[69] http://sourceforge.net/projects/strace/.<br />

[70] http://www.cs.utah.edu/dept/old/texinfo/as/gprof_toc.html.<br />

[71] http://www.jffnms.org/.<br />

[72] http://www.ethereal.com/.


Acknowledgements<br />

I would firstly like to thank all the members in Distributed Computing & Ubiquitous<br />

Labor<strong>at</strong>ory of <strong>Waseda</strong> <strong>University</strong>. It was truly helpful for me to have several seminars and<br />

discussions with members in the labor<strong>at</strong>ory during PhD period. Especially, I think I would<br />

have gone through hard time proceeding my research without Professor Nakajima’s help. I<br />

would like to say special thanks to professor for all the best answers to my questions and<br />

helps while I write this dissert<strong>at</strong>ion. Also, I want to tank Midori Sugaya. Thank you for<br />

telling good opinion when writing thesis or doing experiment.<br />

I would like to send my thanks to Alexandre Courbot and Yuki Ohno who gave me lots of<br />

help when writing the thesis. Also, I want to thank Mr. Yong-Gu Kang who gave personal<br />

help while writing PhD thesis.<br />

I would like to thank all of gre<strong>at</strong> friends I have during my stay <strong>at</strong> <strong>Waseda</strong> <strong>University</strong>. And I<br />

would also like to thank my parents, my wife and our lovely daughter for their moral support.<br />

Finally I would like to express gre<strong>at</strong> thanks to the Hasekawa scholarship found<strong>at</strong>ion, which<br />

sponsored my PhD study from April 2007 to March 2010.<br />

98


Public<strong>at</strong>ion List


Public<strong>at</strong>ion List<br />

種 類 別 題名、 発表・発行掲載誌名、 発表・発行年月、 連名者(申請者含む)<br />

論文誌<br />

国際会議<br />

1. Ki Duk Kwon, Midori Sugaya and T<strong>at</strong>suo Nakajima. “KTAS: Analysis of<br />

Timer L<strong>at</strong>ency for Embedded Linux Kernel”, Intern<strong>at</strong>ional Journal of<br />

Advanced Science and Technology (IJAST), vol.19, June, 2010<br />

1. Kiduk Kwon, Midori Sugaya, T<strong>at</strong>suo Nakajima. "Analysis of Embedded<br />

Linux Using Kernel Analysis System," The 6th IEEE Intern<strong>at</strong>ional<br />

Conferences on Embedded Software and Systems (ICESS), pp.417-422, May<br />

2009<br />

2. Kiduk Kwon, Midori Sugaya, T<strong>at</strong>suo Nakajima. "Analysis of High<br />

Resolution Timer L<strong>at</strong>ency Using Kernel Analysis System in Embedded<br />

System”, 12th IEEE Symposium on Object/component/service-oriented Realtime<br />

distributed Computing Co-loc<strong>at</strong>ed with First Intern<strong>at</strong>ional Workshop on<br />

Software Technologies for Future Dependable Distributed Systems (STFSSD<br />

2009),pp.122-126, March 2009.<br />

3. T<strong>at</strong>suo Nakajima, Hiroo Ishikawa, Yuki Kinebuchi, Midori Sugaya, Lei Sun,<br />

Alexandre Courbot, Andrej van der Zee, Aleksi Aalto, and Kwon Ki Duk. “An<br />

Oper<strong>at</strong>ing System Architecture for Future Inform<strong>at</strong>ion Appliances. ”The 6th<br />

IFIP WG 10.2 Intern<strong>at</strong>ional Workshop, SEUS 2008, October 2008. Lecture<br />

Notes in Computer Science (LNCS), Vol. 5287 / 2008, pp. 292-303.<br />

100


101<br />

Public<strong>at</strong>ion List<br />

種 類 別 題名、 発表・発行掲載誌名、 発表・発行年月、 連名者(申請者含む)<br />

国内会議<br />

著書<br />

1. Kwon Ki Duk, Sugaya Midori, Ohno Yuuki and Nakajima T<strong>at</strong>suo.<br />

“Performance analysis of inform<strong>at</strong>ion explosion by using LTTng”,<br />

Inform<strong>at</strong>ion Processing Society of Japan (IPSJ), 5-299, March 2008.<br />

2. Ohno Yuuki, Sugaya Midori and Kwon Ki-Duk. “Performance analysis of<br />

distributed applic<strong>at</strong>ions in tahe inform<strong>at</strong>ion explosion era”, Inform<strong>at</strong>ion<br />

Processing Society of Japan (IPSJ), 5-147, March 2008.<br />

3. Ohno Yuuki, Sugaya Midori and Kwon Ki-Duk, Nakajima T<strong>at</strong>suo “リソー<br />

スモニタリングによる異常検出システム”, The 6th Dependability<br />

System Workshop (DSW’08 Summer), pp.71-76, 2008.<br />

1. Ki-Duk Kwon, Je-Jung Yu, Bong-Kyu Seo, “Brew Mobile Programming ”,<br />

YoungJin publish company, 08, 2003.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!