The CMS Trigger Supervisor: - HEPHY
The CMS Trigger Supervisor: - HEPHY
The CMS Trigger Supervisor: - HEPHY
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Tesis Doctoral<br />
Departament d’Enginyeria Electrònica<br />
Universitat Autònoma de Barcelona<br />
<strong>The</strong> <strong>CMS</strong> <strong>Trigger</strong> <strong>Supervisor</strong>:<br />
Control and Hardware Monitoring System of the <strong>CMS</strong><br />
Level-1 <strong>Trigger</strong> at CERN<br />
Ildefons Magrans de Abril<br />
Directora:<br />
Dra. Claudia-Elisabeth Wulz<br />
Tutora:<br />
Dra. Montserrat Nafría Maqueda<br />
March 2008
Dr. Claudia-Elisabeth Wulz, <strong>CMS</strong>-<strong>Trigger</strong> Group leader of the Institute for High Energy Physics in Vienna, and<br />
Deputy <strong>CMS</strong> <strong>Trigger</strong> Project Manager<br />
CERTIFIES<br />
That the dissertation <strong>The</strong> <strong>CMS</strong> <strong>Trigger</strong> <strong>Supervisor</strong>: Control and Hardware Monitoring System of the <strong>CMS</strong> Level-<br />
1 <strong>Trigger</strong> at CERN, presented by Ildefons Magrans de Abril to fulfil the degree of Doctor en Enginyeria<br />
Electrònica, has been performed under her supervision.<br />
Bellaterra, March de 2008.<br />
Dra. Claudia-Elisabeth Wulz
Abstract<br />
<strong>The</strong> experiments <strong>CMS</strong> (Compact Muon Solenoid) and ATLAS (A Toroidal LHC ApparatuS) at the Large<br />
Hadron Collider (LHC) are the greatest exponents of the rising complexity in High Energy Physics (HEP) data<br />
handling instrumentation. Tens of millions of readout channels, tens of thousands of hardware boards and the<br />
same order of connections are figures of merit. However, the hardware volume is not the only complexity<br />
dimension, the unprecedented large number of research institutes and scientists that form the international<br />
collaborations, and the long design, development, commissioning and operational phases are additional factors<br />
that must be taken into account.<br />
<strong>The</strong> Level-1 (L1) trigger decision loop is an excellent example of these difficulties. This system is based on a<br />
pipelined logic destined to analyze without deadtime the data from each LHC bunch crossing occurring every<br />
25_ns, using special coarsely segmented trigger data from the detectors. <strong>The</strong> L1 trigger is responsible for<br />
reducing the rate of accepted crossings to below 100 kHz. While the L1 trigger is taking its decision the full<br />
high-precision data of all detector channels are stored in the detector front-end buffers, which are only read out if<br />
the event is accepted. <strong>The</strong> Level-1 Accept (L1A) decision is communicated to the sub-detectors through the<br />
Timing, <strong>Trigger</strong> and Control (TTC) system. <strong>The</strong> L1 decision loop hardware system was built by more than ten<br />
research institutes with a development and construction period of nearly ten years, featuring more than fifty<br />
VME crates, and thousands of boards and connections.<br />
In this context, it is mandatory to provide software tools that ease integration and the short, medium and long<br />
term operation of the experiment. This research work proposes solutions, based on web services technologies, to<br />
simplify the implementation and operation of software control systems to manage hardware devices for HEP<br />
experiments. <strong>The</strong> main contribution of this work is the design and development of a hardware management<br />
system intended to enable the operation and integration of the L1 decision loop of the <strong>CMS</strong> experiment (<strong>CMS</strong><br />
<strong>Trigger</strong> <strong>Supervisor</strong>, TS).<br />
<strong>The</strong> TS conceptual design proposes a hierarchical distributed system which fits the web services based model of<br />
the <strong>CMS</strong> Online SoftWare Infrastructure (OSWI) well. <strong>The</strong> functional scope of this system covers the<br />
configuration, testing and monitoring of the L1 decision loop hardware, and its interaction with the overall <strong>CMS</strong><br />
experiment control system and the rest of the experiment. Together with the technical design aspects, the project<br />
organization strategy is discussed.<br />
<strong>The</strong> main topic follows an initial investigation about the usage of the eXtended Markup Language (XML) as<br />
uniform data representation format for a software environment to implement hardware management systems for<br />
HEP experiments. This model extends the usage of XML beyond the boundaries of the control and monitoring<br />
related data and proposes its usage also for the code. This effort, carried out in the context of the <strong>CMS</strong> <strong>Trigger</strong><br />
and Data Acquisition project, improved the overall team knowledge on XML technologies, created a pool of<br />
ideas and helped to anticipate the main TS requirements and architectural concepts.<br />
i
Visual summary<br />
<strong>The</strong> following diagram presents a visual summary of the PhD thesis. It consists of text boxes summarizing the<br />
main ideas and labeled arrows connecting them. <strong>The</strong> author’s contribution to peer reviewed journals (p),<br />
international conferences (c) and supervised master theses (t) are also indicated next to each text box.<br />
Motivation<br />
Chapter 1<br />
Unprecedented complexity related to the implementation of hardware control system for the<br />
last generation of high energy physics experiments. Very large hardware systems, human<br />
collaborations, and design, development and operational periods.<br />
Chapter 2<br />
Chapters 1, 3<br />
Generic solution<br />
[39]p<br />
Development model: [50]p<br />
•XML for data and code.<br />
•Interpreted code.<br />
First lessons<br />
Concrete case and main thesis goal<br />
Control and monitoring system for the<br />
Level-1 (L1) trigger decision loop.<br />
Chapter 2<br />
Chapter 3<br />
•Web services and XDAQ middleware as<br />
suitable technologies.<br />
•Experience of developing a hardware<br />
management system for the <strong>CMS</strong> experiment.<br />
Experience<br />
Requirements<br />
[56]p<br />
Conceptual design of control system for the <strong>CMS</strong> L1 decision loop (<strong>Trigger</strong> <strong>Supervisor</strong>, TS): [60]c<br />
•Requirements.<br />
•Project organization.<br />
•Layered design: Framework, System, Services.<br />
Chapter 4<br />
Framework design<br />
•Baseline technology survey.<br />
•Additional developments.<br />
[89]c<br />
[90]t<br />
•Performance measurements.<br />
System design<br />
Chapter 5<br />
•Design guidelines.<br />
•Distributed software system architecture.<br />
[95]c<br />
Services design<br />
Chapter 6<br />
Chapters 7, 8<br />
<strong>The</strong>sis achievements<br />
•Configuration, interconnection test<br />
and GUI services.<br />
•New software environment model: confirms XML and XDAQ.<br />
•TS design and project organization as a successful experience for future experiments.<br />
•A building block of the <strong>CMS</strong> experiment.<br />
•A contribution to the <strong>CMS</strong> operation.<br />
•Proposal for a uniform <strong>CMS</strong> experiment control system.<br />
[97]t<br />
ii
Contents<br />
ABSTRACT............................................................................................................................................................I<br />
VISUAL SUMMARY............................................................................................................................................... II<br />
CONTENTS.........................................................................................................................................................III<br />
ACRONYMS..................................................................................................................................................... VII<br />
CHAPTER 1 INTRODUCTION..................................................................................................................... 1<br />
1.1 CERN AND THE LARGE HADRON COLLIDER ........................................................................................... 1<br />
1.2 THE COMPACT MUON SOLENOID DETECTOR ........................................................................................... 3<br />
1.3 THE TRIGGER AND DAQ SYSTEM ............................................................................................................ 5<br />
1.3.1 Overview ......................................................................................................................................... 5<br />
1.3.2 <strong>The</strong> Level-1 trigger decision loop ................................................................................................... 5<br />
1.3.2.1 Calorimeter <strong>Trigger</strong>..................................................................................................................................... 6<br />
1.3.2.2 Muon <strong>Trigger</strong> .............................................................................................................................................. 7<br />
1.3.2.3 Global <strong>Trigger</strong> ............................................................................................................................................. 7<br />
1.3.2.4 Timing <strong>Trigger</strong> and Control System............................................................................................................ 7<br />
1.4 THE <strong>CMS</strong> EXPERIMENT CONTROL SYSTEM ............................................................................................. 8<br />
1.4.1 Run Control and monitoring System ...............................................................................................8<br />
1.4.2 Detector Control System ................................................................................................................. 9<br />
1.4.3 Cross-platform DAQ framework..................................................................................................... 9<br />
1.4.4 Sub-system Online Software Infrastructure................................................................................... 10<br />
1.4.5 Architecture................................................................................................................................... 10<br />
1.5 RESEARCH PROGRAM............................................................................................................................. 11<br />
1.5.1 Motivation ..................................................................................................................................... 11<br />
1.5.2 Goals ............................................................................................................................................. 12<br />
CHAPTER 2 UNIFORM MANAGEMENT OF DATA ACQUISITION DEVICES WITH XML ........ 13<br />
2.1 INTRODUCTION ...................................................................................................................................... 13<br />
2.2 KEY REQUIREMENTS .............................................................................................................................. 13<br />
2.3 A UNIFORM APPROACH FOR HARDWARE CONFIGURATION CONTROL AND TESTING ................................ 14<br />
2.3.1 XML as a uniform syntax .............................................................................................................. 14<br />
2.3.2 XML based control language ........................................................................................................ 15<br />
2.4 INTERPRETER DESIGN............................................................................................................................. 17<br />
2.4.1 Polymorphic structure................................................................................................................... 17<br />
2.5 USE IN A DISTRIBUTED ENVIRONMENT ................................................................................................... 18<br />
2.6 HARDWARE MANAGEMENT SYSTEM PROTOTYPE ................................................................................... 18<br />
2.7 PERFORMANCE COMPARISON................................................................................................................. 20<br />
2.8 PROTOTYPE STATUS............................................................................................................................... 20<br />
CHAPTER 3 TRIGGER SUPERVISOR CONCEPT................................................................................. 21<br />
3.1 INTRODUCTION ...................................................................................................................................... 21<br />
3.2 REQUIREMENTS ..................................................................................................................................... 22<br />
iii
3.2.1 Functional requirements ............................................................................................................... 22<br />
3.2.2 Non-functional requirements......................................................................................................... 23<br />
3.3 DESIGN .................................................................................................................................................. 25<br />
3.3.1 Initial discussion on technology.................................................................................................... 25<br />
3.3.2 Cell................................................................................................................................................ 26<br />
3.3.3 <strong>Trigger</strong> <strong>Supervisor</strong> services .......................................................................................................... 27<br />
3.3.3.1 Configuration............................................................................................................................................. 27<br />
3.3.3.2 Reconfiguration......................................................................................................................................... 29<br />
3.3.3.3 Testing....................................................................................................................................................... 29<br />
3.3.3.4 Monitoring................................................................................................................................................. 31<br />
3.3.3.5 Start-up...................................................................................................................................................... 31<br />
3.3.4 Graphical User Interface .............................................................................................................. 32<br />
3.3.5 Configuration and conditions database ........................................................................................ 32<br />
3.4 PROJECT COMMUNICATION CHANNELS .................................................................................................. 32<br />
3.5 PROJECT DEVELOPMENT ........................................................................................................................ 33<br />
3.6 TASKS AND RESPONSIBILITIES................................................................................................................ 34<br />
3.7 CONCEPTUAL DESIGN IN PERSPECTIVE................................................................................................... 35<br />
CHAPTER 4 TRIGGER SUPERVISOR FRAMEWORK......................................................................... 37<br />
4.1 CHOICE OF AN ADEQUATE FRAMEWORK ................................................................................................ 37<br />
4.2 REQUIREMENTS ..................................................................................................................................... 38<br />
4.2.1 Requirements covered by XDAQ................................................................................................... 38<br />
4.2.2 Requirements non-covered by XDAQ............................................................................................ 38<br />
4.3 CELL FUNCTIONAL STRUCTURE.............................................................................................................. 39<br />
4.3.1 Cell Operation............................................................................................................................... 39<br />
4.3.2 Cell command................................................................................................................................ 41<br />
4.3.3 Factories and plug-ins .................................................................................................................. 41<br />
4.3.4 Pools.............................................................................................................................................. 41<br />
4.3.5 Controller interface....................................................................................................................... 41<br />
4.3.6 Response control module .............................................................................................................. 42<br />
4.3.7 Access control module................................................................................................................... 42<br />
4.3.8 Shared resource manager ............................................................................................................. 42<br />
4.3.9 Error manager .............................................................................................................................. 42<br />
4.3.10 Xhannel ......................................................................................................................................... 42<br />
4.3.11 Monitoring facilities...................................................................................................................... 43<br />
4.4 IMPLEMENTATION.................................................................................................................................. 43<br />
4.4.1 Layered architecture ..................................................................................................................... 43<br />
4.4.2 External packages ......................................................................................................................... 43<br />
4.4.2.1 Log4cplus.................................................................................................................................................. 43<br />
4.4.2.2 Xerces........................................................................................................................................................ 44<br />
4.4.2.3 Graphviz.................................................................................................................................................... 44<br />
4.4.2.4 ChartDirector............................................................................................................................................. 44<br />
4.4.2.5 Dojo........................................................................................................................................................... 44<br />
4.4.2.6 Cgicc ......................................................................................................................................................... 45<br />
4.4.2.7 Logging collector....................................................................................................................................... 45<br />
4.4.3 XDAQ development....................................................................................................................... 45<br />
4.4.4 <strong>Trigger</strong> <strong>Supervisor</strong> framework ...................................................................................................... 46<br />
4.4.4.1 <strong>The</strong> cell...................................................................................................................................................... 47<br />
4.4.4.2 Cell command............................................................................................................................................ 48<br />
4.4.4.3 Cell operation ............................................................................................................................................ 49<br />
4.4.4.4 Factories, pools and plug-ins ..................................................................................................................... 50<br />
4.4.4.5 Controller interface.................................................................................................................................... 51<br />
4.4.4.6 Response control module........................................................................................................................... 51<br />
4.4.4.7 Access control module............................................................................................................................... 53<br />
4.4.4.8 Error management module ........................................................................................................................ 53<br />
4.4.4.9 Xhannel ..................................................................................................................................................... 53<br />
4.4.4.9.1 CellXhannelCell .................................................................................................................................. 54<br />
4.4.4.9.2 CellXhannelTb .................................................................................................................................... 55<br />
iv
4.4.4.10 CellToolbox .......................................................................................................................................... 56<br />
4.4.4.11 Graphical User Interface ....................................................................................................................... 56<br />
4.4.4.12 Monitoring infrastructure ...................................................................................................................... 57<br />
4.4.4.12.1 Model ................................................................................................................................................ 58<br />
4.4.4.12.2 Declaration and definition of monitoring items ................................................................................. 58<br />
4.4.4.13 Logging infrastructure........................................................................................................................... 61<br />
4.4.4.14 Start-up infrastructure ........................................................................................................................... 62<br />
4.5 CELL DEVELOPMENT MODEL.................................................................................................................. 62<br />
4.6 PERFORMANCE AND SCALABILITY MEASUREMENTS............................................................................... 63<br />
4.6.1 Test setup....................................................................................................................................... 63<br />
4.6.2 Command execution ...................................................................................................................... 63<br />
4.6.3 Operation instance initialization................................................................................................... 65<br />
4.6.4 Operation state transition ............................................................................................................. 66<br />
CHAPTER 5 TRIGGER SUPERVISOR SYSTEM.................................................................................... 69<br />
5.1 INTRODUCTION ...................................................................................................................................... 69<br />
5.2 DESIGN GUIDELINES............................................................................................................................... 69<br />
5.2.1 Homogeneous underlying infrastructure....................................................................................... 69<br />
5.2.2 Hierarchical control system architecture...................................................................................... 69<br />
5.2.3 Centralized monitoring, logging and start-up systems architecture ............................................. 70<br />
5.2.4 Persistency infrastructure ............................................................................................................. 70<br />
5.2.4.1 Centralized access ..................................................................................................................................... 70<br />
5.2.4.2 Common monitoring and logging databases.............................................................................................. 70<br />
5.2.4.3 Centralized maintenance............................................................................................................................ 70<br />
5.2.5 Always on system........................................................................................................................... 70<br />
5.3 SUB-SYSTEM INTEGRATION.................................................................................................................... 71<br />
5.3.1 Building blocks.............................................................................................................................. 71<br />
5.3.1.1 <strong>The</strong> TS node .............................................................................................................................................. 71<br />
5.3.1.2 Common services ...................................................................................................................................... 72<br />
5.3.1.2.1 Logging collector................................................................................................................................ 72<br />
5.3.1.2.2 Tstore................................................................................................................................................... 72<br />
5.3.1.2.3 Monitor collector ................................................................................................................................. 72<br />
5.3.1.2.4 Mstore.................................................................................................................................................. 73<br />
5.3.2 Integration..................................................................................................................................... 73<br />
5.3.2.1 Integration parameters ............................................................................................................................... 73<br />
5.3.2.1.1 OSWI parameters ................................................................................................................................ 73<br />
5.3.2.1.2 Hardware setup parameters.................................................................................................................. 74<br />
5.3.2.2 Integration cases ........................................................................................................................................ 74<br />
5.3.2.2.1 Cathode Strip Chamber Track Finder.................................................................................................. 74<br />
5.3.2.2.2 Global <strong>Trigger</strong> and Global Muon <strong>Trigger</strong>............................................................................................ 74<br />
5.3.2.2.3 Drift Tube Track Finder....................................................................................................................... 75<br />
5.3.2.2.4 Resistive Plate Chamber...................................................................................................................... 76<br />
5.3.2.2.5 Global Calorimeter <strong>Trigger</strong> ................................................................................................................. 76<br />
5.3.2.2.6 Hadronic Calorimeter .......................................................................................................................... 77<br />
5.3.2.2.7 <strong>Trigger</strong>, Timing and Control System................................................................................................... 78<br />
5.3.2.2.8 Luminosity Monitoring System........................................................................................................... 79<br />
5.3.2.2.9 Central cell .......................................................................................................................................... 80<br />
5.3.2.3 Integration summary.................................................................................................................................. 80<br />
5.4 SYSTEM INTEGRATION ........................................................................................................................... 81<br />
5.4.1 Control system............................................................................................................................... 81<br />
5.4.2 Monitoring system......................................................................................................................... 82<br />
5.4.3 Logging system.............................................................................................................................. 83<br />
5.4.4 Start-up system.............................................................................................................................. 83<br />
5.5 SERVICES DEVELOPMENT PROCESS ........................................................................................................ 83<br />
CHAPTER 6 TRIGGER SUPERVISOR SERVICES ................................................................................ 87<br />
6.1 INTRODUCTION ...................................................................................................................................... 87<br />
6.2 CONFIGURATION.................................................................................................................................... 87<br />
6.2.1 Description.................................................................................................................................... 87<br />
v
6.2.2 Implementation.............................................................................................................................. 88<br />
6.2.2.1 Central cell ................................................................................................................................................ 89<br />
6.2.2.2 <strong>Trigger</strong> sub-systems................................................................................................................................... 92<br />
6.2.2.3 Global <strong>Trigger</strong> ........................................................................................................................................... 93<br />
6.2.2.3.1 Command interface.............................................................................................................................. 94<br />
6.2.2.3.2 Configuration operation and database ............................................................................................... 101<br />
6.2.2.4 Sub-detector cells .................................................................................................................................... 103<br />
6.2.2.5 Luminosity monitoring system................................................................................................................ 103<br />
6.2.3 Integration with the Run Control and Monitoring System .......................................................... 103<br />
6.3 INTERCONNECTION TEST...................................................................................................................... 105<br />
6.3.1 Description.................................................................................................................................. 105<br />
6.3.2 Implementation............................................................................................................................ 105<br />
6.3.2.1 Central cell .............................................................................................................................................. 105<br />
6.3.2.2 Sub-system cells ...................................................................................................................................... 107<br />
6.4 MONITORING ....................................................................................................................................... 108<br />
6.4.1 Description.................................................................................................................................. 108<br />
6.5 GRAPHICAL USER INTERFACES............................................................................................................. 109<br />
6.5.1 Global <strong>Trigger</strong> control panel ...................................................................................................... 109<br />
CHAPTER 7 HOMOGENEOUS SUPERVISOR AND CONTROL SOFTWARE INFRASTRUCTURE<br />
FOR THE <strong>CMS</strong> EXPERIMENT AT SLHC................................................................................................... 111<br />
7.1 INTRODUCTION .................................................................................................................................... 111<br />
7.2 TECHNOLOGY BASELINE ...................................................................................................................... 111<br />
7.3 ROAD MAP ........................................................................................................................................... 112<br />
7.4 SCHEDULE AND RESOURCE ESTIMATES ................................................................................................ 113<br />
CHAPTER 8 SUMMARY AND CONCLUSIONS.................................................................................... 115<br />
8.1 CONTRIBUTIONS TO THE <strong>CMS</strong> GENETIC BASE...................................................................................... 116<br />
8.1.1 XSEQ........................................................................................................................................... 116<br />
8.1.2 <strong>Trigger</strong> <strong>Supervisor</strong> ...................................................................................................................... 117<br />
8.1.3 <strong>Trigger</strong> <strong>Supervisor</strong> framework ....................................................................................................117<br />
8.1.4 <strong>Trigger</strong> <strong>Supervisor</strong> system........................................................................................................... 118<br />
8.1.5 <strong>Trigger</strong> <strong>Supervisor</strong> services ........................................................................................................ 118<br />
8.1.6 <strong>Trigger</strong> <strong>Supervisor</strong> Continuation ................................................................................................ 119<br />
8.2 CONTRIBUTION TO THE <strong>CMS</strong> BODY ..................................................................................................... 119<br />
8.3 FINAL REMARKS................................................................................................................................... 119<br />
APPENDIX A TRIGGER SUPERVISOR SOAP API............................................................................ 121<br />
A.1 INTRODUCTION .................................................................................................................................... 121<br />
A.2 REQUIREMENTS ................................................................................................................................... 121<br />
A.3 SOAP API ........................................................................................................................................... 121<br />
A.3.1 Protocol....................................................................................................................................... 121<br />
A.3.2 Request message.......................................................................................................................... 123<br />
A.3.3 Reply message ............................................................................................................................. 124<br />
A.3.4 Cell command remote API .......................................................................................................... 125<br />
A.3.5 Cell Operation remote API ......................................................................................................... 125<br />
A.3.5.1 OpInit ...................................................................................................................................................... 125<br />
A.3.5.2 OpSendCommand.................................................................................................................................... 126<br />
A.3.5.3 OpReset ................................................................................................................................................... 127<br />
A.3.5.4 OpGetState .............................................................................................................................................. 128<br />
A.3.5.5 OpKill...................................................................................................................................................... 129<br />
ACKNOWLEDGEMENTS.............................................................................................................................. 131<br />
REFERENCES.................................................................................................................................................. 133<br />
vi
Acronyms<br />
ACM<br />
AJAX<br />
ALICE<br />
API<br />
ATLAS<br />
aTTS<br />
BX<br />
BU<br />
CCC<br />
CCI<br />
CERN<br />
CGI<br />
CKC<br />
<strong>CMS</strong><br />
CSC<br />
CSCTF<br />
CVS<br />
DAQ<br />
DCC<br />
DCS<br />
DB<br />
DBWG<br />
DIM<br />
DOM<br />
DT<br />
DTSC<br />
DTTF<br />
ECAL<br />
ECS<br />
ERM<br />
EVM<br />
FDL<br />
Access Control Module<br />
Asynchronous JavaScript and XML<br />
A Large Ion Collider Experiment<br />
Application Program Interface<br />
A Toroidal LHC Apparatus<br />
Asynchronous <strong>Trigger</strong> Throttle System<br />
Bunch crossing<br />
Builder Unit<br />
Central Crate Cell<br />
Control Cell Interface<br />
Conseil Europeen pour la Recherche Nucleaire<br />
Common Gateway Interface<br />
ClocK crate cell<br />
Compact Muon Solenoid<br />
Cathode Strip Chamber<br />
Cathode Strip Chamber Track Finder<br />
Concurrent Versions System<br />
Data Acquisition<br />
DTTF Central Cell<br />
Detector Control System<br />
DataBase<br />
<strong>CMS</strong> DataBase Working Group<br />
Distributed Information Management System<br />
Document Object Model<br />
Drift Tube<br />
Drift Tube Sector Collector<br />
Drift Tube Track Finder<br />
Electromagnetic CALorimeter<br />
Experiment Control System<br />
Error Manager<br />
EVent Manager<br />
Final Decision Logic<br />
vii
FED<br />
FLFM<br />
FM<br />
FPGA<br />
FRL<br />
FSM<br />
FTE<br />
FU<br />
GCT<br />
GMT<br />
GT<br />
GTFE<br />
GTL<br />
GUI<br />
HAL<br />
HCAL<br />
HF<br />
HLT<br />
HTML<br />
HTTP<br />
HEP<br />
HW<br />
I2O<br />
JSP<br />
LEP<br />
LHC<br />
LHCb<br />
LMS<br />
LMSS<br />
LUT<br />
L1<br />
L1A<br />
ORCA<br />
OSWI<br />
PCI<br />
PSB<br />
PSI<br />
PVSS<br />
RC<br />
Front-end Device<br />
First Level Function Manager<br />
Function manager<br />
Field Programmable Gate Array<br />
Front-end Readout Link board<br />
Finite State Machine<br />
Full Time Equivalent<br />
Filter Unit<br />
Global Calorimeter <strong>Trigger</strong><br />
Global Muon <strong>Trigger</strong><br />
Global <strong>Trigger</strong><br />
Global <strong>Trigger</strong> Front-end<br />
Global <strong>Trigger</strong> Logic<br />
Graphical User Interface<br />
Hardware Access Library<br />
Hadronic CALorimeter<br />
Forward Hadronic calorimeter<br />
High Level <strong>Trigger</strong><br />
HyperText Markup Language<br />
HyperText Transfer Protocol<br />
High Energy Physics<br />
HardWare<br />
Intelligent Input/Output<br />
Java Server Pages<br />
Large Electron and Positron collider<br />
Large Hadron Collider<br />
Large Hadron Collider beauty experiment<br />
Luminosity Monitoring System<br />
Luminosity Monitoring Software System<br />
Look Up Table<br />
Level-1<br />
Level-1 Accept signal<br />
Object Oriented Reconstruction for <strong>CMS</strong> Analysis<br />
Online SoftWare Infrastructure<br />
Peripheral Component Interconnect bus standard<br />
Pipeline Synchronizing Buffer<br />
PVSS SOAP Interface<br />
ProzessVisualisierungs- und SteuerungSSystem<br />
Run Control<br />
viii
RCM<br />
R<strong>CMS</strong><br />
RCT<br />
RF2TTC<br />
RPC<br />
RU<br />
SRM<br />
SW<br />
SCADA<br />
SDRAM<br />
SEC<br />
SLHC<br />
SLOC<br />
SOAP<br />
SRM<br />
SSCS<br />
sTTS<br />
TCS<br />
TFC<br />
TIM<br />
TOTEM<br />
TPG<br />
TriDAS<br />
TS<br />
TSCS<br />
TSMS<br />
TSLS<br />
TSM<br />
TSSS<br />
TTC<br />
TTCci<br />
TTCrx<br />
TTS<br />
UA1<br />
UDP<br />
UML<br />
URL<br />
VME<br />
WSDL<br />
Response Control Module<br />
Run Control and Monitoring System<br />
Regional Calorimeter <strong>Trigger</strong><br />
TTC machine interface<br />
Resistive Plate Chamber and Remote Process Call<br />
Readout Unit<br />
Shared Resources Manager<br />
SoftWare<br />
<strong>Supervisor</strong>y Controls And Data Acquisition<br />
Synchronous Dynamic Random Access Memory<br />
Service Entry Cell<br />
Super LHC<br />
Source Lines Of Code<br />
Simple Object Access Protocol<br />
Shared Resource Module<br />
Sub-detectors <strong>Supervisor</strong>y and Control Systems<br />
Synchronous <strong>Trigger</strong> Throttle System<br />
<strong>Trigger</strong> Control System<br />
Track Finder Cell<br />
TIMing module<br />
TOTal cross cection, Elastic scattering and diffraction dissociation at the LHC<br />
<strong>Trigger</strong> Primitive Generator (HF, HCAL, ECAL, RPC, CSC and DT)<br />
<strong>Trigger</strong> and Data Acquisition System<br />
<strong>Trigger</strong> <strong>Supervisor</strong><br />
<strong>Trigger</strong> <strong>Supervisor</strong> Control System<br />
<strong>Trigger</strong> <strong>Supervisor</strong> Monitoring System<br />
<strong>Trigger</strong> <strong>Supervisor</strong> Logging System<br />
Task Scheduler Module<br />
<strong>Trigger</strong> <strong>Supervisor</strong> Start-up System<br />
Timing, <strong>Trigger</strong> and Control System<br />
<strong>CMS</strong> version of the TTC VME interface module<br />
A Timing, <strong>Trigger</strong> and Control Receiver ASIC for LHC Detectors<br />
<strong>Trigger</strong> Throttle System<br />
Underground Area 1 experiment<br />
User Datagram Protocol<br />
Unified Modeling Language<br />
Uniform Resource Locator<br />
Versa Module Europa bus standard<br />
Web Service Description Language<br />
ix
W3C<br />
XDAQ<br />
XML<br />
XPath<br />
XSD<br />
XSEQ<br />
World Wide Web Consortium<br />
Cross-platform DAQ framework<br />
EXtensible Markup Language<br />
XML Path language<br />
XML Schema Document<br />
Cross-platform SEQuencer<br />
x
Chapter 1<br />
Introduction<br />
1.1 CERN and the Large Hadron Collider<br />
At CERN, the European laboratory for particle physics, the fundamental structure of matter is studied using<br />
particle accelerators. <strong>The</strong> acronym CERN comes from the earlier French title: “Conseil Européen pour la<br />
Recherche Nucléaire”. CERN is located on the Franco-Swiss border west of Geneva. CERN was founded in<br />
1954, and is currently being funded by 20 European countries. CERN employs just under 3000 people, only a<br />
fraction of those are actually particle physicists. This reflects the role of CERN: it does not so much perform<br />
particle physics itself, but rather offers its research facilities to the particle physicists in Europe and increasingly<br />
in the whole world. About half of the world’s particle physicists, some 6500 researchers from over 500<br />
universities and institutes in some 80 countries, use CERN’s facilities.<br />
<strong>The</strong> latest of these facilities that has been designed and is being built at CERN is the Large Hadron Collider or<br />
LHC [1]. It is contained in a 26.7 km circumference tunnel located underground at a depth ranging from 50 to<br />
150 meters (Figure 1-1). <strong>The</strong> tunnel was formerly used for the Large Electron Positron (LEP) collider. <strong>The</strong> LHC<br />
project consists of a superconducting magnet system with two beam channels designed to bring two proton<br />
beams into collision, at a centre of mass energy of 14 TeV. It will also be able to provide collisions of heavy<br />
nuclei (Pb-Pb) produced at a centre of mass energy of 2.76 TeV per nucleon.<br />
When the two counter-rotating proton bunches cross, protons within bunches can collide producing new particles<br />
in inelastic interactions. Such inelastic interactions are also referred to as “events”. <strong>The</strong> probability for such<br />
inelastic collisions to take place is determined by the cross section for proton-proton interactions and by the<br />
density and frequency of the proton bunches. <strong>The</strong> related quantity, which is a characteristic of the collider, is<br />
called the luminosity. <strong>The</strong> design luminosity of the LHC is 10 34 cm −2 s −1 . <strong>The</strong> proton-proton inelastic cross<br />
section σ inel depends on the proton’s energy. At the LHC center-of-mass energy of 14 TeV, σ inel is expected to be<br />
70 mb (70·10 −27 cm 2 ). <strong>The</strong>refore, the number of inelastic interactions per second (event rate), is the product of the<br />
cross section (σ inel ) and the luminosity (L): N inel = σ·L = 7·10 8 s -1 . As the bunch crossing rate is 40 MHz and<br />
bearing in mind that during normal operation at the LHC not all bunches are filled (only 2808 out of 3564), the<br />
average number of events per bunch crossing can be calculated as 7·10 8·25·10 −9·3564/2808 ≈ 22.<br />
<strong>The</strong> main LHC functional parameters that are most important from the experimental point of view are reported in<br />
Table 1-1. At the energy scale and raw data rate aimed at LHC, the design of the detectors faces a number of new<br />
implementation challenges. LHC detectors must have the capability of isolating and reconstructing the<br />
interesting events as only few events can be recorded out of the 40 million each second. Another technical<br />
challenge is the extremely hostile radiation environment.
Introduction 2<br />
Figure 1-1: Schematic illustration of the LHC ring with the four experimental points.<br />
Design Luminosity (L)<br />
Bunch crossing (BX) rate<br />
10 34 cm −2 s −1<br />
40 MHz<br />
Number of bunches per orbit 3564<br />
Number of filled bunches per orbit 2808<br />
Average number of events per bunch crossing 22<br />
Table 1-1: Main LHC functional parameters that are most important from the experimental point of view.<br />
<strong>The</strong>re are four collision points spread over the LHC ring which house the main LHC experiments. <strong>The</strong> two<br />
largest, Compact Muon Solenoid (<strong>CMS</strong>, [2]) and A Toroidal LHC ApparatuS (ATLAS, [3]) are general purpose<br />
experiments that take different approaches, in particular to the detection of muons.<br />
<strong>CMS</strong> is built around a very high field solenoid magnet; its relative compactness derives from the fact that there is<br />
a massive iron yoke so that the muons are detected by their bending over a relatively short distance in a very high<br />
magnetic field. <strong>The</strong> ATLAS experiment is substantially bigger and essentially relies upon an air-cored toroidal<br />
magnet system for the measurement of the muons.<br />
Two more special-purpose experiments have been approved to start their operation at the switch on of the LHC<br />
machine, A Large Ion Collider Experiment (ALICE, [4]) and the Large Hadron Collider beauty experiment<br />
(LHC-b, [5]). ALICE is a dedicated heavy-ion detector that will exploit the unique physics potential of nucleusnucleus<br />
interactions at LHC energies, and the LHC-b detector is dedicated to the study of CP violation and other<br />
rare phenomena in the decays of beauty particles.
<strong>The</strong> Compact Muon Solenoid detector 3<br />
1.2 <strong>The</strong> Compact Muon Solenoid detector<br />
<strong>The</strong> <strong>CMS</strong> detector is a general-purpose quasi-hermetic detector. This kind of particle detector is designed to<br />
observe all possible decay products of an interaction between subatomic particles in a collider by covering as<br />
large an area around the interaction point as possible and incorporating multiple types of sub-detectors. <strong>CMS</strong> is<br />
called “hermetic” because it is designed to let as few particles as possible escape.<br />
<strong>The</strong>re are three main components of a particle physics collider detector. From the inside out, the first is a tracker,<br />
which measures the momenta of charged particles as they curve in a magnetic field. Next there are calorimeters,<br />
which measure the energy of most charged and neutral particles by absorbing them in dense material, and a<br />
muon system which measures the type of particle that is not stopped in the calorimeters and can still be detected.<br />
<strong>The</strong> concept of the <strong>CMS</strong> detector was based on the requirements of having a very good muon system whilst<br />
keeping the detector dimensions compact. In this case, only a strong magnetic field would guarantee good<br />
momentum resolution for high momentum muons. Studies showed that the required magnetic field could be<br />
generated by a superconducting solenoid. It is also a particularity of <strong>CMS</strong> that the solenoid surrounds the<br />
calorimeter detectors.<br />
Figure 1-2 shows a schematic drawing of the <strong>CMS</strong> detector and its components that will be described in detail in<br />
the subsequent sections. Figure 1-3 shows a transverse slice of the detector. Trajectories of different kinds of<br />
particles and the traces they leave in the different components of the detector are also shown.<br />
<strong>The</strong> coordinate system adopted by <strong>CMS</strong> has the origin centered at the nominal collision point inside the<br />
experiment, the y-axis pointing vertically upward, and the x-axis pointing radially inward toward the center of<br />
the LHC. Thus, the z-axis points along the beam direction toward the Jura mountains from LHC Point 5. <strong>The</strong><br />
azimuthal angle (φ) is measured from the x-axis in the x-y plane. <strong>The</strong> polar angle (θ) is measured from the z-axis.<br />
Pseudorapidity is defined as η = -ln tan(θ/2). Thus, the momentum and energy measured transverse to the beam<br />
direction, denoted by p T and E T , respectively, are computed from the x and y components.<br />
Figure 1-2: Drawing of the complete <strong>CMS</strong> detector, showing both the scale and complexity.
Introduction 4<br />
Figure 1-3: Slice through <strong>CMS</strong> showing particles incident on the different sub-detectors.<br />
Tracker<br />
<strong>The</strong> tracking system [6] records the helix traced by a charged particle that curves in a magnetic field by<br />
localizing it in space in finely-segmented layers of detecting material composed of silicon. <strong>The</strong> degree to which<br />
the particle curves is inversely proportional to its momentum perpendicular to the beam, while the degree to<br />
which it drifts in the direction of the beam axis gives its momentum in that direction.<br />
Calorimeters<br />
<strong>The</strong> calorimeter system is installed inside the coil. It slows particles down and absorbs their energy allowing that<br />
energy to be measured. This detector is divided into two types: the Electromagnetic Calorimeter (ECAL, [7]),<br />
made of lead tungstate (PbWO 4 ) crystals, absorbs particles that interact electromagnetically by producing<br />
electron/positron pairs and bremsstrahlung 1 ; and the Hadronic Calorimeter (HCAL, [8]), made of interleaved<br />
copper absorber and plastic scintillator plates, can detect hadrons which interact via the strong nuclear force.<br />
Muon system<br />
Of all the known stable particles, only muons and neutrinos pass through the calorimeter without losing most or<br />
all of their energy. Neutrinos are undetectable, and their existence must be inferred, but muons (which are<br />
charged) can be measured by an additional tracking system outside the calorimeters.<br />
A redundant and precise muon system was one of the first requirements of <strong>CMS</strong> [9]. <strong>The</strong> ability to trigger on and<br />
reconstruct muons, being an unmistakable signature for a large number of new physics processes <strong>CMS</strong> is<br />
designed to explore, is central to the concept. <strong>The</strong> muon system consists of three technologically different<br />
components: Resistive Plate Chambers (RPC), Drift Tubes (DT) and Cathode Strip Chambers (CSC).<br />
1 Bremsstrahlung is electromagnetic radiation produced by the deceleration of a charged particle, such as an electron, when<br />
deflected by another charged particle, such as an atomic nucleus.
<strong>The</strong> <strong>Trigger</strong> and DAQ system 5<br />
<strong>The</strong> muon system of <strong>CMS</strong> is embedded in the iron return yoke of the magnet. It makes use of the bending of<br />
muons in the magnetic field for transverse momentum measurements of muon tracks identified in association<br />
with the tracker. <strong>The</strong> large thickness of absorber material in the return yoke helps to filter out hadrons, so that<br />
muons are practically the only particles apart from neutrinos able to escape from the calorimeter system. <strong>The</strong><br />
muon system consists of 4 stations of muon chambers in the barrel region (Figure 1-3 shows how the 4 stations<br />
correspond to 4 layers of muon chambers) and disks in the forward region.<br />
1.3 <strong>The</strong> <strong>Trigger</strong> and DAQ system<br />
1.3.1 Overview<br />
<strong>The</strong> <strong>CMS</strong> <strong>Trigger</strong> and Data Acquisition (DAQ) system is designed to collect and to analyze the detector<br />
information at the LHC bunch crossing frequency of 40 MHz. <strong>The</strong> rate of events to be recorded for offline<br />
processing and analysis is of the order of 100 Hz. At the design luminosity of 10 34 cm −2 s −1 , the LHC rate of<br />
proton collisions will be around 22 per bunch crossing, producing approximately 1 MB of zero-suppressed data 2<br />
in the <strong>CMS</strong> readout system. <strong>The</strong> Level-1 (L1) trigger is designed to reduce the incoming data rate to a maximum<br />
of 100 kHz, by processing fast trigger information coming from the calorimeters and the muon chambers, and<br />
selecting events with interesting signatures. <strong>The</strong>refore, the DAQ system must sustain a maximum input rate of<br />
100 kHz, for an average data flow of 100 GB/s coming from about 650 data sources, and must provide enough<br />
computing power for a high level software trigger (HLT) to reduce the rate of stored events by a factor of 1000.<br />
In <strong>CMS</strong> all events that pass the Level-1 trigger are sent to a computer farm (Event Filter) that performs physics<br />
selections, using the offline reconstruction software, to filter events and achieve the required output rate. <strong>The</strong><br />
design of the <strong>CMS</strong> Data Acquisition system and of the High Level trigger is described in detail in the Technical<br />
Design Report [10]. <strong>The</strong> architecture of the <strong>CMS</strong> <strong>Trigger</strong> and DAQ system is shown schematically in Figure 1-4.<br />
Figure 1-4: Overview of the <strong>CMS</strong> <strong>Trigger</strong> and DAQ system architecture.<br />
1.3.2 <strong>The</strong> Level-1 trigger decision loop<br />
<strong>The</strong> L1 trigger [11] is a custom pipelined hardware logic intended to analyze the bunch crossing data every 25 ns<br />
without deadtime using special coarsely segmented trigger data from the muon systems and the calorimeters. <strong>The</strong><br />
L1 trigger reduces the rate of crossings to below 100 kHz.<br />
<strong>The</strong> L1 trigger has local, regional and global components. At the bottom end, the Local <strong>Trigger</strong>s, also called<br />
<strong>Trigger</strong> Primitive Generators (TPG), are based on energy deposits in calorimeter trigger towers 3 and track<br />
2 Zero suppression consists of eliminating leading zeros. This encoding is performed by the on-detector readout electronics to<br />
reduce the data volume.<br />
3 Each trigger tower identifies a detector region with an approximate (η,φ)-coverage of 0.087 x 0.087 rad.
Introduction 6<br />
3.2 µs<br />
192 L1A’s (128 Algorithms + 64 Technical <strong>Trigger</strong>s)<br />
Global <strong>Trigger</strong><br />
DAQ<br />
DAQ<br />
Drift Tube<br />
Track finder<br />
Drift Tube Sector<br />
Collector<br />
Slink<br />
Back pressure<br />
(sTTS)<br />
Local Control<br />
Local trigger Local 0<br />
Control 31<br />
Global Muon <strong>Trigger</strong><br />
CSC Track<br />
Finder<br />
RPC<br />
<strong>Trigger</strong><br />
DT CSC RPC<br />
Muon Det. Front End<br />
L1A + TTC<br />
ECAL TPG<br />
Global Calorimeter<br />
<strong>Trigger</strong><br />
Regional Calorimeter<br />
<strong>Trigger</strong><br />
HCAL/HF TPG<br />
ECAL HCAL HF<br />
Calorimeters Front End<br />
Back pressure<br />
(aTTS + sTTS)<br />
Partition<br />
controller 0<br />
OR (192 L1A)<br />
L1A + TTC<br />
<strong>Trigger</strong> Control System<br />
Partition<br />
controller 7<br />
OR (192 L1A)<br />
Figure 1-5: <strong>The</strong> Level-1 trigger decision loop.<br />
segments or hit patterns in muon chambers, respectively. Regional <strong>Trigger</strong>s (or Local <strong>Trigger</strong>s) combine their<br />
information and use pattern logic to determine ranked and sorted trigger objects such as electron or muon<br />
candidates in limited spatial regions. <strong>The</strong> rank is determined as a function of energy or momentum and quality,<br />
which reflects the level of confidence attributed to the L1 trigger parameter measurements, based on detailed<br />
knowledge of the detectors and trigger electronics and on the amount of information available. <strong>The</strong> Global<br />
Calorimeter and Global Muon <strong>Trigger</strong>s determine the highest-rank calorimeter and muon objects across the<br />
entire experiment and transfer them to the Global <strong>Trigger</strong>, the top entity of the L1 trigger hierarchy.<br />
While the L1 trigger is taking its decision the full high-precision data of all detector channels are stored in analog<br />
or digital buffers, which are only read out if the event is accepted. <strong>The</strong> L1 decision loop takes 3.2 μs or 128<br />
bunch crossings which is the size of the front-end buffers. <strong>The</strong> Level-1 Accept (L1A) decision is communicated<br />
to the sub-detectors through the Timing, <strong>Trigger</strong> and Control (TTC) system. Figure 1-5 shows a diagram of the<br />
L1 decision loop.<br />
1.3.2.1 Calorimeter <strong>Trigger</strong><br />
<strong>The</strong> first step of the Calorimeter trigger pipeline is the TPGs. For triggering purposes the calorimeters are<br />
subdivided in trigger towers. <strong>The</strong> TPGs sum the transverse energies measured in ECAL crystals or HCAL<br />
readout towers to obtain the trigger tower E T and attach the correct bunch crossing number. <strong>The</strong> TPG electronics<br />
is integrated with the calorimeter readout. <strong>The</strong> TPGs are transmitted through high-speed serial links to the<br />
Regional Calorimeter <strong>Trigger</strong> (RCT, [12]), which determines candidates for electrons or photons, jets, isolated<br />
hadrons and calculates energy sums in calorimeter regions of 4 x 4 trigger towers. <strong>The</strong>se objects are forwarded to<br />
the Global Calorimeter <strong>Trigger</strong> (GCT, [13]) where the best four objects of each category are sent to the Global<br />
<strong>Trigger</strong>.
<strong>The</strong> <strong>Trigger</strong> and DAQ system 7<br />
1.3.2.2 Muon <strong>Trigger</strong><br />
All three components of the muon systems (DT, CSC and RPC) take part in the trigger. <strong>The</strong> barrel DT chambers<br />
provide local trigger information in the form of track segments in the φ-projection and hit patterns in the η-<br />
projection. <strong>The</strong> endcap CSCs deliver 3-dimensional track segments. All chamber types also identify the bunch<br />
crossing of the corresponding event. <strong>The</strong> Regional Muon <strong>Trigger</strong> joins segments to complete tracks and assigns<br />
physical parameters. It consists of the DT Sector Collector (DTSC, [14]), DT Track Finders (DTTF, [15]) and<br />
CSC Track Finders (CSCTF, [16]). In addition, the RPC trigger chambers, which have excellent timing<br />
resolution, deliver their own track candidates based on regional hit patterns. <strong>The</strong> Global Muon <strong>Trigger</strong> (GMT,<br />
[17]) then combines the information from the three sub-detectors, achieving an improved momentum resolution<br />
and efficiency compared to the stand-alone systems.<br />
1.3.2.3 Global <strong>Trigger</strong><br />
<strong>The</strong> Global <strong>Trigger</strong> (GT, [18]) takes the decision to accept an event for further evaluation by the HLT based on<br />
trigger objects delivered by the GCT and GMT. <strong>The</strong> GT has five basic stages: input, logic, decision, distribution<br />
and readout. Three Pipeline Synchronizing Buffer (PSB) input boards receive the calorimeter trigger objects<br />
from the GCT and align them in time. <strong>The</strong> muons are received from the GMT through the backplane. An<br />
additional PSB board can receive direct trigger signals from sub-detectors or the TOTEM experiment [19] for<br />
special purposes such as calibration. <strong>The</strong>se signals are called “technical triggers”. <strong>The</strong> core of the GT is the<br />
Global <strong>Trigger</strong> Logic (GTL) board, in which algorithm calculations are performed. <strong>The</strong> most basic algorithms<br />
consist of applying p T or E T thresholds to single objects, or of requiring the jet multiplicities to exceed defined<br />
values. Since location and quality information is available, more complex algorithms based on topological<br />
conditions can also be programmed into the logic. <strong>The</strong> number of algorithms that can be executed in parallel is<br />
128, and up to 64 technical trigger bits may in addition be received directly from a dedicated PSB board. <strong>The</strong> set<br />
of algorithm calculations performed in parallel is called “trigger menu”.<br />
<strong>The</strong> results of the algorithm calculations are sent to the Final Decision Logic (FDL) board in the form of one bit<br />
per algorithm. Up to eight final ORs can be applied and correspondingly eight L1A signals can be issued. For<br />
normal physics data taking a single trigger mask is applied, and the L1A decision is taken accordingly. <strong>The</strong> rest<br />
of L1As are used for commissioning, calibration and tests of individual sub-systems 4 .<br />
<strong>The</strong> distribution of the L1A decision to the sub-systems is performed by two L1A OUT output boards, provided<br />
that it is authorized by the <strong>Trigger</strong> Control System described in Section 1.3.2.4. A TIMing module (TIM) is also<br />
necessary to receive the LHC machine clock and to distribute it to the boards.<br />
Finally, the Global <strong>Trigger</strong> Front-end (GTFE) board sends to the DAQ Event Manager (EVM, Section 1.4.3),<br />
located in the surface control room, the GT data records which consists of the GPS event time received from the<br />
machine, the total L1A count, the bunch crossing number in the range from 1 to 3564, the orbit number, the<br />
event number for each TCS/DAQ partition, all FDL algorithm bits and other information<br />
1.3.2.4 Timing <strong>Trigger</strong> and Control System<br />
<strong>The</strong> <strong>Trigger</strong> Timing and Control (TTC) system provides for distribution of L1A and fast control signals (e.g.<br />
synchronization and reset commands, and test and calibration triggers) to the detector front-ends depending on<br />
the status of the sub-detector readout systems and the data acquisition. <strong>The</strong> status is derived from signals<br />
provided by the <strong>Trigger</strong> Throttle System (TTS). <strong>The</strong> TTC system consists of the <strong>Trigger</strong> Control System (TCS,<br />
[20]) module and the Timing, <strong>Trigger</strong> and Control distribution network [21].<br />
<strong>The</strong> TCS allows different sub-systems to be operated independently if required. For this purpose the experiment<br />
is subdivided into 32 partitions. A partition represents a major component of a sub-system. Each partition must<br />
be assigned to a partition group, also called a TCS partition. Within such a TCS partition all connected partitions<br />
operate concurrently. For commissioning and testing up to eight TCS partitions are available, which each receive<br />
their own L1A signals distributed in different time slots allocated by a priority scheme or in round robin mode.<br />
During normal physics data taking there is only one single TCS partition.<br />
4 <strong>The</strong> sub-system concept includes the sub-detectors and the Level-1 trigger sub-systems.
Introduction 8<br />
Sub-systems may either be operated centrally as members of a partition or privately through a Local <strong>Trigger</strong><br />
Controller (LTC). Switching between central and local mode is performed by the TTCci (TTC <strong>CMS</strong> interface)<br />
module, which provides the interface between the respective trigger control module and the destinations for the<br />
transmission of the L1A signal and other fast commands for synchronization and control. At the destinations the<br />
TTC signals are received by TTC receivers (TTCrx).<br />
<strong>The</strong> TCS, which resides in the Global <strong>Trigger</strong> crate, is connected to the LHC machine through the TIM module,<br />
to the FDL through the GT backplane, and to 32 TTCci modules through the LA1 OUT boards. <strong>The</strong> TTS, to<br />
which it is also connected, has a synchronous (sTTS) and an asynchronous branch (aTTS). <strong>The</strong> sTTS collects<br />
status information from the front-end electronics of 24 sub-detector partitions and up to eight tracker and preshower<br />
front-end buffer emulators 5 . <strong>The</strong> status signals, coded in four bits, denote the conditions “disconnected”,<br />
“overflow warning”, “synchronization loss”, “busy”, “ready” and “error”. <strong>The</strong> signals are generated by the Fast<br />
Merging Modules (FMM) through logical operations on up to 32 groups of four sTTS binary signals and are<br />
received by four conversion boards located in a 6U crate next to the GT central crate. <strong>The</strong> aTTS runs under<br />
control of the DAQ software and monitors the behavior of the readout and trigger electronics. It receives and<br />
sends status information concerning the 8 DAQ partitions, which match the TCS partitions. It is coded in a<br />
similar way as for the sTTS.<br />
Depending on the meaning of the status signals different protocols are executed. For example, in case of warning<br />
on the use of resources due to excessive trigger rates pre-scale factors may be applied in the FDL to algorithms<br />
causing them. A loss of synchronization would initiate a reset procedure. General trigger rules for minimal<br />
spacing of L1As are also implemented in the TCS. <strong>The</strong> total deadtime estimated at the maximum L1 trigger<br />
output rate of 100 kHz is estimated to be below 1%. Deadtime and monitoring counters are provided by the TCS.<br />
1.4 <strong>The</strong> <strong>CMS</strong> Experiment Control System<br />
<strong>The</strong> <strong>CMS</strong> Experiment Control System (ECS) is a complex distributed software system that manages the<br />
configuration, monitoring and operation of all equipment involved in the different activities of the experiment:<br />
<strong>Trigger</strong> and DAQ system, detector operations and the interaction with the outside world. This software system<br />
consists of the Run Control and Monitor System (R<strong>CMS</strong>), the Detector Control System (DCS), a distributed<br />
processing environment (XDAQ) and the sub-system Online SoftWare Infrastructure (OSWI). <strong>The</strong>se<br />
components are described in the following sections.<br />
1.4.1 Run Control and monitoring System<br />
<strong>The</strong> Run Control and Monitoring System (R<strong>CMS</strong>) ([10], pp.191-208; [22]) is one of the principal components of<br />
the ECS and the one that provides the interface to control the overall experiment in data taking operations. This<br />
software system configures and controls the online software of the DAQ components and the sub-detector<br />
control systems.<br />
<strong>The</strong> R<strong>CMS</strong> system has a hierarchical structure with eleven main branches, one per sub-detector, e.g. HCAL,<br />
central DAQ or the L1 trigger. <strong>The</strong> basic element in the control tree is the Function Manager (FM). It consists of<br />
a finite state machine and a set of services. <strong>The</strong> state machine model has been standardized for the first level of<br />
FM’s in the control tree. <strong>The</strong>se nodes are the interface to the sub-detector control software (Section 1.4.4).<br />
<strong>The</strong> R<strong>CMS</strong> system is implemented in the R<strong>CMS</strong> framework, which provides a uniform API to common tasks<br />
like storage and retrieval from the process configuration database, state-machine models for process control, and<br />
access to the monitoring system. <strong>The</strong> framework provides also a set of services which are accessible to the FM’s.<br />
<strong>The</strong> services comprise a security service for authentication and user account management, a resource service for<br />
storing and delivering configuration information of online processes, access to remote processes via resource<br />
proxies, error handlers, a log message application to collect, store and distribute messages, and the “job control”<br />
to start, stop and monitor processes in a distributed environment.<br />
5 Buffer emulator: Hardware system responsible for emulating the status of the front-end buffers and vetoing trigger<br />
decisions based on this status.
<strong>The</strong> <strong>CMS</strong> Experiment Control System 9<br />
<strong>The</strong> R<strong>CMS</strong> services are implemented in the programming language Java as web applications. <strong>The</strong> controller<br />
Graphical User Interface (GUI) is based on Java Server Pages technology (JSP, [23]). <strong>The</strong> eXtended Markup<br />
Language (XML [24]) data format and the Simple Object Access Protocol (SOAP, [25]) protocol are used for<br />
inter process communication. Finally, the job control is implemented in C++ using the XDAQ framework<br />
(Section 1.4.3).<br />
1.4.2 Detector Control System<br />
<strong>The</strong> Detector Control System (DCS) ([10], pp. 209-222) is responsible for operating the auxiliary detector<br />
infrastructures: high and low voltage controls, cooling facilities, supervision of all gas and fluids sub-systems,<br />
control of all racks and crates, and the calibration systems. <strong>The</strong> DCS also plays a major role in the protection of<br />
the experiment from any adverse event. <strong>The</strong> DCS runs as a slave of the R<strong>CMS</strong> system during the data-taking<br />
process. Many of the functions provided by DCS are needed at all times, and as a result DCS must function also<br />
outside data-taking periods as the master.<br />
<strong>The</strong> DCS is organized in a hierarchy of nodes. <strong>The</strong> topmost point of the hierarchy offers global commands like<br />
“start” and “stop” for the entire detector. <strong>The</strong> commands are propagated towards the lower levels of the<br />
hierarchy, where the different levels interpret the commands received and translate them into the corresponding<br />
commands specific to the system they represent. As an example, a global “start” command is translated into a<br />
“HV ramp-up” command for a sub-detector. Correspondingly, a summary of the lower level states defines the<br />
state of the upper levels. As an example, the state “HV on” of a sub-detector is summarized as “running” in the<br />
global state. <strong>The</strong> propagation of commands ends at the lowest level at the “devices” which are representations of<br />
the actual hardware.<br />
A commercial <strong>Supervisor</strong>y Controls And Data Acquisition (SCADA) system PVSS II [26] was chosen by all<br />
LHC experiments as the supervisory system of the corresponding DCS systems. PVSS II is a development<br />
environment for a SCADA system which offers many of the basic functionalities needed to fulfill the tasks<br />
mentioned above.<br />
1.4.3 Cross-platform DAQ framework<br />
<strong>The</strong> XDAQ framework ([10], Pp.173-190; [27]) is a domain-specific middleware 6 designed for high energy<br />
physics data acquisition systems [28]. <strong>The</strong> framework includes a collection of generic components to be used in<br />
various application scenarios and specific environments with a limited customization effort. One of them is the<br />
event builder [29] that consists of three collaborating components, a Readout Unit (RU), a Builder Unit (BU) and<br />
an EVent Manager (EVM). <strong>The</strong> logical components and interconnects of the event builder are shown<br />
schematically in Figure 1-6.<br />
An event enters the system as a set of fragments distributed over the Front-end Devices (FED’s). It is the task of<br />
the EVB to collect the fragments of an event, assemble them and send the full event to a single processing unit.<br />
To this end, a builder network connects ~500 Readout Units (RU’s) to ~500 Builder Units (BU’s). <strong>The</strong> event<br />
data is read out by sub-detector specific hardware devices and forwarded to the Readout Units. <strong>The</strong> RU’s<br />
temporally store the event fragments until the reception of a control message to forward specific event fragment<br />
to a builder unit. A builder unit collects the event fragments belonging to a single collision event from all RUs<br />
and combines them to a complete event. <strong>The</strong> BU exposes an interface to event data processors, called the filter<br />
units (FU). This interface can be used to make event data persistent or to apply event-filtering algorithms. <strong>The</strong><br />
EVM interfaces to the L1 trigger readout electronics and controls the event building process by mediating<br />
control messages between RU’s and BU’s.<br />
All components of the DAQ: Event managers (8), Readout Units (~500), Builder Units (~4000) and Filter units<br />
(~4000) are supervised by the R<strong>CMS</strong> system.<br />
6 A Middleware is a software framework intended to facilitate the connection of other software components or applications.<br />
It consists of a set of services that allow multiple processes running on one or more machines to interact across a network.
Introduction 10<br />
Readout Units<br />
buffer event<br />
fragments<br />
Event data fragments are<br />
stored in separated physical<br />
memory systems<br />
Event manager<br />
interfaces between<br />
RU, BU and <strong>Trigger</strong><br />
Builder Units assemble<br />
event fragments<br />
Collection of<br />
Filter Units<br />
Full event data are stored in<br />
a single physical memory<br />
system associated to a<br />
processing unit<br />
Events are processed<br />
and stored persistently<br />
by the Filter Units<br />
Figure 1-6: Logical components and interconnects of the event builder.<br />
1.4.4 Sub-system Online Software Infrastructure<br />
In addition to the sub-system DCS sub-tree and the Readout Units tailored to fit the specific front-end<br />
requirements, the sub-system Online SoftWare Infrastructure (OSWI) consists of Linux device drivers, C++<br />
APIs to control the hardware at a functional level, scripts to automate testing and configuration sequences,<br />
standalone graphical setups and web-based interfaces to remotely operate the sub-system hardware.<br />
Graphical setups were developed using a broad spectrum of technologies: Java programming language [30], C++<br />
language and the Qt library [31] or Python scripting language [32]. Web-based applications were developed also<br />
with the Java programming language and the Tomcat server [33] and with C++ language and the XDAQ<br />
middleware.<br />
Most of the sub-detectors implemented their supervisory and control systems with C++ and the XDAQ<br />
middleware. <strong>The</strong>se distributed systems are mainly intended to download and upload parameters in the front-end<br />
electronics. <strong>The</strong> sub-detector control systems expose also a SOAP API in order to integrate with the R<strong>CMS</strong>.<br />
1.4.5 Architecture<br />
Figure 1-7 shows the architecture of the <strong>CMS</strong> Experiment Control System which integrates the online software<br />
systems presented in Sections 1.4.2, 1.4.3, and 1.4.4.<br />
Up to eight instances of the R<strong>CMS</strong> or R<strong>CMS</strong> sessions can exist concurrently. Each of them operates a subset of<br />
the <strong>CMS</strong> sub-detectors. A R<strong>CMS</strong> session consists of a central Function Manager (FM) that coordinates the<br />
operation of the sub-systems FM involved in the session. A R<strong>CMS</strong> session normally involves a number of subdetectors,<br />
DAQ components and the L1 trigger.<br />
<strong>The</strong> sub-detector FM operates the sub-detector supervisory and control systems which in turn configure the subdetector<br />
front-end electronics. <strong>The</strong> DAQ FM configures and controls the DAQ software and hardware
Research program 11<br />
Run Control<br />
Session 1<br />
…<br />
x8<br />
Run Control<br />
Session 8<br />
DCS<br />
Panel<br />
FM<br />
Subdetector 1<br />
FM<br />
DAQ<br />
FM<br />
Triggger<br />
FM<br />
Triggger<br />
FM<br />
Subdetector 8<br />
FM<br />
DAQ<br />
DCS<br />
Srv1<br />
DCS<br />
<strong>Supervisor</strong><br />
DCS<br />
Srv2<br />
SD1<br />
DCS<br />
XDAQ<br />
Front end<br />
crate<br />
XDAQ<br />
RUs, Bus,<br />
FUs EVMs<br />
GT<br />
GMT<br />
…<br />
RCT<br />
GCT<br />
CSCTF<br />
OSWI<br />
SD8<br />
DCS<br />
XDAQ<br />
Front end<br />
crate<br />
XDAQ<br />
RUs, Bus,<br />
FUs EVMs<br />
<strong>Trigger</strong> crates<br />
components in order to set up a distributed system able to read out the event fragments from the sub-detectors,<br />
and to build, filter and record the most promising events.<br />
Finally, the L1 trigger FM drives the configuration of the L1 decision loop. <strong>The</strong> L1 trigger generates L1As that<br />
are distributed to the 32 sub-detector partitions according to the configuration of the TTC system. Up to eight<br />
exclusive subsets of the sub-detector partitions or DAQ partitions can be handled independently by the TTC<br />
system. Each R<strong>CMS</strong> session controls the configuration of one DAQ partition. <strong>The</strong>refore, the L1 decision loop is<br />
a shared infrastructure among the different sessions. A software facility to control it must be able to serve<br />
concurrently up to 8 R<strong>CMS</strong> sessions avoiding inconsistent configuration operations among sessions. <strong>The</strong> design<br />
of the L1 decision loop hardware management system is the main object of this PhD thesis.<br />
1.5 Research program<br />
1.5.1 Motivation<br />
Figure 1-7: Architecture of the <strong>CMS</strong> Experiment Control System.<br />
<strong>The</strong> design and development of a software system to operate DAQ hardware devices includes the definition of<br />
sequences containing read, write, test and exception handling operations for initialization and parameterization<br />
purposes. <strong>The</strong>se sequences, for instance, are responsible for downloading firmware code and for setting tunable<br />
parameters like threshold values or parameters to compensate for the accrued radiation damage. Mechanisms to<br />
execute tests on hardware devices and for detecting and diagnosing faults are also needed.<br />
However, choosing a programming language, reading the hardware application notes and defining configuration,<br />
testing and monitoring sequences is not enough to deal with the complexity of the last generation of HEP<br />
experiments. <strong>The</strong> unprecedented number of hardware items, the long periods of preparation and operation, and<br />
last but not least the human context, are three complexity dimensions that need to be added to the conceptual<br />
design process.<br />
Number<br />
Fabjan and Fischer [34] have observed that the availability of the ever increasing sophistication, reliability and<br />
convenience in data handling instrumentation has led inexorably to detector systems of increased complexity.<br />
<strong>CMS</strong> and ATLAS are the greatest exponents of this rising complexity. <strong>The</strong> progression in channel numbers,<br />
event rates, bunch crossing rates, event sizes, and data rates in three well known big experiments which belong to
Introduction 12<br />
the decades 1980s (UA1), 1990s (H1) and 2000s (<strong>CMS</strong>) is shown in Table 1-2. <strong>The</strong> huge number of channels,<br />
the highly configurable DHI based on FPGA’s and the distributed nature of this hardware system were<br />
unprecedented requirements to cope with during the conceptual design.<br />
Experiment UA1 H1 <strong>CMS</strong><br />
Tracking [channels] 10 4 10 4 10 8<br />
Calorimeter [channels] 10 5.10 4 6. 10 5<br />
Muons [channels] 10 4 2. 10 5 10 6<br />
Bunch crossing rate [ns] 3400 96 25<br />
Raw data rate [bit·s -1 ] 10 9 3. 10 11 4.10 15<br />
Tape write rate [Hz] 10 10 100<br />
Mean event size [byte] 100k 125k 1M<br />
Table 1-2: Data acquisition parameters for UA1 (1982), H1 (1992) and <strong>CMS</strong> [35].<br />
Time<br />
<strong>The</strong> preparation and operation of HEP experiments typically spans over a period of many years (e.g. 1992, <strong>CMS</strong><br />
Letter of intent [36]). During this time the hardware and software environments evolve. Throughout all phases,<br />
integrators have to deal with system modifications [28]. In such a heterogeneous and evolving environment, a<br />
considerable development effort is required to design and implement new interfaces, synchronize and integrate<br />
them with all other sub-systems, and support the configuration and control of all parts.<br />
<strong>The</strong> long operational phases influence also the possible discussion about the convenience of using commercial<br />
components rather than in-house solutions. <strong>The</strong>re is simply not enough manpower to build all components inhouse.<br />
However, the use of commercial components has a number of risks: First, a selected component may turn<br />
out to have insufficient performance or scalability, or simply have too many bugs to be usable. Significant<br />
manpower is therefore spent on selecting components, and on validating selected components. Another<br />
significant risk with commercial components is that the running time of the <strong>CMS</strong> experiment, at least 15 years<br />
starting from 2008, is much larger than the lifetime of most commercial software products [37].<br />
Human<br />
Despite the necessary and highly hierarchic structure in a collaboration of more than 2000 people, different subsystems<br />
might implement solutions based on heterogeneous platforms and interfaces. <strong>The</strong>refore, the design of a<br />
hardware management system should maximize the possible technologies that can be integrated. A second aspect<br />
of the human context that should guide the system design is that only some of the software project members are<br />
computing professionals: most are trained as physicists, and they often work only part-time on software.<br />
1.5.2 Goals<br />
This research work, carried out in the context of the <strong>Trigger</strong> and Data Acquisition (TriDAS) project of the <strong>CMS</strong><br />
experiment at the Large Hadron Collider, proposes web-based technological solutions to simplify the<br />
implementation and operation of software control systems to manage hardware devices for high energy physics<br />
experiments. <strong>The</strong> main subject of this work is the design and development of the <strong>Trigger</strong> <strong>Supervisor</strong>, a hardware<br />
management system that enables the integration and operation of the Level-1 trigger decision loop of the <strong>CMS</strong><br />
experiment. An initial investigation about the usage of the eXtended Markup Language (XML) as uniform data<br />
representation format for a software environment to implement hardware management systems for HEP<br />
experiments was also performed.
Chapter 2<br />
Uniform Management of Data Acquisition<br />
Devices with XML<br />
2.1 Introduction<br />
In this chapter, a novel software environment model, based on web technologies, is presented. This research was<br />
carried out in the context of the <strong>CMS</strong> TriDAS project in order to better understand the difficulties of building a<br />
hardware management system for the L1 decision loop. This research was motivated by the unprecedented<br />
complexity in the construction of hardware management systems for HEP experiments.<br />
<strong>The</strong> proposed model is based on the idea that a uniform approach to manage the diverse interfaces and operations<br />
of the data acquisition devices would simplify the development of a configuration and control system and should<br />
save development time. A uniform scheme would be advantageous for large installations, like those found in<br />
HEP experiments [2][3][4][5][38] due to the diversity of front-end electronic modules, in terms of configuration,<br />
functionality and multiplicity (e.g. Section 1.3).<br />
2.2 Key requirements<br />
This chapter proposes to work toward an environment to define hardware devices and their behavior at a logical<br />
level. <strong>The</strong> approach should facilitate the integration of various different hardware sub-systems. <strong>The</strong> design<br />
should at least fulfill the following key requirements.<br />
• Standardization: <strong>The</strong> running time of the <strong>CMS</strong> experiment is expected to be at least 15 years which is a<br />
much larger period than the lifetime of most commercial software products. To cope with this, the<br />
environment should maximize the usage of standard technologies. For instance, we believe that standard<br />
C++ with its standard libraries and XML-based technologies will still be used 10 years from now.<br />
• Extensibility: A mechanism to define new commands and data for a given interface must exist, without the<br />
need to change either control or controlled systems that are not concerned by the modification.<br />
• Platform independence: <strong>The</strong> specification of commands and configuration parameters must not impose a<br />
specific format of a particular operating system or hardware platform.<br />
• Communication technology independence: Hardware devices are hosted by different sub-systems that<br />
expose different capabilities and types of communication abilities. Choosing the technology that is most<br />
suitable for a certain platform must not require an overall system modification.<br />
• Performance: <strong>The</strong> additional benefits of any new infrastructure should not imply a loss of execution<br />
performance compared to similar solutions which are established in the HEP community.
Uniform Management of Data Acquisition Devices with XML 14<br />
2.3 A uniform approach for hardware configuration control and<br />
testing<br />
Taking into account the above requirements, we present a model for the configuration, control and testing<br />
interface of data acquisition hardware devices [39]. <strong>The</strong> model, shown in Figure 2-1, builds upon two principles:<br />
1) <strong>The</strong> use of the eXtensible Markup Language (XML [24]) as a uniform syntax for describing hardware devices,<br />
configuration data, test results and control sequences.<br />
2) An interpreted, run-time extensible, high-level control language for these sequences that provides<br />
independence from specific hosts and interconnect systems to which devices are attached.<br />
This model, as compared to other approaches [40], enforces the uniform use of XML syntax to describe<br />
configuration data, device specifications, and control sequences for configuration and control of hardware<br />
devices. This means that control sequences can be treated as data, making it easy to write scripts that manipulate<br />
other scripts and embed them into other XML documents. In addition, the unified model makes it possible to use<br />
the same concepts, tools, and persistency mechanisms, which simplifies the software configuration management<br />
of large projects 7 .<br />
2.3.1 XML as a uniform syntax<br />
Figure 2-1: Abstract description of the model.<br />
When designing systems composed of heterogeneous platforms and/or evolving systems, platform independence<br />
is provided by a uniform syntax, using a single data representation to describe hardware devices, configuration<br />
data, test results, and control sequences. A solution based on the XML syntax presents the following advantages.<br />
• XML is a W3C (World Wide Web Consortium) non-proprietary, platform independent standard that plays<br />
an increasingly important role in the exchange of data. A large set of compliant technologies, like XML<br />
schema [42], DOM [43] and XPath [44] are defined. In addition, tools that support programming become<br />
available through projects like Apache [45].<br />
• XML structures can be formally specified and extended, following a modularized approach, using an XML<br />
schema definition.<br />
7 Software Configuration Management is the set of activities designed to control change by identifying the work products<br />
that are likely to change, establishing relationships among them, defining mechanisms for managing different versions of<br />
these work products, controlling the changes imposed, and auditing and reporting on the changes made [41].
A uniform approach for hardware configuration control and testing 15<br />
• XML documents can be directly transmitted using any kind of protocols including HTTP [46]. In this case,<br />
SOAP [25], a XML based protocol, can be used.<br />
• XML documents can be automatically converted into documentation artifacts by means of an XSLT<br />
transformation [47]. <strong>The</strong>refore, system documentation can be automatically and consistently maintained.<br />
• XML is widely used for nonevent information in HEP experiments: “XML is cropping up all over in online<br />
configuration and monitoring applications” [48].<br />
On the other hand, XML has one big drawback: it uses by default textual data representation, which causes much<br />
more network traffic to transfer data. Even BASE64 or Uuencoded byte arrays are approximately 1.5 times<br />
larger than a binary format. Furthermore, additional processing time is required for translating between XML<br />
and native data representations. <strong>The</strong>refore, the current approach is not well suited for devices generating<br />
abundant amount of real-time data, but is still valid for configuration, monitoring, and slow control purposes.<br />
Figure 2-2: Example program in XSEQ exemplifying the basic features of the language.<br />
2.3.2 XML based control language<br />
A control language (XSEQ: cross-platform sequencer) that processes XML documents to operate hardware<br />
devices has been syntactically and semantically specified. <strong>The</strong> language is XML based and has the following<br />
characteristics:<br />
• Extensibility: <strong>The</strong> syntax has been formally specified using XML schema. A schema document contains the<br />
core syntax of the language, describing the basic structures and constraints on XSEQ programs (e.g. variable<br />
declarations and control flow). <strong>The</strong> basic language can be extended in order to cope with user specific<br />
requirements. Those extensions are also XML schema documents, whose elements are instances of abstract<br />
elements of the core XML schema. This mechanism is one of the most important features of the language<br />
because it facilitates a modular integration of different user requirements and eases resource sharing (code<br />
and data). <strong>The</strong> usage and advantages of this feature will be discussed in Section 2.4.1.
Uniform Management of Data Acquisition Devices with XML 16<br />
• Imperative and object oriented programming styles: <strong>The</strong> language provides standard imperative constructs<br />
just like most other programming languages in order to carry out conditions, sequencing and iteration. It is<br />
also possible to use the main object oriented programming concepts like encapsulation, inheritance,<br />
abstraction and polymorphism.<br />
• Exception handling with error recovery mechanisms.<br />
• Local execution of remote sequences with parameter passing by reference.<br />
• Non-typed scoped variables.<br />
Additional functionalities have been added to the core syntax in the form of modular XML schema extensions, in<br />
order to fit frequently encountered use cases in data acquisition environments:<br />
• Transparent access to PCI and VME devices: This extension facilitates the configuration and control of<br />
hardware devices, following a common interface for both bus systems. This interface is designed to facilitate<br />
its extension in order to cope with future technologies.<br />
• File system access.<br />
• SOAP messaging: This allows inclusion of control sequences and configuration data into XML messages.<br />
<strong>The</strong> messages can be directly transported between remote hosts in a distributed programming environment.<br />
• DOM and XPath interface to facilitate integration in an environment where software and hardware device<br />
configuration are fully XML driven.<br />
• System command execution interface with redirected standard error and standard output to internal string<br />
objects.<br />
In Figure 2-2 an XSEQ program is shown where basic features of the language are exemplified. In Figure 2-3 an<br />
example is given of how the hardware access is performed following the proposed model. Device specifications,<br />
configuration data and control sequences are XML documents. In this example, configuration data are retrieved<br />
through an XPath query from a configuration database.<br />
<br />
<br />
<br />
…<br />
<br />
Configuration database<br />
…<br />
0x01<br />
0x01<br />
…<br />
Figure 2-3: Example of a program in XSEQ, which shows how the model is applied. Device specifications<br />
(register_table.xml), configuration data (retrieved from a configuration data base accessible through a XPath<br />
query) and control sequences are all based on uniform use of XML.
Interpreter design 17<br />
2.4 Interpreter design<br />
To enable code sharing among different platforms, we have chosen a purely interpreted approach that allows<br />
control sequences to run independently of the underlying platform in a single compile/execution cycle. In<br />
addition, the interpreted approach is characterized by small program sizes and an execution environment that<br />
provides controlled and predictable resource consumption, making it easily embeddable in other software<br />
systems.<br />
An interpreter [49] for XSEQ programs has been implemented in C++ under Linux. <strong>The</strong> pattern of the interpreter<br />
is based on the following concepts:<br />
• <strong>The</strong> source format is a DOM document already validated against the XSEQ XML schema document and the<br />
required extensions. This simplifies interpreter implementation and separates the processing into two<br />
independent phases: 1) syntactic validation and 2) execution.<br />
• Every XML command has a C++ class representation that inherits from a single class named XseqObject.<br />
• A global context accessible to all instruction objects. It contains: 1) the execution stack, which stores nonstatic<br />
variables; 2) the static stack, which stores static variables and is useful to retain information from<br />
previous executions; 3) the code cache, which maintains already validated DOM trees in order to accelerate<br />
the interpretation process; 4) the dynamic factory, which facilitates the interpreter run-time extension; and 5)<br />
debug information to properly trace the execution and to find errors.<br />
2.4.1 Polymorphic structure<br />
Every class inherits from a single abstract class XseqObject, and it has information about how to perform its<br />
task. For example, the XSEQ command is represented with the XseqIf class. This class inherits from the<br />
XseqObject class, and the execution algorithm is implemented in the overridden eval() method.<br />
Extends interpreter in<br />
order to execute a new<br />
syntactic extension<br />
<br />
<br />
<br />
<br />
…<br />
<br />
Figure 2-4: Example of program in XSEQ, which exemplifies the use of the tag. It extends<br />
dynamically the interpreter (semantics) in order to execute new commands (syntax) defined in a xsd<br />
document.
Uniform Management of Data Acquisition Devices with XML 18<br />
C++ classes that implement the functionality of every language syntactic extension are grouped and compiled as<br />
shared libraries. Such libraries can be dynamically linked to the running interpreter. <strong>The</strong>y are associated with a<br />
concrete syntactic language extension by means of the special XSEQ command . This facility allows<br />
separate syntax language extensions, defined in XML schema modules, from the run-time interpreter extensions.<br />
<strong>The</strong> best practice of this facility enables two different sub-systems with similar requirements but different<br />
platforms, to share code by just assigning different interpreter extensions to the same language extension. Figure<br />
2-4 exemplifies the use of the tag.<br />
2.5 Use in a distributed environment<br />
<strong>The</strong> interpreter is also available as a XDAQ pluggable module (Section 1.4.3). XDAQ includes an executive<br />
component that provides applications with the necessary functions for communication, configuration, control and<br />
monitoring. All configuration, control and monitoring commands can be performed through the SOAP/HTTP<br />
protocol.<br />
In Figure 2-5 the use of the interpreter in a XDAQ framework is shown. This is the basic building block that<br />
facilitates the deployment of the model in a distributed environment.<br />
Figure 2-5: Use of the interpreter in a XDAQ framework.<br />
To operate this application, the user must provide in XML format the configuration of the physical and logical<br />
properties of the system and its components. <strong>The</strong> configuration process defines the available web services as<br />
XSEQ scripts.<br />
Once the running application is properly configured, the client can send commands through SOAP messages. As<br />
a function of the received command, the corresponding XSEQ script is executed. <strong>The</strong> SOAP message itself can<br />
be processed using the language extension to manipulate SOAP messages. Such functionality is useful when<br />
parameters must be remotely passed. Finally, every XSEQ program ends by returning a SOAP message that will<br />
be forwarded by the executive to the client.<br />
2.6 Hardware management system prototype<br />
<strong>The</strong> architecture of a hypothetical hardware management system for the <strong>CMS</strong> experiment is shown in Figure 2-6.<br />
A number of application scenarios were integrated [50]. Hardware modules belonging to the Global <strong>Trigger</strong> [18],<br />
the Silicon-Tracker sub-detector [6] and the Data Acquisition system [10] participated in this demonstrator.<br />
<strong>The</strong> basic building block presented in Section 2.5 was implemented for every different platform that played the<br />
role of hardware module host. <strong>The</strong> same infrastructure was used to develop a central node which was in charge
Hardware management system prototype 19<br />
to buffer all calls from clients, coordinate the operation of all sub-system control nodes and to forward the<br />
responses from the different sub-system control nodes again to the client.<br />
Hardware modules were quite heterogeneous in terms of configuration, functionality and multiplicity. In<br />
addition, the control software sub-system for every sub-detector was independent from the others. <strong>The</strong>refore, a<br />
diverse set of control software sub-systems existed. This offered a heterogeneous set of interfaces that had to be<br />
understood by a common configuration and control system.<br />
Control sequences executed by the sub-system control nodes depended on a set of language extensions. <strong>The</strong><br />
language was augmented, following a modular approach, by means of the XML schema technology (Section<br />
2.4.1). For a given language extension the interpreter was associated with a platform specific support. Some<br />
language extensions were shared by several sub-systems. For instance, platform 2 and platform 3 were operating<br />
the GT crate through different PCI to VME interfaces. <strong>The</strong> tag was used for binding a common GT<br />
language extension to a specific interpreter extension that knew how to use the concrete PCI to VME interface.<br />
<strong>The</strong> tag was also used to share code between platform 3 and platform 4 in order to test PCI and VME<br />
memory boards. <strong>The</strong> default language extension to execute system commands was used to operate the Fast<br />
Merging Module board (FMM, [51]) and to forward the standard output and the standard error to XSEQ string<br />
objects. Finally, a driver to read and write registers from and to a flash memory embedded into a PCI board was<br />
implemented following the chip application notes.<br />
<strong>The</strong> homogeneous use of XML syntax to describe data, control sequences, and language extensions allowed a<br />
distributed storage of any of these documents that could be simply accessed through their URLs. Interpreter runtime<br />
extensions could also be remotely linked and, therefore, a local binary copy was not necessary. Another<br />
advantage of this approach was that both hardware and software configuration schemes were unified since the<br />
online software of the data acquisition system was also fully XML driven.<br />
<strong>The</strong> default SOAP extension of the control language made it possible to manipulate, send, and receive SOAP<br />
messages.<br />
Figure 2-6: Hardware management system based on the XSEQ software environment.
Uniform Management of Data Acquisition Devices with XML 20<br />
2.7 Performance comparison<br />
Timing measurements have been performed on a desktop PC (Intel D845WN chipset) with a Pentium IV<br />
processor (1.8 GHz), 256 MB SDRAM memory (133 MHz), and running Linux Red Hat 7.2, with kernel version<br />
2.4.9–31.1.<br />
<strong>The</strong> main objective of this section is to present a comparison of the existing interpreter implementation with a<br />
Tcl interpreter [52], focusing on the overhead induced by the interpreter approach when accessing hardware<br />
devices. Tcl has been chosen as a reference because it is a well-established scripting language in the HEP<br />
community, and it shares many features with XSEQ: it is simple, easily extensible and embeddable.<br />
For both interpreters the same hardware access library (HAL [53]) has been used to implement the necessary<br />
extensions. This library has been also used to implement a C++ binary version of the test program for reference<br />
purposes.<br />
<strong>The</strong> test is a loop that reads consecutive memory positions of a memory module. In order to properly identify the<br />
interpreter overhead and to decouple it from the driver overhead, the real hardware access has been disabled and<br />
a dummy driver emulates all accesses. <strong>The</strong> results are shown in Table 2-1.<br />
XSEQ Tcl C++<br />
16.9 μs 16 μs 2.63 μs<br />
Table 2-1 Comparison of average execution times (memory read) for Tcl, XSEQ and C++.<br />
<strong>The</strong> results indicate an overhead which results from the interpreted approach that lies in the same order of<br />
magnitude as the Tcl interpreter. Execution times of XSEQ can be further reduced with customized language<br />
extensions that encapsulate a specific macro behavior. For instance, a loop command with a fixed number of<br />
iterations has been implemented. This command reduces the timing of the test program to 5.3. However,<br />
flexibility is reduced, because the macro command cannot be modified at run time.<br />
2.8 Prototype status<br />
In this chapter a uniform model based on XML technologies for the configuration, control and testing of data<br />
acquisition hardware was presented. It matches well the extensibility and flexibility requirements of a long<br />
lifetime experiment that is characterized by an ever-changing environment.<br />
<strong>The</strong> following chapters present the design and development details of the Level-1 trigger hardware management<br />
system or <strong>Trigger</strong> <strong>Supervisor</strong>. <strong>The</strong>oretically, this would be an ideal opportunity to apply XSEQ. However, the<br />
prototype status of the software, the limited resources and reduced development time were concluding reasons to<br />
remove this technological option from the initial survey.<br />
<strong>The</strong>refore, the XSEQ project did not succeed to reach its final goal which is the same of any other software<br />
project: to be used. On the other hand, this effort carried out in the context of the <strong>CMS</strong> <strong>Trigger</strong> and Data<br />
Acquisition project improved the overall team knowledge on XML technologies, created a pool of ideas and<br />
helped to anticipate the difficulties of building a hardware management system for the Level-1 trigger.
Chapter 3<br />
<strong>Trigger</strong> <strong>Supervisor</strong> Concept<br />
3.1 Introduction<br />
<strong>The</strong> <strong>Trigger</strong> <strong>Supervisor</strong> (TS) is an online software system. Its purpose is to set up, test, operate and monitor the<br />
L1 decision loop (Section 1.3.2) components on one hand, and to manage their interplay and the information<br />
exchange with the Run Control and Monitoring System (R<strong>CMS</strong>, Section 1.4.5) on the other. It is conceived to<br />
provide a simple and homogeneous client interface to the online software infrastructure of the trigger subsystems.<br />
Facing a large number of trigger sub-systems and potentially a highly heterogeneous environment<br />
resulting from different sub-system Application Program Interfaces (API), it is crucial to simplify the task of<br />
implementing and maintaining a client that allows operating several trigger sub-systems either simultaneously or<br />
in standalone mode.<br />
An intermediate node, lying between the client and the trigger sub-systems, which offers a simplified API to<br />
perform control, monitoring and testing operations, will ease the design of this client. This layer provides a<br />
uniform interface to perform hardware configurations, monitor the hardware behavior or to perform tests in<br />
which several trigger sub-systems participate. In addition, this layer coordinates the access of different users to<br />
the common L1 trigger resources.<br />
<strong>The</strong> operation of the L1 decision loop will necessarily be within the broader context of the experiment operation.<br />
In this context, the R<strong>CMS</strong> will be in charge of offering a control window from which an operator can run the<br />
experiment, and in particular the L1 trigger system. On the other hand, it is also necessary to be able to operate<br />
the L1 trigger system independently of the other experiment sub-systems. This independence of the TS will be<br />
mainly required during the commissioning and maintenance phases. Once the TS is accessed through R<strong>CMS</strong>, a<br />
scientist working on a data taking run will be presented with a graphical user interface offering choices to<br />
configure, test, run and monitor the L1 trigger system. Configuring includes setting up the programmable logic<br />
and physics parameters such as energy or momentum thresholds in the L1 trigger hardware. Predefined and<br />
validated configuration files are stored in a database and are proposed as defaults. Tests of the L1 trigger system<br />
after configuration are optional. Once the TS has determined that the system is configured and operational, a run<br />
may be started through R<strong>CMS</strong> and the option to monitor can be selected. For commissioning periods more<br />
options are available in the TS, namely the setting up of different TCS partitions and separate operations of subsystems.<br />
<strong>The</strong> complexity of the TS is a representative example of the discussion presented in Section 1.5.1: 64 crates,<br />
O(10 3 ) boards with an average of 15 MB of downloadable firmware and O(10 2 ) configurable registers per board,<br />
8 independent DAQ partitions, and O(10 3 ) links that must be periodically tested in order to assure the correct<br />
connection and synchronization are figures of merit of the numeric complexity dimension; the human dimension<br />
of the project complexity is represented by a European, Asian and American collaboration of 27 research<br />
institutes in experimental physics. <strong>The</strong> long development and operational periods of this project are also<br />
challenging due to the fast pace of the technology evolution. For instance, although the TS project just started in<br />
August 2004, we have already observed how one of the trigger sub-systems has been fully replaced (Global
<strong>Trigger</strong> <strong>Supervisor</strong> Concept 22<br />
Calorimeter <strong>Trigger</strong>, [13]) and recently a number of proposals to upgrade the trigger sub-systems for the Super<br />
LHC (SLHC, [54]) have been accepted [55].<br />
This chapter presents the conceptual design of the <strong>CMS</strong> <strong>Trigger</strong> <strong>Supervisor</strong> (TS, [56]). This design was approved<br />
by the <strong>CMS</strong> collaboration in March 2005 as the baseline design for the L1 decision loop hardware management<br />
system. <strong>The</strong> conceptual design is not the final design but the seed of a successful project that lasted four years<br />
from conception to completion and involved people from all <strong>CMS</strong> sub-systems. Because the conceptual design<br />
takes into account the challenging context of the last generation of HEP experiments, in addition to the<br />
functional and non-functional requirements, the description model and concrete solution can be an example for<br />
future experiments about how to deal with the initial steps of designing a hardware management system.<br />
3.2 Requirements<br />
3.2.1 Functional requirements<br />
<strong>The</strong> TS is conceived to be a central access point that offers a high level API to facilitate setting a concrete<br />
configuration of the L1 decision loop, to launch tests that involve several sub-systems or to monitor a number of<br />
parameters in order to check the correct functionality of the L1 trigger system. In addition, the TS should provide<br />
access to the online software infrastructure of each trigger sub-system.<br />
1) Configuration: <strong>The</strong> most important functionality offered by the TS is the configuration of the L1 trigger<br />
system. It has to facilitate setting up the content of the configurable items: FPGA firmware, LUT’s,<br />
memories and registers. This functionality should hide from the controller the complexity of operating the<br />
different trigger sub-systems in order to set up a given configuration.<br />
2) High Level <strong>Trigger</strong> (HLT) Synchronization: In order to properly configure the HLT, it is necessary to<br />
provide a mechanism to propagate the L1 trigger configuration to the HLT in order to assure a consistent<br />
overall trigger configuration.<br />
3) Test: <strong>The</strong> TS should offer an interface to test the L1 trigger system. Two different test services should be<br />
provided: the self test, intended to check each trigger sub-system individually, and the interconnection test<br />
service, intended to check the connection among sub-systems. Interconnection and self test services involve<br />
not only the trigger sub-systems but also the sub-detectors themselves (Section 3.3.3.3).<br />
4) Monitoring: <strong>The</strong> TS interface must enable the monitoring of the necessary information that assures the<br />
correct functionality of the trigger sub-systems (e.g., measurements of L1 trigger rates and efficiencies,<br />
simulations of the L1 trigger hardware running in the HLT farm), sub-system specific monitoring data (e.g.,<br />
data read through spy memories), and information for synchronization purposes.<br />
5) User management: During the experiment commissioning the different sub-detectors are tested<br />
independently, and many of them might be tested in parallel. In other words, several run control sessions,<br />
running concurrently, need to access the L1 trigger system (Section 1.4.5). <strong>The</strong>refore, it is necessary that the<br />
TS coordinates the access to the common resources (e.g., the L1 trigger sub-systems). In addition, it is<br />
necessary to control the access to the L1 trigger system hierarchically in order to determine which<br />
users/entities (controllers) can have access to it and what privileges they have. A complete access control<br />
protocol has to be defined that should include identification, authentication, and authorization processes.<br />
Identification includes the processes and procedures employed to establish a unique user/entity identity<br />
within a system. Authentication is the process of verifying the identification of a user/entity. This is<br />
necessary to protect against unauthorized access to a system or to the information it contains. Typically,<br />
authentication takes place using a password. Authorization is the process of deciding if a requesting<br />
user/entity is allowed to have access to a system service. A hierarchical list of users with the corresponding<br />
level of access rights as well as the necessary information to authenticate them should be maintained in the<br />
configuration database. <strong>The</strong> lowest-level user should be only allowed to monitor. A medium-level user, such<br />
as a scientist responsible for the data taking during a running period of the experiment, may manage<br />
partition setups, select predefined L1 trigger menus and change thresholds, which are written directly into<br />
registers on the electronics boards. In addition to all the previously cited privileges the highest-level user or<br />
super user should be allowed to reprogram logic and change internal settings of the boards. In addition to
Requirements 23<br />
coordinate the access of different users to common resources, the TS must also ensure that operations<br />
launched by different users are compatible.<br />
6) Hierarchical start-up mechanism: In order to maximize sub-system independence and client decoupling<br />
(Section 3.2.2, Point 3) ), a hierarchical start-up mechanism must be available (Section 3.3.3.5 describes the<br />
operational details). As will be described later, the TS should be organized in a tree-like structure, with a<br />
central node and several leaves. <strong>The</strong> first run control session or controller should be responsible for starting<br />
up the TS central node, and in turn this should offer an API that provides start-up of the TS leaves and the<br />
online software infrastructure of the corresponding trigger sub-system.<br />
7) Logging support: <strong>The</strong> TS must provide logging mechanisms in order to support the users carrying out<br />
troubleshooting activities in the event of problems. Logbook entries must be time-stamped and should<br />
include all necessary information such as the details of the action and the identity of the user responsible.<br />
<strong>The</strong> log registry should be available online and should be also recorded for offline use.<br />
8) Error handling: An error management scheme, compatible with the global error management architecture,<br />
is necessary. It must provide a standard error format, and remote error handling and notification<br />
mechanisms.<br />
9) User support: A graphical user interface (GUI) should be provided. This should allow a standalone<br />
operation of the TS. It would also help the user to interact with the TS and to visualize the state of a given<br />
operation or the monitoring information. From the main GUI it should be possible to open specific GUIs for<br />
each trigger sub-system. Those should be based on a common skeleton that should be fulfilled by the trigger<br />
sub-system developers following a given methodology described in a document that will be provided. An<br />
adequate online help facility should be available to help the user operate the TS, since many of the users of<br />
the TS would not be experienced and may not have received detailed training.<br />
10) Multi user: During the commissioning and maintenance phases, several run control sessions run<br />
concurrently. Each of them is responsible for operating a different TCS partition. In addition, the TS should<br />
allow standalone operations (not involving the R<strong>CMS</strong>), for instance, to execute tests or monitor the L1<br />
trigger system. <strong>The</strong>refore, it is necessary to allow that several clients can be served in parallel by the TS.<br />
11) Remote operation: <strong>The</strong> possibility to program and operate the L1 trigger components remotely is essential<br />
due to the distributed nature of the <strong>CMS</strong> Experiment Control System (Section 1.4.5). It is important also to<br />
consider that, unlike in the past, most scientists can in general not be present in person at the experiment<br />
location during data taking and also during commissioning, but have to operate and supervise their systems<br />
remotely.<br />
12) Interface requirements: In order to facilitate the integration, the implementation and the description of the<br />
controller-TS interface a web service based approach [57] should be followed. <strong>The</strong> chosen communication<br />
protocol to send commands and state notifications should be the same as for most <strong>CMS</strong> sub-systems, and<br />
especially the same as already chosen for run control, data acquisition and slow control. <strong>The</strong>refore Simple<br />
Object Access Protocol (SOAP) [25] and the representation format Extensible Markup Language (XML)<br />
[24] for exchanged data should be selected. <strong>The</strong> format of the transmitted data and the SOAP messages is<br />
specified using the XML schema language [42], and the Web Services Description Language (WSDL) [58]<br />
is used to specify the location of the services and the methods the service exposes. To overcome the<br />
drawback that XML uses a textual data representation, which causes much network traffic to transfer data, a<br />
binary serialization package provided within the <strong>CMS</strong> online software project and I2O messaging [59] could<br />
be used for devices generating large amounts of real-time data.<br />
Due to the long time required to finish the execution of configuration and test commands, an asynchronous<br />
protocol is necessary to interface the TS. This means that the receiver of the command replies immediately<br />
acknowledging the reception, and that this receiver sends another message to the sender once the command<br />
is executed. An asynchronous protocol improves the usability of the system because the controller is not<br />
blocked until the completion of the requested command.<br />
3.2.2 Non-functional requirements<br />
1) Low-level infrastructure independence: <strong>The</strong> design of the TS should be independent of the online<br />
software infrastructure (OSWI) of any sub-system as far as possible. In other words, the OSWI of a concrete
<strong>Trigger</strong> <strong>Supervisor</strong> Concept 24<br />
sub-system should not drive any important decision in the design of the TS. This requirement is intended to<br />
minimize the TS redesign due to the evolution of the OSWI of any sub-system.<br />
2) Sub-system control: <strong>The</strong> TS should offer the possibility of operating a concrete trigger sub-system.<br />
<strong>The</strong>refore, the design should be able to provide at the same time a mechanism to coordinate the operation of<br />
a number of trigger sub-systems, and a mechanism to control a single trigger sub-system.<br />
3) Controller decoupling: <strong>The</strong> TS must operate in different environments: inside the context of the common<br />
experiment operation, but also independently of the other <strong>CMS</strong> sub-systems, such as, during the phases of<br />
commissioning and maintenance of the experiment, or during the trigger sub-system integration tests. Due to<br />
the diversity of operational contexts, it is useful to facilitate the access to the TS through different<br />
technologies: R<strong>CMS</strong>, Java applications, web browser or even batch scripts. In order to allow such a<br />
heterogeneity of controllers, the TS design must be totally decoupled from the controller, and the following<br />
requirements should be taken into account:<br />
a. <strong>The</strong> logic of the TS should not be split between a concrete controller and the TS itself;<br />
b. <strong>The</strong> technology choice to develop the TS should not depend on the software frameworks used<br />
to develop a concrete controller.<br />
In addition, the logic and technological decoupling from the controller increases the evolution potential and<br />
decreases the maintenance effort of the TS. It also increases development and debug options, and reduces<br />
the complexity of operating the L1 trigger system in a standalone way.<br />
4) Robustness: Due to 1) the key role of the TS in the overall <strong>CMS</strong> online software architecture, and 2) the<br />
fact that a malfunctioning can result in significant losses of physics data but also economic ones, the TS<br />
should be considered a critical system [60] and therefore design decisions had to be taken accordingly.<br />
5) Reduced development time: <strong>The</strong> schedule constraints are also a non-functional requirement. <strong>The</strong> project<br />
development phase only started in May 2005, a first demonstrator of the TS system was expected to be<br />
ready four months later, and an almost final system had to be drafted for the second phase of the Magnet<br />
Test and Cosmic Challenge that took place in November 2006 with the aim that the TS would be able to<br />
follow the monthly increasing deployment of <strong>CMS</strong> experiment components during the Global Run exercises<br />
started in May 2007.<br />
6) Flexibility: <strong>The</strong> TS has to be designed as an open system capable of adopting non-foreseen functionalities<br />
or services required to operate the L1 decision loop or just specific sub-systems. <strong>The</strong>se new capabilities<br />
must be added in a non-disruptive way, without requiring major developments.<br />
7) Human context awareness: <strong>The</strong> TS design and development has to take into account the particular human<br />
context of the L1 trigger project. <strong>The</strong> available human resources in all sub-systems were limited and their<br />
effort was split among hardware debugging, physics related tasks and software development including<br />
online, offline and hardware emulation. In this context, most collaboration members were confronted with a<br />
heterogeneous spectrum of tasks. In addition, the most common professional profiles were hardware experts<br />
and experimental physicists with no software engineering academic background. <strong>The</strong> resources assigned to<br />
the TS project were also very limited; initially and for more than one year, one single person had to cope<br />
with the design, development, documentation and communication tasks. An additional Full Time Equivalent<br />
(FTE) incorporated to the project after this period and a number of students have collaborated for few<br />
months developing small tasks.
Design 25<br />
Controller side<br />
TS responsibility<br />
(customized by<br />
every sub-system)<br />
<strong>Trigger</strong> sub-systems<br />
responsibility<br />
<strong>Trigger</strong><br />
subsystem<br />
GUI<br />
1<br />
1<br />
0..n<br />
Control cell<br />
(TS leaf)<br />
1<br />
0..n<br />
SOAP<br />
SOAP (HTTP, I2O, custom)<br />
wsdl<br />
RC<br />
Session<br />
0..n<br />
1<br />
Control cell<br />
(TS central node)<br />
1<br />
Control cell<br />
(TS leaf)<br />
wsdl<br />
1 1<br />
1<br />
wsdl<br />
…<br />
Control cell<br />
(TS leaf)<br />
1 1<br />
1<br />
OSWI OSWI OSWI<br />
wsdl<br />
Figure 3-1: Architecture of the <strong>Trigger</strong> <strong>Supervisor</strong>.<br />
3.3 Design<br />
<strong>The</strong> TS architecture is composed of a central node in charge of coordinating the access to the different subsystems,<br />
namely the trigger sub-systems and sub-detectors concerned by the interconnection test service (Section<br />
3.3.3.3), and a customizable TS leaf (Section 3.3.2) for each of them that offers the central node a well defined<br />
interface to operate the OSWI of each sub-system. Figure 3-1 shows the architecture of the TS.<br />
Each node of the TS can be accessed independently, fulfilling the requirement outlined in Section 3.2.2, Point 2).<br />
<strong>The</strong> available interfaces and location for each of those nodes are defined in a WSDL document. Both the central<br />
node and the TS leaves are based on a single common building block, the “control cell”. Each sub-system group<br />
will be responsible for customizing a control cell and keeping the consistency of the available interface with the<br />
interface described in the corresponding WSDL file.<br />
<strong>The</strong> presented design is not driven by the available interface of the OSWI of a concrete sub-system (Section<br />
3.2.2, Point 1) ). <strong>The</strong>refore, this improves the evolution potential of the low-level infrastructure and the TS.<br />
Moreover, the design of the TS is logically and technologically decoupled from any controller (Section 3.2.2,<br />
Point 3) ). In addition, the distributed nature of the TS design facilitates a clear separation of responsibilities and<br />
a distributed development. <strong>The</strong> common control cell software framework could be used in a variety of different<br />
control network topologies (e.g., N-level tree or peer to peer graph).<br />
3.3.1 Initial discussion on technology<br />
<strong>The</strong> development of a distributed software system like the TS requires the usage of distributed programming<br />
facilities. An initial technological survey pointed to a possible candidate: a C++ based cross-platform data<br />
acquisition framework called XDAQ developed in-house by the <strong>CMS</strong> collaboration (Section 1.4.3). <strong>The</strong> OSWI<br />
of many sub-systems was already based on this distributed programming framework (Section 1.4.4). It was<br />
therefore an obvious option to develop the TS. <strong>The</strong> following reasons backed up this technological option:<br />
• <strong>The</strong> software frameworks used in both the TS and the sub-systems are homogeneous.<br />
• For a faster messaging protocol, I2O messages could be used instead of being limited to messages according<br />
to the SOAP communication protocol.
<strong>Trigger</strong> <strong>Supervisor</strong> Concept 26<br />
HTTP SOAP I2O or Custom<br />
<strong>Trigger</strong> <strong>Supervisor</strong><br />
Access Control<br />
Module (ACM)<br />
Task Scheduler<br />
Module (TSM)<br />
Shared Resource<br />
Manager (SRM)<br />
Error Manager<br />
(EM)<br />
Task 1<br />
Task 1<br />
Task 1<br />
Task 1<br />
Task 1<br />
Task 1<br />
Task 1<br />
Task 1<br />
Task 1<br />
• Monitoring and security packages are available.<br />
• XDAQ development was practically finished, and its API was considered already stable when the<br />
conceptual design was approved.<br />
3.3.2 Cell<br />
Figure 3-2: Architecture of the control cell.<br />
<strong>The</strong> architecture of the TS is characterized by its tree topology, where all tree nodes are based on a common<br />
building block, the control cell. Figure 3-2 shows the architecture of the control cell. <strong>The</strong> control cell is a<br />
program that offers the necessary functionalities to coordinate the control operations over other software<br />
systems, for instance the OSWI of a concrete trigger sub-system, an information server, or even another control<br />
cell. Each cell can work independently of the rest (fulfilling the requirement of Section 3.2.2, Point 2) ), or inside<br />
a more complex topology.<br />
<strong>The</strong> following points describe the components of the control cell.<br />
1) Control Cell Interface (CCI): This is the external interface of the control cell. Different protocols should<br />
be available. An HTTP interface could be provided using the XDAQ facilities; this should facilitate a first<br />
entry point from any web browser. A second interface based on SOAP should also be provided in order to<br />
ease the integration of the TS with the run control or any other controller that requires a web service<br />
interface. Future interface extensions are foreseen (e.g., an I2O interface should be implemented). Each<br />
control cell should have an associated WSDL document that will describe its interface. <strong>The</strong> information<br />
contained in that document instructs any user/entity how to properly operate with the control cell.<br />
2) Access Control Module (ACM): Each module is responsible for identifying and authenticating every user<br />
or entity (controller) attempting to access, and for providing an authorization protocol. <strong>The</strong> access control<br />
module should have access to a user list, which should provide the necessary information to identify and<br />
authenticate, and the privileges assigned to each controller. Those privileges should be used to check<br />
whether or not an authenticated controller is allowed to execute a given operation.<br />
3) Task Scheduler Module (TSM): This module is in charge of managing the command requests and<br />
forwarding the answer messages. <strong>The</strong> basic idea is that a set of available operations exist that can be<br />
accessed by a given controller. Each operation corresponds to a Finite State Machine (FSM). <strong>The</strong> default set<br />
of operations is customizable and extensible. <strong>The</strong> TSM is also responsible for preventing the launching of<br />
operations that could enter into conflict with other running operations (e.g., simultaneous self test operations
Design 27<br />
within the same trigger sub-system, interconnection test operations that cannot be parallelized). <strong>The</strong><br />
extension and/or customization of the default set of operations could change the available interface of the<br />
control cell. In that case, the corresponding WSDL should be updated.<br />
4) Shared Resources Manager (SRM): This module is in charge of coordinating access to shared resources<br />
(e.g., the configuration database, other control cells, or a trigger sub-system online software infrastructure).<br />
Independent locking services for each resource are provided.<br />
5) Error Manager (ERM): This module provides the management of all errors not solved locally, which have<br />
been generated in the context of the control cell, and also the management of those errors that could not be<br />
resolved in a control cell immediately controlled by this one. Both the error format and the remote error<br />
notification mechanism will be based on the global <strong>CMS</strong> distributed error handling scheme. <strong>The</strong> control<br />
over what operations can be executed is distributed among the ACM for user access level control (e.g., a<br />
user with monitoring privileges cannot launch a self test operation), the TSM for conflictive operation<br />
control (e.g., to avoid running in parallel operations that could disturb each other), and inside the commands<br />
code of each operation (e.g., to check that a given user is allowed to set up the requested configuration).<br />
More details are given in Section 3.3.3.1.<br />
3.3.3 <strong>Trigger</strong> <strong>Supervisor</strong> services<br />
<strong>The</strong> <strong>Trigger</strong> <strong>Supervisor</strong> services are the final functionalities offered by the TS. <strong>The</strong>se services emerge from the<br />
collaboration of several nodes of the TS tree. In general, the central node is always involved in all services<br />
coordinating the operation of the necessary TS leaves. <strong>The</strong> goal of this section is to describe, for each different<br />
service, what the default operations are in both the central node of the TS and in the TS leaves, and how the<br />
services emerge from the collaboration of these distributed operations. It is remarked that a control cell operation<br />
is always a Finite State Machine (FSM). <strong>The</strong> main reason of using FSM’s to define the TS services is that FSM’s<br />
are a well known model in HEP to define control systems. It is therefore a feasible tool to communicate and<br />
discuss ideas with the rest of the collaboration.<br />
3.3.3.1 Configuration<br />
This service is intended to perform the hardware configuration of the L1 trigger system, which includes the<br />
setting of registers or Look-Up Tables (LUT’s) and downloading the L1 trigger logic into the programmable<br />
logic devices of the electronics boards. <strong>The</strong> configuration service requires the collaboration of the central node of<br />
the TS and all the TS leaves. Each control cell involved implements the operation represented in Figure 3-3.<br />
ConfigurationServiceInit()<br />
Configure(Key)<br />
Reconfigure(Key)<br />
Enable()<br />
Not<br />
configured<br />
Configuring<br />
Configured<br />
Enabling<br />
Enabled<br />
Error()<br />
Error()<br />
Error<br />
Figure 3-3: Configuration operation.
<strong>Trigger</strong> <strong>Supervisor</strong> Concept 28<br />
RC<br />
session<br />
Configure(TS_key)<br />
<strong>Trigger</strong><br />
<strong>Supervisor</strong><br />
(central node)<br />
Session_key<br />
TS_key<br />
TS_key<br />
… Other_keys<br />
TCS_key<br />
GM_key<br />
R<strong>CMS</strong><br />
responsibility<br />
… GC_key<br />
TS resp.<br />
Configure(TCS_key) Configure(GM_key)<br />
Configure(GC_key)<br />
<strong>Trigger</strong><br />
<strong>Supervisor</strong><br />
(GT/TCS)<br />
<strong>Trigger</strong><br />
<strong>Supervisor</strong><br />
(Global Muon)<br />
…<br />
<strong>Trigger</strong><br />
<strong>Supervisor</strong><br />
(Global Calo)<br />
TCS_key<br />
BC table<br />
Throttle<br />
logic<br />
… Other TCS param<br />
Subsystem<br />
responsibility<br />
Figure 3-4: Configuration service.<br />
Due to the asynchronous interface, it is also necessary to define transition states such as Configuring and<br />
Enabling, which indicate that a transition is in progress. All commands are executed while the FSM is in a<br />
transition state. If applicable, an error state is invoked from the transition state. Figure 3-4 shows how the<br />
different nodes of the TS collaborate in order to fully configure the L1 trigger system.<br />
A key 8 is assigned to each node. Each key maps into a row of a database table that contains the configuration<br />
information of the system. <strong>The</strong> sequence of steps that a controller of the TS should follow in order to properly<br />
use the configuration service is as follows.<br />
• Send a ConfigurationServiceInit() command to the central node of the TS.<br />
• Once the operation reaches the Not configured state, the next step is to send a Configure(Key) command,<br />
where Key identifies a set of sub-system keys, one per trigger sub-system that is to be configured. <strong>The</strong><br />
Configure(Key) command initiates the configuration operation in the relevant TS leaves. <strong>The</strong> configure<br />
command in the configuration operation of each TS leaf will check whether or not the user is allowed to set<br />
the configuration identified by a given sub-system key. This means that each trigger sub-system has the full<br />
control over who and what can be configured. This also means that the list of users in the central node of the<br />
TS will be replicated in the TS leaves.<br />
• Once the configuration operation of the TS leaves reaches the Configured state, the configuration operation<br />
in the central node of the TS jumps to the Configured state.<br />
• Send an Enable command. This fourth step is just a switch-on operation.<br />
From the point of view of the L1 trigger system, everything is ready to run the experiment once the configuration<br />
operation reaches the Enabled state.<br />
Each trigger sub-system has the responsibility to customize the configuration operation of its own control cell<br />
and thus has to implement the commands of the FSM. <strong>The</strong> central node of the TS owns the data that relates a<br />
given L1 trigger key to the trigger sub-system keys.<br />
<strong>The</strong> presented configuration service is flexible enough to allow a full or a partial configuration of the L1 trigger<br />
system. In the second case, the Key identifies just a subset of sub-system keys, one per trigger sub-system that is<br />
to be configured, and/or each sub-system key identifies just a subset of all the parameters that can be configured<br />
for a given trigger sub-system. <strong>The</strong> configuration database consists of separated databases for each sub-system<br />
and for the central node. Each trigger sub-system is then responsible for populating the configuration database<br />
and to assign key identifiers to sets of configuration parameters.<br />
8 Key: Name that uniquely identifies the configuration of a given system.
Design 29<br />
3.3.3.2 Reconfiguration<br />
This section complements Section 3.3.3.1. A reconfiguration of the L1 trigger system may become necessary, for<br />
example if thresholds have to be adapted due to a change in luminosity conditions. <strong>The</strong> new configuration table<br />
must be propagated to the filter farm, as it was required in Section 3.2.1, Point 2). <strong>The</strong> following steps show how<br />
a controller of the TS should behave in order to properly reconfigure the L1 trigger system using the<br />
configuration service.<br />
• Once the L1 trigger system is configured, the configuration operation in the central node of the TS will be in<br />
the Enabled state.<br />
• Send a Reconfigure(Key) command. <strong>The</strong> following steps show how this command behaves.<br />
o Stop the generation of L1A signals.<br />
o Send a Configure(Key) command as in Section 3.3.3.1, and<br />
o Jump to the state Configured<br />
• <strong>The</strong> controller is also responsible for propagating the configuration changes to the filter farm hosts in charge<br />
of the HLT and the L1 trigger simulation through the configuration/conditions database (Section 3.2.1, Point<br />
2).<br />
• Send an Enable command: This signal will be sent by the controller to confirm the propagation of<br />
configuration changes to the filter farm hosts in charge of the HLT and the L1 trigger simulation. This<br />
command will be in charge of resuming the generation of L1A signals. Run control is in charge of<br />
coordinating the configuration of the TS and the HLT. <strong>The</strong>re is no special interface between the central node<br />
of the TS and the HLT.<br />
3.3.3.3 Testing<br />
<strong>The</strong> TS offers two different test services: the self test service and the interconnection test service. <strong>The</strong> following<br />
sections describe both.<br />
<strong>The</strong> self test service checks that each individual sub-system is able to operate as foreseen. If anything fails during<br />
the test of a given sub-system, an error report is returned, which can be used to define the necessary corrective<br />
actions. <strong>The</strong> self test service can involve one or more sub-systems. In the second, more complex case, the self<br />
test service requires the collaboration of the central node of the TS and all the corresponding TS leaves. Each<br />
control cell involved implements the same self test operation. <strong>The</strong> self test operation running in each control cell<br />
is a FSM with only two states: halted and tested. This is the sequence of steps that a controller of the TS<br />
should follow in order to properly use the self test service.<br />
• Send a SelfTestServiceInit() command. Once the self test operation is initiated, the operation reaches<br />
the halted state (initial state).<br />
• Send a RunTest(LogLevel) command, where the parameter LogLevel specifies the level of detail of the<br />
error report. An additional parameter type, in the RunTest() command, might be used to distinguish among<br />
different types of self test.<br />
<strong>The</strong> behavior of the RunTest() command depends on whether it is the self test operation of the central node of<br />
the TS, or a self test operation in a TS leaf. In the central node of the TS, the RunTest() command is used to<br />
follow the above sequence for each TS leaf, and collect all error reports coming from the TS leaves. In the case<br />
of a TS leaf, the RunTest() command will implement the test itself and will generate an error report that will be<br />
forwarded to the central node of the TS. It is important to note that the error report will be generated in a<br />
standard format specified in a XML Schema Document (XSD). This should ease the automation of test reports.<br />
<strong>The</strong> interconnection test service is intended to check the connections among sub-systems. In each test, several<br />
trigger sub-systems and sub-detectors can participate as sender/s or receiver/s.<br />
Figure 3-5 shows a typical scenario for participants involved in an interconnection test. <strong>The</strong> example shows the<br />
interconnection test of the <strong>Trigger</strong> Primitive Generators and the Global <strong>Trigger</strong> logic.
<strong>Trigger</strong> <strong>Supervisor</strong> Concept 30<br />
Det.<br />
FE<br />
Opt.<br />
S-Link<br />
DAQ<br />
Readout<br />
TPG<br />
Trig.<br />
Links<br />
Trig. Subsystem<br />
Trig.<br />
Links<br />
Global<br />
<strong>Trigger</strong><br />
Sender<br />
Start(L1A)<br />
Receiver<br />
TCS<br />
Figure 3-5: Typical scenario of an interconnection test.<br />
<strong>The</strong> interconnection test service requires the collaboration of the central node of the TS and some of the TS<br />
leaves. Each control cell involved will implement the operation represented in Figure 3-6.<br />
ConTestServiceInit()<br />
Prepare_test(Test_id)<br />
Start_test()<br />
Not<br />
tested<br />
Preparing<br />
Error()<br />
Ready<br />
for test<br />
Error()<br />
Testing<br />
Tested<br />
Error<br />
Figure 3-6: Interconnection test operation.<br />
This is the sequence of steps that a controller of the TS should follow in order to properly use the interconnection<br />
test service.<br />
• Send a ConTestServiceInit() command.<br />
• Once the operation reaches the Not tested state, the next step is to send a PrepareTest(Test_id). This<br />
command implemented in the central node of the TS will do the following steps:<br />
o Retrieve from the configuration database the relevant information for the central node of the TS.<br />
o Send a ConTestServiceInit() command to sender/s and receiver/s.<br />
o Send Prepare_test() command to sender/s and receiver/s.<br />
o Wait for Ready_for_test signal from all senders/receivers.<br />
• Once the operation reaches the Ready state, the next step is to send a Start_test command.<br />
• Wait for results.<br />
This is the sequence of steps that the TS leaves acting as senders/receivers should follow when they receive the<br />
Prepare_test(Test_id) command from the central node of the TS.<br />
Retrieve from the configuration database the relevant information for the leaf (e.g., which role: sender or<br />
receiver, test vectors to be sent or to be expected).<br />
• Send a Ready_for_test signal to the central node of the TS.<br />
• Wait for the Start_test() command.
Design 31<br />
• Do the test, and generate the test report to be forwarded to the central node of the TS (if the TS leaf is a<br />
receiver).<br />
In contrast to the configuration service, the central node of the TS can already check whether a given user can<br />
launch interconnection test operations. However, the TSM of each TS leaf will still be in charge of checking<br />
whether acting as a sender/receiver is in conflict with an already running operation. Each sub-detector must also<br />
customize a control cell in order to facilitate the execution of interconnection tests that involve the TPG modules.<br />
3.3.3.4 Monitoring<br />
<strong>The</strong> monitoring service is implemented by an operation running in a concrete TS leaf or as a collaborative<br />
service where an operation, running in the central node of the TS, is monitoring the monitoring operations<br />
running in a number of TS leaves.<br />
<strong>The</strong> basic monitoring operation is a FSM with only two states: monitoring and stop. Once the monitoring<br />
operation is initiated, the monitoring process is started. At this point, any controller can retrieve items by sending<br />
pull commands. A more advanced monitoring infrastructure should be offered in a second development phase<br />
where a given controller receives monitoring updates following a push approach. This second approach<br />
facilitates the implementation of an alarm mechanism.<br />
3.3.3.5 Start-up<br />
From the point of view of a controller (run control session or standalone client), the whole L1 trigger system is<br />
one single resource, which can be started by sending three commands. Figure 3-7 shows how this process is<br />
carried out. This approach will simplify the implementation of the client.<br />
RC<br />
session<br />
Session_key<br />
TS_start_key<br />
TS_URL<br />
TS_config_data<br />
…<br />
R<strong>CMS</strong><br />
responsibility<br />
1 st .To JC: Start(TS_URL)<br />
JC<br />
1st.To JC: Start(GT_URL)<br />
<strong>Trigger</strong><br />
<strong>Supervisor</strong><br />
(central node)<br />
2 nd . To TS: Config_trigger_sw(TS_config_data)<br />
3 rd . To TS: Startup_trigger(TS_start_key)<br />
TS_Start_key<br />
GT_start_key<br />
GT_URL<br />
GT_config_data<br />
2 nd . To TS: Config_trigger_sw(GT_config_data)<br />
3 rd . To TS: Startup_trigger(GT_start_key)<br />
…<br />
TS responsibility<br />
JC<br />
<strong>Trigger</strong><br />
<strong>Supervisor</strong><br />
(Global <strong>Trigger</strong>)<br />
JC<br />
<strong>Trigger</strong><br />
<strong>Supervisor</strong><br />
(Global Muon)<br />
…<br />
JC<br />
<strong>Trigger</strong><br />
<strong>Supervisor</strong><br />
(Global Calo)<br />
GT_start_key<br />
GT_OSWI: URL’s, config_data<br />
Sub-system<br />
responsibility<br />
OSWI<br />
OSWI<br />
OSWI<br />
Figure 3-7: Start-up service.<br />
<strong>The</strong> first client that wishes to operate with the TS must follow these steps:<br />
• Send a Start(TS_URL) command to the job control daemon in charge of starting up the central node of the<br />
TS, where TS_URL identifies the Uniform Resource Locator from where the compiled central node of the TS<br />
can be retrieved.<br />
• Send a Config_trigger_sw(TS_config_data) command to the central node of the TS in order to properly<br />
configure it. Steps 1 and 2 are separated to facilitate an incremental configuration process.
<strong>Trigger</strong> <strong>Supervisor</strong> Concept 32<br />
• Send a Startup_trigger(TS_start_key) command to the central node of the TS. This command will send<br />
the same sequence of three commands to each TS leaf, but now the command parameters are retrieved from<br />
the configuration database register identified with the TS_start_key index.<br />
<strong>The</strong> Config_trigger_sw(TSLeaf_config_data) command that is received by the TS leaf is in charge of<br />
starting up the corresponding online software infrastructure.<br />
<strong>The</strong> release of the TS nodes is also hierarchic. Each node of the TS (i.e., TS central node and TS leaves) will<br />
maintain a counter of the number of controllers that are operating on it. When a controller wishes to stop<br />
operating a given TS node, it has to demand the value of the reference counter from the TS node. If it is equal to<br />
1, the controller will send a Release_node command and will wait for the answer. When a TS node receives a<br />
Release_node command it will behave like the controller outlined above in order to release the unnecessary<br />
software infrastructure.<br />
3.3.4 Graphical User Interface<br />
Together with the basic building block of the TS or control cell, an interactive graphic environment to interact<br />
with it should be provided. It should feature a display to help the user/developer to operate the control cell in<br />
order to cope with the requirement outlined in Section 3.2.1, Point 9). Two different interfaces are foreseen:<br />
• HTTP: <strong>The</strong> control cell should provide an HTTP interface that allows full operation of the control cell and<br />
visualization of the state of any running operation. <strong>The</strong> HTTP interface should provide an additional entry<br />
point to the control cell (Section 3.3.2), bypassing the ACM, in order to offer a larger flexibility in the<br />
development and debug phases.<br />
• Java: A generic controller developed in Java should provide to the user an interactive window to operate the<br />
control cell through a SOAP interface. This Java application should also be an example of how to interact<br />
with the monitoring operations offered by the control cell, and graphically represent the monitored items.<br />
This Java controller can be used by the R<strong>CMS</strong> developers control as an example of how to interact with the<br />
TS.<br />
3.3.5 Configuration and conditions database<br />
In this design, a dedicated configuration/conditions database per sub-system is foreseen. Different sets of<br />
firmware for the L1 trigger electronics boards and default parameters such as thresholds should be predefined<br />
and stored in the database. <strong>The</strong> information should be validated with respect to the actual hardware limitations<br />
and compatibility between different components. However, as it is shown in Figure 3-1, all these databases share<br />
the same database server provided by the <strong>CMS</strong> DataBase Working Group (DBWG). <strong>The</strong> general <strong>CMS</strong> database<br />
infrastructure, which the TS will use, includes the following components:<br />
• HW infrastructure: Servers.<br />
• SW infrastructure: Likely based on Oracle, scripts and generic GUIs to populate the databases, methodology<br />
to create customized GUIs to populate sub-system specific configuration data.<br />
Each trigger sub-system should provide the specific database structures for storing configuration data, access<br />
control information and interconnection test parameters. Custom GUIs to populate these structures should also<br />
be delivered.<br />
3.4 Project communication channels<br />
<strong>The</strong> development of the <strong>Trigger</strong> <strong>Supervisor</strong> required the collaboration of all trigger sub-systems, sub-detectors<br />
and the R<strong>CMS</strong>. Other parties of the <strong>CMS</strong> collaboration are also involved in this project: the Luminosity<br />
Monitoring System (LMS), the High Level <strong>Trigger</strong> (HLT), the Online Software Working Group (OSWG) and<br />
the DataBase Working Group (DBWG). A consistent configuration of the <strong>Trigger</strong> Primitive Generator (TPG)<br />
modules of each sub-detector, the automatic update of the L1 trigger pre-scales as a function of information<br />
obtained from the LMS, the adequate configuration of the HLT and the agreement in the usage of software tools<br />
and database technologies enlarged the number of involved parties during the development of the TS. Due to the<br />
large number of involved parties and sub-system interfaces an important effort was dedicated to documentation<br />
and communication purposes.
Project development 33<br />
One of the problems in defining the communication channels is that they may concern different classes of<br />
consumers having fairly different background and language - electronics engineers, physicists, programmers and<br />
technicians. Consumers can be roughly divided between the TS team and the rest. For internal use, the TS<br />
members use the Unified Modeling Language (UML) [61] descriptions to model and document the status of the<br />
TS software framework: concurrency, communication mechanism, access control, task scheduling and error<br />
management. This model is kept consistent with the status of the TS software framework. This additional effort<br />
is worthwhile because it accelerates the learning curve of new team members that are able to contribute<br />
effectively to the project in a shorter period of time, it helps to detect and remove errors and can be used as<br />
discussion material with other software experts, for instance to discuss the data base interface with the DBWG or<br />
to justify to the OSWG an upgrade in a core library. But this approach is no longer valid when the consumer is<br />
not a software expert. Project managers, electronic engineers or physicists must also contribute. Periodic<br />
demonstrators with all involved parties have proved to be powerful communication channels. This simple<br />
approach has facilitated the understanding of the TS for a wide range of experts and has helped in the continuous<br />
process of understanding the requirements. A practical way to communicate the status of the project has also<br />
facilitated the maintenance of a realistic development plan and manpower prevision calendar.<br />
3.5 Project development<br />
<strong>The</strong> development of the TS was divided in three main development layers: the framework, the system and the<br />
services. <strong>The</strong> framework is the software infrastructure that facilitates the main building block or control cell, and<br />
the integration with the specific sub-system OSWI. <strong>The</strong> system is a distributed software architecture built out of<br />
these building blocks. Finally, the services are the L1 trigger operation capabilities implemented on top of the<br />
system as a collaboration of finite state machines running in each of the cells. <strong>The</strong> decomposition of the project<br />
development tasks into three layers has the following advantages:<br />
1) Project development coordination: <strong>The</strong> division of the project development effort into three conceptually<br />
different layers facilitates the distribution of tasks between a central team and the sub-systems. In a context<br />
of limited human resources the central team can focus on those tasks that had a project overall scope like<br />
project organization, communication, design and development of the TS framework, coordination of subsystem<br />
integration, sub-system support and so on. <strong>The</strong> tasks assigned to the sub-systems are those that<br />
require an expert knowledge of the sub-system hardware. <strong>The</strong>se tasks consist of developing the sub-system<br />
TS cells according to the models proposed by the central team, and the development of the sub-system cell<br />
operations required by the central team in order to build the configuration and test services.<br />
2) Hardware and software upgrades: Periodic software platform and hardware upgrades are foreseen during<br />
the long operational life of the experiment. A baseline layer that hides these upgrades and provides a stable<br />
interface avoids the propagation of code modifications to higher conceptual layers. <strong>The</strong>refore, the code and<br />
number of people involved in updating the TS after each SW/HW upgrade are limited and well localized.<br />
3) Flexible operation capabilities: A stable distributed architecture built on top of the baseline layer is the<br />
first step towards providing a simple methodology to create new services to operate the L1 decision loop<br />
(Section 3.2.2, Point 6) ). <strong>The</strong> simplicity of this methodology is necessary because the people in charge of<br />
defining the way of operating the experiment are in general not software experts but particle physicists with<br />
almost full time management responsibilities.
<strong>Trigger</strong> <strong>Supervisor</strong> Concept 34<br />
Periodic demonstrators as a communication channel with all involved parties:<br />
•<strong>Trigger</strong> sub-systems and sub-detectors<br />
• Luminosity monitoring system<br />
• High Level <strong>Trigger</strong><br />
• Run Control and Monitoring System<br />
• Database Working Group<br />
• Online Software Working Group<br />
Services<br />
System<br />
Framework<br />
Chapter 6<br />
Chapter 5<br />
Chapter 4<br />
Prototype<br />
SW Context<br />
Concept<br />
HW Context<br />
Chapter 3<br />
Chapter 1<br />
Figure 3-8: <strong>Trigger</strong> <strong>Supervisor</strong> project organization and communication schemes.<br />
<strong>The</strong> TS framework, presented in Chapter 4, consists of the distributed programming facilities required to build<br />
the distributed software system known as TS system. <strong>The</strong> TS system, presented in Chapter 5, is a set of nodes<br />
and the communication channels among them that serve as the underlying infrastructure that facilitate the<br />
development of the TS services presented in Chapter 6. Figure 3-8 shows a simplified diagram of the project<br />
organization, the communication channels and the contents of Chapters 1, 3, 4, 5, and 6.<br />
3.6 Tasks and responsibilities<br />
<strong>The</strong> development of the TS framework, system and services can be further divided into a number of tasks. Due<br />
to the limited resources in the central TS team and in some cases due to the required expertise about concrete<br />
sub-system hardware, these tasks are distributed among the trigger sub-systems and the TS team.<br />
Central team responsibilities<br />
<strong>The</strong> tasks assigned to the central team are those that have a project overall scope like project organization,<br />
communication, design and development of common infrastructure, coordination of sub-system integration, subsystem<br />
support and so on. <strong>The</strong> following list describes the tasks assigned to the central team.<br />
1) <strong>Trigger</strong> <strong>Supervisor</strong> framework development: <strong>The</strong> creation of the basic building blocks that form the TS<br />
system and that facilitate the integration of the different sub-systems is a major task which requires a<br />
continuous development process from the prototype to the periodic upgrades in coordination with the<br />
OSWG and DBWG.<br />
2) Coordination: <strong>The</strong> central team is responsible to discuss and propose to each sub-system an integration<br />
model with the TS system. <strong>The</strong> central team is also responsible to develop the central cell and to coordinate<br />
the different sub-systems in order to create the TS services.<br />
3) Sub-system support: It is important to provide adequate support to the sub-systems in order to ease the<br />
integration process and the usage of the TS framework. With this aim, the project web page [62] was<br />
regularly updated with the last version of the user’s guide [63] and the last presentations, a series of<br />
workshops [64][65][66] were organized, and finally a web-based support management tool was set up [67].
Conceptual design in perspective 35<br />
4) Software configuration management: A set of configuration management actions were proposed by the<br />
central team in order to improve the communication of the system evolution and the coordination among<br />
sub-system development groups. A common Concurrent Versions System 9 (CVS) repository for all the<br />
online software infrastructure of the L1 trigger was created, which facilitates the production and<br />
coordination of L1 trigger software releases. A generic Makefile 10 was adopted to homogenize the build<br />
process of the L1 trigger software. This allowed a more automatic deployment of the L1 trigger online<br />
software infrastructure, and prepared it for integration with the DAQ online software.<br />
5) Communication: <strong>The</strong> central team was also responsible for communicating with all involved parties<br />
according to Section 3.4. <strong>The</strong> communication effort consisted of periodic demonstrators, the framework<br />
internal documentation and presentations in the collaboration meetings.<br />
Sub-system responsibilities<br />
<strong>The</strong> tasks assigned to the sub-systems were those that required an expert knowledge of the sub-system hardware.<br />
<strong>The</strong>se tasks consisted of developing the sub-system TS cells according to the models proposed by the central<br />
team, and the development of the sub-system cell operations required by the central team in order to build the<br />
configuration and test services.<br />
Shared responsibilities<br />
Due to an initial lack of human resources in the sub-system teams, some sub-system cells were initially<br />
prototyped by the central team: GT, GMT, and DTTF. At a later stage, the bulk of these developments was<br />
transferred to the corresponding sub-systems.<br />
3.7 Conceptual design in perspective<br />
<strong>The</strong> TS conceptual design presented in this chapter consists of functional and non-functional requirements, a<br />
feasible architecture that fulfills these requirements and the project organization details. <strong>The</strong>se three points<br />
define the project concept. Some initial technical aspects have also been presented in order to prove the<br />
feasibility of the design: XDAQ as baseline infrastructure and GUI technologies, the usage of FSM’s, services<br />
implementation details and so on.<br />
During three years the project scope has not been altered proving the suitability of the initial conceptual ideas.<br />
However, some technical details have evolved towards different solutions, some have disappeared and a few<br />
have been added. <strong>The</strong> following chapters describe the final technical details of the <strong>Trigger</strong> <strong>Supervisor</strong>.<br />
9 <strong>The</strong> Concurrent Versions System (CVS), also known as the Concurrent Versioning System, is an open-source version<br />
control system that keeps track of all work and all changes in a set of files, typically the implementation of a software project,<br />
and allows several (potentially widely-separated) developers to collaborate (Wikipedia).<br />
10 In software development, make is a utility for automatically building large applications. Files specifying instructions for<br />
make are called Makefiles (Wikipedia).
Chapter 4<br />
<strong>Trigger</strong> <strong>Supervisor</strong> Framework<br />
4.1 Choice of an adequate framework<br />
<strong>The</strong> conceptual design of the <strong>Trigger</strong> <strong>Supervisor</strong> presented in Chapter 3 outlines a distributed software control<br />
system with a hierarchical topology where each node is lying on a common architecture. Such a distributed<br />
system requires the usage of a distributed programming framework 11 that should facilitate the necessary tools<br />
and services for remote communication, system process management, memory management, error management,<br />
logging and monitoring. A suitable solution had to cope with the functional and non-functional requirements<br />
presented in Chapter 3.<br />
As discussed in Section 1.4, the <strong>CMS</strong> Experiment Control System (ECS) is based on three main distributed<br />
programming frameworks, namely XDAQ, DCS and R<strong>CMS</strong>, which as official projects of the <strong>CMS</strong> collaboration<br />
will be maintained and supported during an operational phase of the order of ten years. <strong>The</strong> choice was therefore<br />
limited to these frameworks. Other external projects were not considered due to the impossibility to assure the<br />
long-term maintenance.<br />
Among them, XDAQ had proven to be the most complete and able to facilitate a fast development as required in<br />
Section 3.2.2, Point 5):<br />
• <strong>The</strong> Online SoftWare Infrastructure (OSWI) of all sub-systems is mainly formed by libraries written in C++<br />
running on an x86/Linux platform. <strong>The</strong>se are intended to hide hardware complexity from software experts.<br />
<strong>The</strong>refore, a distributed programming framework based on C++ would simplify the model of integration<br />
with the sub-system OSWI’s.<br />
• When the survey took place, XDAQ was already a mature product with an almost final API which<br />
facilitated the upgrading effort.<br />
• XDAQ provides infrastructure for monitoring, logging and database access.<br />
<strong>The</strong> R<strong>CMS</strong> and PVSSII/JCOP frameworks were not selected due to the additional complexity of the overall<br />
architecture. First, R<strong>CMS</strong> is written in Java and therefore the integration of C++ libraries would require an<br />
additional effort. Besides, R<strong>CMS</strong> was being totally re-developed when the survey took place. Regarding PVSSII,<br />
it could have been adopted if the sub-system C++ code would have run within a Distributed Information<br />
Management (DIM) server [70]. This could have provided an adequate remote interface to PVSSII [71];<br />
however, the usage of two distributed programming frameworks (PVSSII and DIM) within two different<br />
platforms (PVSSII runs on Windows and DIM on Linux) would have resulted in an undesirably complex<br />
architecture.<br />
11 A software framework is a reusable software design that can be used to simplify the implementation of a specific type of<br />
software. If this is implemented in an object oriented language, this consists of a set of classes and the way their instances<br />
collaborate [68][69].
<strong>Trigger</strong> <strong>Supervisor</strong> Framework 38<br />
Despite the fact that XDAQ was the best available option, it was not an out-of-the-box solution to implement the<br />
<strong>Trigger</strong> <strong>Supervisor</strong> and therefore further development was needed. Section 4.2 describes the requirements of the<br />
<strong>Trigger</strong> <strong>Supervisor</strong> framework. Section 4.3 describes the functional architecture. Section 4.4 discusses the<br />
implementation details. Section 4.5 presents a concrete usage guide of the framework. Finally, the performance<br />
and scalability issues are presented in Section 4.6.<br />
4.2 Requirements<br />
This section presents the requirements of a suitable software framework to develop the TS. It is shown how the<br />
functional (Section 3.2.1) and non-functional (Section 3.2.2) requirements associated with the conceptual design<br />
motivate a number of additional developments which are not covered by XDAQ.<br />
4.2.1 Requirements covered by XDAQ<br />
<strong>The</strong> software basic infrastructure necessary to implement the TS should fulfill a number of requirements in order<br />
to be able to serve as the core framework of the TS system. <strong>The</strong> following list presents the requirements which<br />
were properly covered by XDAQ:<br />
1) Web services centric: <strong>The</strong> <strong>CMS</strong> online software, and more exactly, the Run Control and Monitoring<br />
System (R<strong>CMS</strong>) is extensively using web services technologies ([10], p. 202). XDAQ is also a web services<br />
centric infrastructure. <strong>The</strong>refore, it simplifies the integration with R<strong>CMS</strong> (Section 3.2.1, Point 12) ).<br />
2) Logging and error management: According to Sections 3.2.1, Point 7) and 3.2.1, Point 8), the TS<br />
framework should provide facilities for logging and error management in a distributed environment. XDAQ<br />
provides this infrastructure compatible with the <strong>CMS</strong> logging and error management schemes.<br />
3) Monitoring: According to Section 3.2.1, Point 4), the TS framework should provide infrastructure for<br />
monitoring in a distributed environment.<br />
4.2.2 Requirements non-covered by XDAQ<br />
Additional infrastructure had to be designed and developed to cope with the requirements of the conceptual<br />
design:<br />
1) Synchronous and asynchronous protocols: <strong>The</strong> TS frameworks should facilitate the development of<br />
distributed systems featuring both synchronous and asynchronous communication among nodes (Section<br />
3.2.1, Point 12) ).<br />
2) Multi-user: <strong>The</strong> nodes of a distributed system implemented with the TS framework should facilitate<br />
concurrent access to multiple clients (Section 3.2.1, Point 10) ).<br />
However, the main additional developments were motivated by the human context (Section 3.2.2, Point 7) ) of<br />
the project and time constraints (Section 3.2.2, Point 5) ). This section presents a number of desirable<br />
requirements grouped as a function of few generic guidelines.<br />
Simplify integration and support effort: <strong>The</strong> resources in the central TS team were very limited. Threfore, it<br />
was necessary to provide infrastructure that simplified the software integration and reduced the the need for subsystem<br />
support.<br />
3) Finite State Machine (FSM) based control system: A framework that guides the sub-system developer<br />
reducing the degrees of freedom during the customization process would simplify the software integration<br />
and would reduce the support tasks. A control system model based on Finite State Machines (FSM) is well<br />
known in HEP. It was proposed in Section 3.3 as a feasible model to implement the final services of the<br />
<strong>Trigger</strong> <strong>Supervisor</strong>. FSM’s have been used in other experiment control systems [72][73][74], and are<br />
currently being used by the <strong>CMS</strong> DCS [75] and other CERN experiments [76][77]. On the other hand, just a<br />
well known model is not enough, a concrete FSM had to be provided with a clear specification of all states<br />
and transitions, their expected behavior, input/output parameter data types and names. <strong>The</strong> more complete<br />
this specification is the easier is the sub-system coordination and the more it facilitates a clear separation of<br />
responsibilities among sub-systems. Some more concrete implementation details, shown in the<br />
implementation section, like a clear separation of the error management, are intended to ease the
Cell functional structure 39<br />
customization and the maintenance phases. In addition, the usage of a well known model would accelerate<br />
the learning curve and therefore, the integration process.<br />
4) Simple access to external services: A framework should provide facilities to access Oracle relational<br />
databases, XDAQ applications, and remote web-based services (i.e. SOAP-based, HTTP/CGI based<br />
services) in a simple and homogeneous way. This infrastructure would ease the development of the FSM<br />
transition methods, for instance when it is necessary to access the configuration database.<br />
5) Homogeneous integration methodology independent of the concrete sub-system OSWI: <strong>The</strong> TS<br />
framework should facilitate a common integration methodology independent of the available OSWI and the<br />
hardware setup.<br />
6) Automatic creation of graphical user interfaces: In order to reduce the integration development time, a<br />
framework should provide a mechanism to automatically generate a GUI to control the sub-system<br />
hardware. This should also facilitate a common look and feel for all sub-systems graphical setups.<br />
<strong>The</strong>refore, an operator of the L1 trigger system could learn faster how to operate any sub-system.<br />
7) Single integration software infrastructure: A single visible software framework would simplify the<br />
understanding of the integration process for the sub-systems.<br />
Simplify software tasks during the operational phase: <strong>The</strong> framework architecture should take into account<br />
that support and maintenance tasks are foreseen during the experiment operational phase.<br />
8) Homogeneous online software infrastructure: In addition to simplify the understanding of the integration<br />
process, for the sub-system, a single integration software infrastructure would ease the creation of releases,<br />
the user support and maintenance tasks.<br />
A common technological approach, with the <strong>Trigger</strong> <strong>Supervisor</strong>, to design and develop sub-system expert<br />
tools, like graphical setups or command line utilities to control a concrete piece of hardware would also help<br />
to simplify the overall maintenance effort of the whole L1 trigger OSWI.<br />
9) Layered architecture: From the maintenance point of view, any additional development on top of XDAQ<br />
had to be designed such that it is easy to upgrade to new XDAQ versions or even to other distributed<br />
programming frameworks.<br />
4.3 Cell functional structure<br />
<strong>The</strong> “cell” is the main component of the additional software infrastructure motivated by the requirements not<br />
covered by XDAQ. This component serves as the main facility to integrate the sub-system’s OSWI with the<br />
<strong>Trigger</strong> <strong>Supervisor</strong>. Figure 4-1 shows the functional structure of the cell for a stable version of the TS<br />
framework.<br />
This functional structure is more detailed and it has a number of differences compared to the cell presented in the<br />
conceptual design chapter. <strong>The</strong> following sections describe in detail this architecture.<br />
4.3.1 Cell Operation<br />
A cell operation is basically a FSM running inside the cell which can be remotely operated. In general, FSM’s<br />
are applied to HEP control problems where it is necessary to monitor and control the stable state of a system.<br />
<strong>The</strong> TS services outlined in Chapter 3 were suitable candidates to use it.
<strong>Trigger</strong> <strong>Supervisor</strong> Framework 40<br />
Control panel<br />
plug‐in<br />
HTTP/CGI (GUI)<br />
Access Control<br />
SOAP<br />
Monitoring<br />
data source<br />
Monitorable<br />
item handler<br />
Operation<br />
plug‐in<br />
Operations<br />
factory<br />
Response Control<br />
Command<br />
factory<br />
Command<br />
plug‐in<br />
Sub‐system<br />
hardware<br />
driver<br />
Operations Pool<br />
Error<br />
Mgt.<br />
Module<br />
Commands Pool<br />
Sub‐system<br />
hardware<br />
driver<br />
Monitor<br />
Xhannel<br />
Data base<br />
Xhannel<br />
Cell<br />
Xhannel<br />
XDAQ<br />
Xhannel<br />
Figure 4-1: Architecture of the main component of the TS framework: <strong>The</strong> cell.<br />
To use a cell operation it is necessary to initialize an operation instance. <strong>The</strong> cell facilitates a remote interface to<br />
create instances of cell operations. Figure 4-2 shows a cell operation example with one initial state (S1), several<br />
normal states (S2 and S3), transitions between state pairs (arrows), and one event name assigned to each<br />
transition (e 1 , e 2 , e 3 and e 4 ).<br />
Operation events are issued by the controller in order to change the current state. <strong>The</strong> state changes when a<br />
transition named with the issued event and with origin in the actual state is successfully executed. A transition<br />
named with the event e i has two customizable methods: c i and f i . <strong>The</strong> method c i returns a boolean value. <strong>The</strong><br />
method f i defines the functionality assigned to a successful transition. In case c i returns false, the current state<br />
does not change and f i is not executed. If c i returns true, f i is executed and after this execution the actual state of<br />
the FSM changes.<br />
A first aspect to note is that each transition has two functions (f i , c i ). This design has been chosen to enforce a<br />
customization style that simplifies the implementation, the understanding and the maintenance of the transition<br />
code (f i ) whilst facilitating a progressive improvement of the necessary previous system check code (c i ). For<br />
instance, reading from a database and configuring a board would be a sequence of actions defined by the<br />
transition code, whilst checking that the board is plugged-in and the database is reachable, among other possible<br />
error conditions, would be defined in the check method.<br />
e 1<br />
e 2<br />
S1<br />
e 4<br />
S2<br />
e 3<br />
S3<br />
e i : if (c i ) then { f i , Move to next state }<br />
else { do not move }<br />
Warning_level = 1000<br />
Warning Message = “no message”<br />
Figure 4-2: Cell operation.
Cell functional structure 41<br />
Each operation has a warning object which provides a way to monitor the status of the operation. How this is<br />
updated with the execution of every new event, the warning object can also be used to provide feedback about<br />
the success level of the transition execution. A warning object contains a warning level and a warning message.<br />
<strong>The</strong> warning message is destined for human operators and the warning level is a numeric value that can be<br />
eventually processed by a remote machine controller.<br />
A number of operation specific parameters can be set. All of them are accessible during the definition and<br />
execution of any of f i ’s and c i ’s. <strong>The</strong> value of the parameters can be set by the controller when the operation is<br />
initialized or when the controller is sending an event. <strong>The</strong> type of the parameters can be signed or unsigned<br />
integer, string and boolean. <strong>The</strong> return message, after executing the transition methods, always includes a<br />
payload and the operation warning object. <strong>The</strong> payload data type can be any of the parameter types.<br />
Standard operations are provided with the TS framework for the implementation of the configuration and<br />
interconnection test services. <strong>The</strong> transition methods for these operations are left empty and each sub-system is<br />
responsible for defining this code. <strong>The</strong> TS services, presented in Chapter 6, appear as a coordinated collaboration<br />
of the different sub-system specific operations. Additional operations can be created by each sub-system to ease<br />
concrete commissioning and debugging tasks. For instance, an operation can be implemented to move data from<br />
memories to spy buffers in order to check the proper information processing in a number of stages.<br />
In order to simplify the understanding of the cell operation model, the intermediate states (Section 3.3.3.1)<br />
,representing the execution of the transition methods, are not visible in Figure 4-2. However, each transition has<br />
a hidden state which indicates that the transition methods are being executed.<br />
4.3.2 Cell command<br />
A cell command is a functionality of the cell which can be remotely called. Every command splits its<br />
functionality in two methods: the precondition() and the code(). <strong>The</strong> method precondition() returns a<br />
boolean value and the method code() defines the command functionality. In case precondition() returns<br />
false, the code() method is not executed. When precondition() returns true, the code() method is executed.<br />
<strong>The</strong> cell commands can have an arbitrary number of typed parameters which can be used within the command<br />
methods.<br />
Like the cell operation, the command has a warning object. This is used to provide a better feedback of the<br />
success level of the command execution. <strong>The</strong> warning object can be modified during the execution of the<br />
precondition() and/or code() methods.<br />
4.3.3 Factories and plug-ins<br />
A number of operations and commands are provided with the TS framework. <strong>The</strong>se can be enlarged with<br />
operations and command plug-ins. <strong>The</strong> operation factory and command factory are meant to create instances of<br />
the available plug-ins under the request of an authorized controller. Several instances of the same operation or<br />
command can be operated concurrently.<br />
4.3.4 Pools<br />
<strong>The</strong> cell command’s and operation’s pools are cell internal structures which store all operation and command<br />
instances respectively. Each instance of an operation and a command is identified with a unique name<br />
(operation_id and command_id). This identifier is used to retrieve and operate a specific instance.<br />
4.3.5 Controller interface<br />
Compared to the functional design presented in the conceptual design (Section 3.3.2), the input interfaces were<br />
limited to SOAP and HTTP/CGI (Common Gateway Interface 12 ). <strong>The</strong> I2O high performance interface was not<br />
12 <strong>The</strong> Common Gateway Interface (CGI) is a standard protocol for interfacing information servers, commonly a web server.<br />
Each time a request is received, the server analyzes what the request asks for, and returns the appropriate output. CGI can use<br />
the HTTP protocol as transport layer (HTTP/CGI).
<strong>Trigger</strong> <strong>Supervisor</strong> Framework 42<br />
added to the definitive architecture because, finally, it was just necessary to serve slow control requests. <strong>The</strong><br />
possibility to extend the input interface with a sub-system specific protocol was also eliminated because none of<br />
the sub-systems required it.<br />
Both interfaces (SOAP and HTTP/CGI) facilitate the initialization, destruction and operation of any available<br />
command and operation. <strong>The</strong> HTTP/CGI interface also provides access to all monitoring items in the sub-system<br />
cell and other cells belonging to the same distributed system (Section 4.4.4.12). <strong>The</strong> HTTP/CGI interface is<br />
automatically generated during the compilation phase. This simplifies the sub-system development effort and<br />
homogenizes the look and feel of all sub-system GUIs. This human-to-machine interface can be extended with<br />
control panel plug-ins (Section 4.4.4.11). A control panel is also a web-based graphical setup facilitated by the<br />
HTTP/CGI interface but with a customized look and feel. <strong>The</strong> default and automatically generated GUI provides<br />
access to the control panels.<br />
<strong>The</strong> second interface is a SOAP-based machine-to-machine interface. It is intended to facilitate the integration of<br />
the TS with the R<strong>CMS</strong> and to provide a communication link between cells. Appendix A presents a detailed<br />
specification of this interface.<br />
4.3.6 Response control module<br />
<strong>The</strong> Response Control Module (RCM) was not introduced in the conceptual design chapter. This cell functional<br />
module is meant to handle both synchronous and asynchronous responses with the controller side. <strong>The</strong><br />
synchronous protocol is intended to assure an exclusive usage of the cell and the asynchronous mode enables<br />
multi-user access and an enhanced overall system performance.<br />
4.3.7 Access control module<br />
<strong>The</strong> Access Control Module (ACM) is intended to identify and to authorize a given controller. A new controller<br />
trying to gain access to a cell will have to identify himself with a user name and a password. <strong>The</strong> ACM will<br />
check this information in the user’s database and will grant the user a session identifier. This session identifier<br />
will be stored and will be accessible from any cell. <strong>The</strong> session identifier is the key to those services that are<br />
granted to a concrete user. This key has to be sent with every new controller request.<br />
4.3.8 Shared resource manager<br />
<strong>The</strong> Shared Resource Manager (SRM), outlined in the conceptual design (Section 3.3.2), is no longer the unique<br />
responsible for coordinating the access to any internal or external resource. In the final design, the concurrent<br />
access to common resources, like the sub-system hardware driver, the communication ports with external entities<br />
is coordinated by each individual entity. <strong>The</strong> main reason for this approach is that it is not possible to assure that<br />
all requests pass through the cell.<br />
4.3.9 Error manager<br />
<strong>The</strong> Error Manager (ERM) is meant to detect any exceptional situation that could happen when a command or<br />
operation transition method is executed and the method is not able to solve the problem locally. In this case, the<br />
ERM takes the control over the method execution and sends back the reply message to the controller with textual<br />
information about what went wrong during the execution of the command or operation transition. This message<br />
is embedded in the warning object of the reply message (Appendix A).<br />
4.3.10 Xhannel<br />
<strong>The</strong> xhannel infrastructure has been designed to gain access to external resources from the cell command and<br />
operation methods. It provides a simple and homogeneous interface to a wide range of external services: other<br />
cells, XDAQ applications and web services. This infrastructure eases the definition of the command and<br />
operation transition methods by simplifying the process of creating SOAP and HTTP/CGI messages, processing<br />
the responses and handling synchronous and asynchronous protocols.
Implementation 43<br />
4.3.11 Monitoring facilities<br />
<strong>The</strong> TS monitoring infrastructure consists of a methodology to declare cell monitoring items and an additional<br />
infrastructure which facilitates the definition of the code to be executed every time that each item is being<br />
checked. <strong>The</strong> TS monitoring infrastructure is based on the XDAQ monitoring components.<br />
4.4 Implementation<br />
<strong>The</strong> TS framework is the implementation of the additional infrastructure required in the discussion of Section 4.2<br />
and formalized with a functional design in Section 4.3. <strong>The</strong> layered architecture of Figure 4-3 shows how the TS<br />
framework is implemented on top of the XDAQ middleware and a number of external software packages 13 . <strong>The</strong><br />
TS framework, together with the XDAQ middleware, is used to implement the <strong>Trigger</strong> <strong>Supervisor</strong> system.<br />
4.4.1 Layered architecture<br />
<strong>The</strong> L1 trigger OSWI has the layered structure shown in Figure 4-3. In this organization, the TS framework lies<br />
between a specific sub-system OSWI on the upper side, and the XDAQ middleware and other external packages<br />
on the lower side. Figure 4-4 shows the package level description of the L1 trigger OSWI. Each layer of Figure<br />
4-3 is represented by a box in Figure 4-4 and each box includes a number of packages. <strong>The</strong> dependencies among<br />
packages are also presented in Figure 4-4. Sections 4.4.2 to 4.4.4 present each of the layers outlined in Figure<br />
4-3.<br />
Figure 4-3: Layered description of a Level-1 trigger online software infrastructure.<br />
4.4.2 External packages<br />
This section describes the external packages used by the TS and XDAQ frameworks. <strong>The</strong> C++ classes contained<br />
in these packages are used to enhance the developments described in Section 4.4.<br />
4.4.2.1 Log4cplus<br />
Inserting user notifications, also known as “log statements”, into the code is a method for debugging it (Section<br />
3.2.1, Point 7) ). It may also be the only way for multi-threaded applications and distributed applications at large.<br />
Log4cplus is a C++ logging software framework modeled after the Java log4j API [78]. It provides precise<br />
context about a running application. Once inserted into the code, the generation of logging output requires no<br />
human intervention. Moreover, log output can be saved in a persistent medium to be studied at a later time.<br />
<strong>The</strong> Log4cplus package is used to facilitate the debugging of the TS system and to have a persistent register of<br />
the run time system behavior. This facilitates the development of post-mortem analysis tools. Logging facilities<br />
are also used to document and to monitor alarm conditions.<br />
13 A software package in object-oriented programming is a group of related classes with a strong coupling. A software<br />
framework can consist of a number of packages.
<strong>Trigger</strong> <strong>Supervisor</strong> Framework 44<br />
Figure 4-4: Software packages of the Level-1 trigger online software Infrastructure.<br />
4.4.2.2 Xerces<br />
Xerces [79] is a validating XML parser written in C++. Xerces eases C++ applications to read and write XML<br />
data. An API is provided for parsing, generating, manipulating, and validating XML documents. Xerces<br />
conforms to the XML 1.1 [80] recommendation. Xerces is used to ease the parsing of the SOAP request<br />
messages in order to extract the command and parameter names, the parameter values and other message<br />
attributes.<br />
4.4.2.3 Graphviz<br />
Graphviz [81] is a C++ framework for graph filtering and rendering. This library is used to draw the finite state<br />
machine of the cell operations.<br />
4.4.2.4 ChartDirector<br />
ChartDirector [82] is a C++ framework which enables a C++ application to synthesize charts using standard<br />
chart layers. This package is used to present the monitoring information.<br />
4.4.2.5 Dojo<br />
Dojo [83] is a collection of JavaScript functions. Dojo eases building dynamic capabilities into web pages and<br />
any other environment that supports JavaScript. <strong>The</strong> components provided by Dojo can be used to make web<br />
sites more usable, responsive and functional. <strong>The</strong> Dojo toolkit is used to implement the TS graphical user<br />
interface.
Implementation 45<br />
4.4.2.6 Cgicc<br />
Ccgicc [84] is a C++ library that simplifies the processing of the HTTP/CGI requests on the server side (the cell<br />
in our case). This package is used by the CellFramework, Ajaxell and sub-system cell packages to ease the<br />
implementation of the TS web-based graphical user interface.<br />
4.4.2.7 Logging collector<br />
<strong>The</strong> logging collector or log collector [85] is a software component that belongs to the R<strong>CMS</strong> framework<br />
(Section 1.4.1). It is designed and developed to collect logging information from log4j compliant applications<br />
and to forward these logging statements to several consumers at the same time. <strong>The</strong>se consumers can be: Oracle<br />
database, files or a real time message system. <strong>The</strong> log collector is not a component of the TS framework but it is<br />
used as a component of the TS logging system, a component of the TS system.<br />
4.4.3 XDAQ development<br />
XDAQ (pronounced Cross DAQ) was introduced in Section 1.4.3 as a domain-specific middleware designed for<br />
high energy physics data acquisition systems. It provides platform independent services, tools for local and<br />
remote inter-process communication, configuration and control, as well as technology independent data storage.<br />
To achieve these goals, the framework is built upon industrial standards, open protocols and libraries.<br />
This distributed programming framework is designed according to the object-oriented model and implemented<br />
using the C++ programming language. This infrastructure facilitates the development of scalable distributed<br />
software systems by partitioning applications into smaller functional units that can be distributed over multiple<br />
processing units. In this scheme each computing node runs a copy of an executive that can be extended at runtime<br />
with binary components. A XDAQ-based distributed system is therefore designed as a set of independent,<br />
dynamically loadable modules 14 , each one dedicated to a specific sub-task. <strong>The</strong> executive simply acts as a<br />
container for such modules, and loads them according to an XML configuration provided by the user.<br />
A collection of C++ utilities is available to enhance the development of XDAQ components: logging, data<br />
transmission, exception handling facilities, remote access to configuration parameters, thread management,<br />
memory management and communication among XDAQ applications.<br />
Some core components are loaded by default in the executive in order to provide basic functionalities. <strong>The</strong> main<br />
components of the XDAQ environment are the peer transports. <strong>The</strong>se implement the communication among<br />
XDAQ applications. Another default component is the Hyperdaq web interface application which turns an<br />
executive into a browsable web application that can visualize its internal data structure [86].<br />
<strong>The</strong> framework supports two data formats, one based on the I2O [87] specification and the other on XML. I2O<br />
messages are binary packets with a maximum size of 256 KB. I2O messages are primarily intended for the<br />
efficient exchange of binary information, e.g. data acquisition flow. Despite its efficiency the I2O scheme is not<br />
universal and lacks flexibility. A second type of communication has been chosen for tasks that require higher<br />
flexibility such as configuration, control and monitoring. This message-passing protocol, called Simple Object<br />
Access Protocol (SOAP) relies on the standard Web protocol (HTTP) and encapsulates data using the eXtensible<br />
Markup Language (XML). SOAP is a means to exchange structured data in the form of XML-based messages<br />
among computers over HTTP.<br />
XDAQ uses SOAP for a concept called Remote Procedure Calls (RPC). This means that the SOAP message<br />
contains an XML tag that is associated with a function call, a so called callback, at the receiver side. That way a<br />
controller can execute procedures on remote XDAQ nodes.<br />
<strong>The</strong> XDAQ framework is divided into three packages: Core Tools, Power Pack and Work Suite. <strong>The</strong> Core Tools<br />
package contains the main classes required to build XDAQ applications, the Power Pack package consists of<br />
pluggable components to build DAQ applications, and the Work Suite package contains additional infrastructure,<br />
totally independent of XDAQ, which is intended to perform some related data acquisition tasks.<br />
14 XDAQ component, module and application are equivalent concepts.
<strong>Trigger</strong> <strong>Supervisor</strong> Framework 46<br />
XDAQ example<br />
A XDAQ application is a C++ class which extends the base class xdaq::Application. It can be loaded into a<br />
XDAQ executive at run-time. Unlike ordinary C++ applications, a XDAQ application does not have a main()<br />
method as an entry point, but instead, has several methods to control specific aspects of its execution. Each of<br />
these methods can be assigned to a RPC in order to facilitate its remote execution.<br />
At the startup, a XDAQ executive can be configured passing the path of a configuration file as a command line<br />
argument. <strong>The</strong> configuration file contains the configuration information of the XDAQ executive. This file uses<br />
XML to hierarchically structure the configuration information in three levels:<br />
• Partition: Each configuration file contains exactly one partition that is a collection of XDAQ executives<br />
hosting XDAQ applications.<br />
• Context: Each context defines one XDAQ executive uniquely identified by its URL that is composed of<br />
host name and port. A partition may contain an arbitrary number of contexts. <strong>The</strong> tag inside the<br />
tag specifies the location of shared libraries that have to be loaded in order to make applications<br />
available.<br />
• Application: <strong>The</strong> tag uniquely identifies a XDAQ application. Each context can be<br />
composed of an arbitrary number of XDAQ applications. Applications can define properties using the<br />
tag. <strong>The</strong> application properties can be accessed at run-time.<br />
<strong>The</strong> cell is implemented as a XDAQ component or application. Figure 4-5 shows the configuration file of the<br />
Global <strong>Trigger</strong> (GT) cell. <strong>The</strong> GT cell is running on the first host configured with a number of properties. <strong>The</strong><br />
GT cell is compiled in one library located in the path given by tag. A second executive runs on a<br />
different host and contains one single application named Tstore.<br />
<br />
<br />
<br />
<br />
<br />
GT<br />
file://…<br />
<br />
<br />
file://…/libCell.so<br />
<br />
<br />
<br />
...-<br />
<br />
<br />
Figure 4-5: Example of XDAQ configuration file: GT cell configuration file.<br />
4.4.4 <strong>Trigger</strong> <strong>Supervisor</strong> framework<br />
<strong>The</strong> TS framework is the software layer built on top of XDAQ and the external packages. This software layer<br />
fills the gap between XDAQ and a suitable solution that copes with the project related human factors (Section<br />
3.2.2, Point 7) ), time constraints (Section 3.2.2, Point 5) ) and non-covered functional requirements (Section<br />
3.2.1, Point 10) ) discussed in the TS conceptual design. This solution has been developed according to the<br />
requirements discussed in Section 4.2 and the functional architecture presented in Section 4.3.
Implementation 47<br />
<strong>The</strong> components of the TS framework can be divided in two groups: the TS core framework and the sub-system<br />
customizable components. <strong>The</strong> TS core framework is the main infrastructure used by the customizable<br />
components. Figure 4-6 shows a Unified Modeling language (UML) diagram of the most important classes of<br />
the TS core framework and a possible scenario of derived or customizable sub-system classes.<br />
This section presents the structure of classes contained in the TS framework. Its description has been organized<br />
following the same structure of the cell functional description presented in Section 4.3. <strong>The</strong> implementation of<br />
each functional module is described as a collaboration of classes using the UML. <strong>The</strong> main classes that<br />
collaborate to form the cell functional modules are contained in the CellFramework package. This section<br />
presents also a number of packages developed specifically for this project: the CellToolbox package and a new<br />
library designed and developed to implement the TS Grapical User Interface. Finally, the database interfaces,<br />
and the integration of the XDAQ monitoring and logging infrastructures are presented.<br />
4.4.4.1 <strong>The</strong> cell<br />
A SubsystemCell class (or sub-system cell) is a C++ class that inherits from the CellAbstract class which in<br />
turn is a descendant of the xdaq::Application class. <strong>The</strong> fact that a sub-system cell is a XDAQ application<br />
allows the sub-system cell to be added to a XDAQ partition, then making it browsable through the XDAQ<br />
HTTP/CGI interface. <strong>The</strong> XDAQ SOAP Remote Procedure Calls (RPC’s) interface is also available to the subsystem<br />
cell. <strong>The</strong> RPC interface, implemented in the CellAbstract class, allows a remote usage of the cell<br />
operations and commands. <strong>The</strong> CellAbstract class is also responsible for the dynamic creation of<br />
communication channels between the cell and external services also known as “xhannels”. <strong>The</strong> xhannel run-time<br />
setup is done according to a XML file known as “xhannel list”. <strong>The</strong> CellAbstract class implements a GUI<br />
accessible through the XDAQ HTTP/CGI interface which can be extended with custom graphical setups called<br />
“control panels”.
<strong>Trigger</strong> <strong>Supervisor</strong> Framework 48<br />
<strong>Trigger</strong> <strong>Supervisor</strong> core framework<br />
xdaq::Application<br />
1<br />
1<br />
CellAbstractContext<br />
1<br />
1<br />
CellXhannel<br />
1<br />
1<br />
1<br />
1<br />
1<br />
CellCommandPort<br />
+run(in msg : ) :<br />
CellAbstract<br />
+addCommand()<br />
+addOperation()<br />
+addChannel()<br />
«uses»<br />
«uses»<br />
+createRequest() : CellXhannelRequest<br />
+removeRequest(in req : CellXhannelRequest) 1<br />
«uses»<br />
CellOperationFactory<br />
CellToolbox<br />
Ajaxell<br />
«instance»<br />
+createFromOperation(in name : string) : CellOperation<br />
1<br />
«instance»<br />
CellCommandFactory<br />
+createFromCommand() : CellCommand<br />
1<br />
CellXhannelRequest<br />
CellWarning<br />
«instance»<br />
CellPanelFactory<br />
+createPanel(in className : string) : CellPannel «uses»<br />
«instance»<br />
1<br />
1<br />
1<br />
1<br />
DataSource<br />
CellOperation<br />
CellCommand<br />
CellPannel<br />
+layout()<br />
Subsystem monitoring handlers<br />
SubsystemOperation<br />
SubsystemCommand<br />
SubsystemPanel<br />
SubsystemCell<br />
+layout()<br />
1<br />
1<br />
1<br />
1<br />
SubsystemContext<br />
1<br />
1<br />
1 1<br />
Subsystem<br />
OSWI<br />
Subsystem customizable classes and OSWI<br />
Figure 4-6: Components of the TS framework and sub-system customizable classes.<br />
4.4.4.2 Cell command<br />
A cell command, presented in Section 4.3.2, is an internal method of the cell that can be executed by an external<br />
entity or controller. <strong>The</strong>re are few default commands that allow a controller to remotely instantiate, control and<br />
kill cell operations. <strong>The</strong>se commands are presented in the following section. It is also possible to extend the<br />
default cell commands with sub-system specific ones.<br />
Figure 4-7 shows a UML diagram of the TS framework components involved in the creation of the cell<br />
command concept. <strong>The</strong> CellCommand class inherits from the CellObject class which provides access to the<br />
CellAbstractContext object and to the Logger object. <strong>The</strong> CellAbstactContext object is a shared object<br />
among all instances of CellObject in a given cell, in particular for all CellCommand and CellOperation<br />
instances. <strong>The</strong> CellAbstractContext provides access to the factories and to the xhannels. Through a dynamic
Implementation 49<br />
CellObject<br />
1<br />
CellAbstractContext<br />
-xhannels<br />
-factories<br />
1<br />
1<br />
Logger<br />
1<br />
CellCommand<br />
-paramList : xdata::serializable<br />
+run() : xoap::message<br />
+virtual init()<br />
+virtual code()<br />
+virtual precondition() : bool<br />
1 1<br />
CellWarning<br />
-message : xdata::serializable<br />
-level : xdata::serializable<br />
CellSubsystemCommand<br />
«uses»<br />
SubsystemContext<br />
-HWdriver SubCrate<br />
Figure 4-7: UML diagram of the main classes involved on the creation of the cell command concept.<br />
cast, it is also possible to access a sub-system specific descendant of the CellAbstractContext class (or just cell<br />
context). In some cases, the sub-system cell context gives access to a sub-system hardware driver. <strong>The</strong>refore, all<br />
CellCommand and CellOperation instances can control the hardware. <strong>The</strong> CellObject interface facilitates also<br />
access to the logging infrastructure through the logger object. Each CellCommand or CellOperation object has a<br />
CellWarning object.<br />
<strong>The</strong> CellCommand has one public method named run(). When this method is called, a sequence of three virtual<br />
methods is executed. <strong>The</strong>se virtual methods have to be implemented in the specific CellSubsystemCommand<br />
class: 1) the Init() method initializes those objects that will be used in the precondition() and code()<br />
methods (Section 4.3.2); 2) the precondition() method checks the necessary conditions to execute the<br />
command; and 3) the code() method defines the functionality of the command. <strong>The</strong> warning message and level<br />
can be read or written within any of these methods. Finally, the run() method returns the reply SOAP message<br />
which embeds a serialized version in XML of the code() method result and warning objects.<br />
4.4.4.3 Cell operation<br />
Figure 4-8 shows a UML diagram of the TS framework components involved in the creation of the cell operation<br />
concept.<br />
toolbox::lang::class<br />
CellObject<br />
1<br />
1<br />
Logger<br />
1<br />
CellAbstractContext<br />
-xhannels<br />
-factories<br />
CellCommand<br />
1<br />
OpSendCommand<br />
OpGetState<br />
CellFSM<br />
1 1<br />
CellOperation<br />
-paramList : xdata::serializable<br />
+apply(in CellCommand)<br />
+virtual initFSM()<br />
1 1<br />
CellWarning<br />
-message : xdata::serializable<br />
-level : xdata::serializable<br />
OpInit OpKill OpReset<br />
CellSubsystemOperation<br />
«uses»<br />
SubsystemContext<br />
-HWdriver SubCrate<br />
Figure 4-8: UML diagram of the TS framework components involved in the creation of the cell operation<br />
concept.
<strong>Trigger</strong> <strong>Supervisor</strong> Framework 50<br />
Like the CellCommand class, the CellOperation class is a descendant of the CellObject class. <strong>The</strong>refore, it has<br />
access to the logger object and to the cell context. <strong>The</strong> CellOperation class inherits also from<br />
toolbox::lang::class. This XDAQ class facilitates a loop that will run in an independent thread executing a<br />
concrete job defined in the CellOperation::job() method. This is known as the “cell operation work-loop”.<br />
An important member of the CellOperation class is the CellFSM attribute. This attribute implements the FSM<br />
defined in Section 4.3.1. <strong>The</strong> initialization code of the CellFSM class is defined in the initFSM() method of the<br />
CellSubsystemOperation class. This method defines the states, transitions and (f i , c i ) methods associated with<br />
each transition.<br />
An external controller can interact with the CellOperation infrastructure through a set of predefined cell<br />
commands: OpInit, OpSendCommand, OpGetState, OpReset and Opkill. <strong>The</strong> OpInit::code() method triggers in<br />
the cell the creation of a new CellOperation object. Once the CellOperation object is created the operation<br />
work-loop starts. This work-loop reads periodically from a given queue the availability of new events. If a new<br />
event arrives, it is then passed to the CellFSM object. This queue avoids losing any event and assures that the<br />
events are served orderly. <strong>The</strong> rest of predefined commands are considered events over existing operation<br />
objects. <strong>The</strong>refore, the code() method for these commands just pushes the command itself to the operation<br />
queue.<br />
4.4.4.4 Factories, pools and plug-ins<br />
Figure 4-9 shows the components involved in the creation of the factory, the pool and the plug-in concepts.<br />
<strong>The</strong>re are three types of factories: command, operation and panel factories. <strong>The</strong> factories are responsible for<br />
controlling the creation, destruction and operation of the respective items (operations, commands or control<br />
panels). Sub-system specific commands, operations and panels are also called plug-ins. <strong>The</strong> available<br />
commands, operations and panels in the factories can be extended at run-time using the CellAbstract::add()<br />
method.<br />
CellAbstractContext<br />
1<br />
1<br />
1 1 1<br />
CellOperationFactory<br />
+createFromOperation(in name : string) : CellOperation<br />
+add()<br />
1<br />
1<br />
CellCommandFactory<br />
*<br />
«instance»<br />
1<br />
+createFromCommand() : CellCommand<br />
+add()<br />
CellPanelFactory<br />
«instance»<br />
+createPanel(in className : string) : CellPannel<br />
+add()<br />
«instance»<br />
CellOperation CellCommand CellPannel<br />
*<br />
1 *<br />
1<br />
1<br />
CellAbstract<br />
+addCommand()<br />
+addOperation()<br />
+addChannel()<br />
SubsystemContext<br />
SubsystemOperation SubsystemCommand SubsystemPanel<br />
SubsystemCell<br />
Figure 4-9: TS framework components involved in the creation of the factory, the pool and the plug-in<br />
concepts.
Implementation 51<br />
<strong>The</strong> factories also play the role of pools. Each factory keeps track of the created objects and is responsible for<br />
assigning a unique identifier to each of them. After the object creation, this identifier is embedded in the reply<br />
SOAP message and sent back to the controller (Section 4.4.4.5 and Appendix A).<br />
4.4.4.5 Controller interface<br />
Figure 4-10 shows the components involved in the creation of the cell Controller Interface (CI). As it is shown in<br />
Section 4.4.4.1, sub-system cells are XDAQ applications and therefore are able to expose both a HTTP/CGI and<br />
a SOAP interface. <strong>The</strong> cell HTTP/CGI interface is defined in the CellAbstract class by overriding the<br />
default() virtual method of the xdaq::Application class. This method parses the input HTTP/CGI request<br />
which is available as a Cgicc input argument (Section 4.4.2.6). <strong>The</strong> HTTP/CGI response is written into the<br />
Cgicc output argument at the end of the default() method and is sent back by the executive to the browser. <strong>The</strong><br />
TS GUI is presented in Section 4.4.4.11.<br />
xdaq::Application<br />
1<br />
CellAbstract<br />
CellAbstractContext<br />
1<br />
+addCommand()<br />
+addOperation()<br />
+addChannel()<br />
+xoap::MessageReference guiResponse(xoap::MessageReference msg)()<br />
+xoap::MessageReference command(xoap::MessageReference msg)()<br />
+void Default(xgi::Input* in, xgi::Output* out)()<br />
«uses»<br />
SubsystemCell<br />
Ajaxell<br />
Figure 4-10: Components involved in the creation of the controller interface.<br />
A second interface is the SOAP interface. A non-customized cell is able to serve the default commands which<br />
allows to instantiate, control and kill cell operations. <strong>The</strong> cell SOAP interface and the callback routine assigned<br />
to each SOAP command are defined in the CellAbstract class. This interface is enlarged when a new command<br />
is added using the CellAbstract::addCommand() method.<br />
All SOAP commands are served by the same callback method CellAbstract::command(). This method uses the<br />
CommandFactory object to create a CellCommand object and executes the command public method<br />
CellCommand::run() (Section 4.4.4.2). <strong>The</strong> SOAP message object returned by the run() method is forwarded<br />
by the executive to the controller. Section 4.4.4.6 discusses in more detail the implementation of the synchronous<br />
and asynchronous interaction with the controller and the Appendix A presents the SOAP API from the controller<br />
point of view.<br />
4.4.4.6 Response control module<br />
Figure 4-11 shows a UML diagram of the classes involved in the implementation of the Response Control<br />
Module (RCM). <strong>The</strong> RCM implements the details of the communication protocols with a cell client or controller.<br />
A given controller has two possible ways to interact with the cell: synchronous and asynchronous (Appendix A).<br />
When the controller requests a synchronous execution of a cell command, it assumes that the reply message will<br />
be sent back when the command execution will have finished. A second way to interact with the cell is the<br />
asynchronous one. In this case, an empty acknowledge message will be sent back immediately to the controller<br />
and a second message will be sent back again when the execution of the command will be completed. <strong>The</strong><br />
asynchronous protocol allows implementing cell clients with an improved response time and facilitates the multiuser<br />
(or multi-client) functional requirement outlined in Section 3.2.1, Point 10). <strong>The</strong> asynchronous protocol
<strong>Trigger</strong> <strong>Supervisor</strong> Framework 52<br />
1<br />
CellAbstractContext<br />
1<br />
CellCommandFactory<br />
+createFromCommand() : CellCommand<br />
1<br />
1<br />
1<br />
CellCommandPort<br />
1<br />
+run(in msg) : <br />
«uses»<br />
CellAbstract<br />
+addCommand()<br />
+addOperation()<br />
+addChannel()<br />
toolbox::lang::class<br />
«instance»<br />
CellObject<br />
CellCommand<br />
«uses»<br />
SoapMessengeer<br />
+send(xoap::mesage)()<br />
Figure 4-11: UML diagram of the classes involved in the implementation of the Response Control Module.<br />
facilitates the multi-user interface because the single user SOAP interface provided by the XDAQ executive is<br />
leveraged immediately. However, the synchronous protocol is interesting for a given controller that wants to<br />
block the access to a given cell whilst it is using the cell.<br />
It was shown in Section 4.4.4.5 that all SOAP commands are served by the same callback routine defined in the<br />
method CellAbstract::command(). This method uses the CommandFactory object to create a CellCommand<br />
object and then executes the method CellCommand::run() which returns the SOAP reply message (Section<br />
4.4.4.2). In the synchronous case, the CellCommand::run() method returns just after executing the code()<br />
method. In the asynchronous case, the CellCommand::run() method returns immediately after starting the<br />
execution of the code() method which continues running in a dedicated thread. <strong>The</strong> asynchronous SOAP reply<br />
message is sent back to the controller by this thread when the code() method finishes. <strong>The</strong> thread is facilitated<br />
by the cell command inheritance from the toolbox::lang::class class. Figure 4-12 shows a simplified<br />
sequence diagram of the interaction between a controller and a cell using synchronous and asynchronous SOAP<br />
message protocols.
Implementation 53<br />
Cel<br />
SOAP message(async=true, cid=xyz)<br />
CellCommand 1<br />
CellCommand 2<br />
<br />
run(async=true)<br />
SOAP reply(ack, cid=xyz)<br />
Ack<br />
SOAP reply(result, cid=xyz)<br />
SOAP message(async=false, cid=xyz)<br />
<br />
run(async=false)<br />
SOAP reply(result, cid=xyz)<br />
result<br />
Figure 4-12: Simplified sequence diagram of the interaction between a controller and a cell using<br />
synchronous and asynchronous SOAP messages.<br />
4.4.4.7 Access control module<br />
<strong>The</strong> Access Control module (ACM) is not implemented in version 1.3 of the TS framework, although a place<br />
holder is available. <strong>The</strong> run() method of the CellCommandPort object (Figure 4-6) is meant to hide the access<br />
control complexity.<br />
4.4.4.8 Error management module<br />
<strong>The</strong> Error Management Module (EMM) catches all software exceptional situations not handled in the command<br />
and operation transition methods. When this method is executed due to a synchronous request message, the<br />
CellAbstract::command() method is responsible for catching any software exception. If one is caught, the<br />
method builds the reply message with the warning level equal to 3000 (Appendix A) and the warning message<br />
specifying the software exception. When the command or operation transition method is executed after an<br />
asynchronous request, all possible exceptions are caught in the same thread where the code() methods runs. In<br />
this second case, the thread itself builds the reply message with the adequate warning information.<br />
In case the cell dies during the execution of a given synchronous request, this will be detected on the client side<br />
because the socket connection between the client and cell would be broken. If the request is sent in asynchronous<br />
mode, the request message is sent through a socket which is closed just after receiving the acknowledge<br />
message. In this case, the reply message is sent through a second socket opened by the cell. <strong>The</strong>refore, the client<br />
is not automatically informed if the cell dies, and it is the client’s responsibility to implement a time-out or a<br />
periodic “ping” routine to check that the cell is still alive.<br />
4.4.4.9 Xhannel<br />
<strong>The</strong> xhannel infrastructure was implemented to simplify the access from a cell to external web service providers<br />
(SOAP, HTTP, etc.) like for instance other cells. <strong>The</strong> cell xhannels are designed to hide the concrete details of<br />
the remote service provider protocol and to provide a homogeneous and simple interface. This infrastructure<br />
eases decoupling the development of external services and the cell customization process.
<strong>Trigger</strong> <strong>Supervisor</strong> Framework 54<br />
Four different xhannels are provided: CellXhannelCell or xhannel to other cells, CellXhannelTB or xhannel to<br />
Oracle-based relational databases, CellXhannelXdaqSimple or xhannel to access to XDAQ applications through<br />
a SOAP interface, CellXhannelMonitor or xhannel to access to monitoring information collected in a XDAQ<br />
collector. Table 4-1 outlines the purpose of each of the xhannels.<br />
Xhannel class name<br />
Purpose (External service)<br />
CellXhannelCell To interact with other cells (Section 4.4.4.9.1)<br />
CellXhannelTB To interact with a Tstore application (Section 4.4.4.9.2)<br />
CellXhannelXdaqSimple To interact with a XDAQ application<br />
CellXhannelMonitor To interact with a monitor collector (Section 4.4.4.12)<br />
Table 4-1: Cell xhannel types and their purpose.<br />
Each CellXhannel class has an associated CellXhannelRequest class. CellXhannel classes are in charge of<br />
hiding the process of sending and receiving whilst the CellxhannelRequest classes are in charge of creating the<br />
SOAP or HTTP request messages and to parse the replies. All CellXhannel and CellXhannelRequest classes<br />
inherit respectively from the CellXhannel and CellXhannelRequest classes.<br />
4.4.4.9.1 CellXhannelCell<br />
For instance, the CellXhannelCell class provides access to the services offered by remote cells and the<br />
CellXhannelRequestCell class is used to create the SOAP request messages and to parse the replies. <strong>The</strong><br />
CellXhannelCell class can handle both synchronous and asynchronous interaction modes. <strong>The</strong> asynchronous<br />
reply is caught because the CellXhannelCell is also a XDAQ application which is loaded in the same executive<br />
as the cell. A callback method in charge of processing all the asynchronous replies assigns them to the<br />
corresponding CellXhannelRequestCell object.<br />
A usage example is shown in Figure 4-13. First, the CellXhannel pointer is obtained from the CellContext.<br />
Second, the CellXhannel object is used to create the request and the message (line 5). Third, the request is sent<br />
to the remote cell using the CellXhannelCell (line 7). And finally, when the reply is received (line 12), the<br />
request is destroyed (line 16).<br />
<strong>The</strong> definition of all available xhannels in a cell is made in a XML configuration file called “xhannel list”. When<br />
the cell is started-up, this file is processed and the xhannel objects are attached to the CellContext. Figure 4-14<br />
shows an example of xhannel list file. <strong>The</strong> xhannel list should be referenced from the sub-system configuration<br />
file as shown in Figure 4-5.
Implementation 55<br />
1 CellXhannelCell* pXC = dynamic_cast(contextCentral->getXhannel(“GT”));<br />
2 CellXhannelRequestCell* req=dynamic_cast(pXC->createRequest());<br />
3 map param;<br />
4 bool async = true;<br />
5 req->doCommand(currentSid_, async, “check<strong>Trigger</strong>Key”, param);<br />
6 try {<br />
7 pXhannelCell->send(req);<br />
8 } catch (xcept::Exception& e){<br />
9 pXhannelCell->remove(req);<br />
10 XCEPT_RETHROW(CellException, “Error sending request to Xhannel GT”,e);<br />
11 }<br />
12 while(!req->hasResponse()) sleepmillis(100);<br />
13 try {<br />
14 LOG4CPLUS_INFO(getLogger(), “GT key is “ + req->commandReply()->toString());<br />
15 } catch(xcept::Exception& e) {<br />
16 pXhannelCell->remove(req);<br />
17 XCEPT_RETHROW(CellException, “Parsing error in the GT reply”,e);<br />
18 }<br />
19 pXhannelCell->remove(req);<br />
Figure 4-13: Example of how to use the xhannel to send SOAP messages to the GT cell.<br />
<br />
<br />
<br />
DB<br />
tstore<br />
<br />
<br />
<br />
MON<br />
monitor<br />
<br />
<br />
<br />
GT<br />
cell<br />
<br />
<br />
<br />
Figure 4-14: Example of xhannel list file. This file corresponds to the central cell of the TS system and defines<br />
xhannels to the monitor collector, to a Tstore application and to the GT cell.<br />
4.4.4.9.2 CellXhannelTb<br />
<strong>The</strong> CellXhannelTB class is another case of the xhannel infrastructure. It simplifies the development of the<br />
command and operation transition methods that need to interact with an Oracle database server. <strong>The</strong>
<strong>Trigger</strong> <strong>Supervisor</strong> Framework 56<br />
XDAQ executive<br />
Cell 1<br />
XDAQ executive<br />
Cell 2<br />
XDAQ executive<br />
CellXhannelTB(SOAP)<br />
XDAQ executive<br />
Tstore<br />
OCCI<br />
Oracle DB<br />
Cell 3<br />
Figure 4-15: Recommended architecture to access a relational database from a cell.<br />
CellXhannelTB provides read and write (insert and update) access to the database. Figure 4-15 shows the<br />
recommended architecture to access a relational database from a cell using this communication channel.<br />
<strong>The</strong> CellXhannelTB sends SOAP requests to an intermediate XDAQ application named Tstore which is<br />
delivered with the XDAQ Power Pack package. Tstore allows reading and writing XDAQ table structures in an<br />
Oracle relational database. Tstore is the agreed solution for the <strong>CMS</strong> experiment as intermediate node between<br />
the sub-systems online software and the central <strong>CMS</strong> database server. It is designed to efficiently manage<br />
multiple connections with a central database server. <strong>The</strong> communication between Tstore and the server uses an<br />
Oracle proprietary protocol named OCCI.<br />
4.4.4.10 CellToolbox<br />
<strong>The</strong> CellToolbox package contains a number of classes intended to simplify the implementation of the cell.<br />
Table 4-2 presents the CellToolbox class list.<br />
Class name<br />
CellException<br />
CellToolbox<br />
CellLogMacros<br />
HttpMessenger<br />
SOAPMessenger<br />
4.4.4.11 Graphical User Interface<br />
Functionality<br />
Definition of the TS framework exception<br />
Several methods to create and parse SOAP messages<br />
Macros to insert log statements<br />
To send a HTTP request<br />
To send a SOAP message<br />
Table 4-2: Class list of the CellToolbox package.<br />
When a XDAQ executive is started-up, a number of core components are loaded in order to provide basic<br />
functionalities. One of the main core components is Hyperdaq. It facilitates a web interface which turns an<br />
executive into a browsable web application able to provide access to the internal data structure of any XDAQ<br />
application loaded in the same executive [86]. Any XDAQ application can customize its own web interface by<br />
overriding the default() virtual method of the xdaq::Application class (4.4.4.5). <strong>The</strong> web interface<br />
customization process requires developing Hypertext Markup Language (HTML) and JavaScript [88] code<br />
embedded in C++. Mixing three different languages in the same code has a cost associated with the learning<br />
curve because developers must learn two new languages, their syntax, best practices and the testing and<br />
debugging methodology using a web browser.
Implementation 57<br />
Command<br />
execution<br />
control<br />
Operation<br />
execution<br />
control<br />
Possible<br />
events<br />
Fish eye<br />
interface:<br />
logging,<br />
configuration<br />
database,<br />
support, …<br />
Control<br />
panels<br />
Operation<br />
parameters<br />
Monitoring<br />
information<br />
visualization<br />
Operation<br />
FSM<br />
Figure 4-16: Screenshot of the TS GUI. <strong>The</strong> GUI is accessible from a web browser and integrates the many<br />
services of the cell in a desktop-like fashion.<br />
Ajaxell [89] is a C++ library intended to smooth this learning curve. This library provides a set of graphical<br />
objects named “widgets” like “sliding windows”, “drop down lists”, “tabs”, buttons, “dialog boxes” and so on.<br />
<strong>The</strong>se widgets ease the development of web interfaces with a look-and-feel and responsiveness similar to the<br />
stand-alone tools executed locally or through remote terminals (Java Swing, Tcl/Tk or C++ Qt. See Section<br />
1.4.4). <strong>The</strong> web interface of the cell implemented in the CellAbstract::default() method uses the Ajaxell<br />
library. This is an out-of-the-box solution which does not require any additional development by the subsystems.<br />
Figure 4-16 shows the TS GUI. It provides several controls: i) to execute cell commands; ii) to<br />
initialize, operate, and kill cell operations; iii) to visualize monitoring information retrieved from a monitor<br />
collector; iv) to access to the logging record for audit trials and postmortem analysis; v) to populate the L1<br />
trigger configuration database; vi) to request support; and vii) to download documentation.<br />
<strong>The</strong> cell web interface fulfills the requirement of automating the generation of a graphical user interface (Section<br />
4.2.2). <strong>The</strong> default TS GUI can be extended with “control panels”. A control panel is a sub-system specific<br />
graphical setup, normally intended for expert operations of the sub-system hardware. <strong>The</strong> control panel<br />
infrastructure allows developing expert tools with the TS framework. This possibility opens the door for the<br />
migration of existing standalone tools (Section 1.4.4) to control panels, and therefore contributes to the<br />
harmonization of the underlying technologies for both the expert tools and the TS. This homogeneous<br />
technological approach has the following benefits: i) smoothing the learning curve of the operators, ii)<br />
simplification of the overall L1 trigger OSWI maintenance, and iii) enhancing the sharing of code and<br />
experience.<br />
<strong>The</strong> implementation of a sub-system control panel is equivalent to develop a SubsystemPanel class which<br />
inherits from the CellPanel class (Figure 4-6). This development consists of defining the<br />
SubsystemPanel::layout() method following the guidelines of the TS framework user’s guide and using the<br />
widgets of the Ajaxell library [90]. <strong>The</strong> example of the Global <strong>Trigger</strong> control panel is presented in Section<br />
6.5.1.<br />
4.4.4.12 Monitoring infrastructure<br />
<strong>The</strong> monitoring infrastructure allows the users of a distributed control system implemented with the TS<br />
framework to be aware of the state of the cells or of any of its components (e.g. CellContext, CellOperation,<br />
etc.). Once a monitoring item is declared and defined for one of the cells, it can be retrieved from any node of the
<strong>Trigger</strong> <strong>Supervisor</strong> Framework 58<br />
system. Actually the TS framework is using the monitoring infrastructure of XDAQ and one additional class<br />
(Datasource) to assist on the definition of the code that updates the monitoring data. <strong>The</strong> monitoring<br />
infrastructure has the following characteristics:<br />
• An interface to declare and define monitoring items (integers, strings and tables).<br />
• Centralized collection of monitoring data coming from monitoring items that belong to different cells of the<br />
distributed system.<br />
• Central collector provides HTTP/CGI to consumers of monitoring data.<br />
• Visualization of monitoring items history through tables and graphs from all GUI of the cells.<br />
4.4.4.12.1 Model<br />
<strong>The</strong> XDAQ monitoring model is no longer based on FSM’s as proposed in Section 3.3.3.4. Figure 4-17 shows a<br />
distributed monitoring system implemented with the TS framework. A central node known as monitor collector<br />
polls the monitoring information from each of the cells that has an associated monitor sensor. <strong>The</strong> monitor sensor<br />
forwards the requests to the cell and sends back to the collector the updated monitoring information. <strong>The</strong><br />
collector is responsible for storing this information and of providing a HTTP/CGI interface. <strong>The</strong> GUIs of the<br />
cells use the collector interface to read updated monitoring information from any cell.<br />
4.4.4.12.2 Declaration and definition of monitoring items<br />
<strong>The</strong> creation of monitoring items for a given cell consists of the monitoring items declaration and the monitoring<br />
update code definition. <strong>The</strong> declaration of a new monitoring item is accomplished by declaring this item in a<br />
XML file called “flashlist”. One of these files exists per cell. <strong>The</strong> declaration step also requires inserting the path<br />
to this file in the configuration file of the corresponding monitor sensor application and also of the central<br />
collector (Figure 4-18). Second, it is necessary to create the update code of the monitoring items using the<br />
DataSource class. <strong>The</strong> following sections present one example.<br />
PCI to VME<br />
External<br />
system<br />
d<br />
m<br />
h s<br />
xe sensor<br />
SOAP<br />
Http<br />
h<br />
mx<br />
Occi<br />
h<br />
Mon<br />
Collector Mstore<br />
s<br />
s<br />
Monitoring DB<br />
s<br />
Tstore<br />
o<br />
o<br />
Figure 4-17: Distributed monitoring system implemented in the TS framework. <strong>The</strong> monitor collector polls<br />
the cell sensor through the sensor SOAP interface, and the system cells read monitoring data stored in the<br />
collector using the HTTP/CGI interface.
Implementation 59<br />
<br />
<br />
<br />
<br />
Subsystem<br />
<br />
<br />
<br />
<br />
<br />
true<br />
<br />
<br />
${XDAQ_ROOT}/trigger/subsystem/ts/client/xml/flashlist1.xml<br />
<br />
<br />
<br />
<br />
<br />
<br />
Figure 4-18: Sub-system cell configuration file configures cell sensor with one flashlist named flashlist1.xml.<br />
Declaration<br />
Figure 4-19 presents an example of flashlist. This file declares three monitoring items: item1 of type string,<br />
item2 of type int (integer) and, table of type table. <strong>The</strong> monitoring items belong to the items group (or<br />
“infospace”) named monitorsource (see below: definition of monitoring items). <strong>The</strong> name of the infospace is<br />
the same as the name of the DataSource descendant class that is used to define the update code of the monitoring<br />
items.<br />
<strong>The</strong> tag embeds the definition of the different parameters that will use the monitor<br />
collector to poll monitoring information from the sensors. <strong>The</strong> most important attributes are:<br />
• Attribute every: Defines the sampling frequency (the time unit is 1 second).<br />
• Attribute history: If true, the monitor collector stores the history of past values.<br />
• Attribute range: Defines the size of the monitoring history in time units.<br />
Definition<br />
<strong>The</strong> classes involved in the definition of the monitoring item are shown in the UML diagram of Figure 4-20. <strong>The</strong><br />
monitor collector is responsible for periodically sending SOAP messages to the cell sensors requesting updated<br />
monitoring data. Each monitor sensor translates the SOAP request into an internal event that is forwarded to all<br />
objects created inside a given XDAQ executive that belong to a descendant class of xdata::ActionListener.
<strong>Trigger</strong> <strong>Supervisor</strong> Framework 60<br />
<strong>The</strong> DataSource class is a descendant of xdata::ActionListener. It is therefore able to process the incoming<br />
events by overriding the actionPerformed(xdata::Event&) method. This method is responsible for executing<br />
the MonitorableItem::refresh() method which gets the updated value for the monitoring item. A sub-system<br />
specific descendant of the DataSource is meant to contain the refresh methods for each of the monitoring items<br />
of the cell. <strong>The</strong> DataSource class is responsible also for creating the infospace object with the same name<br />
declared in the flashlist (Figure 4-19).<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
Figure 4-19: Declaration of monitoring items using a flashlist.<br />
xdata::ActionListerner<br />
xdaq::Application<br />
DataSource<br />
-std::map monitorables_<br />
-std::string infospaceName_<br />
-xdata::InfoSpace* infospace_<br />
CellAbstract<br />
1 1<br />
CellAbstractContext<br />
*<br />
+void DataSource::actionPerformed(xdata::Event& received)()<br />
1<br />
MonitorableItem<br />
-string name_<br />
-xdata::Serializable* serializable_<br />
-RefreshFunctionalSignature* refreshFunctional_<br />
Subsystem monitoring handlers<br />
1<br />
1<br />
SubsystemContext<br />
+refresh()()<br />
Figure 4-20: Components of the TS framework involved in the definition of monitoring items.
Implementation 61<br />
4.4.4.13 Logging infrastructure<br />
Each cell of a distributed control system implemented with the TS framework can send logging statements to a<br />
common logging database. Logging records can also be retrieved and visualized from any cell. Figure 4-21<br />
shows the logging model for a distributed control system implemented with the TS framework.<br />
<strong>The</strong> Architecture of the data logging model consists of the following components:<br />
• Logging database: A relational database stores the logging information that is sent from the logging<br />
collector. <strong>The</strong> logging database is set up according to the schema proposed for the entire <strong>CMS</strong> experiment.<br />
• Logging collector: <strong>The</strong> logging collector is part of the R<strong>CMS</strong> framework (Section 4.4.2.7). It is a hub that<br />
accepts logging messages via UDP protocol 15 . <strong>The</strong> collector filters the logging messages by logging level, if<br />
necessary, and relays them to other applications, databases or other instances of logging collector.<br />
• Logging console: A XDAQ application named XS included with the Work Suite package (Section 4.4.3) is<br />
used as logging console to retrieve the logging information from the database. This application lists logging<br />
sessions according to their cell session identifier. A session identifier is the identifier of a session that a<br />
given controller has initiated with a distributed control system implemented with the TS framework. <strong>The</strong><br />
logging console is able to display the logging messages. In addition, the user can filter the logging messages<br />
of each session using keywords.<br />
• Logging Macros: <strong>The</strong> TS framework provides macros to notify a log from inside the command and<br />
operation transition methods. <strong>The</strong>se macros accept a cell session identifier, a logger object and a message<br />
string. <strong>The</strong> cell session identifier is accessible in any command and operation. <strong>The</strong> logger object is<br />
accessible from any class descendant of CellObject class (Section 4.4.4.2).<br />
u<br />
h<br />
Cell xe<br />
d<br />
u<br />
XS<br />
o<br />
u<br />
Log<br />
Collector x<br />
c<br />
j<br />
Log<br />
u<br />
Collector<br />
x<br />
c<br />
j<br />
Chainsaw<br />
XML file<br />
Console<br />
PCI to VME<br />
Udp<br />
Occi<br />
Http<br />
j<br />
o<br />
Logging<br />
DB<br />
Figure 4-21: Logging model of a distributed control system implemented with the <strong>Trigger</strong> <strong>Supervisor</strong><br />
framework.<br />
15 User Datagram Protocol (UDP) is one of the core protocols (together with TCP) of the Internet protocol suite. Using UDP,<br />
programs on networked computers can send short messages sometimes known as datagrams to one another. UDP is<br />
sometimes called the Universal Datagram Protocol. UDP does not guarantee reliability or ordering in the way that the<br />
Transmission Control Protocol (TCP) does.
<strong>Trigger</strong> <strong>Supervisor</strong> Framework 62<br />
4.4.4.14 Start-up infrastructure<br />
<strong>The</strong> start-up infrastructure of the TS framework consists of one component, the job control (Section 1.4.1). This<br />
is a XDAQ application included as a component of the R<strong>CMS</strong> framework. <strong>The</strong> purpose of the job control<br />
application is to launch and terminate XDAQ executives. Job control is a small XDAQ application running on a<br />
XDAQ executive, which is launched at boot time. It exposes a SOAP API which allows launching another<br />
XDAQ executive with its own set of environment variables and terminating them. A distributed system<br />
implemented with the TS framework has a job control application running at all times in every host of the<br />
cluster. In this context, a central process manager would coordinate the operation of all job control applications<br />
running in the cluster.<br />
4.5 Cell development model<br />
<strong>The</strong> TS framework, together with XDAQ and the external packages, forms the software infrastructure that<br />
facilitated the development of a single distributed software system to control and monitor all trigger sub-systems<br />
and sub-detectors. This section describes how to implement a cell to operate a given sub-system hardware. <strong>The</strong><br />
integration of this node into a complex distributed control and monitoring system is exemplified with the TS<br />
system presented in Chapter 5.<br />
Install framework<br />
Do cell<br />
Do operation<br />
Loop<br />
Do command<br />
Prepare cell context<br />
Prepare xhannels<br />
Do monitoring item<br />
Compile & test<br />
Do control panel<br />
Figure 4-22: Usage model of the TS framework.<br />
Figure 4-22 schematizes the development model associated with the TS framework. It consists of a number of<br />
initial steps common to all control nodes, and an iterative process intended to customize the functionalities of the<br />
node according to the specific operation requirements.<br />
• Install framework: <strong>The</strong> TS and XDAQ frameworks have to be installed in the CERN Scientific Linux<br />
machine where the cell should run. <strong>The</strong> installation details are described in the <strong>Trigger</strong> <strong>Supervisor</strong><br />
framework user’s guide [63].<br />
• Do cell: Developing a cell consists of defining a class descendant of CellAbstract (Section 4.4.4.1).<br />
• Prepare cell context: <strong>The</strong> cell context, presented in Section 4.4.4.2, is a shared object among all<br />
CellObject objects that forms a given cell. <strong>The</strong> CellAbstractContext object contains the Logger, the<br />
xhannels and the factories. <strong>The</strong> cell context can be extended in order to store sub-system specific shared<br />
objects like a hardware driver. To extend the cell context it is necessary to define a class descendant of<br />
CellAbstractContext (e.g. SubsystemContext in Figure 4-6). <strong>The</strong> cell context object has to be created in<br />
the cell constructor and assigned to the context_ attribute. <strong>The</strong> cell context attribute can be accessed from<br />
any CellObject object, for instance a cell command or operation.<br />
• Prepare xhannel list file: <strong>The</strong> preparation of the xhannel list consists of defining the external web service<br />
providers that will be used by the cell: other cells, Tstore application to access the configuration database or<br />
any other XDAQ application (Section 4.4.4.9). Once the cell is running, the xhannels are accessible through<br />
the cell context object.
Performance and scalability measurements 63<br />
• Do plug-in: Additional cell operations (Section 4.4.4.3), commands (Section 4.4.4.2), monitoring items<br />
(Section 4.4.4.12) and control panels (Section 4.4.4.11) can be gradually implemented when they are<br />
required. <strong>The</strong> details are described in the corresponding sections and in the TS framework user’s guide [63].<br />
4.6 Performance and scalability measurements<br />
This section presents performance and scalability measurements of the TS framework. This discussion focuses<br />
on the most relevant framework factors that affect the ability to build a distributed control system complex<br />
enough to cope with the operation of O(10 2 ) VME crates and assuming that each crate is directly operated by one<br />
cell. <strong>The</strong>se factors are the remote execution of cell commands and operations using the TS SOAP API (Appendix<br />
A). <strong>The</strong> measurements are neither meant to evaluate external developments (i.e. monitoring, database, logging<br />
and start-up infrastructures) nor the responsiveness of the TS GUI which was presented in [90].<br />
4.6.1 Test setup<br />
Timing and scalability tests have been carried out in the <strong>CMS</strong> PC cluster installed in the underground cavern.<br />
<strong>The</strong> tests ran on 20 identical rack-mounted PCs (Dell Power Edge SC2850, 1U Dual Xeon 3GHz, hyperthreading<br />
and 64 bit-capable) equipped with 1 GB memory and connected to the Gigabit Ethernet private<br />
network of the <strong>CMS</strong> cluster. All hosts run CERN Scientific Linux version 3.0.9 [91] with kernel version<br />
2.4.21.40.EL.cernsmp and version 1.3 of the <strong>Trigger</strong> <strong>Supervisor</strong> framework.<br />
<strong>The</strong> most relevant factors of the cell command and operations are presented. In order to evaluate the scalability<br />
of each factor under test, five software distributed control system configurations have been set up. Table 4-3<br />
summarizes the setups.<br />
Setup name<br />
# of<br />
Hosts<br />
# of<br />
Level-0<br />
cells<br />
# of<br />
Level-1<br />
cells<br />
# of<br />
Level-2<br />
cells<br />
Central 1 1 0 0 1<br />
Central_10Level1 11 1 10 0 11<br />
Central_10Level1_10Level2 20 1 10 10 21<br />
Central_10Level1_20Level2 20 1 10 20 31<br />
Total<br />
# of<br />
cells<br />
Central_10Level1_100Level2 20 1 10 100 111<br />
Notes<br />
Level-2 cells are all in the<br />
same branch<br />
Level-2 cells are<br />
distributed in 2 branches<br />
Level-2 cells are equally<br />
distributed in 10 branches<br />
Table 4-3: System configuration setups.<br />
Each table row specifies a test setup. A test setup consists of a number cells organized in a hierarchical way.<br />
<strong>The</strong>re is always 1 level-0 cell or central cell which coordinates the operation of up to 10 level-1 cells, and as<br />
function of the setup the level-1 cells coordinate also a number of level-2 cells. Figure 4-23 presents the example<br />
of the Central_10Level1_20Level2 setup architecture. This setup consists of 1 central cell, 10 level-1 cells<br />
controlled by the central cell and 10 level-2 cells controlled by the first and the second level-1 cells.<br />
4.6.2 Command execution<br />
This section measures the remote execution of cell commands. This study has been carried out with the<br />
central_10Level1 setup. <strong>The</strong>se tests measure the necessary time for the central cell to remotely execute a number<br />
of commands in the first level-1 cell. Each measure starts when the first request message is sent from the central<br />
cell and it finishes when the last reply arrives.<br />
<strong>The</strong> first exercise measures the time to execute commands which have a code() method that does nothing.<br />
Figure 4-24 shows the test results.
<strong>Trigger</strong> <strong>Supervisor</strong> Framework 64<br />
s<br />
cx<br />
h<br />
Central<br />
Cell<br />
SOAP (CellXhannelCell)<br />
HTTP/CGI<br />
s h<br />
Level-1<br />
Cell 1<br />
cx<br />
s h<br />
Level-1<br />
Cell 2<br />
cx<br />
s h<br />
Level-1<br />
Cell 3<br />
…<br />
s h<br />
Level-1<br />
Cell 10<br />
s h s h s<br />
Level-2<br />
Branch 1 …<br />
Cell 2<br />
Level-2<br />
Branch 1<br />
Cell 1<br />
h<br />
Level-2<br />
Branch 1<br />
Cell 10<br />
s h s h s<br />
Level-2<br />
Branch 2 …<br />
Cell 2<br />
Level-2<br />
Branch 2<br />
Cell 1<br />
h<br />
Level-2<br />
Branch 2<br />
Cell 10<br />
Figure 4-23: Central_10Level1_20Level2 test setup architecture consists of 1 central cell, 10 level-1 cells<br />
controlled by the central cell and 10 level-2 cell controlled by the first and the second level-1 cells.<br />
<strong>The</strong> first conclusion that can be extracted from Figure 4-24 is that in both synchronous and asynchronous<br />
communication cases, the execution time scales linearly. A second conclusion is that there is a small time<br />
overhead due to the asynchronous protocol. For instance, the execution of 256 commands in synchronous mode<br />
takes 1.81 seconds whilst the execution of the same number of commands in asynchronous modes takes 1.94<br />
seconds. This overhead is due to the additional complexity of handling the asynchronous protocol in both the<br />
client (central cell) and the server (first level-1 cell). In synchronous mode the average time to execute a<br />
command is 7 ms, which is just a little bit better than the 7.7 ms obtained in asynchronous mode.<br />
time (s)<br />
2.5<br />
2<br />
1.5<br />
1<br />
0.5<br />
0<br />
Remote command execution with delta = 0<br />
0 50 100 150 200 250 300<br />
number of messages<br />
synchronous SOAP asynchronous SOAP<br />
Figure 4-24: Summary of performance tests to study the remote execution cell commands between the<br />
central cell and a level-1 cell.<br />
However, the importance of this overhead disappears when the performance test presents a more realistic<br />
scenario. In this new scenario the remote command executes a delay (delta). This delay in the code() method<br />
emulates for instance a hardware configuration sequence or a database access. Figure 4-25 summarizes the<br />
results of performance tests intended to study the remote execution of 256 cell commands between the central<br />
cell and a level-1 cell in synchronous and asynchronous mode (Y axis) and as a function of delta (X axis).<br />
<strong>The</strong> results in synchronous mode increase approximately linearly with the level-1 cell command delay (delta)<br />
whilst the results in asynchronous mode remain constant when delta increases. <strong>The</strong> performance advantage is
Performance and scalability measurements 65<br />
Remote execution of 256 commands as a function of delta<br />
30<br />
time (s)<br />
20<br />
10<br />
0<br />
0 0.02 0.04 0.06 0.08 0.1 0.12<br />
delta time (s)<br />
synchronous SOAP asynchronous SOAP<br />
Figure 4-25: Summary of performance tests to study the remote execution of 256 cell commands between<br />
the central cell and a level-1 cell in synchronous and asynchronous mode.<br />
visible for down to 2 messages and small deltas down to 20 milliseconds. This is a proof of the suitability of the<br />
asynchronous protocol to improve the overall performance of a given controller. This feature is very much<br />
appreciated during the configuration of the trigger sub-systems because the asynchronous protocol allows<br />
starting the configuration process in parallel in all the trigger sub-systems. <strong>The</strong>refore, the overall configuration<br />
time will be approximately the configuration time of the slowest sub-system rather than the addition of all<br />
configuration times.<br />
4.6.3 Operation instance initialization<br />
This section discusses the performance and scalability of the cell operation initialization. <strong>The</strong> test setups used for<br />
these measurements are: Central_10Level1, Central_10Level1_10Level2, Central_10Level1_20Level2 and<br />
Central_10Level1_100Level2. Each test consists of measuring the overall time necessary to initialize an<br />
operation in each node of the configuration setup. <strong>The</strong> measurement includes the operation initialization in the<br />
central cell plus the remote initialization in the sibling cells. <strong>The</strong> test finishes when the last reply message arrives<br />
at the central cell.<br />
Operation initialization<br />
time (s)<br />
1.6<br />
1.4<br />
1.2<br />
1<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
0 20 40 60 80 100 120<br />
number of nodes<br />
Figure 4-26: Total time to initialize an operation instance in all cells of a setup as a function of the number<br />
of cells.<br />
Figure 4-26 shows the results of measuring the total time to initialize an operation instance in each cell as a<br />
function of the number of cells in the setup, and Figure 4-27 shows the results of measuring the total time to<br />
initialize an operation instance in each cell as a function of the number cell levels in the setup. <strong>The</strong> tests are just<br />
done in the synchronous case because the operation initialization request is just available in synchronous mode
<strong>Trigger</strong> <strong>Supervisor</strong> Framework 66<br />
Operation initialization<br />
time (s)<br />
1.6<br />
1.4<br />
1.2<br />
1<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
Central_10Level1_20Level2<br />
Central_10Level1_100Level2<br />
0 0.5 1 1.5 2 2.5 3 3.5<br />
number of levels<br />
Figure 4-27: Total time to initialize an operation instance in all cells of a setup as a function of the number<br />
of cell levels. It is interesting to note that, due to the synchronous protocol, the number of cells in the setup<br />
define the total initialization time. E.g. Central_10Level1_20Level2 and Central_10Level1_100Level2 setups<br />
have different total initialization time despite having the same number of levels (3).<br />
(cell blocked). This interface constraint was set in order to assure that no operation events were received before<br />
the operations instance was created.<br />
<strong>The</strong> results show that the average time to initialize a cell operation is 13.4 ms. We can also conclude that the<br />
overall time to initialize one operation in each cell scales linearly with the number of cells independently of the<br />
cell levels.<br />
4.6.4 Operation state transition<br />
This section discusses the performance and scalability of the cell operation transition. <strong>The</strong> test setups used for<br />
these measurements are again: Central_10Level1, Central_10Level1_10Level2, Central_10Level1_20Level2 and<br />
Central_10Level1_100Level2. Each test consists of measuring the overall time necessary to execute an operation<br />
transition in each node of the configuration setup.<strong>The</strong> measurement includes the operation transition in the<br />
central cell plus the remote execution of an operation transition in the sibling cells. <strong>The</strong> test finishes when the<br />
last reply message arrives at the central cell. All cell operation transition methods have an internal delay of 1<br />
time (s)<br />
Operation transtion in synchronous mode<br />
120<br />
100<br />
80<br />
60<br />
40<br />
20<br />
0<br />
0 20 40 60 80 100 120<br />
number of nodes<br />
Figure 4-28: Total time to execute an operation transition in all cells of a setup as a function of the number<br />
of cells and in synchronous mode.
Performance and scalability measurements 67<br />
second. This time lapse is defined in milliseconds and is called “delta”. Delta is meant to emulate a hardware<br />
configuration sequence and/or a database access.<br />
Figure 4-28 shows the results of measuring the total time to execute an operation transition in all cells of a setup<br />
as a function of the number of cells in the setup and in synchronous mode. This Figure shows that, in<br />
synchronous mode, the overall execution time scales linearly with the number of cells and it is therefore<br />
independent of the cell levels as it is shown in Figure 4-29.<br />
Operation initialization<br />
time (s)<br />
1.6<br />
1.4<br />
1.2<br />
1<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
Central_10Level1_100Level2<br />
Central_10Level1_20Level2<br />
0 0.5 1 1.5 2 2.5 3 3.5<br />
number of levels<br />
Figure 4-29: Total time to execute an operation transition in all cells of a setup as a function of the cell levels<br />
and in synchronous mode. It is interesting to note that, due to the synchronous protocol, the number of cells<br />
in the setup define the total execution time. E.g. Central_10Level1_20Level2 and<br />
Central_10Level1_100Level2 setups have different total execution time despite having the same number of<br />
levels (3).<br />
Figure 4-30 shows the results of measuring the total time to execute an operation transition in all cells of a setup<br />
as a function of the number of cells and in asynchronous mode. This Figure shows that, in asynchronous mode,<br />
the overall execution time, for all test cases, is much better than for the synchronous case. This overall time is<br />
equal to adding the worst cases in each level (1 second per level of the test setup). Figure 4-31 shows how in<br />
asynchronous mode the overall execution time scales linearly with the number of levels.<br />
Operation transition in asynchronous mode<br />
time (s)<br />
3.5<br />
3<br />
2.5<br />
2<br />
1.5<br />
1<br />
0.5<br />
0<br />
0 20 40 60 80 100 120<br />
number of nodes<br />
Figure 4-30: Total time to execute an operation transition in all cells of a setup as a function of the number<br />
of cells and in asynchronous mode.
<strong>Trigger</strong> <strong>Supervisor</strong> Framework 68<br />
Operation transition in asynchronous mode<br />
time (s)<br />
4<br />
3<br />
2<br />
1<br />
0<br />
0 1 2 3 4<br />
number of levels<br />
Figure 4-31: Total time to execute an operation transition in all cells of a setup as a function of the number<br />
of cell levels and in asynchronous mode.
Chapter 5<br />
<strong>Trigger</strong> <strong>Supervisor</strong> System<br />
5.1 Introduction<br />
<strong>The</strong> TS system is a distributed software system, initially outlined in the TS conceptual design chapter (Section<br />
3.3). It consists of a set of nodes and the communication channels among them. <strong>The</strong> TS system is designed to<br />
facilitate a stable platform, despite hardware and software upgrades, on top of what the TS services can be<br />
implemented following a well defined methodology. This approach implements the “flexibility” non-functional<br />
requirement discussed in Section 3.2.2, Point 6).<br />
This chapter is organized in the following sections: Section 5.1 is the introduction; in Section 5.2 the system<br />
design guidelines are discussed; in Section 5.3 the system building blocks, the sub-system integration strategies<br />
and an overview of the system architecture are presented; Section 5.4 describes the TS control, monitoring,<br />
logging and start-up systems. Finally, the service development process associated with the TS system is<br />
discussed in Section 5.5.<br />
5.2 Design guidelines<br />
<strong>The</strong> TS system design principles, presented in this section, have two main sources of inspiration: i) the software<br />
infrastructure presented in Chapter 4, which consists of a number of external packages, the XDAQ middleware<br />
and the TS framework; ii) the functional and non-functional requirements described in the TS conceptual design,<br />
with an special attention to the “human context awareness” non-functional requirement (Section 3.2.2, Point 7) ),<br />
which already guided the design decisions of the TS framework.<br />
5.2.1 Homogeneous underlying infrastructure<br />
<strong>The</strong> design of the TS system is solely based on the software infrastructure presented in Chapter 4, which consists<br />
of a number of external packages, the XDAQ middleware and the TS framework. A homogeneous underlying<br />
software infrastructure simplifies the support and maintenance tasks during the integration and operational<br />
phases. Moreover, the concrete usage of the TS framework was encouraged in order to profit from a number of<br />
facilities designed and developed to fulfill additional functional requirements and to cope with the project human<br />
factors and the reduced development time (Section 4.2.2).<br />
5.2.2 Hierarchical control system architecture<br />
<strong>The</strong> TS control system has a hierarchical topology with a central cell that coordinates the operation of the lower<br />
level sub-system central cells. <strong>The</strong>se second level cells are responsible for operating the sub-system crate or to<br />
coordinate a third level of sub-system cells that finally operate the sub-system crates. A hierarchical TS control<br />
system eases the implementation of the following system level features:
<strong>Trigger</strong> <strong>Supervisor</strong> System 70<br />
1) Distributed development: Each sub-system has always one central cell exposing a well defined interface.<br />
This cell hides from the TS central cell the implementation details of the sub-system control infrastructure.<br />
This approach simplifies the role of a TS system coordinator because then s/he just needs to worry about the<br />
interface definition exposed by each sub-system central cell. <strong>The</strong> respective sub-system software responsible<br />
takes care to implement this interface. At the sub-system level, the development of the sub-system control<br />
infrastructure is further divided into smaller units following the same approach. This development<br />
methodology eased the central coordination tasks by dividing the system overall complexity into much<br />
simpler sub-systems which could be developed with a minimal central coordination.<br />
2) Sub-system control: <strong>The</strong> hierarchical design facilitates the independent operation of a given sub-system.<br />
This is possible by operating the corresponding sub-system central cell interface. This feature fulfills the<br />
non-functional requirement outlined in Section 3.2.2, Point 2).<br />
3) Partial deployment: <strong>The</strong> hierarchical design simplifies the partial deployment of the TS system by just<br />
deploying certain branches of the TS system. This is useful, for instance, to create a sub-system test setup.<br />
4) Graceful degradation: <strong>The</strong> hierarchical design facilitates a graceful degradation inline with the<br />
“Robustness” non-functional requirement stated in Section 3.2.2, Point 4). If something goes wrong during<br />
the system operation, only one branch of the hierarchy needs to be restarted.<br />
5.2.3 Centralized monitoring, logging and start-up systems architecture<br />
<strong>The</strong> TS framework uses the monitoring, logging and start-up infrastructure provided by the XDAQ middleware<br />
and the R<strong>CMS</strong> framework. This infrastructure is characterized by enforcing a centralized architecture. <strong>The</strong>refore,<br />
the TS monitoring, logging and startup systems cannot be a pure hierarchical systems, as proposed in Section<br />
3.3.3, due to the trade-off of reusing existing components.<br />
5.2.4 Persistency infrastructure<br />
<strong>The</strong> TS system requires a database infrastructure to store and retrieve configuration, monitoring and logging<br />
information. <strong>The</strong> following points present the design guidelines for this infrastructure.<br />
5.2.4.1 Centralized access<br />
A <strong>CMS</strong> wide architectural decision enforces the centralization of common services to access the persistency<br />
infrastructure. This common access points should facilitate a simple interface to the persistency infrastructure<br />
and should be responsible to manage the connections to the persistency server. <strong>The</strong> <strong>CMS</strong> database task force<br />
recommends using one single Tstore (Section 4.4.4.9.2) application for all nodes of the TS system.<br />
5.2.4.2 Common monitoring and logging databases<br />
<strong>The</strong> TS monitoring and logging systems (Sections 5.4.2 and 5.4.3) are based on XDAQ and R<strong>CMS</strong><br />
infrastructure. In this context, single monitor and logging collector applications gather periodically the<br />
monitoring and logging information respectively and facilitate an HTTP/CGI interface to any possible<br />
information consumer. <strong>The</strong>se collectors are also responsible to store the gathered information into the L1 trigger<br />
monitoring and logging databases. <strong>The</strong>se two databases are common to all L1 trigger sub-systems.<br />
5.2.4.3 Centralized maintenance<br />
All TS databases are maintained in the central <strong>CMS</strong> database server (Oracle database 10g Enterprise Edition<br />
Release 10.2.0.2, [92]) which is under the responsibility of the <strong>CMS</strong> and the CERN-IT database services.<br />
5.2.5 Always on system<br />
<strong>The</strong> TS configuration and monitoring services are used to operate the L1 trigger when the experiment is running<br />
but are also used during the integration, commissioning and test operations of the L1 trigger in standalone mode.<br />
In addition, the TS services to test each of the L1 trigger sub-systems and to check the inter sub-system
Sub-system integration 71<br />
connections and synchronization are required outside the experiment running periods. <strong>The</strong>refore, the TS system<br />
should always be available.<br />
5.3 Sub-system integration<br />
Figure 5-1 shows an overview of the TS system with the central node controlling twelve TS nodes, one per subsystem<br />
including all L1 trigger sub-systems and sub-detectors: the Global <strong>Trigger</strong> (GT), the Global Muon<br />
<strong>Trigger</strong> (GMT), the Drift Tube Track Finder (DTTF), the Cathode Strip Chamber Track Finder (CSCTF), the<br />
Global Calorimeter <strong>Trigger</strong> (GCT), the Regional Calorimeter <strong>Trigger</strong> (RCT), the Electromagnetic Calorimeter<br />
(ECAL), the Hadronic Calorimeter (HCAL), the Drift Tube Sector Collector (DTSC), the Resistive Plate<br />
Chambers (RPC), the Tracker and the Luminosity Monitoring System (LMS). This is the entry point for any<br />
controller that wishes to access only sub-system specific services. For some sub-systems, an additional level of<br />
TS nodes can be controlled by the sub-system central node.<br />
L‐1<strong>Trigger</strong><br />
FM<br />
Central Node<br />
TS Node<br />
TS Node<br />
TS Node TS Node TS Node<br />
TS Node<br />
TS Node<br />
TS Node<br />
TS Node<br />
Common Services:<br />
Logging, DB, Monitoring<br />
Persistency<br />
Infrastructure<br />
5.3.1 Building blocks<br />
<strong>The</strong> following sections present the building blocks used to build the TS system. <strong>The</strong> main role is played by the<br />
cell. In addition, XDAQ and the R<strong>CMS</strong> frameworks contribute with a number of secondary elements.<br />
5.3.1.1 <strong>The</strong> TS node<br />
Figure 5-1: Overview of the <strong>Trigger</strong> <strong>Supervisor</strong> system.<br />
<strong>The</strong> TS node, shown in Figure 5-2, is the basic unit of a distributed system implemented with the TS framework.<br />
It has three main components: the cell, the monitor sensor and the job control. <strong>The</strong> cell is the element that has to<br />
be customized (Section 4.5), the monitor sensor is a XDAQ application intended to interact with the monitor<br />
collector forwarding update requests to the cell and sending back to the monitor collector the updated monitoring<br />
information (Section 4.4.4.12). Finally, the job control is a building block of the start-up system (Section<br />
4.4.4.14).<br />
<strong>The</strong> cell has two input ports exposing respectively the cell SOAP (s) and HTTP/CGI (h) interfaces and four<br />
output ports corresponding to the monitoring (mx), database (dx), cell (cx) and XDAQ (xx) xhannels (Section<br />
4.4.4.9). <strong>The</strong> functionality of the cell is meant to be customized according to specific needs of each sub-system.<br />
<strong>The</strong> customization process consists of implementing control panel, commands and operations plug-ins, and<br />
adding monitoring items (Section 4.5). Those cells intended to directly control a sub-system crate should also<br />
embed the sub-system crate hardware driver (Section 4.5).
<strong>Trigger</strong> <strong>Supervisor</strong> System 72<br />
op<br />
s<br />
cp<br />
h<br />
m<br />
xe<br />
h<br />
Mon<br />
Sensor<br />
s<br />
xe<br />
Job<br />
control<br />
d<br />
mx dx<br />
c<br />
cx xx<br />
Figure 5-2: Components of a TS node. (s: SOAP interface, h: HTTP/CGI interface, xe: XDAQ executive,<br />
op: Operation plug-ins, c: Command plug-ins, m: monitoring item handlers, d: hardware driver, cp: control<br />
panel plug-in).<br />
<strong>The</strong> sub-system cells are meant to act as abstractions of the corresponding sub-system hardware. <strong>The</strong>se black<br />
boxes expose a stable SOAP API regardless of hardware and/or software upgrades. This facilitates a stable<br />
platform on top of which the TS services (Chapter 6) can be implemented. This approach allows largely<br />
decoupling the evolution of sub-system hardware and software platforms from changes in the operation<br />
capabilities offered by the TS.<br />
5.3.1.2 Common services<br />
<strong>The</strong> common services of the TS system, shown in Figure 5-3, are unique nodes of the distributed system which<br />
are used by all TS nodes. <strong>The</strong>se nodes are the logging collector, the Tstore, the monitor collector and the Mstore.<br />
u<br />
tc c<br />
Log<br />
Collector x<br />
u<br />
j<br />
xe<br />
s<br />
Tstore<br />
o<br />
h<br />
xe<br />
Mon<br />
Collector Mstore<br />
s<br />
s<br />
Figure 5-3: Common service nodes. (tc: Tomcat server, u: UDP interface, x: XML local file, j: JDBC<br />
interface, xe: XDAQ executive, s: SOAP interface, o: OCCI interface, h: HTTP/CGI interface).<br />
5.3.1.2.1 Logging collector<br />
<strong>The</strong> logging collector or log collector [85] is a software component that belongs to the R<strong>CMS</strong> framework. It is a<br />
web application written in Java and running on a Tomcat sever. It is designed and developed to collect logging<br />
information from log4j compliant applications and to distribute these logs to several consumers. <strong>The</strong>se<br />
consumers can be: an Oracle database, files, other log collectors or a real time message system. <strong>The</strong> log<br />
collector is part of the TS logging infrastructure (Section 4.4.4.13).<br />
5.3.1.2.2 Tstore<br />
<strong>The</strong> Tstore is a XDAQ application delivered with the XDAQ Power Pack package. Tstore provides a SOAP<br />
interface which allows reading and writing XDAQ table structures in an Oracle database (Section 4.4.4.9.2). <strong>The</strong><br />
<strong>CMS</strong> DataBase Working Group (DBWG) stated that having one single Tstore application for all cells of the TS<br />
system already assures a suitable management of the database connections.<br />
5.3.1.2.3 Monitor collector<br />
<strong>The</strong> monitor collector is also a XDAQ application delivered with the XDAQ Power Pack package. This XDAQ<br />
application periodically pulls from all TS system sensors the monitoring information of all items declared in the
Sub-system integration 73<br />
sub-system flashlist files. <strong>The</strong> collection of each flashlist can be performed at regular intervals by providing the<br />
collector a snapshot of the corresponding data values at retrieval time. Optionally, a history of data values can be<br />
buffered in memory at the collector node. This buffered data can be made persistent for later retrieval. <strong>The</strong><br />
interface between sensor and collector is SOAP. <strong>The</strong> collector also provides an HTTP/CGI interface to read the<br />
monitoring information coming from all the TS system. <strong>The</strong> monitor collector is part of the TS monitoring<br />
infrastructure (Section 4.4.4.12).<br />
5.3.1.2.4 Mstore<br />
<strong>The</strong> Mstore application is a XDAQ application delivered with the Work Suite package of XDAQ. This<br />
application takes flashlist data from a monitor collector and forwards it to a Tstore application for persistent<br />
storage in a database.<br />
5.3.2 Integration<br />
All sub-systems use the same building blocks, presented in Section 5.3.1, to integrate with the TS system.<br />
However, each sub-system follows a particular integration model which depends on a number of parameters<br />
related to either the sub-system Online SoftWare Infrastructure (OSWI) or to the sub-system hardware setup.<br />
This section presents the definition of all integration parameters, the description of the most relevant integration<br />
models and finally a summary of all the integration exercises.<br />
5.3.2.1 Integration parameters<br />
This section presents the sub-system infrastructure parameters which were relevant during the integration<br />
process with the TS system. <strong>The</strong>se have been separated in those related to the OSWI and those related to the subsystem<br />
hardware setup.<br />
5.3.2.1.1 OSWI parameters<br />
Usage of HAL<br />
This parameter defines the low level software infrastructure to access the sub-system custom hardware boards.<br />
<strong>The</strong> <strong>CMS</strong> recommendation to access VME boards is the Hardware Access Library (HAL [53]). HAL is a library<br />
that provides user-level access to VME and PCI modules in the C++ programming language. Most of the subsystems<br />
follow the <strong>CMS</strong> recommendation to access VME boards with the exception of the RCT and the GCT. In<br />
the GCT case, board control is provided by a USB interface and the GCT software infrastructure uses a USB<br />
access library. In the RCT case, a sub-system specific driver and user level C++ libraries were developed.<br />
C++ API<br />
On top of HAL or the sub-system specific hardware access library or driver, most of the sub-systems have<br />
developed a C++ library which offers a high level C++ API to control the hardware from a functional point of<br />
view.<br />
XDAQ application<br />
Some sub-systems have developed their own XDAQ application to remotely operate their hardware setups<br />
(Sectio 1.4.4). In some of these cases the sub-system XDAQ application is the visible interface to the hardware<br />
from the point of view of the cell.<br />
Scripts<br />
In addition to the compiled applications (i.e. C++ and XDAQ applications), some sub-systems have opted for an<br />
additional degree of flexibility enhancing their OSWI with interpreted scripts. Python and HAL sequences are<br />
being used. Scripts are used to define test procedures but also to define configuration sequences. <strong>The</strong>se<br />
configuration scripts used to mix the configuration code with the configuration data. In the final system,<br />
configuration data is retrieved separately from the configuration database. However, during the commissioning<br />
phase, some sub-systems retrieve configuration scripts from the configuration database. This is an acceptable
<strong>Trigger</strong> <strong>Supervisor</strong> System 74<br />
practice because it helps to decouple the continuous firmware updates with the maintenance of a consistent<br />
configuration database.<br />
5.3.2.1.2 Hardware setup parameters<br />
Bus adapter<br />
From the hardware point of view, the L1 trigger sub-system hardware is hosted in VME crates controlled by an<br />
x86/Linux machine. With few exceptions, the interface between the PC and the VME crate is done with a PCI to<br />
VME bus adapter [93].<br />
Hardware crate types and number<br />
<strong>The</strong>se parameters tell us how many different types of crates and how many units of each type have to be<br />
controlled. It was decided to have a one on one relationship between cells and crates. In other words, each cell<br />
controls one single crate and each crate is controlled by only one cell. This approach enhances the reusability of<br />
the same sub-system cells in different hardware setups. For instance:<br />
1) During the debugging phases, in the home institute laboratory, and during the initial commissioning<br />
exercises, when just one or few crates are available, a single cell controlling one single crate was developed<br />
in order to enhance the board debugging process. Afterwards, this cell has been reused as a part of a more<br />
complex control system.<br />
2) During the system deployment in its final location, when the complete hardware setup must be controlled,<br />
all individual cells implemented during the debugging and commissioning exercises were reused and<br />
integrated into the corresponding sub-system control system.<br />
Exceptions to this rule are the GT, the GMT and the RPC integration models. Board level cells were discarded<br />
due to the higher complexity of the resulting distributed control system. Just the control of one single crate with a<br />
number of boards would require a central cell which coordinated the operations of as many cells as boards.<br />
Hardware crate sharing<br />
This parameter tells us whether or not a given sub-system crate is shared by more than one sub-system. This has<br />
to be taken into account because to share a crate means also to share the bus adapter.<br />
5.3.2.2 Integration cases<br />
<strong>The</strong> TS sub-systems presented in the following sections are examples of the main different integration cases.<br />
Each integration case corresponds to a different L1 trigger sub-system or sub-detector, and it is defined by the<br />
parameters presented in Sections 5.3.2.1.1 and 5.3.2.1.2. <strong>The</strong> result of each integration case is a set of building<br />
blocks and the communication channels among them.<br />
5.3.2.2.1 Cathode Strip Chamber Track Finder<br />
<strong>The</strong> hardware setup of the CSCTF is one single VME crate controlled by a PCI to VME bus adapter. <strong>The</strong> OSWI<br />
consists of C++ classes built on top of the HAL library. <strong>The</strong>se classes offer a high level abstraction of the VME<br />
boards and facilitate their configuration and monitoring.<br />
<strong>The</strong> integration model for the CSCTF represents the simplest integration case. One single cell running in the<br />
CSCTF host was enough. <strong>The</strong> customization process of the CSCTF cell is based on using the C++ classes of the<br />
CSCTF OSWI to operate the crate.<br />
5.3.2.2.2 Global <strong>Trigger</strong> and Global Muon <strong>Trigger</strong><br />
<strong>The</strong> integration of the GT and the GMT represents a special case because despite being two different subsystems,<br />
they share the same crate.
Sub-system integration 75<br />
<strong>The</strong> integration model followed in this concrete case, shown in Figure 5-4, contradicts the rule of one cell per<br />
crate. In this case two cells access the same crate. Compared to the single cell integration model, this approach<br />
has several advantages:<br />
1) Smaller complexity: During the initial development process, we realized that the overall complexity of two<br />
individual cells was smaller than the complexity of one single cell. <strong>The</strong>refore, this solution was easier to<br />
maintain.<br />
2) Enhanced distributed development: <strong>The</strong> development work to integrate the GT and GMT sub-systems can<br />
be more easily split between two different developers working independently.<br />
3) Homogeneous architecture: <strong>The</strong> Interconnection test service between GT and GMT can be logically<br />
implemented like any other interconnection test service between two sub-systems hosted in different crates.<br />
Concerning the OSWI, it consists of C++ classes built on top of HAL. <strong>The</strong>refore, the definition of the cells<br />
command and operation transition methods is based on using this API.<br />
op<br />
s<br />
cp<br />
h<br />
m<br />
h<br />
xe<br />
Mon<br />
Sensor<br />
GT Cell<br />
op<br />
s<br />
cp<br />
h<br />
m<br />
h<br />
xe<br />
Mon<br />
Sensor<br />
GMT Cell<br />
xe<br />
s<br />
Job<br />
control<br />
d<br />
c<br />
d<br />
c<br />
GMT/GT Host<br />
mx dx<br />
cx xx<br />
mx dx cx xx<br />
PCI to VME bus adapter<br />
GMT/GT crate<br />
5.3.2.2.3 Drift Tube Track Finder<br />
Figure 5-4: Integration model used by the GT and GMT.<br />
<strong>The</strong> DTTF hardware setup consists of six identical track finder crates, one central crate and one clock crate. Due<br />
to limitations of the device driver specifications, it is not possible to have more than three PCI to VME interfaces<br />
per host. <strong>The</strong>refore, the six track finder crates are controlled by two hosts. An additional host controls the clock<br />
crate and the central crate. <strong>The</strong> OSWI is based on C++ classes built on top of HAL.<br />
Figure 5-5 shows the integration model followed by the DTTF. As usual, each crate is controlled by one cell.<br />
<strong>The</strong>re are four different cells: 1) track finder cell (TFC) which is in charge of controlling a track finder crate, 2)<br />
clock crate cell (CKC), 3) the central crate cell (CCC) and 4) the DTTF central cell (DCC) which is in charge of<br />
coordinating the operation of all other cells. <strong>The</strong> DCC provides a single access point to operate all DTTF crates<br />
and simplifies the implementation of the TS central cell.<br />
<strong>The</strong> customization process of the DTTF crate cells (i.e.: TFC, CKC and CCC) uses the C++ class libraries of the<br />
DTTF OSWI. <strong>The</strong>refore, all crate cells must run in the same hosts where the PCI to VME interfaces are pluggedin.
<strong>Trigger</strong> <strong>Supervisor</strong> System 76<br />
PCI to VME<br />
s<br />
xe<br />
Job<br />
control<br />
DTTF<br />
host 2<br />
s h s<br />
DCC xe sensor<br />
cx<br />
dx<br />
s<br />
xe Job<br />
control<br />
DTTF<br />
host 1<br />
DTTF<br />
host 3<br />
SOAP (CellXhannelCell)<br />
SOAP (CellXhannelTstore)<br />
Occi<br />
s<br />
xe<br />
Job<br />
control<br />
s h s<br />
TFC xe sensor<br />
d<br />
dx<br />
s h s<br />
TFC xe sensor<br />
d<br />
dx<br />
s h s s h s s h s s h s s h s s h s<br />
TFC xe sensor CKC xe sensor CCC xe sensor TFC xe sensor TFC xe sensor TFC xe sensor<br />
d d d d d d<br />
dx<br />
dx<br />
dx<br />
dx<br />
dx<br />
dx<br />
3 x Track finder crates<br />
•Clock crate<br />
•DIO<br />
Central crate:<br />
•DCC<br />
•TIM<br />
•FSC<br />
•Barrel sorter<br />
s<br />
3 x Track finder crates<br />
o<br />
Tstore<br />
Configuration<br />
DB<br />
o<br />
Figure 5-5: Integration model for the DTTF.<br />
5.3.2.2.4 Resistive Plate Chamber<br />
<strong>The</strong> OSWI of the RPC <strong>Trigger</strong> system consists of three different XDAQ applications that are used to control<br />
three different types of crates: 1) twelve RPC <strong>Trigger</strong> crates, 2) one RPC Sorter crate and 3) one RPC CCS/DCC<br />
crate.<br />
<strong>The</strong> integration model of the RPC with the TS is shown in Figure 5-6. In this case, the hardware interface is<br />
facilitated by XDAQ applications and these applications are operated by one cell, the RPC cell.<br />
s h s<br />
RPC<br />
xe sensor<br />
Cell<br />
xx<br />
dx<br />
s<br />
xe<br />
Job<br />
control<br />
RPC<br />
host 1<br />
s<br />
o<br />
Tstore<br />
Configuration<br />
DB<br />
o<br />
s<br />
RPC<br />
Xdaq<br />
app<br />
xe<br />
s<br />
RPC<br />
Xdaq<br />
app<br />
xe<br />
s<br />
RPC<br />
Xdaq<br />
app<br />
xe<br />
RPC<br />
host 2<br />
…<br />
RPC<br />
host 3<br />
RPC<br />
host 4<br />
s<br />
RPC<br />
Xdaq<br />
app<br />
xe<br />
s<br />
RPC<br />
Xdaq<br />
app<br />
xe<br />
RPC<br />
host 5<br />
PCI to VME<br />
SOAP (CellXhannelCell)<br />
12 x RPC <strong>Trigger</strong> crates<br />
RPC Sorter<br />
crate<br />
RPC<br />
CCS/DCC<br />
crate<br />
SOAP (CellXhannelTstore)<br />
Http<br />
Occi<br />
Figure 5-6: RPC integration model.<br />
5.3.2.2.5 Global Calorimeter <strong>Trigger</strong><br />
<strong>The</strong> Global Calorimeter <strong>Trigger</strong> (GCT) hardware setup consists of one main crate and three data source card<br />
crates. <strong>The</strong> particularity of this hardware setup is that all boards are controlled independently through a USB
Sub-system integration 77<br />
s h s<br />
GCT xe sensor<br />
Cell<br />
d<br />
xx dx<br />
s<br />
xe<br />
Job<br />
control<br />
GCT<br />
host 1<br />
s<br />
Tstore<br />
o<br />
Configuration<br />
DB<br />
o<br />
GCT<br />
host 2<br />
s<br />
GCT<br />
Xdaq<br />
app<br />
xe<br />
s<br />
GCT<br />
Xdaq<br />
app<br />
xe<br />
s<br />
GCT<br />
Xdaq<br />
app<br />
xe<br />
USB<br />
SOAP (CellXhannelCell)<br />
SOAP (CellXhannelTstore)<br />
Http<br />
Occi<br />
Main<br />
crate<br />
3 x Data source crates<br />
d: Python interpreter, Python configuration<br />
sequences, Python extension and USB driver<br />
interface. <strong>The</strong>refore, it is possible to control the four crates from one single host because the limitation of the<br />
CAEN driver does not exist.<br />
<strong>The</strong> OSWI consists of a C++ class library, a Python language extension and XDAQ applications. <strong>The</strong> low level<br />
OSWI for both the data source crates and the main crate is based on a C++ class library built on top of a USB<br />
driver. A second component of the GCT software is the Python extension that allows developing Python<br />
programs in order to create complex configuration, test sequences or simple hardware debugging routines<br />
without having to compile C++ code. <strong>The</strong> third component is a XDAQ application which allows remote access<br />
to the boards in the data source crates.<br />
Figure 5-7 shows the integration model followed by the GCT. This integration model maximizes the usage of the<br />
existing infrastructure. It consists of one single cell, which embeds a Python interpreter in order to execute<br />
Python sequences to configure the main crate. This same cell coordinates the operation of the data source crates<br />
through the remote SOAP interface of the GCT XDAQ applications.<br />
5.3.2.2.6 Hadronic Calorimeter<br />
Figure 5-7: GCT integration model.<br />
<strong>The</strong> HCAL sub-detector has its own supervisory and control system which is responsible for the configuration,<br />
control and monitoring of the sub-detector hardware and for handling the interaction with R<strong>CMS</strong> (Section 1.4.4).<br />
In addition to this infrastructure, a HCAL cell will provide the interface to the central cell to set the configuration<br />
key of the trigger primitive generator (TPG) hardware and to participate in the interconnection test service<br />
between the HCAL TPG and the RCT. <strong>The</strong> HCAL cell exposes also a SOAP interface that makes it easier for the<br />
HCAL supervisory software to read the information that is set by the central cell. <strong>The</strong> HCAL integration model<br />
is shown in Figure 5-8. This model is equally valid for the ECAL sub-detector.
<strong>Trigger</strong> <strong>Supervisor</strong> System 78<br />
s h<br />
Central<br />
Cell<br />
cx<br />
dx<br />
s<br />
<strong>Trigger</strong> <strong>Supervisor</strong> system<br />
o<br />
Tstore<br />
Configuration<br />
DB<br />
o<br />
s<br />
s<br />
s<br />
HCAL s<br />
manager<br />
HCAL s<br />
HCAL<br />
manager<br />
manager<br />
HCAL<br />
HCAL<br />
manager<br />
manager<br />
h<br />
HCAL<br />
Cell<br />
PCI to VME<br />
SOAP (CellXhannelCell)<br />
SOAP (CellXhannelTstore)<br />
Http<br />
Occi<br />
Subdetector<br />
control systems<br />
Figure 5-8: HCAL integration model.<br />
5.3.2.2.7 <strong>Trigger</strong>, Timing and Control System<br />
<strong>The</strong> TTC hardware setup (Section 1.3.2.4) consists of one crate per sub-system with as many TTCci boards as<br />
TTC partitions are assigned to the sub-system. Table 5-1 shows TTC partitions and TTCci boards assigned to<br />
each sub-system. Some sub-systems share the same TTC crate. This is the case of: 1) DTTF and DTSC, 2) RCT<br />
and GCT, and 3) CSC and CSCTF. <strong>The</strong> GT has no TTCci board because the GTFE board receives the TTC<br />
signals from the TCS directly through the backplane.<br />
Sub-system # of partitions Partition names #of TTCci<br />
Pixels 2 BPIX, FPIX 2<br />
Tracker 4 TIB/TID, TOB, TEC+, TEC- 4<br />
ECAL 6 EB+, EB-, EE+, EE-, SE+, SE- 6<br />
HCAL 5 HBHEa, HBHEb, HBHEc, HO, HF 5<br />
DT 1 DT 1<br />
DTTF 1 DTTF 1<br />
RPC 1 RPC 1<br />
CSCTF 1 CSCTF 1<br />
CSC 2 CSC+, CSC- 2<br />
GT 1 GT 0<br />
RCT 1 RCT 1<br />
GCT 1 GCT 1<br />
Totem and Castor 2 Totem, Castor 2<br />
Totals 28 27<br />
Table 5-1: TTC partitions.
Sub-system integration 79<br />
s<br />
cx<br />
h<br />
Central<br />
Cell<br />
dx<br />
<strong>Trigger</strong> <strong>Supervisor</strong> system<br />
s h<br />
ECAL<br />
supervisor<br />
s<br />
…<br />
s h<br />
Tracker<br />
supervisor<br />
s<br />
s h<br />
DTTF<br />
Central<br />
Cell<br />
cx dx<br />
s h<br />
GCT<br />
Cell<br />
cx dx<br />
…<br />
s h<br />
Ecal<br />
Cell<br />
cx dx<br />
s h<br />
Tracker<br />
Cell<br />
cx dx<br />
Subdetector<br />
control<br />
systems<br />
s h<br />
TTCci<br />
Cell<br />
xx<br />
dx<br />
s h<br />
TTCci<br />
Cell<br />
xx<br />
dx<br />
s h<br />
TTCci<br />
Cell<br />
xx<br />
dx<br />
s h<br />
TTCci<br />
Cell<br />
xx<br />
dx<br />
s<br />
o<br />
Tstore<br />
Configuration<br />
DB<br />
o<br />
s<br />
s<br />
TTCci<br />
Xdaq<br />
TTCci<br />
Xdaq<br />
PCI to VME<br />
SOAP (CellXhannelCell)<br />
SOAP (CellXhannelTstore)<br />
Http<br />
s<br />
TTCci<br />
Xdaq<br />
s<br />
TTCci<br />
Xdaq<br />
Occi<br />
…<br />
DT‐DTTF TTCci<br />
crate (1+1 board)<br />
GCT‐RCT TTCci<br />
crate (1+1 board)<br />
ECAL TTCci crate<br />
(6 board)<br />
Tracker TTCci crate<br />
(6 board)<br />
<strong>The</strong> Integration model for the TTCci infrastructure is shown in Figure 5-9. Every TTCci board is controlled by<br />
one TTCci XDAQ application. <strong>The</strong> central cell of each L1 trigger sub-system interacts with the TTCci XDAQ<br />
application through a TTCci cell. <strong>The</strong> TTCci cell retrieves the TTCci configuration information and passes it to<br />
the TTCci XDAQ application.<br />
<strong>The</strong> sub-detector TTCci boards are operated slightly differently. <strong>The</strong> sub-detector supervisory software interacts<br />
directly with the TTCci XDAQ application. <strong>The</strong> sub-detector central cell also has a TTCci cell which controls<br />
the TTCci XDAQ applications running in the sub-detector supervisory software tree. This additional control path<br />
is necessary to run TTC interconnection tests between the TCS module, located in the GT crate, and TTCci<br />
boards that belong to sub-detectors. <strong>The</strong> sub-detector TTCci cells can control more than one TTCci XDAQ<br />
application.<br />
<strong>The</strong> configuration of the L1 trigger sub-systems TTCci boards is driven by the TS. On the other hand, the<br />
configuration of the sub-detector TTCci boards is driven by the corresponding sub-detector supervisory software.<br />
5.3.2.2.8 Luminosity Monitoring System<br />
Figure 5-9: TTCci integration model.<br />
<strong>The</strong> Luminosity Monitoring System (LMS) provides beam luminosity information. <strong>The</strong> LMS cell uses the<br />
monitoring xhannel (Section 4.4.4.12) to retrieve information from the L1 trigger monitoring collector. This<br />
information is sent periodically to a LMS XDAQ application which gathers luminosity information from several<br />
sources and distributes it to a number of consumers, for instance the luminosity database. Figure 5-10 shows the<br />
LMS integration model.
<strong>Trigger</strong> <strong>Supervisor</strong> System 80<br />
HCAL<br />
software<br />
xe<br />
LMS software<br />
s h<br />
HTR<br />
Xdaq<br />
s<br />
s h<br />
Central<br />
Cell<br />
cx<br />
s<br />
LMS<br />
Cell<br />
dx<br />
s o<br />
Tstore<br />
h Mon<br />
Collector<br />
Configuration<br />
DB<br />
o<br />
PCI to VME<br />
SOAP (CellXhannelCell)<br />
SOAP (CellXhannelTstore)<br />
SOAP (CellXhannelMonitor)<br />
Http<br />
Occi<br />
xx<br />
mx<br />
xe<br />
s<br />
distributor<br />
Xdaq<br />
<strong>Trigger</strong><br />
<strong>Supervisor</strong><br />
Figure 5-10: LMS integration model.<br />
5.3.2.2.9 Central cell<br />
<strong>The</strong> central cell coordinates the operation of the sub-system central cells using the cell xhannel interface (Section<br />
4.4.4.9). Figure 5-11 shows the integration model of the central cell with the rest of sub-system central cells.<br />
s<br />
h<br />
Central<br />
Cell<br />
cx<br />
dx<br />
PCI to VME<br />
SOAP (CellXhannelCell)<br />
SOAP (CellXhannelTstore)<br />
s<br />
h<br />
s<br />
h<br />
s<br />
h<br />
s<br />
h<br />
Http<br />
GT<br />
Cell<br />
dx<br />
GMT<br />
Cell<br />
dx<br />
DTTF<br />
Cell<br />
dx<br />
…<br />
ECAL<br />
Cell<br />
dx<br />
s<br />
o<br />
Tstore<br />
Occi<br />
Configuration<br />
DB<br />
o<br />
Figure 5-11: Central cell integration model.<br />
5.3.2.3 Integration summary<br />
Table 5-2 summarizes the most important parameters that define the integration model for each of the subsystems<br />
including L1 trigger sub-systems and sub-detectors.
System integration 81<br />
Subsystem<br />
Online software related parameters HW setup parameters TS system parameters<br />
HAL<br />
C++<br />
API<br />
GT Yes Yes No<br />
GMT Yes Yes No<br />
XDAQ<br />
apps.<br />
GCT No (Usb) Yes Yes<br />
DTTF Yes Yes No No<br />
Scripts<br />
Yes<br />
(HAL)<br />
Yes<br />
(HAL)<br />
Yes<br />
(Python)<br />
Crates<br />
(type/#)<br />
Shared<br />
crates<br />
Cells<br />
(type/#)<br />
Integration case<br />
1 GT/GMT 1 Section 5.3.2.2.2<br />
1 GT/GMT 1 Section 5.3.2.2.2<br />
GC A (3),<br />
GC B (1)<br />
D A (6),<br />
D B (1),<br />
D C (1)<br />
No<br />
DTTF crates<br />
host DTSC<br />
receiver board<br />
GC A (3),<br />
GC B (1),<br />
CN(1)<br />
D A (6),<br />
D B (1),<br />
D C (1),<br />
CN(1)<br />
Section 5.3.2.2.5<br />
Section 5.3.2.2.3<br />
CSCTF Yes Yes No No 1 No 1 Section 5.3.2.2.1<br />
RCT No Yes No No<br />
R A (18),<br />
R B (1)<br />
DTSC Yes Yes No No D A (10)<br />
RPC Yes Yes Yes No<br />
RP A (12),<br />
RP B (1),<br />
RP C (1)<br />
No<br />
Receiver<br />
optical board<br />
in DTTF crate<br />
R A (18),<br />
R B (1),<br />
CN (1)<br />
DT A (10),<br />
CN (1)<br />
Section 5.3.2.2.3<br />
Section 5.3.2.2.3<br />
No 1 Section 5.3.2.2.4<br />
ECAL Yes Yes Yes No NA NA 1 Section 5.3.2.2.6<br />
HCAL Yes Yes Yes No NA NA 1 Section 5.3.2.2.6<br />
Tracker NA NA NA NA NA NA 1 Section 5.3.2.2.6<br />
LMS NA NA Yes NA NA NA 1 Section 5.3.2.2.8<br />
TTC Yes Yes Yes No 7<br />
DTTF/DTSC<br />
RCT/GCT<br />
CSCTF/CSC<br />
8 Section 5.3.2.2.7<br />
CC NA NA NA No NA NA 1 Section 5.3.2.2.9<br />
5.4 System integration<br />
Table 5-2: Summary of integration parameters.<br />
<strong>The</strong> TS system is formed by the integration of the local scope distributed systems presented in Section 5.3.2. <strong>The</strong><br />
TS system itself can be described as four distributed systems with an overall scope: the TS control system, the<br />
TS monitoring system, the TS logging system and the TS start-up system. <strong>The</strong> following sections describe the<br />
node structure and the communication channels among them for each of the four TS systems.<br />
5.4.1 Control system<br />
<strong>The</strong> TS control system (TSCS) and the TS monitoring system (TSMS) are the main distributed systems with an<br />
overall scope. <strong>The</strong>se two systems facilitate the development of the configuration, test and monitoring services
<strong>Trigger</strong> <strong>Supervisor</strong> System 82<br />
s<br />
cx<br />
h<br />
Central<br />
Cell<br />
dx<br />
s<br />
h<br />
s<br />
h<br />
Subsystem<br />
Central Cell<br />
cx dx<br />
…<br />
d<br />
dx<br />
s<br />
o<br />
Tstore<br />
Configuration<br />
DB<br />
o<br />
s h<br />
Crate<br />
Cell d<br />
dx<br />
s<br />
d<br />
dx<br />
h<br />
…<br />
s<br />
d<br />
dx<br />
h<br />
PCI to VME<br />
SOAP (CellXhannelCell)<br />
SOAP (CellXhannelTstore)<br />
Http<br />
Occi<br />
Figure 5-12: Architecture of the TS control system. (s: SOAP interface, h: HTTP/CGI interface, d: Hardware<br />
driver, cx: Cell xhannel interface (SOAP), dx: Tstore xhannel interface (SOAP), o: OCCI interface).<br />
outlined in the conceptual design. Figure 5-12 shows the TSCS. It consists of the sub-system cells, one Tstore<br />
application, the sub-system relational databases and the communication channels among all these nodes.<br />
<strong>The</strong> TSCS is a purely hierarchical control system where each node can communicate only with the immediate<br />
lower level nodes. <strong>The</strong> central node of the TSCS uses its cell xhannel interface to coordinate the operation of the<br />
sub-system central cells. Sub-system central cells are responsible to coordinate the operation over all sub-system<br />
crates. <strong>The</strong> crate operation is done through an additional level of cells when the sub-system has more than one<br />
crate, or directly when the sub-system is contained in one single crate. Each sub-system has its own relational<br />
database that can be accessed from the sub-system cell using the Tstore xhannel interface. All database queries<br />
sent through the Tstore xhannel are centralized into the Tstore application. This node’s task is to manage the<br />
connections with the database server and to translate the SOAP request messages into OCCI requests<br />
understandable by the Oracle database server (Section 4.4.4.9.2).<br />
<strong>The</strong> TSCS can be remotely controlled using the TS SOAP interface (Appendix A) or using the TS GUI. Both<br />
interfaces are accessible from any node of the TSCS. On the other hand, not all services are available in all the<br />
nodes. <strong>The</strong> central node of the TSCS facilitates access to the global level services, the sub-system central nodes<br />
facilitate the access to the sub-system level services and finally the crate cells facilitate the access to the crate<br />
level services. <strong>The</strong> TS services are discussed in Chapter 6.<br />
5.4.2 Monitoring system<br />
<strong>The</strong> TS monitoring, logging and start-up systems are not hierarchic. <strong>The</strong>se systems are very much dependent on<br />
existing infrastructure provided by the XDAQ middleware or R<strong>CMS</strong> framework. <strong>The</strong> usage model for this<br />
infrastructure is characterized by a centralized architecture (Section 4.4.4.12).<br />
<strong>The</strong> TS Monitoring System (TSMS), shown in Figure 5-13, is a distributed application intended to facilitate the<br />
development of the TS monitoring service. <strong>The</strong> TSMS consists of the same cells that participate in the TSCS, the<br />
sensor applications associated to each cell, one monitor collector application, one Mstore application, the Tstore<br />
application and the monitoring relational database.<br />
A TSCS cell that wishes to participate in the TSMS has to customize a class descendent of the DataSource class.<br />
This class defines the code intended to create the updated monitoring information. <strong>The</strong> monitor collector<br />
periodically requests from the sensors of the TSMS, through a SOAP interface, the updated monitoring<br />
information of all items declared in the flashlist files (Section 4.4.4.12). <strong>The</strong> Mstore application is responsible for
Services development process 83<br />
External<br />
system<br />
m<br />
h<br />
s<br />
xe sensor<br />
h<br />
Mon<br />
Collector<br />
s<br />
h<br />
m<br />
h s<br />
xe sensor<br />
mx<br />
…<br />
h s<br />
m xe sensor<br />
d<br />
mx<br />
h s<br />
m xe sensor<br />
h s<br />
m xe sensor<br />
…<br />
h s<br />
m xe sensor<br />
d<br />
mx<br />
d<br />
mx<br />
d<br />
mx<br />
Figure 5-13: Architecture of the TS monitoring system.<br />
embedding the collected monitoring information into a SOAP message and for sending it to the Tstore<br />
application in order to be stored in the monitoring database. A user of the TSCS can visualize any monitoring<br />
item of the TSMS with a web browser connected to the HTTP/CGI interface of any cell.<br />
5.4.3 Logging system<br />
Figure 5-14 shows the TS Logging System (TSLS). <strong>The</strong> logging records are generated by any node of the TSCS<br />
and stored in the logging database. <strong>The</strong> TSLS facilitates also a filtering GUI embedded in the TS GUI of any<br />
cell. It allows any user to follow the execution flow of the TS system.<br />
<strong>The</strong> TS logging collector is responsible for filtering the logging information and for sending it to its final<br />
destinations including the TS logging database. <strong>The</strong> persistent storage of logging records in the logging database<br />
facilitates the development of post-mortem analysis tools. <strong>The</strong> TS logging collector can also send the TS logging<br />
records to a number of destinations: i) a central <strong>CMS</strong> logging collector intended to gather all logging information<br />
from the <strong>CMS</strong> online software infrastructure, ii) a XML file and iii) a GUI-based log viewer (Chainsaw [94]).<br />
5.4.4 Start-up system<br />
Figure 5-15 shows the TS Start-up System (TSSS). <strong>The</strong> TSSS enables to remotely start-up the TSCS or any<br />
subset of its nodes. <strong>The</strong> TSSS consists of one job control application in each host of the TS cluster. Each job<br />
control application exposes a SOAP interface which allows starting or killing an application in the same host.<br />
<strong>The</strong> job control applications are installed as operative system services and are started-up at boot time. A central<br />
process manager coordinates the operation of the job control applications in order to start/stop the TS nodes.<br />
5.5 Services development process<br />
<strong>The</strong> <strong>Trigger</strong> <strong>Supervisor</strong> Control System (TSCS) and the <strong>Trigger</strong> <strong>Supervisor</strong> Monitoring System (TSMS) provide<br />
a stable layer on top of which the TS services have been implemented following a well defined methodology<br />
[95]. Figure 5-16 schematizes the TS service development model associated with the TS system. <strong>The</strong> following<br />
description explains each of the steps involved in the creation of a new service.
<strong>Trigger</strong> <strong>Supervisor</strong> System 84<br />
• Entry cell definition: <strong>The</strong> first step to implement a service is to designate the cell of the TSCS that<br />
facilitates the client interface. This cell is known as Service Entry Cell (SEC). When the service involves<br />
more than one sub-system, the SEC is the TS central cell. When the scope of the service is limited to a given<br />
sub-system, the SEC is the sub-system central cell. Finally, when the service scope is limited to a single<br />
crate, the SEC is the corresponding crate cell.<br />
• Operation states: <strong>The</strong> second step is to identify the operation states. <strong>The</strong>se represent the stable states of the<br />
system under control that wish to be monitored during the operation execution. For instance, a possible<br />
configuration operation intended to set up one single crate could have as many states as boards; and the<br />
successful configuration of each board could be represented as a different operation state.<br />
• Operation transition: Once the FSM states are known, the next step is to define the possible transitions<br />
among stable states and for each transition identify an event that triggers this transition.<br />
• Operation transition methods: For each FSM transition, the conditional and functional methods and<br />
associated parameters have to be defined. <strong>The</strong>se methods actually do the system state change. In case the<br />
SEC is a crate cell, these methods use the hardware driver, located in the cell context, to modify the crate<br />
state. When the SEC is a central cell, these methods use the xhannel infrastructure to operate lower level<br />
cells and XDAQ applications, and to read monitoring information. New services may require new<br />
operations, commands and monitoring items in lower level cells. <strong>The</strong> developer of the SEC is responsible<br />
for coordinating the required developments in the lower level cells.<br />
• Service test: <strong>The</strong> last step of the process is to test the service.<br />
Although changes to the L1 decision loop hardware and associated software platforms are expected during the<br />
operational life of the experiment, these changes may occur independently of the requirement of new services or<br />
the evolution of existing ones. <strong>The</strong> TS system is a software infrastructure that facilitates an stable abstraction of<br />
the L1 decision loop despite of hardware and software upgrades.<br />
<strong>The</strong> stable layer of the TS system enables the development coordination of new services uniquely following a<br />
well defined methodology, with very limited knowledge of the TS framework internals and independently of the<br />
hardware and software platform upgrades. This approach to coordinatie the development of new L1 operation<br />
capabilities fits the professional background and experience of managers and technical coordinators well.<br />
Chapter 6 presents the result of applying this methodology to implement the configuration and interconnection<br />
test services outlined in Section 3.3.3.<br />
u<br />
h<br />
xe<br />
Cell<br />
u<br />
XS<br />
o<br />
u<br />
Log<br />
Collector x<br />
c<br />
j<br />
Log<br />
u<br />
Collector<br />
x<br />
c<br />
j<br />
h<br />
xe<br />
Cell<br />
u<br />
XS<br />
o<br />
…<br />
Cell<br />
d<br />
h<br />
xe<br />
u<br />
XS<br />
o<br />
Chainsaw<br />
XML file<br />
Console<br />
Cell<br />
d<br />
h<br />
xe<br />
u<br />
XS<br />
o<br />
Cell<br />
d<br />
h<br />
xe<br />
u<br />
XS<br />
o<br />
…<br />
Cell<br />
d<br />
h<br />
xe<br />
u<br />
XS<br />
o<br />
j<br />
Logging<br />
DB<br />
o<br />
PCI to VME<br />
Figure 5-14: Architecture of the TS logging system.<br />
Udp<br />
Occi<br />
Http
Services development process 85<br />
Start-up<br />
manager<br />
s<br />
s<br />
xe<br />
Job<br />
control<br />
s<br />
xe<br />
Job<br />
control<br />
…<br />
xe<br />
s<br />
Job<br />
control<br />
xe<br />
s<br />
Job<br />
control<br />
xe<br />
s<br />
Job<br />
control<br />
…<br />
xe<br />
s<br />
Job<br />
control<br />
Figure 5-15: Architecture of the TS start-up system.<br />
Entry cell Operation states Operation transitions Operation transition methods Service test<br />
Figure 5-16: TS services development model.
Chapter 6<br />
<strong>Trigger</strong> <strong>Supervisor</strong> Services<br />
6.1 Introduction<br />
<strong>The</strong> TS services are the final <strong>Trigger</strong> <strong>Supervisor</strong> functionalities developed on top of the TS control and<br />
monitoring systems. <strong>The</strong>y have been implemented following the TS services development process described in<br />
Section 5.5. <strong>The</strong> functional descriptions outlined in Section 3.3.3 were initial guidelines. <strong>The</strong> logging and startup<br />
systems provide the corresponding final services and do not require any further customization process beyond<br />
the system integration presented in Section 5.4.<br />
Guided by the “controller decoupling” non-functional requirement presented in Section 3.2.2, Point 3), the TS<br />
services were totally implemented on top of the TS system and did not require the implementation of any<br />
functionality on the controller side. This approach to implement the TS services simplified the development of<br />
controller applications, and it eased the deployment and maintenance of the TS system and services.<br />
<strong>The</strong> goal of this chapter is to describe for each different service: the functionality seen by an external controller,<br />
the internal implementation details from the TS system point of view, and finally, the service operational use<br />
cases.<br />
This chapter has been organized in the following sections: Section 6.1 is the introduction, the configuration<br />
service is presented in Section 6.2, Section 6.3 is dedicated to the interconnection test service, Section 6.4<br />
describes the monitoring service, and finally, Section 6.5 presents the graphical user interfaces.<br />
6.2 Configuration<br />
6.2.1 Description<br />
<strong>The</strong> TS configuration service facilitates setting up the L1 trigger hardware. It defines the content of the<br />
configurable items: FPGA firmware, LUT’s, memories and registers. Figure 6-1 illustrates the client point of<br />
view to operate the L1 trigger with this service. In general, the TS Control System (TSCS) provides two<br />
interfaces to access the TS services: a SOAP based protocol for remote procedure calls (Appendix A) and the TS<br />
GUI based on the HTTP/CGI protocol (Section 4.4.4.11 and Section 6.5). Both interfaces to the central cell<br />
expose all TS services. <strong>The</strong> following description presents the service operation instructions without the SOAP<br />
or HTTP/CGI details.<br />
Up to eight remote clients can use this service simultaneously in order to set up the L1 trigger and the TTC<br />
system (Sections 1.3.2 and 1.4.5). <strong>The</strong> first client that connects to the central cell initiates a configuration<br />
operation and executes the first transition configure with a key assigned to the TSC_KEY parameter. <strong>The</strong> key<br />
corresponds to a full configuration of the L1 trigger which is common for all DAQ partitions. When the<br />
configure transition finalizes, the L1 trigger system should be in a well defined working state. Additional clients<br />
attempting to operate with the configuration service have to initiate another configuration operation and also to
<strong>Trigger</strong> <strong>Supervisor</strong> Services 88<br />
execute the configure transition. To avoid configuration inconsistencies, these additional clients have to provide<br />
the same configuration TSC_KEY parameter, otherwise they are not allowed to reach the configured state.<br />
All clients can execute the partition transition with a second key assigned to the TSP_KEY parameter and the<br />
run number assigned to the Run Number parameter. This key identifies the configurable parameters of the L1<br />
decision loop which are exclusive of the DAQ partition that the corresponding client is controlling. <strong>The</strong><br />
following list presents these parameters:<br />
• TTC vector: This 32 bit vector identifies the TTC partitions assigned to a DAQ partition.<br />
• DAQ partition: This number from 0 to 7 defines the DAQ partition.<br />
• Final-Or vector: This vector defines which algorithms of the trigger menu (128 bits) and technical triggers<br />
(64 bits) should be used to trigger a DAQ partition.<br />
• BX Table: This table defines which bunch crossings should be used for triggering and which fast and<br />
synchronization signals should be sent to the TTC partitions belonging to one DAQ partition.<br />
OpInit(“configuration”, “sesion_id1”, “opid_1“)<br />
OpInit(“configuration”, “sesion_id2”, “opid_2”)<br />
…<br />
OpInit(“configuration”, “sesion_id8”, “opid_8“)<br />
SOAP or Http/cgi (GUI)<br />
Configuration<br />
Operation<br />
plug‐in<br />
Operations<br />
factory<br />
CellContext<br />
String TSC_KEY;<br />
Bool isConfigured;<br />
Bool isEnabled;<br />
Central Cell<br />
partition(“opid_1”,TSP_KEY, Run Number)<br />
configure(“opid_1”, TSC_KEY)<br />
enable(“opid_1”)<br />
Operations Pool<br />
…<br />
suspend(“opid_1”)<br />
5) suspend(“opid_2”)<br />
5) suspend(“opid_8”)<br />
halted configured partitioned enabled suspended<br />
stop(“opid_1”)<br />
resume(“opid_1”)<br />
Data base<br />
Xhannel<br />
Cell<br />
Xhannel<br />
Figure 6-1: Client point of view of the TS configuration service.<br />
<strong>The</strong> enable transition starts the corresponding DAQ partition controller in the TCS module. <strong>The</strong> suspend<br />
transition temporally stops the partition controller without resetting the associated counters. <strong>The</strong> resume<br />
transition facilitates the recovery of the normal running state. Finally, the stop transition which can be executed<br />
from either the suspended or enabled states stops the DAQ partition controller and resets all associated<br />
counters.<br />
6.2.2 Implementation<br />
<strong>The</strong> configuration service requires the collaboration of the TSCS nodes, the Luminosity Monitoring Software<br />
System (LMSS), the sub-detectors supervisory and control systems (SSCS), and the usage of the L1 trigger<br />
configuration databases. All involved nodes are shown in Figure 6-2.
Configuration 89<br />
s h<br />
HTR<br />
Xdaq<br />
s<br />
s s<br />
HCAL<br />
manager<br />
HCAL<br />
s<br />
manager<br />
s<br />
…<br />
h Mon<br />
Mstore<br />
Collector<br />
s<br />
s<br />
cx<br />
h<br />
Central<br />
Cell<br />
dx<br />
PCI to VME<br />
SOAP (CellXhannelCell)<br />
SOAP (CellXhannelTstore)<br />
SOAP (CellXhannelMonitor)<br />
LMS<br />
software<br />
Sub-detector<br />
control<br />
systems<br />
(HCAL)<br />
s h<br />
HCAL<br />
Cell<br />
s<br />
LMS<br />
Cell<br />
xx<br />
mx<br />
s h<br />
<strong>Trigger</strong><br />
Sub-system<br />
Central Cell<br />
cx dx<br />
…<br />
s h<br />
GT<br />
Cell<br />
d<br />
dx<br />
s o<br />
Tstore<br />
Http<br />
Occi<br />
Configuration<br />
DB<br />
o<br />
s h<br />
TTCci<br />
Cell<br />
xx<br />
dx<br />
s h<br />
TTCci<br />
Cell<br />
xx<br />
dx<br />
s h<br />
Crate<br />
Cell<br />
d<br />
dx<br />
…<br />
s h<br />
Crate<br />
Cell<br />
d<br />
dx<br />
<strong>Trigger</strong> <strong>Supervisor</strong><br />
Control System<br />
s<br />
distributor<br />
Xdaq<br />
s<br />
TTCci<br />
Xdaq<br />
s<br />
TTCci<br />
Xdaq<br />
HCAL TTCci<br />
crate (5 boards)<br />
<strong>Trigger</strong> sub‐system<br />
TTCci crate (1 board)<br />
<strong>Trigger</strong> sub‐system crates<br />
GT crate<br />
Figure 6-2: Distributed software and hardware system involved in the implementation of the TS<br />
configuration and interconnections test services.<br />
6.2.2.1 Central cell<br />
<strong>The</strong> role of the central cell in the configuration services is twofold: to facilitate the remote client interface<br />
presented in Section 6.2.1 and to coordinate the operation of all involved nodes. Both the interface to the client<br />
and the system coordination are defined by the configuration operation installed in the central cell (Figure 6-1).<br />
This section describes the stable states, and the functional (f i ) and conditional (c i ) methods of the central cell<br />
configuration operation transitions (Section 4.3.1).<br />
Initialization()<br />
This method stores the session_id parameter in an internal variable of the configuration operation instance.<br />
This number will be propagated to lower level cells when a cell command or operation is instantiated. <strong>The</strong><br />
session_id is attached to every log record in order to help identify which client directly or indirectly executed a<br />
given action in a cell of the TSCS.<br />
Configure_c()<br />
<strong>The</strong> conditional method of the configure transition checks whether this is the first configuration operation<br />
instance. If this is the case, this method disables the isConfigured flag, iterates over all cell xhannels accessible<br />
from the central cell and initiates a configuration operation in all trigger sub-system central cells with the same<br />
session_id provided by the client. If one of these configuration operations cannot be successfully started this<br />
method returns false, the functional method of the configure transition is not executed and the operation state<br />
stays halted. This method does not retrieve information from the configuration database.<br />
In case this is not the first configuration operation instance, this method checks if the parameter TSC_KEY is equal<br />
to the variable TSC_KEY stored in the cell context. If this is different, the configure transition is not executed and<br />
the operation state stays halted. Otherwise, this method enables the isConfigured flag, returns true and the<br />
functional method of the configure transition is executed.
<strong>Trigger</strong> <strong>Supervisor</strong> Services 90<br />
Configure_f()<br />
<strong>The</strong> functional method for this transition performs the following steps:<br />
1. If the isConfigured flag is false, the method executes steps 2, 3, 4 and 5. Otherwise this method does<br />
nothing.<br />
2. To read from the TSC_CONF table, shown in Figure 6-3, of the central cell configuration database the row<br />
with the unique identifier equal to TSC_KEY. This row contains as many identifiers as sub-systems have to be<br />
configured (sub-system keys). If a sub-system shall not be configured, the corresponding position in the<br />
TSC_KEY row is left empty.<br />
3. To execute the configure transition in each sub-system central cell sending as a parameter the sub-system<br />
key. This transition is not executed in those sub-systems with an empty key. Section 6.2.2.2 presents the<br />
configuration operation of the sub-system central and crate cells.<br />
4. To store in the cell context the current TSC_KEY.<br />
TSC_CON<br />
F<br />
TSC_KEY<br />
GT_KEY<br />
GMT_KEY<br />
DTTF_KEY<br />
CSCTF_KEY<br />
GCT_KEY<br />
RCT_KEY<br />
RPCTrig_KEY<br />
ECAL_TPG_KEY<br />
HCAL_TPG_KEY<br />
DT_TPG_KEY<br />
GT_CONF<br />
GT_KEY<br />
…<br />
GMT_CONF<br />
GMT_KEY<br />
…<br />
DTTF_CONF<br />
DTTF_KEY<br />
…<br />
CSCTF_CONF<br />
CSCTF_KEY<br />
…<br />
GTL_CONFIG<br />
GTL_KEY<br />
GTL_FW_KEY<br />
GTL_REG_KEY<br />
GTL_SEQ_KEY<br />
URL_TRIG_MENU<br />
Figure 6-3: L1 configuration database structure is organized in a hierarchic way. <strong>The</strong> main table is named<br />
TSC_CONF.<br />
Partition_c()<br />
This method performs the following steps:<br />
1. To read from TSP_CONF table (Figure 6-4) the row with the unique identifier equal to TSP_KEY. This row<br />
points to the hardware configuration parameters that affect just the concrete DAQ partition, namely: the 32<br />
bits TTC vector, the DAQ partition identifier, the 128 + 64 bits vector of the final or and the bunch crossing<br />
table.<br />
2. To use the GT cell commands to check that the DAQ partition and the TTC partitions are not being used. If<br />
there is an inconsistency, this method returns false, the functional method of the partition transition is not<br />
executed and the operation state stays configured. Section 6.2.2.3.1 presents the GT cell commands.<br />
Partition_f()<br />
This method performs the following steps:<br />
1. To read from TSP_CONF table the row with the unique identifier equal to TSP_KEY.<br />
2. To execute the GT cell commands (Section 6.2.2.3.1) in order to:<br />
a. Set up the DAQ partition dependent parameters retrieved in the first step.<br />
b. Reset the DAQ partition counters.<br />
c. Assign the Run Number parameter to the DAQ partition.
Configuration 91<br />
TSP_CONF<br />
TSP_KEY<br />
TTC_VECTOR<br />
FIN_OR<br />
DAQ_PARTITION<br />
BC_TABLE<br />
Figure 6-4: <strong>The</strong> database table that stores DAQ partition dependent parameters is named TSP_CONF.<br />
Enable_c()<br />
This method checks whether this is the first configuration operation instance. If this is the case, this method<br />
disables the isEnabled flag. Otherwise, this method enables the isEnabled flag and checks in all trigger subsystem<br />
central cells that the configuration operation is in configured state.<br />
Enable_f()<br />
<strong>The</strong> functional method of the enable transition performs the following steps:<br />
1. If the isEnabled flag is disabled, the method executes steps 2 and 3. Otherwise this method only executes<br />
step 3.<br />
2. To execute the enable transition in the configuration operation of all sub-systems central cells. This enables<br />
the trigger readout links with the DAQ system and the LMS software.<br />
3. To execute the GT cell commands to start the DAQ partition controller in the TCS module.<br />
Suspend_c()<br />
This method checks nothing.<br />
Suspend_f()<br />
This method executes in the GT cell a number of commands that simulate a busy sTTS signal (Section 1.3.2.4) in<br />
the corresponding DAQ partition. <strong>The</strong> procedure stops the generation of L1A’s and TTC commands in this DAQ<br />
partition. Section 6.2.2.3 presents these commands.<br />
Resume_c()<br />
This method checks nothing.<br />
Resume_f()<br />
This method executes in the GT cell a command that disables the simulated busy sTTS signal that was enabled in<br />
the functional method of the suspend transition. Section 6.2.2.3.1 presents these commands.<br />
Stop_c()<br />
This method checks nothing.<br />
Stop_f()<br />
This method executes in the GT cell the command to stop a given DAQ partition (Section 6.2.2.3.1).<br />
Destructor()<br />
This method is executed when the remote client finishes using the configuration operation service and destroys<br />
the configuration operation instance. <strong>The</strong> destructor method of the last configuration operation destroys the<br />
configuration operations running in the sub-system central cells. This stops the trigger readout links with the<br />
DAQ system and the LMS software.
<strong>Trigger</strong> <strong>Supervisor</strong> Services 92<br />
6.2.2.2 <strong>Trigger</strong> sub-systems<br />
Each trigger crate is configured by a configuration operation running on a dedicated cell for that crate (Section<br />
5.3.2.1.2). A configuration operation provided by the sub-system central cell coordinates the operation over all<br />
crate cells. When a trigger sub-system consists of one single crate, the central cell and the crate cell are the same.<br />
A complete description of all integration scenarios was presented in Section 5.3.2.2.<br />
Figure 6-5 shows the configuration operation running in all trigger sub-system cells. <strong>The</strong> description of the<br />
functional and conditional methods depends on whether it is a cell crate or not. This is a generic description that<br />
can be applied to any trigger sub-system. It is not meant to provide the specific hardware configuration details of<br />
a concrete trigger sub-system. Specific sub-system configuration details can be checked in the code itself [96].<br />
This section describes the stable states, and the functional (f i ) and conditional (c i ) methods of the trigger subsystem<br />
cell configuration operation transitions. This description includes the sub-system central and crate cell<br />
cases.<br />
OpInit(“configuration”, “session_id”, “opid“)<br />
configure(“opid”, KEY) enable(“opid”) suspend(“opid”)<br />
halted configured enabled suspended<br />
Initialization()<br />
This method stores the session_id parameter in an internal variable of the configuration operation instance. If<br />
the current operation instance was started by the central cell, the session_id is the same as the one provided by<br />
the central cell client.<br />
Configure_()<br />
If the operation runs in the trigger sub-system central cell, this method iterates over all available cell xhannels<br />
and initiates a configuration operation in all crate cells and TTCci cells (if the trigger sub-system has a TTCci<br />
board). If the operation runs in a crate cell, this method checks if the hardware is accessible using the hardware<br />
driver.<br />
If one of these configuration operations cannot be successfully started or the hardware is not accessible, this<br />
method returns false, the functional method of the configure transition is not executed and the operation state<br />
stays halted.<br />
Configure_f()<br />
resume(“opid”)<br />
Figure 6-5: <strong>Trigger</strong> sub-system configuration operation.<br />
<strong>The</strong> functional method for this transition performs the following steps:<br />
1. To read from the trigger sub-system configuration database the row with the unique identifier equal to KEY.<br />
If the operation runs in the trigger sub-system central cell, this row contains as many identifiers as crate<br />
cells. If a crate cell is not going to be configured, the corresponding position in the KEY row is left empty. If<br />
the operation runs in a crate cell, this row contains configuration information, links to firmware or look up<br />
table (LUT) files and/or references to additional configuration database tables. Section 6.2.2.3 presents the<br />
GT configuration database example.<br />
2. If the operation runs in the trigger sub-system central cell, this method executes in each crate cell and TTCci<br />
cell the configure transition sending as a parameter the crate or TTCci key. If the operation runs in a crate<br />
cell, the configuration information is retrieved from the configuration database using the database xhannel.<br />
<strong>The</strong> crate is configured with this information using the hardware driver.
Configuration 93<br />
Enable_c()<br />
If the operation runs in the trigger sub-system central cell, this method iterates over all available cell xhannels<br />
and checks if the current state is configured. If the operation runs in a crate cell, this method checks if the<br />
hardware is accessible using the hardware driver.<br />
If one of these configuration operations is not in the configured state or the hardware is not accessible, this<br />
method returns false, the functional method of the enable transition is not executed and the operation state<br />
stays configured.<br />
Enable_f()<br />
If the operation runs in the trigger sub-system central cell, this method iterates over all available cell xhannels<br />
and executes the enable transition. If the operation runs in a crate cell, this method configures the hardware in<br />
order to enable the readout link with the DAQ system.<br />
Suspend_c()<br />
If the operation runs in the trigger sub-system central cell, this method iterates over all available cell xhannels<br />
and checks if the current state is enabled. If the operation runs in a crate cell, this method checks if the hardware<br />
is accessible using the hardware driver.<br />
If one of these configuration operations is not in the enabled state or the hardware is not accessible, this method<br />
returns false, the functional method of the suspend transition is not executed and the operation state stays<br />
enabled.<br />
Suspend_f()<br />
If the operation runs in the trigger sub-system central cell, this method iterates over all available cell xhannels<br />
and executes the suspend transition. If the operation runs in a crate cell, this method configures the hardware in<br />
order to disable the readout link with the DAQ system.<br />
Resume_c()<br />
If the operation runs in the trigger sub-system central cell, this method iterates over all available cell xhannels<br />
and checks if the current state is suspended. If the operation runs in a crate cell, this method checks if the<br />
hardware is accessible using the hardware driver.<br />
If one of these configuration operations is not in the suspended state or the hardware is not accessible, this<br />
method returns false, the functional method of the resume transition is not executed and the operation state<br />
stays suspended.<br />
Resume_f()<br />
If the operation runs in the trigger sub-system central cell, this method iterates over all available cell xhannels<br />
and executes the resume transition. If the operation runs in a crate cell, this method configures the hardware in<br />
order to enable again the readout link with the DAQ system.<br />
Destructor()<br />
<strong>The</strong> destructor method of the trigger sub-system central cell configuration operation is executed by the destructor<br />
method of the last TS central cell configuration operation. If the operation runs in the trigger sub-system central<br />
cell, this method iterates over all available cell xhannels and destroys all configuration operations. If the<br />
operation runs in a crate cell, this method configures the hardware in order to disable the readout link with the<br />
DAQ system.<br />
6.2.2.3 Global <strong>Trigger</strong><br />
<strong>The</strong> GT cell operates the GT where L1A decisions are taken based on trigger objects delivered by the GCT and<br />
the GMT (Section 1.3.2.3). <strong>The</strong> GT cell plays a special role in the configuration of the L1 trigger. It facilitates a<br />
set of cell commands used by the central cell configuration operation and an implementation of the trigger subsystem<br />
configuration operation presented in Section 6.2.2.2. This section presents the interface of the GT cell<br />
[97] involved in the configuration and the interconnection test services.
<strong>Trigger</strong> <strong>Supervisor</strong> Services 94<br />
6.2.2.3.1 Command interface<br />
<strong>The</strong> GT command interface is used by the configuration and interconnection test operations running in the<br />
central cell, and also by the GT control panel (Section 6.5.1). <strong>The</strong> command interface has been mostly designed<br />
according to the needs of these clients. <strong>The</strong> commands can be classified as a function of the GT boards: <strong>Trigger</strong><br />
Control System (TCS), Final Decision Logic (FDL) and Global <strong>Trigger</strong> Logic (GTL).<br />
FDL commands<br />
<strong>The</strong> FDL is one of the GT modules that are configured during the partition transition of the central cell<br />
configuration operation (Section 6.2.2.1). For instance, to set up the Final-Or of the FDL for a given DAQ<br />
partition, to monitor the L1A rate counters for each of the 192 L1A’s (FDL slice) coming from the GTL or to<br />
apply a pre-scaler to a certain algorithm or technical trigger.<br />
NAME TYPE VALID VALUES<br />
Number of slice<br />
xdata::UnsignedShort<br />
<strong>The</strong> number of FDL slices depends on the firmware. Currently there are 192<br />
slices foreseen on the FDL. Valid values for the parameter are therefore [0:191].<br />
DAQ partition xdata::UnsignedShort <strong>The</strong> Number of DAQ partitions is 8. <strong>The</strong>refore valid values are between [0:7].<br />
Pre-scale factor<br />
Update step size<br />
Bit for refresh rate<br />
xdata::UnsignedLong<br />
xdata::UnsignedLong<br />
xdata::UnsignedShort<br />
Value of the pre-scaler for a slice that is determined by a 16 bit register. Range<br />
of valid values is [0:65535].<br />
Value of the update step size is determined by a 16 bit register. Range of valid<br />
values is [0:65535].<br />
Each of 8 bits refers to a different multiplicity that is defined in the firmware of<br />
the FDL. Valid values are between [0:7].<br />
Table 6-1: Description of parameters used in FDL commands.<br />
SetFinOrMask:<br />
Description:<br />
Parameters:<br />
Return value:<br />
Each slice can be added to the Final-Or of one or more DAQ partitions. This command<br />
adds or removes a specific slice to or from a DAQ partition’s Final-Or according to the<br />
”Enable for Final-Or” parameter.<br />
Number of slice<br />
Number of DAQ partition<br />
Enable for Final-Or<br />
Slice number: ”Number of slice” ”enabled/disabled” for Final-Or in DAQ partition<br />
number: ”Number of DAQ partition”<br />
GetFinOrMask:<br />
Description:<br />
Parameters:<br />
Return value:<br />
Reads out whether a slice is currently part of the Final-Or of a certain DAQ partition.<br />
Number of slice<br />
Number of DAQ partition<br />
xdata::Boolean<br />
SetVetoMask:<br />
Description:<br />
Parameters:<br />
Each slice can suppress a L1A for one or more DAQ partitions. This command enables<br />
or disables that mechanism for a given slice and DAQ partition.<br />
Number of slice<br />
Number of DAQ partition
Configuration 95<br />
Return value:<br />
Enable for veto<br />
Slice number: ”Number of slice” ”enabled/disabled” as veto for DAQ partition number:<br />
”Number of DAQ partition”<br />
GetVetoMask:<br />
Description:<br />
Parameters:<br />
Return value:<br />
Reads if a certain slice is currently defined as veto for a certain DAQ partition.<br />
Number of slice<br />
Number of DAQ partition<br />
xdata::Boolean<br />
SetPrescaleFactor:<br />
Description:<br />
Parameters:<br />
Return value:<br />
To control L1A rates that are too high, a pre-scale factor for each slice can be applied.<br />
This factor can be set individually for each slice. Setting the factor to 0 or 1 does no<br />
pre-scaling.<br />
Number of slice<br />
Pre-scale factor<br />
Pre-scale factor of slice Number: ”Number of slice” set to: ”Pre-scale factor”<br />
GetPrescaleFactor:<br />
Description:<br />
Parameters:<br />
Return value:<br />
Reads out the pre-scale factor for a certain FDL slice.<br />
Number of slice<br />
xdata::UnsignedLong<br />
ReadRateCounter:<br />
Description:<br />
Parameters:<br />
Return value:<br />
Reads-out the rate counter for a certain slice.<br />
Number of slice<br />
xdata::UnsignedLong<br />
SetUpdateStepSize:<br />
Description:<br />
Parameters:<br />
Return value:<br />
Sets the common step-size for the reset period of all rate counters.<br />
Update step size<br />
Update step size set to: ”Update step size”<br />
SetUpdatePeriod:<br />
Description:<br />
Parameters:<br />
Sets the “update period” of the rate counters for a certain slice, based on the common<br />
update step-size. <strong>The</strong> update-period is chosen by setting a register. Each register bit<br />
corresponds to a factor the common update-period is multiplied with. An array in the<br />
code of the command maps bit numbers to multiplicities.<br />
Number of slice<br />
Bit for refresh rate
<strong>Trigger</strong> <strong>Supervisor</strong> Services 96<br />
Return value:<br />
Update Period of slice Number: ”Number of slice” set to: ”multiplicity”<br />
GetNumberOfAlgos:<br />
Description:<br />
Parameters:<br />
Return value:<br />
Depending on the version of the firmware of the FDL chip, the number of Technical<br />
<strong>Trigger</strong>s (TT’s) may differ. This command gives back the number of TT’s currently<br />
implemented.<br />
xdata::UnsignedShort<br />
TCS commands<br />
<strong>The</strong> <strong>Trigger</strong> Control System module (TCS) controls the distribution of L1A’s (Section 1.3.2.4). <strong>The</strong>refore, it<br />
plays a crucial role with respect to Data Acquisition and readout of the trigger components. <strong>The</strong> TCS command<br />
interface of the GT cell is used by the configuration operation running in the central cell (Section 6.2.2.1) and by<br />
the GT control panel (Section 6.5.1). This interface provides very fine grained control over the TCS module.<br />
Assigning TTC partitions to DAQ partitions, assigning time slots, controlling the random trigger generator and<br />
the generation of fast and synchronization signals, and loading predefined bunch crossing tables separately for<br />
each DAQ partition are tasks the command interface has to cope with.<br />
Commands of the TCS can be grouped into commands affecting more than one DAQ partition controller (PTC)<br />
and PTC dependent commands. <strong>The</strong> first group of commands therefore contains the prefix ”Master” whereas<br />
commands of the second group start with ”Ptc”. <strong>The</strong> second group of commands has the number of the PTC as a<br />
common parameter.<br />
NAME TYPE VALID VALUES<br />
DAQ partition xdata::UnsignedShort <strong>The</strong> number of DAQ partitions is 8. <strong>The</strong>refore, valid values are between [0:7].<br />
Number of PTC<br />
Detector partition<br />
Time slot<br />
Random trigger frequency<br />
xdata::UnsignedShort<br />
xdata::UnsignedShort<br />
xdata::UnsignedShort<br />
xdata::UnsignedLong<br />
For each DAQ partition there is a PTC implemented on the TCS chip.<br />
<strong>The</strong>refore, valid values are between [0:7].<br />
This parameter refers to one of 32 TTC partitions. Valid values are between<br />
[0:31].<br />
<strong>The</strong> time slot for a PTC is calculated from a 8 bit value. Valid values are<br />
between [0:255].<br />
<strong>The</strong> random frequency is calculated from a 16 bit register value. Valid values<br />
are between [0:65535].<br />
Table 6-2: Description of parameters used in TCS commands.<br />
MasterSetAssignPart:<br />
Description:<br />
Parameters:<br />
Return value:<br />
This command assigns a TTC partition to a DAQ partition. In case the TTC partition<br />
is already part of a DAQ partition it will be assigned to the new partition anyway.<br />
Detector partition<br />
DAQ partition<br />
Detector partition ”Detector partition” assigned to DAQ partition: ”DAQ partition”.
Configuration 97<br />
MasterGetAssignPart:<br />
Description:<br />
Parameters:<br />
Return value:<br />
Returns the number of the DAQ partition a certain TTC partition is part of.<br />
Detector partition<br />
xdata::UnsignedShort<br />
MasterSetAssignPartEn:<br />
Description:<br />
Parameters:<br />
Return value:<br />
This command enables or disables a TTC partition. Before a TTC partition can be<br />
assigned to a DAQ partition it has to be enabled.<br />
Detector partition<br />
Enable partition<br />
Detector partition enabled/disabled<br />
MasterGetAssignPartEn:<br />
Description: Reads-out whether or not a certain TTC partition is enabled .<br />
Parameters:<br />
Detector partition<br />
Return value:<br />
xdata::Boolean<br />
MasterStartTimeSlotGen:<br />
Description:<br />
Parameters:<br />
Return value:<br />
Depending on the registers that define the time slots for every DAQ partition the<br />
time slot generator switches between the DAQ partitions in round robin mode.<br />
This command starts the time slot generator.<br />
Time slot generator started.<br />
PtcGetTimeSlot:<br />
Description:<br />
Parameters:<br />
Return value:<br />
Returns the current time slot assignment for a certain PTC.<br />
Number of PTC<br />
xdata::UnsignedShort<br />
PtcStartRnd<strong>Trigger</strong>:<br />
Description:<br />
Parameters:<br />
Return value:<br />
Starts the random trigger generator for a specified PTC.<br />
Number of PTC<br />
Random trigger generator started for DAQ partition controller ”number of PTC”<br />
PtcStopRnd<strong>Trigger</strong>:<br />
Description:<br />
Parameters:<br />
Return value:<br />
Stops the random trigger generator for a specified PTC.<br />
Number of PTC<br />
Random trigger generator stopped for DAQ partition controller ”number of PTC”
<strong>Trigger</strong> <strong>Supervisor</strong> Services 98<br />
PtcRndFrequency:<br />
Description:<br />
Parameters:<br />
Return value:<br />
Sets the frequency of generated triggers by the random trigger generator for a specified<br />
PTC.<br />
Number of PTC<br />
Random trigger frequency<br />
Random frequency of partition Group: ”number of PTC” set to: "random trigger<br />
frequency"<br />
PtcGetRndFrequency:<br />
Description:<br />
Parameters:<br />
Return value:<br />
Reads-out the frequency of the random trigger generator for a PTC.<br />
Number of PTC<br />
xdata::UnsignedLong<br />
PtcStartRun:<br />
Description:<br />
Parameters:<br />
Return value:<br />
Starts a run for a PTC, by first resetting and starting the PTC and then sending a start<br />
run command pulse.<br />
Number of PTC<br />
Run started for PTC: ”number of PTC”<br />
PtcStopRun:<br />
Description:<br />
Parameters:<br />
Return value:<br />
Stops a run for a PTC.<br />
Number of PTC<br />
Run stopped for PTC: ”number of PTC”<br />
PtcCalibCycle<br />
Description:<br />
Parameters:<br />
Return value:<br />
Starts a calibration cycle for the specified PTC.<br />
Number of PTC<br />
Calibration cycle for DAQ partition ”number of PTC” started.<br />
PtcResync:<br />
Description:<br />
Parameters:<br />
Return value:<br />
Manually starts a resynchronization procedure for the specified PTC.<br />
Number of PTC<br />
Resynchronization procedure for DAQ partition “number of PTC” initialized.<br />
PtcTracedEvent:<br />
Description:<br />
Parameters:<br />
Return value:<br />
Manually sends a traced event for a specified PTC.<br />
Number of PTC<br />
Traced event initiated for DAQ partition “number of PTC”.
Configuration 99<br />
PtcHwReset:<br />
Description:<br />
Parameters:<br />
Return value:<br />
Manually sends a hardware reset to the PTC.<br />
Number of PTC<br />
Hardware for DAQ partition ”number of PTC” has been reset.<br />
PtcResetPtc:<br />
Description:<br />
Parameters:<br />
Return value:<br />
Resets the state machine of the PTC.<br />
Number of PTC<br />
PTC ”number of PTC” reset.<br />
Other commands<br />
This section describes a number of commands not specifically implemented for a certain type of GT module but<br />
rather used during the initialization, for debugging or for filling the database with register data.<br />
NAME TYPE VALID VALUES<br />
Item<br />
Offset<br />
Board serial number<br />
xdata::String<br />
xdata::UnsignedInteger<br />
xdata::String<br />
Refers to a register item, defined in the HAL “AddressTable” file for a module.<br />
If the specified item is not found, HAL will throw an exception that is caught in<br />
the command.<br />
<strong>The</strong> offset to the register address specified by an Item parameter according to<br />
the “HAL AddressTable” file. In case the offset gets too large, a HAL exception<br />
caught by the command will indicate that.<br />
Only serial numbers of GT modules that are initialized will be accepted. <strong>The</strong><br />
GetCrateStatus command returns a list of boards in the crate.<br />
Bus adapter xdata::String <strong>The</strong> GT cell only accepts bus adapters of type ”DUMMY” and ”CAEN”.<br />
“Module Mapper” File<br />
“AddressTableMap” File<br />
xdata::String<br />
xdata::String<br />
<strong>The</strong> full path to the HAL “ModuleMapper” file has to be specified. If the file is<br />
not found a HAL exception caught by the command will inform the user about<br />
that.<br />
<strong>The</strong> full path to the HAL “AddressTableMap” file has to be specified. If the file<br />
is not found a HAL exception caught by the command will inform the user<br />
about that.<br />
Table 6-3: Description of parameters used in the auxiliar commands.<br />
GtCommonRead:<br />
Description:<br />
Parameters:<br />
Return value:<br />
This command was written to read out register values from any GT module in the crate.<br />
This is useful for debugging. When correctly using the offset parameter also lines of<br />
memories can be read out.<br />
Item<br />
Offset<br />
Board serial number<br />
xdata::UnsignedLong<br />
GtCommonWrite:<br />
Description:<br />
Parameters:<br />
Generic write access for all GT modules.<br />
Item
<strong>Trigger</strong> <strong>Supervisor</strong> Services 100<br />
Return value:<br />
Value<br />
Offset<br />
Board serial number<br />
Register Value for Item: ”Item” set to: ”Value” (offset=”Offset” ) for board with serial<br />
number: ”board serial number”<br />
GtInitCrate:<br />
Description:<br />
Parameters:<br />
Return value:<br />
<strong>The</strong> initialization of the GT crate object (GT crate software driver) is done during startup<br />
of the cell application. If the creation of the crate object did not work correctly or if<br />
another type of bus adapter or different HAL files should be used, this command is<br />
used. Only if the ”reinitialize crate” parameter is set to true a new CellGTCrate object<br />
is instantiated.<br />
Module Mapper File<br />
AddressTableMap File<br />
Bus adapter<br />
Reinitialize crate<br />
<strong>The</strong> GT crate has been initialized with ”bus adapter” bus adapter.<br />
Board with serial nr.: ”board1 serial number” in slot Nr. ”board1 slot number”<br />
Board with serial nr.: ”board2 serial number” in slot Nr. ”board2 slot number”<br />
GtGetCrateStatus:<br />
Description:<br />
Parameters:<br />
Return value:<br />
<strong>The</strong> crate object dynamically creates associative maps during its initialization where<br />
information about modules in the crate is put. This information can be read out using<br />
this command.<br />
Module Mapper File<br />
AddressTableMap File<br />
Bus adapter<br />
Reinitialize crate<br />
<strong>The</strong> GT crate has been initialized with ”bus adapter” bus adapter.<br />
Board with serial nr.: ”board1 serial number” in slot Nr. ”board1 slot number”<br />
Board with serial nr.: ”board2 serial number” in slot Nr. ”board2 slot number”<br />
GtInsertBoardRegistersIntoDB:<br />
Description:<br />
Parameters:<br />
Return value:<br />
This command reads-out all registers for a specified GT module that are<br />
in the configuration database and inserts a row of values with a unique<br />
identifier and optionally a description into the corresponding GT<br />
configuration database table.<br />
Board serial number<br />
Primary Key<br />
Description<br />
Register values have been read from the hardware and inserted into table<br />
”Name of Register Table” with Primary Key: ”Primary Key”
Configuration 101<br />
6.2.2.3.2 Configuration operation and database<br />
<strong>The</strong> configuration operation of the GT cell is interesting for two reasons: for being responsible for configuring<br />
the GT hardware that is common to all DAQ partitions; and is also interesting as an example of configuration<br />
operation defined for a trigger sub-system crate cell (Section 6.2.2.2). This section describes in detail the<br />
functional method of the configure transition for this operation and the GT configuration database.<br />
Figure 6-6: Flow diagram of the configure transition functional method.<br />
Configure_f()<br />
<strong>The</strong> flow diagram for this method is shown in Figure 6-6. <strong>The</strong> method performs the following steps:<br />
1. To retrieve a row from the main table of the GT configuration database named GT_CONFIG (Figure 6-7).<br />
This row is identified by the key that is given as a parameter to the operation. If a certain board should not<br />
be configured at all, the corresponding entry in the GT_CONFIG table has to be left empty.<br />
2. To loop over all boards in the GT crate in order to log those not found.
<strong>Trigger</strong> <strong>Supervisor</strong> Services 102<br />
Figure 6-7: Main table of the GT configuration database.<br />
3. For all boards that are initialized, the BOARD_FIRMWARE table, shown in Figure 6-8, is retrieved. New<br />
firmware is attempted to be loaded if the version number of the current firmware does not match the<br />
firmware version of the configuration.<br />
4. <strong>The</strong> same loop is executed over all possible board memories found in the BOARD_MEMORIES table. Empty<br />
links are omitted just like above.<br />
Figure 6-8: Each BOARD_CONFIG table references a set of sub tables.
Configuration 103<br />
5. <strong>The</strong> register table for each board is retrieved. If this table is empty because of a missing link, a warning<br />
message is issued, because loading registers is essential to put the hardware into a well defined state.<br />
6. Finally, a sequencer file is attempted to be downloaded for every board. This sequencer file can be used to<br />
write values in a set of registers.<br />
6.2.2.4 Sub-detector cells<br />
HCAL and ECAL sub-detectors have just one cell each (Section 5.3.2.2.6). <strong>The</strong> configuration operation<br />
customized by the sub-detector cells is the same as for the trigger cells (Section 6.2.2.2). <strong>The</strong> configuration<br />
operation of the sub-detector cell only does something during the execution of its functional method of the<br />
configure transition. This method sets the sub-detector TPG configuration key to an internal variable of the subdetector<br />
cell. However, the sub-detector cell is not responsible for actually setting the hardware. Instead, when<br />
the sub-detector FM requires the configuration of the TPG (Section 1.4.5), the sub-detector supervisory system<br />
performs the following sequence:<br />
1. It reads the key using a dedicated cell command of the sub-detector cell.<br />
2. It uses this key to retrieve the hardware configuration from the sub-detector configuration database.<br />
3. It configures the TPG hardware.<br />
6.2.2.5 Luminosity monitoring system<br />
<strong>The</strong> Luminosity Monitoring System (LMS) cell implements a configuration operation which resets the LMS<br />
software (Section 5.3.2.2.8) during its functional method of the enable transition. This method announces that<br />
the trigger system is running and the LMS readout software can be started. <strong>The</strong> destructor method of the LMS<br />
configuration operation stops the LMS software. <strong>The</strong>refore, the LMS system will be enabled as far as there is at<br />
least one configuration operation instance running in the central cell.<br />
6.2.3 Integration with the Run Control and Monitoring System<br />
<strong>The</strong> experiment Control System (ECS) presented in Section 1.4 coordinates the operation of all detector subsystems<br />
and among them the L1 decision loop. <strong>The</strong> interface between the central node of the ECS and each of<br />
the sub-systems is the First Level Function Manager (FLFM) which is basically a finite state machine.<br />
Figure 6-9 shows the state diagram of the FLFM. It consists of solid and dashed ellipses to symbolize states. <strong>The</strong><br />
solid ellipses are steady states that are exited only if a command arrives from the central node of the ECS or an<br />
error is produced. <strong>The</strong> dashed ellipses are transitional states which are executing instructions on the sub-systems<br />
supervisors and self-trigger a transition to the next steady state upon completion of work. <strong>The</strong> command<br />
Interrupt may force the transition to Error from a transitional state. <strong>The</strong> transitions itself are instantaneous and<br />
guaranteed to succeed as no execution of instructions is taking place. <strong>The</strong> entry to the state machine is the<br />
Initial state [98].<br />
This FLFM has to be customized by each sub-system. This customization consists of implementing the code of<br />
the main transitional states. For the L1 decision loop, the code for the Configuring, Starting, Pausing,<br />
Resuming and Stopping states has been defined. This definition uses the TS SOAP API described in Appendix A<br />
to access the TS configuration service. In this context, the FLFM acts as a client of the TS.<br />
During the configuring state, <strong>The</strong> FLFM instantiates a configuration operation in the central cell of the TS and<br />
executes the configure and the partition transitions.<br />
During the starting state, the FLFM executes the enable transition.<br />
During the pausing state, the FLFM executes the suspend transition.<br />
During the resuming state, the FLFM executes the resume transition.<br />
Finally, the FLFM stopping state executes the stop transition.<br />
<strong>The</strong> parameters TSC_KEY, TSP_KEY and Run Number are passed during the corresponding transitions<br />
(Section 6.2.2.1).
<strong>Trigger</strong> <strong>Supervisor</strong> Services 104<br />
Figure 6-9: Level-1 function manager state diagram.
Interconnection test 105<br />
6.3 Interconnection test<br />
6.3.1 Description<br />
Due to the large number of communication channels between the trigger primitive generator (TPG) modules of<br />
the sub-detectors and the trigger system, and between the different trigger sub-systems, it is necessary to provide<br />
an automatic testing mechanism. <strong>The</strong> interconnection test service of the <strong>Trigger</strong> <strong>Supervisor</strong> is intended to<br />
automatically check the connections between sub-systems.<br />
From the client point of view, the interconnection test service is another operation running in the TS central cell.<br />
Figure 6-10 shows the state machine of the interconnection test operation. <strong>The</strong> client of the interconnection test<br />
service initiates an interconnection test operation in the central cell and executes the first transition prepare with<br />
a key assigned to the IT_KEY parameter and an optional second string assigned to the custom parameter. This<br />
transition prepares the L1 trigger hardware and the TS system for the starting of the interconnection test. <strong>The</strong><br />
start transition enables the starting of the test. Finally, the client executes the analyze transition to get the test<br />
result from the sub-system central cells.<br />
OpInit(“interconnectionTest”, “session_id”, “opid“)<br />
prepare(IT_KEY, “custom”) start(“opid”) analyze(“opid”)<br />
halted prepared started analyzed<br />
6.3.2 Implementation<br />
<strong>The</strong> following sections describe how the TS interconnection test service is formed by the collaboration of<br />
different cell operations installed in different cells of the TS system. In addition, this service requires the<br />
collaboration of the Sub-detectors <strong>Supervisor</strong>y and Control Systems (SSCS), and the usage of the L1 trigger<br />
configuration databases (Figure 6-2). A unique operation is necessary in the TS central cell. However, every<br />
interconnection test requires specific operations for the concrete sender and receiver sub-system central cells and<br />
crate cells.<br />
6.3.2.1 Central cell<br />
<strong>The</strong> role of the central cell in the interconnection test service is similar to the role played in the configuration<br />
service: to facilitate the remote client interface presented in Section 6.3.1 and to coordinate the operation of all<br />
involved nodes. Both the interface to the client and the system coordination are defined by the interconnection<br />
test operation installed in the central cell (Figure 6-10). This section describes the stable states, and the<br />
functional (f i ) and conditional (c i ) methods of the central cell interconnection test operation transitions.<br />
Initialization()<br />
This method stores the session_id parameter in an internal variable of the interconnection test operation<br />
instance. This number will be propagated to lower level cells when a cell command or operation is instantiated.<br />
<strong>The</strong> session_id is attached to every log record in order to help identify which client directly or indirectly<br />
executed a given action in a cell of the TSCS.<br />
Prepare_c()<br />
This method performs the following steps:<br />
Figure 6-10: Interconnection test operation.<br />
resume(“opid”)
<strong>Trigger</strong> <strong>Supervisor</strong> Services 106<br />
1. To read the IT_KEY row from the IT_CONF database table shown in Figure 6-11. This row contains two keys<br />
(TSC_KEY and TSP_KEY) and the cell operation names that have to be initiated in each of the central cells of<br />
those sub-systems involved in the interconnection test.<br />
2. To initiate the corresponding operation in the required trigger sub-system central cells with the same<br />
session_id provided by the central cell client. This method also initiates a configuration operation in the<br />
central cell. If one of these operations cannot be successfully started then this method returns false, the<br />
functional method of the prepare transition is not executed and the operation state stays halted.<br />
Prepare_f()<br />
This method performs the following steps:<br />
1. To execute the configure and the partition transitions with the TSC_KEY and TSP_KEY keys respectively in<br />
the central cell configuration operation. This configures the TCS module in order to deliver the required<br />
TTC commands to the sender and/or to the receiver sub-systems. By reconfiguring the BX table of a given<br />
DAQ partition, the TCS can send periodically any sequence of TTC commands to a set of TTC partitions<br />
(i.e. senders or receivers or both). <strong>The</strong> usual configuration use case is that senders are waiting for a BC0<br />
signal 16 to start sending patterns, whilst the receiver systems do not need any TTC signal. <strong>The</strong> configuration<br />
operation is also used to configure intermediate trigger sub-systems in order to work in transparent mode.<br />
2. To execute the prepare transition in the interconnection test operation of each trigger sub-system central<br />
cell sending as a parameter the custom string parameter. This parameter is intended to be used by the subsystem<br />
interconnection test operation (Section 6.3.2.2).<br />
Start_c()<br />
This method checks if the interconnection test operation state of each trigger sub-system central cell is in<br />
prepared state. This method also checks if the configuration operation of the central cell is in partitioned state.<br />
If one of these operations is not in the expected state, this method returns false, the functional method of the<br />
start transition is not executed and the operation state stays prepared.<br />
Start_f()<br />
This method performs the following steps:<br />
IT_KEY<br />
IT_CONF<br />
TSC_KEY<br />
TSP_KEY<br />
GT_IT_CLASS<br />
GMT_IT_CLASS<br />
DTTF_IT_CLASS<br />
CSCTF_IT_CLASS<br />
GCT_IT_CLASS<br />
RCT_IT_CLASS<br />
RPCTrig_IT_CLASS<br />
ECAL_IT_CLASS<br />
HCAL_IT_CLASS<br />
DTSC_IT_CLASS<br />
Figure 6-11: Main database table used by the central cell interconnection test operation.<br />
16 This TTC command signals the beginning of an LHC orbit.
Interconnection test 107<br />
1. To execute the start transition in the interconnection test operation of each trigger sub-system central cell.<br />
This enables input and output buffers on the receiver and sender sides respectively.<br />
2. To execute the enable transition in the configuration operation of the central cell. This enables the delivery<br />
of TTC commands to the sender and receiver sub-systems.<br />
Analyze_c()<br />
This method checks if the interconnection test operation state of each trigger sub-system central cell is in<br />
started state. This method also checks if the configuration operation of the central cell is in enabled state. If<br />
one of these operations is not in the expected state, this method returns false, the functional method of the<br />
analyze transition is not executed and the operation state stays started.<br />
Analyze_f()<br />
This method performs the following steps:<br />
1. To execute the suspend transition in the configuration operation of the central cell. This temporally stops the<br />
delivery of TTC commands to the sender and receiver sub-systems.<br />
2. To execute the analyze transition in the interconnection test operation of each trigger sub-system central<br />
cell. This method retrieves the test result from the sub-systems and disables the input and output buffers on<br />
the receiver and sender sides respectively. Usually, the sender returns nothing and the receiver returns the<br />
result after comparing the expected patterns with the actual received patterns.<br />
Resume_c()<br />
This method checks in the interconnection test operation of each trigger sub-system central cell that the current<br />
state is analyzed. This method also checks if the configuration operation of the central cell is in suspended state.<br />
If one of these operations is not in the expected state, this method returns false, the functional method of the<br />
resume transition is not executed and the operation state stays analyzed.<br />
Resume_f()<br />
This method performs the following steps:<br />
1. To execute the resume transition in the interconnection test operation of each trigger sub-system central cell.<br />
This enables input and output buffers on the receiver and sender sides respectively.<br />
2. To execute the resume transition in the configuration operation of the GT cell. This enables the delivery of<br />
TTC commands to the sender and receiver sub-systems.<br />
6.3.2.2 Sub-system cells<br />
<strong>The</strong> interconnection test operation interface running in the trigger sub-system cells is almost the same as the one<br />
running in the TS central cell (Figure 6-10), with the difference that the IT_KEY parameter does not exist. This<br />
section describes the stable states, and the functional (f i ) and conditional (c i ) methods of the sub-system cells<br />
interconnection test operation transitions. This description includes the crate cell and the trigger sub-system<br />
central cell cases. <strong>The</strong> following method descriptions do not match a concrete interconnection test example but<br />
describe the relevant aspects common to all the cases.<br />
Initialization()<br />
This method stores the session_id parameter in an internal variable of the configuration operation instance.<br />
This number will be propagated to lower level cells when a cell command or operation is instantiated. <strong>The</strong><br />
session_id is attached to every log record in order to help identify which client directly or indirectly executed a<br />
given action in a cell of the TSCS.<br />
Prepare_c()<br />
If the operation runs in the sub-system central cell, this method reads the custom parameter and initiates the<br />
interconnection test operation in the crate cells involved in the test. If the operation runs in a crate cell, this<br />
method checks if the hardware is accessible. If an operation cannot be started in the crate cells or the hardware is<br />
not accessible, this method returns false, the functional method of the prepare transition is not executed and the<br />
operation state stays halted.
<strong>Trigger</strong> <strong>Supervisor</strong> Services 108<br />
Prepare_f()<br />
This method reads the custom parameter and executes the necessary actions to prepare the sub-system to perform<br />
the test according to this parameter. If the operation runs in the sub-system central cell, this method executes the<br />
prepare transition in the required interconnection test operation running in the lower level crate cells. If the<br />
operation runs in a crate cell, this method prepares the patterns to be sent or to be received.<br />
Start_c()<br />
If the operation runs in the sub-system central cell, this method checks if the current state of the interconnection<br />
test operation running in the crate cells is prepared. If the operation runs in a crate cell, this method checks if the<br />
hardware is accessible. If one of these checks fails, this method returns false, the functional method of the start<br />
transition is not executed and the operation state stays prepared.<br />
Start_f()<br />
If the operation runs in the sub-system central cell, this method executes the start transition in the interconnection<br />
test operation running in the lower level crate cells. If the operation runs in a crate cell, this method enables the<br />
input or the output buffers depending on whether the crate is on the receiver or on the sender side.<br />
Analyze_c()<br />
If the operation runs in the sub-system central cell, this method checks if the current state of the interconnection<br />
test operation running in the crate cells is started. If the operation runs in a crate cell, this method checks if the<br />
hardware is accessible. If one of these checks fails, this method returns false, the functional method of the<br />
analyze transition is not executed and the operation state stays started.<br />
Analyze_f()<br />
If the operation runs in the sub-system central cell, this method executes the analyze transition in the<br />
interconnection test operation running in the lower level crate cells, gathers the results and returns them to the<br />
central cell. If the operation runs in a crate cell, this method compares the expected patterns, prepared during the<br />
prepare transition, against the received ones and returns the result to the sub-system central cell.<br />
Resume_c()<br />
If the operation runs in the sub-system central cell, this method checks if the current state of the interconnection<br />
test operation running in the crate cells is analyzed. If the operation runs in a crate cell, this method checks if the<br />
hardware is accessible. If one of these checks fails, this method returns false, the functional method of the<br />
resume transition is not executed and the operation state stays analyzed.<br />
Resume_f()<br />
If the operation runs in the sub-system central cell, this method executes the resume transition in the<br />
interconnection test operation running in the lower level crate cells. If the operation runs in a crate cell, this<br />
method enables again the input or the output buffers depending on whether the crate is on the receiver or on the<br />
sender side.<br />
6.4 Monitoring<br />
6.4.1 Description<br />
<strong>The</strong> TS monitoring service provides access to the monitoring information of the L1 decision loop hardware. This<br />
service is implemented using the TSMS presented in Section 5.4.2. <strong>The</strong> HTTP/CGI interface of the monitor<br />
collector provides remote access to the monitoring information.<br />
Event data based monitoring system<br />
A second source of monitoring information is the event data. For instance, the GTFE board is designed to gather<br />
monitoring information from almost all boards of the GT and to send this information as an event fragment every<br />
time that the GT receives a L1A. <strong>The</strong>refore, an online monitoring system for the GT could be based on<br />
extracting this data from the corresponding event fragment. This approach would be very convenient because<br />
every event would contain precise monitoring information of the L1 hardware status for the corresponding bunch
Graphical user interfaces 109<br />
crossing (BX). In addition, this approach would not require the development of a complex monitoring software<br />
infrastructure. On the other hand, we would face two limitations:<br />
• <strong>The</strong> GT algorithm rates are accumulated in the Final Decision Logic (FDL) board and the current version of<br />
the GTFE board cannot access its memories and registers. <strong>The</strong> only way to read out the rate counters is<br />
through VME access.<br />
• <strong>The</strong> GTFE board will send event fragments only when the DAQ infrastructure is running.<br />
<strong>The</strong>se limitations could be overcome using the TS monitoring service. This is meant to be an “always on”<br />
infrastructure (Section 5.2.5) and to provide a HTTP/CGI interface to access all monitoring items, and<br />
specifically the GT algorithm rate counters. <strong>The</strong>refore, the TS monitoring service is the only feasible approach to<br />
read out the GT algorithm rates and to achieve an “always on” external system depending on this information.<br />
6.5 Graphical user interfaces<br />
<strong>The</strong> HTTP/CGI interface of every cell facilitates the generic TS web-based GUI presented in Section 4.4.4.11.<br />
This is automatically generated and provides a homogeneous look and feel to control any sub-system cell<br />
independent of the operations, commands and monitoring customization details. <strong>The</strong> generic TS GUI of the<br />
DTTF, GT, GMT and RCT was extended with control panel plug-ins. <strong>The</strong> following section presents the Global<br />
<strong>Trigger</strong> control panel example [90].<br />
6.5.1 Global <strong>Trigger</strong> control panel<br />
<strong>The</strong> GT control panel is integrated into the generic TS GUI of the GT cell. It uses the GT cell software in order<br />
to get access to the GT hardware. This control panel has the following features:<br />
• Monitoring and control of the GT hardware: <strong>The</strong> GT Control Panel implements the most important<br />
functionalities to monitor and control the GT hardware. That includes monitoring of the counters and the<br />
TTC detector partitions assigned to DAQ partitions, setting the time slots, enabling and disabling the TTC<br />
sub-detectors for a given DAQ partition, setting the FDL board mask, starting a run, stopping a run, starting<br />
random triggers, stopping random triggers, changing the frequency and step size for random triggers and<br />
resynchronization and resetting each of the DAQ partitions.<br />
• Configuration database population tool: <strong>The</strong> GT Control Panel allows hardware experts to create<br />
configuration entries in the configuration database without the need of any knowledge of the underlying<br />
database schema.<br />
• Access control integration: <strong>The</strong> GT Control Panel supports different access control levels. Depending on<br />
the user logged in (i.e. an expert, a shifter or a guest) the panel visualizes different information and allows<br />
different tasks to be performed.<br />
• <strong>Trigger</strong> menu generation: the GT Control Panel allows the visualization and modification of the trigger<br />
menu. <strong>The</strong> trigger menu is the high-level description of the algorithms that will be used to select desired<br />
physics events. For each algorithm it is possible to visualize and modify the name, algorithm number, prescale<br />
factor, algorithm description and condition properties (i.e. threshold, quality, etc.)<br />
Figure 6-12 presents a view of the GT control panel where it is shown which TTC partitions (32 columns) are<br />
assigned to each of the eight DAQ partitions (8 rows). <strong>The</strong> red color means that a given TTC partition is not<br />
connected.
<strong>Trigger</strong> <strong>Supervisor</strong> Services 110<br />
Figure 6-12: GT control panel view showing the current partitioning state.
Chapter 7<br />
Homogeneous <strong>Supervisor</strong> and Control<br />
Software Infrastructure for the <strong>CMS</strong> Experiment<br />
at SLHC<br />
This chapter presents a project proposal to homogenize the supervisory control, data acquisition, and control<br />
software infrastructure for an upgraded <strong>CMS</strong> experiment at the SLHC. Its advantage is a unique, modular<br />
development platform enabling an efficient use of manpower and resources.<br />
7.1 Introduction<br />
This proposal aims to develop the <strong>CMS</strong> Experiment Control System (ECS) based on a new supervisory and<br />
control software framework. We propose a homogeneous technological solution for the <strong>CMS</strong> infrastructure of<br />
<strong>Supervisor</strong>y Control And Data Acquisition (SCADA [99]). <strong>The</strong> current <strong>CMS</strong> software control system consists of<br />
the Run Control and Monitoring System (R<strong>CMS</strong>), the Detector Control System (DCS), the <strong>Trigger</strong> <strong>Supervisor</strong><br />
(TS), and the Tracker, ECAL, HCAL, DT and RPC sub-detector supervisory systems. This infrastructure is<br />
based on three major supervisor and control software frameworks: PVSSII (Section 1.4.2), R<strong>CMS</strong> (Section<br />
1.4.1) and TS (Chapter 4). In addition, each sub-detector has created its own SCADA software.<br />
A single SCADA software framework used by all <strong>CMS</strong> sub-systems would have advantages for the<br />
maintenance, support and operation tasks during the experiment life-cycle:<br />
1) Overall design strategy optimization: <strong>The</strong>re is an evident similarity in technical requirements for controls<br />
amongst the different levels of the experiment control system. A common SCADA framework will allow an<br />
overall optimization of requirements, design and implementation.<br />
2) Support and maintenance resources: <strong>The</strong> project should enable an efficient use of resources. A common<br />
SCADA infrastructure for <strong>CMS</strong> will manage the increasing complexity of the experiment control and reduce<br />
the effects of current and future constraints on manpower.<br />
3) Accelerated learning curve: Operators and developers will benefit from a common SCADA infrastructure<br />
due to: 1) One-time learning cost, 2) Moving between <strong>CMS</strong> control levels and sub-systems will not imply a<br />
change in technology.<br />
This project proposal is based on the evolution of the software infrastructure used to integrate the L1 trigger subsystems.<br />
Section 7.2 presents the project technology baseline and the criteria for its selection. Section 7.3<br />
presents an overview of the project road map. Finally, Section 7.4 outlines the project schedule and the required<br />
human resources.<br />
7.2 Technology baseline<br />
<strong>The</strong> design and development of the unique underlying supervisory and control infrastructure should initially start<br />
from the software framework currently used to implement the L1 trigger control software system or TS
Homogeneous <strong>Supervisor</strong> and Control Software Infrastructure for the <strong>CMS</strong> Experiment at SLHC 112<br />
framework. <strong>The</strong> following paragraphs describe the principal objective criteria for which this technological<br />
baseline has been chosen:<br />
1) Proven technology: It is used in the implementation of a supervisory and control system that coordinates<br />
the operation of all L1 trigger sub-systems, the TTC system, the LMS and to some extent the ECAL, HCAL,<br />
DT and RPC sub-detectors. This solution was successfully used during the second phase of the Magnet Test<br />
and Cosmic Challenge, has been used in the monthly commissioning exercises of the <strong>CMS</strong> Global Runs and<br />
is the official solution for the experiment operation.<br />
2) Homogeneous TriDAS infrastructure and support: <strong>The</strong> TS framework is based on XDAQ, which is the<br />
same middleware used by the DAQ event builder (Section 1.4.3). This component is a key part of the DAQ<br />
system and as such it is not likely to evolve towards a different underlying middleware. <strong>The</strong>refore, a<br />
supervisory and control software framework based on the XDAQ middleware could profit from a long term,<br />
in-house supported solution. In addition, a SCADA infrastructure based on the XDAQ middleware would<br />
homogenize the underlying technologies for the DAQ and for the supervisory control infrastructure that<br />
would automatically reduce the overall support and maintenance effort.<br />
3) Simplified coordination and support tasks: <strong>The</strong> TS framework is designed to reduce the gap between<br />
software experts and experimental physicists and to reduce the learning curve. Examples are the usage of<br />
well known models in HEP control systems like finite state machines or homogeneous integration<br />
methodologies independent of the concrete sub-system Online SoftWare Infrastructure (OSWI) and<br />
hardware setup, or the automatic creation of graphical user interfaces. <strong>The</strong> latter is a development<br />
methodology characterized by a modular upgrading process and one single visible software framework.<br />
4) C++: <strong>The</strong> OSWI of all sub-systems is mainly formed by libraries written in C++ running on x86/Linux<br />
platforms. <strong>The</strong>se are intended to hide hardware complexity from software experts. <strong>The</strong>refore, a SCADA<br />
infrastructure based on C++, like the TS framework, would simplify the complexity of the integration<br />
architecture.<br />
7.3 Road map<br />
This project aims to reach the technological homogenization of the <strong>CMS</strong> Experiment Control System following a<br />
progressive and non-disruptive strategy. This shall allow a gradual and smooth transition from the current<br />
SCADA infrastructure to the proposed one. An adequate approach could have the following project tasks:<br />
1) L1 trigger incremental development: Continue with the current development and maintenance process in<br />
the L1 trigger using the proposed framework.<br />
2) Sub-detector control and supervisory software integration: This task involves the incremental adoption<br />
of a common software framework for all sub-detectors in order to homogenize the control and supervisory<br />
software of <strong>CMS</strong>. <strong>The</strong> participating sub-detectors are ECAL, HCAL, DT, CSC, RPC, and Tracker.<br />
Currently, this step is partially achieved because all sub-detectors are partially integrated with the TS system<br />
in order to: 1) Automate the pattern tests between the sub-detector TPG’s and the regional trigger systems,<br />
2) Check configuration consistency between L1 trigger and the trigger primitive generators.<br />
3) L1 trigger emulators supervisory system: This task involves the upgrade of the supervisory software of<br />
the L1 trigger emulators to the proposed common framework. <strong>The</strong> hardware emulators of the L1 trigger<br />
have been deployed as components of the <strong>CMS</strong>SW framework [100]. This task does not involve any change<br />
in the emulator code or in the <strong>CMS</strong>SW framework.<br />
4) High Level <strong>Trigger</strong> (HLT) supervisory system: This task involves the upgrade of the supervisory<br />
software of the HLT to the proposed common framework. In this way the components of the HLT (filter<br />
units, slice supervisors, and storage managers) will be launched, configured and monitored as the other<br />
software components of the <strong>CMS</strong> online software [101]. This task does not involve any change on the<br />
supervised components.<br />
5) Event builder supervisory system: This task involves the deployment of the event builder supervisory<br />
system as nodes of the proposed framework. <strong>The</strong> event builder supervisory software will launch all software<br />
components, will configure and will monitor the Front-End Readout Links (FRL), the Front-End Driver<br />
Network (FED Builder Network), and the different slices of Event Managers (EVM), Builder Units (BU)
Schedule and resource estimates 113<br />
and Readout Units (RU). This task does not involve the modification of the event builder components<br />
(Section 1.4.3).<br />
6) Experiment Control System feasibility study and final homogenization step: This is the last stage of the<br />
homogenization process. This task involves the feasibility study to change the top layer of the ECS and,<br />
afterwards, its substitution by components of the proposed framework. This means the substitution of the<br />
Function Managers by the nodes of the proposed SCADA software. This task also involves the feasibility<br />
study and homogenization of the top software layer of the DCS in order to be supervised, controlled and<br />
monitored by the ECS (Section 1.4.2).<br />
7.4 Schedule and resource estimates<br />
Schedule and resource estimates have been approximated according to the COCOMO II model [102] assuming<br />
the delivery of 50000 new Source Lines Of Code (SLOC), the modification of 10000 SLOC and reusing 30000<br />
SLOC, with the model parameters rated as a project with an average complexity. <strong>The</strong> SLOC effort has been<br />
estimated using the development experience with the TS and R<strong>CMS</strong> frameworks. Additional assumptions are a<br />
development team of people working in an in-house environment with extensive experience with related<br />
systems, and having a thorough understanding of how the system under development will contribute to the<br />
objectives of <strong>CMS</strong>.<br />
<strong>The</strong> four project phases are: 1) Inception: This phase includes the analysis of requirements, system definitions,<br />
specification and prototyping of user interfaces, and cost estimation; 2) Elaboration: This period is meant to<br />
define the software architecture and test plan; 3) Construction: this includes the coding and testing phases; 4)<br />
Transition: this last phase includes the final release delivery and set up of support and maintenance<br />
infrastructure.<br />
Table 7-1 shows the schedule for the project phases and the required resources per phase in person-months. This<br />
estimate includes the resources to deliver the infrastructure stated in Section 7.3: all templates, standard elements<br />
and functions required to achieve a homogeneous system and to reduce as much as possible the development<br />
effort for the sub-system integration developers. This estimate does not include the sub-system integration,<br />
which follows the transition phase.<br />
Phase<br />
Phase effort<br />
(Person-months)<br />
Inception 16 3<br />
Elaboration 64 8<br />
Construction 199 14<br />
Transition 32 13<br />
Schedule<br />
(Months)<br />
Table 7-1: Project phases schedule and associated effort in person-months.<br />
We summarize in Table 7-2 the top-level resource and schedule estimate of the project.<br />
Total effort (Person-months) 311<br />
Schedule (months) 38<br />
Table 7-2: Top-level estimate for elaboration and construction.
Chapter 8<br />
Summary and Conclusions<br />
<strong>The</strong> life span of the last generation of HEP experiment projects is of the same order of magnitude as a human<br />
being’s life, and both the experiment’s and the human being’s life phases share a number of analogies:<br />
During the conception period of a HEP experiment, key people discuss about the feasibility of a new project. For<br />
instance, the initial informal discussions about <strong>CMS</strong> started in 1989 and continued for nearly three years. This<br />
period finished with a successful conceptual design (<strong>CMS</strong> Letter of intent, 1992). In a similar way, the<br />
conception of a human being would follow a dating period and the decision of having a common life project.<br />
Right after the conceptual design, the research and prototyping phase starts. During this period research and<br />
prototyping tasks are performed in order to prove the feasibility of the former design. A successful culmination<br />
of this period is the release of a number of Technical Design Reports (TDR’s) describing the design details, the<br />
project schedule and organization. For the <strong>CMS</strong> experiment this period lasted until the year 2002. This second<br />
period is similar to the human childhood and infancy where the child grows up, experiments with her<br />
environment, learns the basic knowledge for life and approximately plans what she wants to be when she will<br />
grow up.<br />
<strong>The</strong> next stage in the life of a HEP experiment is the development phase. During this time, the building blocks<br />
described in the individual TDR’s are produced. For the <strong>CMS</strong> experiment this period lasted approximately until<br />
early 2007. Following the analogy of the human being, this period could be similar to the formation life period<br />
spent in high school and college where the adolescent learns several different subjects.<br />
Before being operational, the building blocks produced during the development phase need to be assembled and<br />
commissioned. <strong>The</strong> <strong>CMS</strong> commissioning exercises started in 2006 with Magnet Test and Cosmic Challenge and<br />
continued during 2007 with a monthly periodic and incremental commissioning exercise known as Global Run.<br />
This is similar to what happens to recent graduates starting their careers with a trainee period in a company or<br />
research institute. <strong>The</strong>y learn how to use the knowledge acquired during the formation period in order to perform<br />
a concrete task.<br />
After a successful commissioning period the experiment is ready for operation. <strong>The</strong> <strong>CMS</strong> experiment is expected<br />
to be operational for at least 20 years. During this phase, periodic sub-system upgrades will be necessary to cope<br />
with the radiation damage or new requirements due to the SLHC luminosity upgrade. This period would be like<br />
the adult professional life when the person is fully productive and needs to periodically undergo medical checks<br />
or recycle her knowledge in order to fit the continuous changes in the evolution of the job market.<br />
Finally, the experiment will be decommissioned at the end of its operational life. <strong>The</strong> analogy also works in this<br />
case, because at the end of a successful career a person will also retire.<br />
<strong>The</strong> long life span is not the only complexity dimension of the last generation of HEP experiments that finds a<br />
good analogy in the metaphor of the human being. <strong>The</strong> numeric complexity of the sub-systems collaborating is<br />
amazing also on both sides.<br />
We have discussed the time scale and complexity similarities between human beings and HEP experiments, but<br />
we can still go further in this analogy and ask: “What is the experiment’s genetic material” In other words, what<br />
is the seed of a HEP experiment project It cannot be people, because only few collaboration members stay
Summary and Conclusions 116<br />
during the whole lifetime of the experiment. <strong>The</strong> good answer is that the experiment genetic material is the<br />
knowledge consisting of successful ideas applied in past experiments and of novel contributions from other<br />
fields which promise improved results. This set of ideas is a potential future HEP experiment.<br />
And people Where do the members of the collaboration fit In this analogy, the scientists, engineers and<br />
technicians are responsible for transmitting and expressing the experiment’s genetic material. In other words, the<br />
collaboration members are the hosts of the experiment DNA and are also responsible for its expression in actual<br />
experiment body parts. <strong>The</strong>refore, even though concrete people are more able than others to transmit and express<br />
the experiment DNA, none is essential.<br />
<strong>The</strong> metaphor between the most advanced HEP experiments with the human beings serves the author to explain<br />
how this thesis contributed to <strong>CMS</strong>, and to the HEP and scientific communities. <strong>The</strong> following sections<br />
summarize the contributions of this work to both the <strong>CMS</strong> body or experiment, and the <strong>CMS</strong> DNA or knowledge<br />
base of the <strong>CMS</strong> collaboration and HEP communities.<br />
8.1 Contributions to the <strong>CMS</strong> genetic base<br />
This work encompasses a number of ideas intended to enhance the expression of a concrete <strong>CMS</strong> body part, the<br />
control and hardware monitoring system of the L1 trigger or <strong>Trigger</strong> <strong>Supervisor</strong> (TS). A successful final design<br />
was reached not just by gathering a detailed list of functional requirements. It was necessary to understand the<br />
complexity of the task, and the most promising technologies had to be proven.<br />
<strong>The</strong> unprecedented number of hardware items, the long periods of preparation and operation, and the human and<br />
political context were presented as three complexity dimensions related to building hardware management<br />
systems for the latest generation of HEP experiments. <strong>The</strong> understanding of the problem context and associated<br />
complexity, together with the experience acquired with an initial generic solution, guided us to the conceptual<br />
design of the <strong>Trigger</strong> <strong>Supervisor</strong>.<br />
8.1.1 XSEQ<br />
An initial generic solution to the thesis problem context proposed a software environment to describe<br />
configuration, control and test systems for data acquisition hardware devices. <strong>The</strong> design followed a model that<br />
matched well the extensibility and flexibility requirements of a long lifetime experiment that is characterized by<br />
an ever-changing environment. <strong>The</strong> model builds upon two points: 1) the use of XML for describing hardware<br />
devices, configuration data, test results, and control sequences; and 2) an interpreted, run-time extensible, highlevel<br />
control language for these sequences that provides independence from a specific host platform and from<br />
interconnect systems to which devices are attached. <strong>The</strong> proposed approach has several advantages:<br />
• <strong>The</strong> uniform usage of XML assures a long term technological investment and a reduced in house<br />
development due to an existing large asset of standards and tools.<br />
• <strong>The</strong> interpreted approach enables the definition of platform independent control sequences. <strong>The</strong>refore, it<br />
enhances the sub-system platform upgrade process.<br />
<strong>The</strong> syntax of a XML-based programming language (XSEQ, XML-based sequencer) was defined. It was shown<br />
how an adequate use of XML schema technology facilitated the decoupling of syntax and semantics, and<br />
therefore enhanced the sharing of control sequences among heterogeneous sub-system platforms.<br />
An interpreter for this language was developed for the CERN Scientific Linux (SLC3) platform. It was proved<br />
that the performance of an interpreter for a XML-based programming language oriented to hardware control<br />
could be at least as good as the performance of an interpreter for a HEP standard language for hardware control.<br />
<strong>The</strong> model implementation was integrated into a distributed programming framework specifically designed for<br />
data acquisition in the <strong>CMS</strong> experiment (XDAQ). It was shown that this combination could be the architectural<br />
basis of a management system for DAQ hardware. A feasibility study of this software defined a number of<br />
standalone applications for different <strong>CMS</strong> hardware modules and a hardware management system to remotely<br />
access these heterogeneous sub-systems through a uniform web service interface.
Contributions to the <strong>CMS</strong> genetic base 117<br />
8.1.2 <strong>Trigger</strong> <strong>Supervisor</strong><br />
<strong>The</strong> experience acquired during this initial research together with the L1 trigger operation requirements seeded<br />
the conceptual design of the <strong>Trigger</strong> <strong>Supervisor</strong>. It consists of a set of functional and non-functional<br />
requirements, the architecture design together with few technological proposals, and the project tasks and<br />
organization details.<br />
<strong>The</strong> functional purpose of the TS is to coordinate the operation of the L1 trigger and to provide a flexible<br />
interface that hides the burden of this coordination. <strong>The</strong> required operation capabilities had to simplify the<br />
process of configuring, testing and monitoring the hardware. Additional functionalities were required for<br />
troubleshooting, error management, user support, access control and start-up purposes. <strong>The</strong> non-functional<br />
requirements were also discussed. <strong>The</strong>se take into account the magnitude of the infrastructure under control, the<br />
implications related to the periodic hardware and software upgrades necessary in a long-lived experiment like<br />
<strong>CMS</strong>, the particular human and political context of the <strong>CMS</strong> collaboration, the required long term support and<br />
maintenance, the limitations of the existing <strong>CMS</strong> online software infrastructure and the particularities of the<br />
operation environment of the <strong>CMS</strong> Experiment Control System.<br />
<strong>The</strong> design of the TS architecture fulfills the functional and non-functional requirements. This architecture<br />
identifies three main development layers: the framework, the system and the services. <strong>The</strong> framework is the<br />
software infrastructure that facilitates the main building block or cell, and the integration with the specific subsystem<br />
OSWI. <strong>The</strong> system is a distributed software architecture built out of these building blocks. Finally, the<br />
services are the L1 trigger operation capabilities implemented on top of the system as a collaboration of finite<br />
state machines running in each of the cells.<br />
<strong>The</strong> decomposition of the project development tasks into three layers enhances the coordination of the<br />
development tasks; and helps to keep a stable system, in spite of hardware and software upgrades, on top of<br />
which new operation capabilities can be implemented without software engineering expertise.<br />
8.1.3 <strong>Trigger</strong> <strong>Supervisor</strong> framework<br />
<strong>The</strong> TS framework is the lowest level layer of the TS. It consists of the basic software infrastructure delivered to<br />
the sub-systems to facilitate their integration. This infrastructure is based on the XDAQ middleware and few<br />
external libraries. XDAQ was chosen among the <strong>CMS</strong> officially supported distributed programming frameworks<br />
(namely XDAQ, R<strong>CMS</strong> and JCOP) as the baseline solution because it offered the best trade-off between<br />
infrastructure completeness and fast sub-system integration. Although XDAQ was the best available option,<br />
further development was needed to reach the usability required by a community of customers with no software<br />
engineering background and limited time dedicated to software integration tasks.<br />
<strong>The</strong> cell is the main component of the additional software infrastructure. This component is a XDAQ application<br />
that needs to be customized by each sub-system in order to integrate with the <strong>Trigger</strong> <strong>Supervisor</strong>. <strong>The</strong><br />
customization process has the following characteristics:<br />
• Based on Finite State Machines (FSM): <strong>The</strong> integration of a sub-system with the TS consists of defining<br />
FSM plug-ins. A FSM model was chosen because this is a well known approach to define control systems<br />
for HEP experiments and therefore it would accelerate the customer’s learning curve. FSM plug-ins wrap<br />
the usage of the sub-system OSWI and offers a stable remote interface despite software platform and<br />
hardware upgrades.<br />
• Simple: Additional facilities were also delivered to the sub-systems in order to simplify the customization<br />
process. <strong>The</strong> most important one is the xhannel API. It provides a simple and homogeneous interface to a<br />
wide range of external services: other cells, XDAQ applications and web services.<br />
• Automatically generated GUI: A mechanism to automatically generate the cell GUI reduced the<br />
customization time and facilitated a common look and feel for all sub-systems graphical setups. <strong>The</strong><br />
common look and feel improved the learning curve for new L1 trigger operators.<br />
• Remote interface: <strong>The</strong> cell provided a human and a machine interface based on the HTTP/CGI and the<br />
SOAP protocols respectively, fitting well the web services based model of the <strong>CMS</strong> Online SoftWare<br />
Infrastructure (OSWI). This interface facilitated the remote operation of the sub-system specific FSM plugins.<br />
This interface could also be enlarged with custom functionalities using command plug-ins.
Summary and Conclusions 118<br />
8.1.4 <strong>Trigger</strong> <strong>Supervisor</strong> system<br />
<strong>The</strong> intermediate layer of the TS is the TS System (TSS). It provides a stable layer on top of which the TS<br />
services have been implemented. <strong>The</strong> TS system is designed to require a reduced maintenance and to provide a<br />
methodology to develop services which can fit present and future experiment operational requirements. In this<br />
scheme, the development of new services requires very limited knowledge about the internals of the TS<br />
framework, and uniquely needs to follow a well defined methodology. <strong>The</strong> stable TS system together with the<br />
associated methodology facilitates to accommodate these functionalities in a non-disruptive way, without<br />
requiring major developments.<br />
<strong>The</strong> TSS consists of four distributed software systems with well defined functionalities: TS Control System<br />
(TSCS), TS Monitoring System (TSMS), TS Logging System (TSLS) and TS Start-up System (TSSS). <strong>The</strong><br />
following points describe the design principles:<br />
• Reduced number of basic building blocks: <strong>The</strong> TSS is uniquely based on the sub-system cells and already<br />
existing monitoring, logging and start-up components provided by the XDAQ and R<strong>CMS</strong> frameworks.<br />
Reusing XDAQ and R<strong>CMS</strong> components minimized the development effort and at the same time guaranteed<br />
the long term support and maintenance. A reduced number of basic building blocks helped also to<br />
communicate architectural concepts.<br />
• Nodes and connections without logic: <strong>The</strong> TSCS is a collection of nodes and the communication channels<br />
among them. It does not include the logic of the L1 decision loop operation capabilities. This is<br />
implemented one layer above following a well defined methodology. <strong>The</strong> improved modularity obtained by<br />
decoupling the stable infrastructure (TSCS) from the L1 trigger operation capabilities eases the distribution<br />
of development tasks. Sub-system experts and technical coordinators were responsible for maintaining<br />
and/or implementing L1 trigger operation capabilities, whilst the TS central team focused on assuring a<br />
stable TSCS.<br />
• Hierarchical control system: It is shown how a hierarchical topology for the TSCS enhances a distributed<br />
development, facilitates the independent operation of a given sub-system, simplifies a partial deployment<br />
and provides graceful system degradation.<br />
• Well defined subsystem integration model: <strong>The</strong> integration of each sub-system is done according to<br />
guidelines proposed by the TS central team. Those are intended to maximize the deployment of the TSS in<br />
different set-ups, and to ease the hardware evolution without affecting the services layer intended to provide<br />
the L1 trigger operation capabilities.<br />
8.1.5 <strong>Trigger</strong> <strong>Supervisor</strong> services<br />
<strong>The</strong> TS services are the L1 decision loop operation capabilities. <strong>The</strong> current services are the final functionalities<br />
required during the conceptual design. <strong>The</strong>se have been implemented on top of the TS system and according to<br />
the proposed methodology. <strong>The</strong> following services were presented:<br />
• Configuration: This is the main service provided by the TS. It facilitates the configuration of the L1<br />
decision loop. Up to eight remote clients can use this service simultaneously without risking inconsistent<br />
configurations of the L1 decision loop. <strong>The</strong> configuration information (e.g. firmware, LUT’s, registers) is<br />
retrieved from the configuration database using a database identifier provided by the client. R<strong>CMS</strong> uses the<br />
remote interface provided by the central node of the TS in order to configure the L1 decision loop.<br />
• Interconnection test: It is intended to automatically check the connections between sub-systems. From the<br />
client point of view, the interconnection test service is another operation running in the TS central cell.<br />
• Logging and start-up services: <strong>The</strong>y are provided by the corresponding TS logging and start-up systems<br />
and did not require any further customization process.<br />
• Monitoring: This service, facilitated by the TS monitoring system, provides access to the monitoring<br />
information of the L1 decision loop hardware. It is designed to be an “always on” source of monitoring<br />
information despite the availability of the DAQ system.<br />
• Graphical User Interface (GUI): This service is facilitated by the HTTP/CGI interface of every cell. It is<br />
automatically generated and provides a homogeneous look and feel to control any sub-system cell
Final remarks 119<br />
independent of the operations, commands and monitoring customization details. It was also shown that the<br />
generic TS GUI could be extended with subsystem specific control panels.<br />
8.1.6 <strong>Trigger</strong> <strong>Supervisor</strong> Continuation<br />
A continuation line for the TS was presented. <strong>The</strong> project proposal is is intended to homogenize the <strong>Supervisor</strong>y<br />
Control And Data Acquisition infrastructure (SCADA) for the <strong>CMS</strong> experiment. A single SCADA software<br />
framework used by all <strong>CMS</strong> sub-systems would have advantages for the maintenance, support and operation<br />
tasks during the experiment operational life. <strong>The</strong> proposal is based on the evolution of the TS framework. A<br />
tentative schedule and resource estimates were also presented.<br />
8.2 Contribution to the <strong>CMS</strong> body<br />
<strong>The</strong> main initial goal of this PhD thesis was to build a tool to operate the L1 trigger decision loop and to integrate<br />
it in the overall Experiment Control System. This objective has been achieved: <strong>The</strong> <strong>Trigger</strong> <strong>Supervisor</strong> has<br />
become a real body part of the <strong>CMS</strong> experiment and it serves its purpose.<br />
Periodic demonstrators brought the TS to the first joint operation with the Experiment Control System in<br />
November 2006 with the second phase of the Magnet Test and Cosmic Challenge ([103], pag. 9). It has<br />
continued improving and serving every monthly commissioning exercise since May 2007 and is the official tool<br />
for the <strong>CMS</strong> experiment to operate the L1 decision loop ([104], pag. 190).<br />
Using the introductory analogy, the <strong>CMS</strong> Experiment Control System would be the experiment brain, and the<br />
<strong>Trigger</strong> <strong>Supervisor</strong> a specialized brain module just like the human brain is thought to be divided in specialized<br />
units for instance to turn sounds into speech or to recognize a face. <strong>The</strong> development of the <strong>CMS</strong> <strong>Trigger</strong><br />
<strong>Supervisor</strong> can be seen as the expression of a newly added genetic material in the <strong>CMS</strong> DNA.<br />
This thesis has also an important influence on how the <strong>CMS</strong> experiment is being controlled. <strong>The</strong> operation of the<br />
<strong>CMS</strong> experiment is influenced by how the configuration and monitoring services of the TS allows operating the<br />
L1 decision loop.<br />
Continuing with the analogy, if the TS is a specialized brain module, the TS system would be the static neural<br />
net and the TS services would be the behavior pattern stored in it. Having the possibility to adopt new operation<br />
capabilities on top of a stable architecture, without requiring major upgrades, fits well a long-life experiment,<br />
just like the human brain which keeps an almost invariant neural architecture but is able to learn and adapt to its<br />
environment.<br />
8.3 Final remarks<br />
This thesis contributes to the <strong>CMS</strong> knowledge base and by extension to the HEP and scientific communities. <strong>The</strong><br />
motivation and goals, a generic solution and finally a successful design for a distributed control system are<br />
discussed in detail. This new <strong>CMS</strong> genetic material has achieved its full expression and has become a <strong>CMS</strong> body<br />
part, the <strong>CMS</strong> <strong>Trigger</strong> <strong>Supervisor</strong>. This is the maximum impact we could initially expect inside the <strong>CMS</strong><br />
Collaboration.<br />
A more complicated question is the impact of the exposed material outside the <strong>CMS</strong> collaboration. Answering<br />
this question is like answering the question of how well the added <strong>CMS</strong> genetic material will spread. To a certain<br />
important extent, the chances to successfully propagate the knowledge written in this thesis depends of how well<br />
adapted is <strong>CMS</strong> to its environment - In other words, how successful <strong>CMS</strong> will be to fulfill its physics goals.
Appendix A<br />
<strong>Trigger</strong> <strong>Supervisor</strong> SOAP API<br />
A.1 Introduction<br />
This chapter specifies the SOAP Application Program Interface (API) exposed by a <strong>Trigger</strong> <strong>Supervisor</strong> (TS) cell.<br />
<strong>The</strong> audience for this specification is mainly the application developers requiring the remote execution of cell<br />
commands and/or operations (e.g. the developer of the L1 trigger function manager in order to use the TS<br />
services provided by the TS central cell).<br />
A.2 Requirements<br />
• Command and operation control: <strong>The</strong> protocol should allow the remote initialization, operation and<br />
destruction of cell operations and the execution of commands.<br />
• Controller identification: <strong>The</strong> protocol should enforce the identification of the controller in the cell in<br />
order to be able to classify all the logging records as a function of the controller.<br />
• Synchronous and asynchronous communication: <strong>The</strong> protocol should allow both synchronous and<br />
asynchronous communication modes. <strong>The</strong> synchronous protocol is intended to assure an exclusive usage of<br />
the cell. <strong>The</strong> asynchronous mode should enable multi-user access and achieve an enhanced overall system<br />
performance.<br />
• XDAQ data type serialization: <strong>The</strong> protocol should be able to encode different data types like integer,<br />
string or boolean. <strong>The</strong> encoding scheme should be compatible with the XDAQ encoding/decoding data type<br />
from/to XML.<br />
• Human and machine interaction mechanism: <strong>The</strong> protocol should embed a warning message and level in<br />
each reply message. <strong>The</strong> warning information should facilitate a machine comprehension of the request<br />
success level.<br />
A.3 SOAP API<br />
A.3.1 Protocol<br />
<strong>The</strong> cell SOAP protocol allows both synchronous and asynchronous communication between the controller and<br />
the cell. Figure A-1 shows a UML sequence diagram that exemplifies the synchronous communication protocol<br />
between a controller and a cell. In that case, the controller is blocked until the reply message arrives. This<br />
protocol also blocks the cell. <strong>The</strong>refore, additional requests coming from other controllers will not be served<br />
until the cell has replied to the former controller.<br />
Figure A-2 shows a UML sequence diagram that exemplifies the asynchronous communication protocol between<br />
a controller and a cell. In the asynchronous case, the controller is blocked just a few milliseconds per request
<strong>Trigger</strong> <strong>Supervisor</strong> SOAP API 122<br />
Synchronous controller<br />
Cell<br />
request(async=false, cid=1)<br />
reply(result, cid=1)<br />
request(async=false, cid=2)<br />
reply(result, cid=2)<br />
Figure A-1: UML sequence diagram of a synchronous SOAP communication between a controller and a<br />
cell.<br />
Asynchronous controller<br />
Cell<br />
request(async=true, cid=1)<br />
Ack(cid=1)<br />
request(async=true, cid=2)<br />
Ack(cid=2)<br />
reply(result, ci=1)<br />
reply(result, ci=2)<br />
Figure A-2: UML sequence diagram of an asynchronous SOAP communication between a controller and a<br />
cell.<br />
until it receives the acknowledge message. <strong>The</strong> asynchronous reply is received in a parallel thread that listen the<br />
corresponding port. In that case, the overall response time as a function of the number of SOAP request<br />
messages (n) will grow as O(1) instead of O(n) (synchronous case). <strong>The</strong> total response time will be slightly<br />
longer than the longest remote call.<br />
On the cell side, each asynchronous request opens a new thread where the command is executed. <strong>The</strong>refore,<br />
several controllers are allowed to remotely execute commands concurrently in the same cell.<br />
Whatever communication mechanism is used, the reply message embeds the warning information. <strong>The</strong> warning<br />
level provides the request success level to the controller. <strong>The</strong> warning message completes this information with a<br />
human-readable message.
SOAP API 123<br />
A.3.2 Request message<br />
Figure A-3 shows an example of a request message. This request executes the command ExampleCommand in a<br />
given cell.<br />
<br />
<br />
<br />
<br />
<br />
<br />
3<br />
CommandResponse<br />
http://centralcell.cern.ch:50001<br />
urn:xdaq-application:lid=13<br />
<br />
<br />
<br />
Figure A-3: SOAP request message example.<br />
<strong>The</strong> first XML tag (or just tag) inside the body of the SOAP message (i.e. Examplecommand) identifies the cell<br />
command to be executed in the remote cell. <strong>The</strong> attribute async takes a boolean value and tells the cell whether<br />
this request has to be executed synchronously or asynchronously. <strong>The</strong> cid attribute is set by the controller and<br />
the same value is set by the cell in the reply message cid. This mechanism allows a controller to identify<br />
request-reply pairs in an asynchronous communication (cid is not necessary in the synchronous communication<br />
case). <strong>The</strong> sid attribute identifies a concrete controller. <strong>The</strong> value of this attribute is added into all log message<br />
generated by the execution of the command. It is therefore possible to trace the actions of each individual<br />
controller by analyzing the logging statements.<br />
<strong>The</strong> asynchronous communication modality requires the specification of three additional tags: callbackFun,<br />
callbackUrl and callbackUrn. <strong>The</strong> value of these tags identifies univocally the controller side callback that will<br />
handle the asynchronous reply.<br />
When async is equal to false (i.e. synchronous communication) the attributes cid, callbackFun, callbackUrl<br />
and callbackUrn are not needed.<br />
<strong>The</strong> parameters of the command are set using the tag param. <strong>The</strong> name of the parameter is defined with the<br />
attribute name. <strong>The</strong> type of the parameter is defined with the attribute xsi:type and its value is set inside the tag.<br />
Table A-1 presents the list of possible types and their correspondence with the class that facilitates the<br />
marshalling process 17 .<br />
xsi:type attribute<br />
xsd:integer<br />
xsd:unsignedShort<br />
xsd:unsignedLong<br />
xsd:float<br />
XDAQ class<br />
xdata::Integer<br />
xdata::UnsignedShort<br />
xdata::UnsignedLong<br />
xdata::Float<br />
17 In the context of data transmission, marshalling or serialization is the process of transmitting an object across a network<br />
connection link in binary form. <strong>The</strong> series of bytes can be used to deserialize or unmarshall an object that is identical in its<br />
internal state to the original one.
<strong>Trigger</strong> <strong>Supervisor</strong> SOAP API 124<br />
xsd:double<br />
xsd:Boolean<br />
xsd:string<br />
xdata::Double<br />
xdata::Boolean<br />
xdata::String<br />
Table A-1: Correspondence between xsi:type data types and the class that facilitates the marshalling process.<br />
A.3.3 Reply message<br />
Figure A-4 shows an example of a reply message. This message is the asynchronous response sent by the cell<br />
after executing the command ExampleCommand requested with the request message of Figure A-3.<br />
<br />
<br />
<br />
<br />
Hello World!<br />
Warning message <br />
SOAP API 125<br />
<br />
<br />
<br />
<br />
<br />
A.3.4 Cell command remote API<br />
<strong>The</strong> SOAP API for cell commands has already been presented to exemplify the request and reply messages in<br />
Sections A.3.2 and A.3.3ions A.3.2 and A.3.3.<br />
A.3.5 Cell Operation remote API<br />
<strong>The</strong> SOAP API for cell operations consists of a number of request messages which allow to remotely instantiate,<br />
reset, execute a transition, get the state and finally kill an operation instance. <strong>The</strong> following sections present the<br />
request and reply messages for all relevant cases.<br />
A.3.5.1 OpInit<br />
Figure A-5: Acknowledge reply message.<br />
Figure A-6 shows the request message to instantiate a new operation.<br />
<br />
<br />
<br />
<br />
MTCCIIConfiguration<br />
NULL<br />
NULL<br />
NULL<br />
<br />
<br />
<br />
Figure A-6: Request message to create an operation instance.<br />
This request example corresponds to a synchronous request. It is therefore not needed to specify a value for the<br />
cid, callbackFun, callbackUrl and callbackUrn tags. <strong>The</strong> operation tag serves to specify the operation class<br />
name, and the opId attribute is an optional attribute defining the instance name or identifier. If the opId is not<br />
specified, the cell will assign a random opId to the operation instance.<br />
Figure A-7 shows the reply message to the request of Figure A-6. In this case, the callback function was not<br />
specified (i.e. setting callbackFun, callbackUrl and callbackUrn tags to NULL). <strong>The</strong>refore, the tag inside the<br />
body is named NULL. Inside the callback tag NULL there are two more tags: payload and operation. <strong>The</strong> payload<br />
tag contains a string with information about the instantiation process. <strong>The</strong> tag operation contains the name (or<br />
identifier) that has been assigned to the operation instance. This identifier is used by the controller to refer to that<br />
operation instance. <strong>The</strong> operation warning object is also embedded in the reply message.
<strong>Trigger</strong> <strong>Supervisor</strong> SOAP API 126<br />
<br />
<br />
<br />
<br />
InitOperation done<br />
my_opid<br />
<br />
SOAP API 127<br />
Figure A-9 shows the reply message to the request of Figure A-8. <strong>The</strong> tag payload contains the result of the<br />
transition execution that depends on the customization process. <strong>The</strong> operation warning object is also embedded<br />
in the reply message.<br />
<br />
<br />
<br />
<br />
Ok<br />
<br />
<strong>Trigger</strong> <strong>Supervisor</strong> SOAP API 128<br />
<br />
<br />
<br />
<br />
Operation reset Ok<br />
<br />
SOAP API 129<br />
<br />
<br />
<br />
<br />
halted<br />
<br />
<strong>Trigger</strong> <strong>Supervisor</strong> SOAP API 130<br />
<br />
<br />
<br />
<br />
Operation killed<br />
<br />
Acknowledgements<br />
First of all I want to thank Claudia-Elisabeth Wulz, Joao Varela, Wesley Smith and Sergio Cittolin for granting<br />
me the privilege to lead the conceptual design and development effort of the <strong>Trigger</strong> <strong>Supervisor</strong> project.<br />
My special thanks to Marc Magrans de Abril for being the “always on” motor of the project, for his continuous<br />
will to improve, for the never ending flow of ideas and most important for being my brother and strongest<br />
support.<br />
This thesis work could not have reached its full expression without the hard work of so many <strong>CMS</strong> collaboration<br />
members: managers, sub-system cell developers and TS central team members built the bridge between a dream<br />
and a reality.<br />
<strong>The</strong> very careful reading of the manuscript by Marco Boccioli, Iñaki García Echebarría, Joni Hahkala, Elisa<br />
Lanciotti, Raúl Murillo García, Blanca Perea Solano and Ana Sofía Torrentó Coello. <strong>The</strong>ir suggestions improved<br />
the English and made this document readable for other people than me alone.<br />
Many thanks to all my colleagues at the High Energy Physics Institute of Vienna as it was always a pleasure to<br />
work with them.<br />
Last but not least, I wish to thank my family for the unconditional support.
References<br />
[1] P. Lefèvre and T. Petterson (Eds.), “<strong>The</strong> Large Hadron Collider, conceptual design”, CERN/AC/95-05.<br />
[2] <strong>CMS</strong> Collaboration, “<strong>The</strong> Compact Muon Solenoid”, CERN Technical Proposal, LHCC 94-38, 1995.<br />
[3] ATLAS Collaboration, “ATLAS Technical Proposal,” CERN/LHCC 94-43.<br />
[4] ALICE Collaboration, “ALICE - Technical Proposal for A Large Ion Collider Experiment at the CERN<br />
LHC”, CERN/LHCC 95-71.<br />
[5] LHCb Collaboration, “LHCb Technical proposal”, CERN/LHCC 98-4.<br />
[6] <strong>CMS</strong> Collaboration, “<strong>The</strong> Tracker System Project, Technical Design Report”, CERN/LHCC 98-6.<br />
[7] <strong>CMS</strong> Collaboration, “<strong>The</strong> Electromagnetic Calorimeter Project, Technical Design Report”,<br />
CERN/LHCC 97-33. <strong>CMS</strong> Addendum CERN/LHCC 2002-27.<br />
[8] <strong>CMS</strong> Collaboration, “<strong>The</strong> Hadron Calorimeter Technical Design Report”, CERN/LHCC 97-31.<br />
[9] <strong>CMS</strong> Collaboration, “<strong>The</strong> Muon Project, Technical Design Report”, CERN/LHCC 97-32.<br />
[10] <strong>CMS</strong> Collaboration, “<strong>The</strong> <strong>Trigger</strong> and Data Acquisition Project, Volume II, Data Acquisition & High-<br />
Level <strong>Trigger</strong>, Technical Design Report,” CERN/LHCC 2002-26.<br />
[11] <strong>CMS</strong> Collaboration, “<strong>The</strong> TriDAS Project - <strong>The</strong> Level-1 <strong>Trigger</strong> Technical Design Report”,<br />
CERN/LHCC 2000-38.<br />
[12] P. Chumney et al., “Level-1 Regional Calorimeter <strong>Trigger</strong> System for <strong>CMS</strong>", in Proc. of Computing in<br />
High Energy Physics and Nuclear Physics, La Jolla, CA, USA, 2003.<br />
[13] J.J. Brooke et al., “<strong>The</strong> design of a flexible Global Calorimeter <strong>Trigger</strong> system for the Compact Muon<br />
Solenoid experiment”, <strong>CMS</strong> Note 2007/018.<br />
[14] R. Martinelli et al., “Design of the Track Correlator for the DTBX <strong>Trigger</strong>”, <strong>CMS</strong> Note 1999/007<br />
(1999).<br />
[15] J. Erö et al., “<strong>The</strong> <strong>CMS</strong> Drift Tube Track Finder”, <strong>CMS</strong> Note (in preparation).<br />
[16] D. Acosta et al., “<strong>The</strong> Track-Finder Processor for the Level-1 <strong>Trigger</strong> of the <strong>CMS</strong> Endcap Muon<br />
System”, in Proc. of the 5 th Workshop on Electronics for LHC Experiments, Snowmass, Co, USA, Sept.<br />
1999, CERN/LHCC/99-33 (1999).<br />
[17] H. Sakulin, “Design and Simulation of the First Level Global Muon <strong>Trigger</strong> for the <strong>CMS</strong> Experiment at<br />
CERN”, PhD tesis, University of Technology, Vienna (2002).<br />
[18] C.-E. Wulz, “Concept of the <strong>CMS</strong> First Level Global <strong>Trigger</strong> for the <strong>CMS</strong> Experiment at LHC”, Nucl.<br />
Instr. Meth. A 473/3 231-242 (2001).<br />
[19] TOTEM Collaboration, paper to be published in Journal of Instrumentation (JINST).<br />
[20] <strong>CMS</strong> <strong>Trigger</strong> and Data Acquisition Group, “<strong>CMS</strong> L1 <strong>Trigger</strong> Control System”, <strong>CMS</strong> Note 2002/033.<br />
[21] B. G. Taylor, “Timing Distribution at the LHC”, in Proc. of the 8 th Workshop on Electronics for LHC<br />
and Future Experiments, Colmar, France (2002).<br />
[22] V. Brigljevic et al., “Run control and monitor system for the <strong>CMS</strong> experiment,”, in Proc. of Computing<br />
in High Energy and Nuclear Physics 2003, La Jolla, CA (2003).
[23] JavaServer Pages Technology, http://java.sun.com/products/jsp/<br />
[24] W3C standard, “Extensible Markup Language (XML)”, http://www.w3.org/XML<br />
[25] W3C standard, “Simple Object Access Protocol (SOAP)”, http://www.w3.org/TR/SOAP<br />
[26] PVSS II system from ETM, http://www.pvss.com<br />
[27] J. Gutleber and L. Orsini, “Software architecture for processing clusters based on I2O,” in Cluster<br />
Computing, New York, Kluwer Academic Publishers, Vol. 5, pp. 55–65 (2002).<br />
[28] J. Gutleber, S. Murray and L. Orsini, “Towards a homogeneous architecture for high-energy physics<br />
data acquisition systems”, Comput. Phys. Commun. 153, Issue 2 (2003) 155-163.<br />
[29] V. Brigljevic et al., “<strong>The</strong> <strong>CMS</strong> Event Builder”, in Proc. of Computing in High-Energy and Nuclear<br />
Physics, La Jolla CA, March 24-28 (2003).<br />
[30] P. Glaser et al.,”Design and Development of a Graphical Setup Software for the <strong>CMS</strong> Global <strong>Trigger</strong>”,<br />
IEEE Transactions on Nuclear Science, Vol. 53, No. 3, June 2006.<br />
[31] Qt Project, http://trolltech.com/products/qt<br />
[32] Python Project, http://www.python.org/<br />
[33] Tomcat Project, http://tomcat.apache.org/<br />
[34] C. W. Fabjan and H.G. Fischer, “Particle Detectors”, Rep. Prog. Phys., Vol. 43, 1980.<br />
[35] R.E Hughes-Jones et al., “<strong>Trigger</strong>ing and Data Acquisition for the LHC”, in Proc. of the International<br />
Conference on Electronics for Particle Physics, May 1995.<br />
[36] <strong>CMS</strong> Collaboration, “<strong>CMS</strong> Letter of Intent”, CERN/LHCC 92-3, LHCC/I 1, Oct 1, 1992.<br />
[37] K. Holtman, “Prototyping of the <strong>CMS</strong> Storage Management”, Ph.D. <strong>The</strong>sis, Technische Universiteit<br />
Eindhoven, Eindhoven, May 2000.<br />
[38] CDF II Collaboration, “<strong>The</strong> CDF II Detector: Technical Design Report”, FERMILAB-PUB-96/390-E,<br />
1996.<br />
[39] J. Gutleber, I. Magrans, L. Orsini and M. Nafría, “Uniform management of data acquisition devices<br />
with XML”, IEEE Transactions on Nuclear Science, Vol. 51, Nº. 3, June 2004.<br />
[40] M. Elsing and T. Schorner-Sadenius, “Configuration of the ATLAS trigger system,” in Proc. of<br />
Computing in High Energy and Nuclear Physics 2003, La Jolla, CA (2003).<br />
[41] Roger Pressman, “Software Engineering: A Practitioner's Approach”, McGraw-Hill, 2005.<br />
[42] W3C standard, “XML Schema“, http://www.w3.org/XML/Schema<br />
[43] W3C standard, “Document Object Model (DOM)”, http://www.w3.org/DOM/<br />
[44] W3C standard, “XML Path Language (XPath)”, http://www.w3.org/TR/xpath<br />
[45] Apache Project, http://xml.apache.org/<br />
[46] W3C standard, “HTTP - Hypertext Transfer Protocol”, http://www.w3.org/Protocols/<br />
[47] W3C standard, “XSL Transformations (XSLT)”, http://www.w3.org/TR/xslt<br />
[48] G. Dubois-Felsman, “Summary DAQ and <strong>Trigger</strong>”, in Proc.of Computing in High Energy and Nuclear<br />
Physics 2003, La Jolla, CA (2003).<br />
[49] S. N. Kamin, “Programming Languages: An Interpreted-Based Approach”, Reading, MA, Addison-<br />
Wesley, 1990.<br />
[50] I. Magrans et al., “Feasibility study of a XML-based software environment to manage data acquisition<br />
hardware devices”, Nucl. Instr. Meth. A 546 324-329 (2005).<br />
[51] E. Cano et al., ”<strong>The</strong> Final Prototype of the Fast Merging Module (FMM) for Readout Status Processing<br />
in <strong>CMS</strong> DAQ”, in Proc. of the 10 th Workshop on Electronics for LHC Experiments and Future<br />
Experiments, Amesterdam, Netherland, September 29 - October 03, 2003.
[52] J. Ousterhout, “Tcl and Tk Toolkit”, Reading, MA, Addisson-Wesley, 1994.<br />
[53] HAL Project, http://cmsdoc.cern.ch/~cschwick/software/documentation/HAL/index.html<br />
[54] Albert De Roeck, John Ellis and Fabiola Gianotti, “Physics Motivations for Future CERN<br />
Accelerators”, CERN-TH/2001-023, hep-ex/0112004.<br />
[55] <strong>CMS</strong> SLHC web page, http://cmsdoc.cern.ch/cms/electronics/html/elec_web/common/slhc.html<br />
[56] I. Magrans, C.-E. Wulz and J. Varela, “Conceptual Design of the <strong>CMS</strong> <strong>Trigger</strong> <strong>Supervisor</strong>”, IEEE<br />
Transactions on Nuclear Sciences, Vol. 53, Nº. 2, November 2005.<br />
[57] W3C Web Services Activity, http://www.w3.org/2002/ws/<br />
[58] W3C standard, “Web Services Description Language (WSDL)”, http://www.w3.org/TR/wsdl<br />
[59] I2O Special Interest Group, “Intelligent I/O (I2O) Architecture Specification v2.0”, 1999.<br />
[60] I. Magrans and M. Magrans, “<strong>The</strong> <strong>CMS</strong> <strong>Trigger</strong> <strong>Supervisor</strong> Project”, in Proc of the IEEE Nuclear<br />
Science Symposium 2005, Puerto Rico, 23-29 October, 2005.<br />
[61] Unified Modeling Language, http://www.rational.com/uml/<br />
[62] <strong>Trigger</strong> <strong>Supervisor</strong> web page, http://triggersupervisor.cern.ch/<br />
[63] I. Magrans and M. Magrans, “<strong>Trigger</strong> <strong>Supervisor</strong> - User’s Guide”,<br />
http://triggersupervisor.cern.ch/index.phpoption=com_docman&task=doc_download&gid=32<br />
[64] <strong>Trigger</strong> <strong>Supervisor</strong> Framework Workshop,<br />
http://triggersupervisor.cern.ch/index.phpoption=com_docman&task=doc_download&gid=16<br />
[65] <strong>Trigger</strong> Superviosr Interconnection Test Workshop,<br />
http://triggersupervisor.cern.ch/index.phpoption=com_docman&task=doc_download&gid=44<br />
[66] <strong>Trigger</strong> <strong>Supervisor</strong> Framework v 1.4 Workshop,<br />
http://indico.cern.ch/getFile.py/accessresId=0&materialId=slides&confId=24530<br />
[67] <strong>Trigger</strong> Supervior Support Management Tool, https://savannah.cern.ch/projects/l1ts/<br />
[68] R.E. Johnson, and B. Foote, “Designing reusable classes”, Journal of Object-Oriented Programming,<br />
1(2): pp. 22-35, 1988.<br />
[69] L. Peter Deutsch, “Design reuse and frameworks in the smalltalk-80 system”, In Software Reusability -<br />
Volume II, Applications and Experience, pp. 57-72, 1989.<br />
[70] C. Gaspar and M. Dönszelmann, “DIM - A Distributed Information Management System for the<br />
DELPHI Experiment at CERN”, in Proc. of the 8 th Conference on Real-Time Computer Applications in<br />
Nuclear, Particle and Plasma Physics, Vancouver, Canada, June 1993.<br />
[71] R. Jacobsson, "Controlling Electronic Boards with PVSS”, in Proc. of the 10 th International Conference<br />
on Accelerator and Large Experimental Physics Control Systems, Geneva, 10-14 October 2005, P-<br />
01.045-6.<br />
[72] B. Franek and C. Gaspar, “SMI++ Object-Oriented Framework for Designing and Implementing<br />
Distributed Control Systems”, IEEE Transactions on Nuclear Science, Vol. 52, Nº. 4, August 2005.<br />
[73] T. Adye et al., “<strong>The</strong> DELPHI Experiment Control”, in Proc. of the International Conference on<br />
Computing in High Energy Physics 1992, Annecy, France.<br />
[74] A. J. Kozubal, L. R. Dalesio, J. O. Hill and D. M. Kerstiens, “A State Notation Language for Automatic<br />
Control”, Los Alamos National Laboratory report LA-UR-89-3564, November, 1989.<br />
[75] R. Arcidiacono et al., “<strong>CMS</strong> DCS Design Concepts”, in Proc. of the 10 th International Conference on<br />
Accelerator and Large Experimental Physics Control Systems, Geneva, Switzerland, 10-14 Oct. 2005.<br />
[76] A. Augustinus et al., “<strong>The</strong> ALICE Control System - a Technical and Managerial Challenge”, in Proc. of<br />
the 9 th International Conference on Accelerator and Large Experimental Physics Control Systems,<br />
Gyeongju, Korea, 2003.
[77] C. Gaspar et al.,”An Integrated Experiment Control System, Architecture and Benefits: the LHCb<br />
Approach”, in Proc. of the 13 th IEEE-NPSS Real Time Conference, Montreal, Canada, May 18-23,<br />
2003.<br />
[78] Log4j Project, http://logging.apache.org/log4j/docs/index.html<br />
[79] Xerces-C++ project, http://xml.apache.org/xerces-c/<br />
[80] W3C recommendation, “XML 1.1 (1 st Edition)”, http://www.w3.org/TR/2004/REC-xml11-20040204/<br />
[81] Graphviz Project, http://www.graphviz.org/<br />
[82] ChartDirector Project, http://www.advsofteng.com/<br />
[83] Dojo project, http://dojotoolkit.org/<br />
[84] Cgicc project, http://www.gnu.org/software/cgicc/<br />
[85] Logging Collector documentation, http://cmsdoc.cern.ch/cms/TRIDAS/R<strong>CMS</strong>/<br />
[86] J. Gutleber, L. Orsini et al., “Hyperdaq, Where Data Adquisition Meets the Web”, in Proc. of the 10 th<br />
International Conference on Accelerator and Large Experimental Physics Control Systems, Geneva,<br />
Switzerland, 10-14 Oct. 2005.<br />
[87] I2O Special Interest Group, “Intelligent I/O (I2O) Architecture Specification v2.0”, 1999.<br />
[88] ECMA standard-262, “ECMAScript Language Specification”, December 1999.<br />
[89] I. Magrans and M. Magrans, “Enhancing the User Interface of the <strong>CMS</strong> Level-1 <strong>Trigger</strong> Online<br />
Software with Ajax”, in Proc. of the 15 th IEEE-NPSS Real Time Conference, Fermi National<br />
Accelerator Laboratory in Batavia, IL, USA, May 2007.<br />
[90] A. Winkler, “Suitability Study of the <strong>CMS</strong> <strong>Trigger</strong> <strong>Supervisor</strong> Control Panel Infrastructure: <strong>The</strong> Global<br />
<strong>Trigger</strong> Case”, Master <strong>The</strong>sis, Technical University of Vienna, March 2008.<br />
[91] Scientific Linux CERN 3 (SLC3), http://linux.web.cern.ch/linux/scientific3/<br />
[92] Oracle Corp., http://www.oracle.com/<br />
[93] CAEN bus adapter, model: VME64X - VX2718, http://www.caen.it<br />
[94] Apache Chainsaw project, http://logging.apache.org/chainsaw/index.html<br />
[95] I. Magrans and M. Magrans, “<strong>The</strong> Control and Hardware Monitoring System of the <strong>CMS</strong> Level-1<br />
<strong>Trigger</strong>”, in Proc of the IEEE Nuclear Science Symposium 2007, Honolulu, Hawaii, October 29 -<br />
November 2, 2007.<br />
[96] Web interface of the <strong>Trigger</strong> <strong>Supervisor</strong> CVS repository, http://isscvs.cern.ch/cgi-bin/viewcvsall.cgi/TriDAS/trigger/root=tridas<br />
[97] P. Glaser, "System Integration of the Global <strong>Trigger</strong> for the <strong>CMS</strong> Experiment at CERN", Master thesis,<br />
Technical University of Vienna, March 2007.<br />
[98] A. Oh, “Finite State Machine Model for Level 1 Function Managers, Version 1.6.0”,<br />
http://cmsdoc.cern.ch/cms/TRIDAS/R<strong>CMS</strong>/Docs/Manuals/manuals/level1FMFSM_1_6.pdf<br />
[99] IEEE standard C37.1-1994, “IEEE standard definition, specification, and analysis of systems used for<br />
supervisory control, data acquisition, and automatic control”.<br />
[100] <strong>CMS</strong> Collaboration, “<strong>CMS</strong> physics TDR - Detector performance and software”, CERN/LHCC 2006-<br />
001.<br />
[101] A. Afaq et Al, “<strong>The</strong> <strong>CMS</strong> High Level <strong>Trigger</strong> System”, IEEE NPSS Real Time Conference, Fermilab,<br />
Chicago, USA, April 29 - May 4, 2007.<br />
[102] B. Boehm et al., “Software cost estimation with COCOMO II”. Englewood Cliffs, NJ: Prentice-Hall,<br />
2000. ISBN 0-13-026692-2.<br />
[103] <strong>CMS</strong> Collaboration, “<strong>The</strong> <strong>CMS</strong> Magnet Test and Cosmic Challenge (MTCC Phase I and II) -<br />
Operational Experience and Lessons Learnt”, <strong>CMS</strong> Note 2007/005.
[104] <strong>CMS</strong> Collaboration, “<strong>The</strong> Compact Muon Solenoid detector at LHC”, To be submitted to Journal of<br />
Instrumentation.