Computer Diagnostics - Siemens
Computer Diagnostics - Siemens
Computer Diagnostics - Siemens
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Computer</strong> <strong>Diagnostics</strong><br />
Managing detailed health status information<br />
of Healthcare’s Imaging <strong>Computer</strong> Systems<br />
Imaging <strong>Computer</strong> Systems
<strong>Computer</strong> <strong>Diagnostics</strong><br />
is a crucial success<br />
factor for delivering<br />
and maintaining high<br />
quality Imaging<br />
<strong>Computer</strong> Systems<br />
world-wide.<br />
Introduction to CODIAG –<br />
<strong>Computer</strong> <strong>Diagnostics</strong><br />
<strong>Siemens</strong> Healthcare’s modalities are in<br />
use world-wide, 24 hours/day. It is important<br />
for our customers, that downtimes<br />
can be avoided or are at least planned<br />
and that repairs can be easily achieved.<br />
For <strong>Siemens</strong> itself, it is important to understand<br />
the causes of failures and to be<br />
able to consider these issues when designing<br />
subsequent hardware generations.<br />
Today, in many cases, hardware failures<br />
cannot be proactively predicted and<br />
complete systems are exchanged in the<br />
field and sent back for repair. Sometimes,<br />
when the repair centers analyzes returned<br />
systems, they detect parts, which have<br />
been exchanged in the field and have not<br />
been qualified as spare parts or they detect<br />
no hardware failure at all. These<br />
systems could have stayed at the customer’s<br />
site and most likely needed only a<br />
reset to <strong>Siemens</strong>’ factory default settings<br />
and/or a software re-installation.<br />
<strong>Computer</strong> <strong>Diagnostics</strong> (CODIAG) was<br />
developed by CV ME to address several<br />
shortcomings in the design, manufacturing,<br />
service and repair cycle. The main<br />
focus of CODIAG is to synchronize the<br />
way computer diagnostics is performed in<br />
various locations, e.g. at factory shipment,<br />
by a service technician in the field<br />
or at the repair centers.<br />
Conceptually CODIAG consists of software,<br />
central databases and related services.<br />
The software CODIAG is used to analyze<br />
the current system status and compare<br />
this report either with target values<br />
or with reports from previous runs. The<br />
databases store the configuration specification,<br />
the factory shipment report and<br />
the complete history of all reports from<br />
each unit. The services provide additional<br />
value, especially for Customer Service,<br />
like proactive maintenance and remote<br />
diagnostics support.<br />
For full functionality, CODIAG needs the<br />
systems to be connected to the common<br />
Remote Service Platform and the installation<br />
of vendor specific software to access<br />
the hardware sensors, e.g. FTS Deskview.<br />
Without these prerequisites, CODIAG is<br />
still beneficial for Healthcare, but cannot<br />
be used to its full potential.<br />
Special Healthcare requirements<br />
Due to the fact that Healthcare’s products<br />
have to be fully FDA compliant, every<br />
software that is shipped with our computers<br />
must be validated and conform to<br />
special requirements, different from typical<br />
mainstream computer usage that<br />
would be common in data centers.<br />
CODIAG has been developed to conform<br />
to the following mandatory requirements:<br />
• no background tasks, must be passive if<br />
not explicitly called from syngo software<br />
or scheduled to run<br />
• runs under all operating systems that<br />
are used by Healthcare<br />
• can be run unattended, without graphical<br />
user interface (GUI)<br />
• execution time shall be configurable<br />
and tailored to use a very short time in<br />
2
Important for the<br />
customer perceived<br />
quality of a modality is<br />
not only the gantry,<br />
but also a trouble-free<br />
operation of the computer<br />
systems operating<br />
within.<br />
order to be used frequently, without<br />
much effort<br />
• shall be utilized as service tool in the<br />
field<br />
• output must be structured and parseable<br />
for easy post-processing analysis<br />
• covers inventory, monitoring and<br />
benchmarking functionality<br />
• must be extensible to include Healthcare-specific<br />
test functionality, e.g. receiver<br />
boards or specific data loads<br />
• shall include test functionality which<br />
addresses common failures<br />
Also, the diagnostics approach shall utilize<br />
available commercial products and<br />
resources and avoid duplicate implementation.<br />
Off-the-shelf software analysis<br />
A thorough survey and evaluation of<br />
commercially off-the-shelf (COTS) software<br />
has been conducted in 2007, before<br />
deciding, which parts of the diagnostics<br />
suite need to be developed in-house.<br />
Key learnings from this survey were as<br />
follows:<br />
• most tools have their strength in one of<br />
the fields (inventory, monitoring or<br />
benchmarking), but shortcomings in<br />
others<br />
• no tool provided a structured, futureproof<br />
output, that can be analysed<br />
• no tool could perform a comparison to<br />
a given requirement<br />
• many monitoring tools needed to run<br />
continuously in the background<br />
• no tool provided an API to implement/extend<br />
the functionality<br />
• the best tools with a good mix of functionality<br />
used DOS as OS and were not<br />
useable during normal operation<br />
The survey results were discussed within<br />
Healthcare and it was decided to do an inhouse<br />
development of the core logic and<br />
use vendor tools, open-source algorithms<br />
and existing Healthcare tools to achieve<br />
full functionality. Since the majority of<br />
<strong>Siemens</strong>’ installed base of imaging computers<br />
consists of FTS systems, using<br />
Deskview to retrieve hardware sensor<br />
data was a logical choice. FTS has been<br />
contracted to modify Deskview to run in<br />
passive mode and only consume calculation<br />
power, when explicitly called. For<br />
other platforms and operating systems,<br />
different methods are used to retrieve<br />
hardware information, e.g. IPMI or direct<br />
sensor register calls.<br />
3
<strong>Computer</strong> diagnostics<br />
test should work their<br />
way up the <strong>Diagnostics</strong><br />
Pyramid: quick tests<br />
first, then investigate<br />
in more detail,<br />
if necessary.<br />
System design<br />
Root causes for computer system failures<br />
can sometimes be easily detected or require<br />
extensive benchmarking and analysis<br />
in other cases. In order to most efficiently<br />
narrow down a symptom to the<br />
root cause, a systematic approach has<br />
been developed, that operates on several<br />
levels, which are shown in the <strong>Diagnostics</strong><br />
Pyramid below.<br />
An analysis showed, that many problems<br />
can be detected, just by taking an inventory<br />
snapshot of the system and compare<br />
the found components, devices and setting<br />
with a given specification. This scan<br />
takes only a couple of seconds and can<br />
already reveal a lot of information, when<br />
compared with previous runs. The Inventory<br />
level basically answers the question:<br />
„Is everybody there?“<br />
Tests on the Monitoring level take longer,<br />
but reveal information about the current<br />
health status and changing dynamic data.<br />
Examples are: CPU temperature, fan<br />
speeds, number of remapped disk sectors,<br />
SMART trips, network errors etc.<br />
These are answers to the question: „Is<br />
everybody doing well?“<br />
The <strong>Diagnostics</strong> level tests the hardware<br />
according to load profiles, tailored to the<br />
particular usage scenario of that system.<br />
These tests will typically not be run every<br />
day, but can be useful during a scheduled<br />
service period. These tests can easily run<br />
Order of Test Execution<br />
Duration/Cost of Tests<br />
several minutes to several hours. They<br />
can cover CPU stress tests, hard disk performance<br />
tests, memory or network tests<br />
and especially combinations of these<br />
tests. The question to be answered here<br />
is: „Is everybody performing okay?“<br />
Eventually, CODIAG could evolve into an<br />
Expert System in the future. Currently,<br />
planned features for the next version is to<br />
fully integrate CHSBench which is a standalone<br />
tool today.<br />
Internal Architecture<br />
From the start, the core of CODIAG was<br />
developed to be as operating system and<br />
platform independent as possible. Analyzing<br />
the defects found in our repair center,<br />
we defined a set of items that should<br />
be inventoried and monitored. The underlying<br />
data source for each item varies<br />
between hardware platforms and operating<br />
systems.<br />
Currently we use three data providers:<br />
• operating system’s standard information<br />
sources, e.g. WMI, process tables,<br />
registry etc.<br />
• vendor tools, e.g. FTS Deskview<br />
• system category explorer (SCE), a library<br />
interface to access hardware information,<br />
which is normally not exposed<br />
via the other two methods, e.g.<br />
hard disk settings, cache settings etc.<br />
Expert<br />
System<br />
System Analysis<br />
»What is wrong?«<br />
»How can it be fixed?«<br />
Diagnosis<br />
Performability Check<br />
»Is everybody fit & performing OK?«<br />
Monitoring<br />
(of „technical“, dynamic parameters)<br />
Availability Check<br />
»Is everybody doing well?«<br />
Inventory<br />
(of „logistical“, static parameters)<br />
Visibility Check<br />
»Is everybody there?«<br />
Expected Frequency of Faults<br />
Future<br />
FY<br />
10/11<br />
FY<br />
09/10<br />
FY<br />
08/09<br />
CODIAG<br />
&<br />
CHSBENCH<br />
CODIAG<br />
(combined with<br />
FTS DeskView<br />
and other<br />
vendor tools)<br />
Timeline Tools<br />
4
OS<br />
DiagInv<br />
(Inventory, Monitoring)<br />
Vendor Tools<br />
System Category<br />
Explorer<br />
Imaging <strong>Computer</strong> System Hardware<br />
The information provided is then used by<br />
the DiagInv Module to create the report<br />
and do the comparison with expected<br />
values or values from previous runs. Depending<br />
on the settings in the configuration<br />
specification, DiagInv can also raise<br />
errors or warnings, when a deviation is<br />
found.<br />
All configurations and reports are stored<br />
in structured eXtensible Markup Language<br />
(XML) files which conform to a predefined,<br />
versioned XML schema. It is very<br />
straightforward to parse these files for<br />
further analysis and processing.<br />
External Architecture<br />
CHSBench<br />
(<strong>Diagnostics</strong>, Performance)<br />
Typically a Healthcare business unit decides<br />
together with CV ME on the set of<br />
system characteristics to be evaluated<br />
and reported. The interval and particular<br />
points in time when CODIAG shall be run<br />
are also defined together with the BU to<br />
ensure minimal interference with the<br />
normal modality operation. For the start<br />
CV ME recommends to implement CODI-<br />
AG runs at the start and/or shutdown of<br />
the modality, which would – in most<br />
cases – result in daily runs.<br />
OS<br />
Once the reports have been generated,<br />
the syngo Autoreport functionality will be<br />
used to pick up and transfer the files to<br />
Architecture model of<br />
<strong>Computer</strong> <strong>Diagnostics</strong>.<br />
DiagInv implements<br />
the core logic, comparison<br />
functionality and<br />
reporting. Additionally<br />
inventory and monitoring<br />
are collected.<br />
CHSBench can also be<br />
used independently for<br />
performance benchmarking.<br />
<strong>Siemens</strong>. Individual subscription servers<br />
within the <strong>Siemens</strong> Healthcare network<br />
will then receive these files. CV ME will<br />
host a database where these reports are<br />
consolidated and stored for further processing.<br />
If system management (hp OpenView<br />
and the common Remote Service Platform)<br />
is available at a site, additional<br />
functionality can be deployed. CODIAG<br />
can be configured to raise an error or<br />
warning on certain conditions, e.g. a fan<br />
failure or overheating/throttling of the<br />
CPU. Based on system management<br />
templates these warnings can be detected<br />
and so-called events can be generated.<br />
These events are transmitted in real-time<br />
to Healthcare and the existing processes<br />
of the Uptime Service Center can be used<br />
to start an investigation, inform the<br />
customers or initiate a Remote <strong>Diagnostics</strong><br />
session to confirm or correct<br />
the error or schedule a site visit of a service<br />
technician.<br />
5
Customer<br />
Requirement<br />
Specification<br />
1<br />
4<br />
1.4.2010<br />
2.4.2010<br />
Field<br />
3.4.2010 Snapshot<br />
Field<br />
Snapshot<br />
Field<br />
Snapshot<br />
1.9.2013<br />
Engineering<br />
Field<br />
Snapshot<br />
Configuration<br />
Specification<br />
2<br />
Hospital<br />
Repair<br />
&<br />
Manufacturing<br />
Factory<br />
Reference<br />
Snapshot<br />
Recycling<br />
Factory<br />
Reference<br />
Snapshot<br />
Factory<br />
Reference<br />
Snapshot<br />
Operations concept and services<br />
Basis for the operations concept is the<br />
uniform and continuous collection of<br />
system diagnostics data. This data is the<br />
foundation of all further activities and<br />
services.<br />
Basic operations concept<br />
3<br />
During the first two phases (engineering<br />
and manufacturing phases) CODIAG is set<br />
up in general for each class of system and<br />
subsequently for each individual system.<br />
① In the initial project phase CV ME receives<br />
the customer’s requirement<br />
specification and starts the system selection<br />
and validation project<br />
② Hardware engineers create a formal,<br />
structured representation of the requirements<br />
in XML format as „Configuration<br />
Specification“. This specification<br />
contains e.g. the required capacity or<br />
throughput requirement of the storage<br />
subsystem.<br />
③ During manufacturing, in the final<br />
software installation and test phase<br />
the assembled system configuration is<br />
taken as „Factory Reference Snapshot“.<br />
This is the most detailed snapshot,<br />
5<br />
1.9.2013<br />
Engineering<br />
Field<br />
Snapshot<br />
Configuration<br />
Specification<br />
Hospital<br />
Repair<br />
&<br />
3.9.2013<br />
Repair<br />
Snapshot<br />
Manufacturing<br />
Factory<br />
Snapshot<br />
containing also serial numbers of the<br />
system.<br />
The third phase is the continuous daily<br />
operation in the hospital and further<br />
phases deal with unexpected events.<br />
④ In the hospital at the configured points<br />
in time, a snapshot is taken, similar to<br />
the Factory Reference Snapshot. Depending<br />
on the business unit, a comparison<br />
with previous runs and/or with<br />
the Configuation Specification or Factory<br />
Reference Snapshot can be done,<br />
but this is not mandatory. In any case,<br />
the resulting snapshot is prepared for<br />
transmission to the <strong>Siemens</strong> Headquarter<br />
and syngo Autoreport mechanisms<br />
take care of the transmission.<br />
⑤ If an error occurs in the field and the<br />
system cannot be repaired locally, it is<br />
sent back to CV ME’s repair centers.<br />
The last Field Snapshot may be still on<br />
the system or already transferred to<br />
the HQ.<br />
⑥ After arrival, a repair snapshot is taken.<br />
This snapshot can now be compared to<br />
the Configuration Specification, the<br />
Factory Reference Snapshot and the<br />
last Field Snapshot. These comparisons<br />
now enable a more detailed investiga-<br />
7<br />
Snapshot<br />
Deletion<br />
Recycling<br />
6<br />
6
<strong>Computer</strong> diagnostics<br />
test should work their<br />
way up the <strong>Diagnostics</strong><br />
Pyramid: quick tests<br />
first, then investigate<br />
in more detail,<br />
if necessary.<br />
tion, what happened to the system,<br />
than previously possible. After repair,<br />
the system can again be deployed in<br />
the field.<br />
⑦ Eventually, if the system cannot be<br />
repaired, it will be recycled. All<br />
snapshots of this particular system will<br />
be removed from the databases, but of<br />
course the base Configuration Specification<br />
and snapshots of other systems<br />
still in the field are unaffected.<br />
Proactive Maintenance<br />
With the availability of consistent diagnostics<br />
data, further services can be<br />
offered by CV ME. The statistical analysis<br />
and trend analysis of data will be used to<br />
create predictions of components where a<br />
failure might be imminent. Proactively CS<br />
can be informed, that a service site visit<br />
should be scheduled or a replacement<br />
part could be dispatched to the responsible<br />
service technician or even the end<br />
customer, if so desired.<br />
Remote <strong>Diagnostics</strong><br />
The latest hardware generations provide<br />
technologies that enable to remotely<br />
operate and diagnose a malfunctioning<br />
computer system. Of course this depends<br />
on the criticality of the error, e.g. if the<br />
power supply is broken, remote diagnostics<br />
cannot be used either. But in<br />
many cases skilled technicians can remotely<br />
reboot the system, run tests and<br />
perform error correction measures.<br />
Reduction of<br />
overall system<br />
failures<br />
A probability analysis showed, that in<br />
48% of all computer failures – if remotely<br />
analyzed – a complete system exchange<br />
could possibly be avoided.<br />
The usage of Remote <strong>Diagnostics</strong> depends<br />
on a certain infrastructure on<br />
Healthcare’s and the customer’s site, so<br />
the deployment of this service is expected<br />
to more slowly ramp up over time, than<br />
the other services, Proactive Maintenance<br />
and Guided Local <strong>Diagnostics</strong>. Still, this<br />
service delivers the greatest benefits to<br />
CS and the customer.<br />
Guided Local <strong>Diagnostics</strong><br />
Since the length of an on-site visit of a<br />
service technician is one of the largest<br />
cost drivers, CODIAG was designed, to<br />
performa a broad, but shallow depth<br />
analysis first in a very short time to detect<br />
a deviation from an expected<br />
configuration very efficiently. If combined<br />
with Remote <strong>Diagnostics</strong> or Proactive<br />
Maintenance to already identify a failed<br />
part time, the service technician can<br />
replace the spare part and then quickly<br />
test the repaired system, and check<br />
whether the configuration is valid.<br />
Proactive<br />
Maintenance<br />
Continuous system monitoring, transfer of data<br />
to <strong>Siemens</strong> HQ for analysis and predictions<br />
Increase part<br />
exchange,<br />
reduce system<br />
swap<br />
Guided Local<br />
<strong>Diagnostics</strong><br />
Local analysis tools, tailored to <strong>Siemens</strong> systems and<br />
boards, requirement check and factory comparison<br />
Reduction of<br />
field service<br />
dispatch<br />
Reduction of<br />
on-site<br />
maintenance<br />
time<br />
Remote<br />
<strong>Diagnostics</strong><br />
Remote network connect to system, diagnostics,<br />
repair or spare part dispatch, inform technician<br />
7
www.siemens.com