21.08.2013 Views

Computer Diagnostics - Siemens

Computer Diagnostics - Siemens

Computer Diagnostics - Siemens

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Computer</strong> <strong>Diagnostics</strong><br />

Managing detailed health status information<br />

of Healthcare’s Imaging <strong>Computer</strong> Systems<br />

Imaging <strong>Computer</strong> Systems


<strong>Computer</strong> <strong>Diagnostics</strong><br />

is a crucial success<br />

factor for delivering<br />

and maintaining high<br />

quality Imaging<br />

<strong>Computer</strong> Systems<br />

world-wide.<br />

Introduction to CODIAG –<br />

<strong>Computer</strong> <strong>Diagnostics</strong><br />

<strong>Siemens</strong> Healthcare’s modalities are in<br />

use world-wide, 24 hours/day. It is important<br />

for our customers, that downtimes<br />

can be avoided or are at least planned<br />

and that repairs can be easily achieved.<br />

For <strong>Siemens</strong> itself, it is important to understand<br />

the causes of failures and to be<br />

able to consider these issues when designing<br />

subsequent hardware generations.<br />

Today, in many cases, hardware failures<br />

cannot be proactively predicted and<br />

complete systems are exchanged in the<br />

field and sent back for repair. Sometimes,<br />

when the repair centers analyzes returned<br />

systems, they detect parts, which have<br />

been exchanged in the field and have not<br />

been qualified as spare parts or they detect<br />

no hardware failure at all. These<br />

systems could have stayed at the customer’s<br />

site and most likely needed only a<br />

reset to <strong>Siemens</strong>’ factory default settings<br />

and/or a software re-installation.<br />

<strong>Computer</strong> <strong>Diagnostics</strong> (CODIAG) was<br />

developed by CV ME to address several<br />

shortcomings in the design, manufacturing,<br />

service and repair cycle. The main<br />

focus of CODIAG is to synchronize the<br />

way computer diagnostics is performed in<br />

various locations, e.g. at factory shipment,<br />

by a service technician in the field<br />

or at the repair centers.<br />

Conceptually CODIAG consists of software,<br />

central databases and related services.<br />

The software CODIAG is used to analyze<br />

the current system status and compare<br />

this report either with target values<br />

or with reports from previous runs. The<br />

databases store the configuration specification,<br />

the factory shipment report and<br />

the complete history of all reports from<br />

each unit. The services provide additional<br />

value, especially for Customer Service,<br />

like proactive maintenance and remote<br />

diagnostics support.<br />

For full functionality, CODIAG needs the<br />

systems to be connected to the common<br />

Remote Service Platform and the installation<br />

of vendor specific software to access<br />

the hardware sensors, e.g. FTS Deskview.<br />

Without these prerequisites, CODIAG is<br />

still beneficial for Healthcare, but cannot<br />

be used to its full potential.<br />

Special Healthcare requirements<br />

Due to the fact that Healthcare’s products<br />

have to be fully FDA compliant, every<br />

software that is shipped with our computers<br />

must be validated and conform to<br />

special requirements, different from typical<br />

mainstream computer usage that<br />

would be common in data centers.<br />

CODIAG has been developed to conform<br />

to the following mandatory requirements:<br />

• no background tasks, must be passive if<br />

not explicitly called from syngo software<br />

or scheduled to run<br />

• runs under all operating systems that<br />

are used by Healthcare<br />

• can be run unattended, without graphical<br />

user interface (GUI)<br />

• execution time shall be configurable<br />

and tailored to use a very short time in<br />

2


Important for the<br />

customer perceived<br />

quality of a modality is<br />

not only the gantry,<br />

but also a trouble-free<br />

operation of the computer<br />

systems operating<br />

within.<br />

order to be used frequently, without<br />

much effort<br />

• shall be utilized as service tool in the<br />

field<br />

• output must be structured and parseable<br />

for easy post-processing analysis<br />

• covers inventory, monitoring and<br />

benchmarking functionality<br />

• must be extensible to include Healthcare-specific<br />

test functionality, e.g. receiver<br />

boards or specific data loads<br />

• shall include test functionality which<br />

addresses common failures<br />

Also, the diagnostics approach shall utilize<br />

available commercial products and<br />

resources and avoid duplicate implementation.<br />

Off-the-shelf software analysis<br />

A thorough survey and evaluation of<br />

commercially off-the-shelf (COTS) software<br />

has been conducted in 2007, before<br />

deciding, which parts of the diagnostics<br />

suite need to be developed in-house.<br />

Key learnings from this survey were as<br />

follows:<br />

• most tools have their strength in one of<br />

the fields (inventory, monitoring or<br />

benchmarking), but shortcomings in<br />

others<br />

• no tool provided a structured, futureproof<br />

output, that can be analysed<br />

• no tool could perform a comparison to<br />

a given requirement<br />

• many monitoring tools needed to run<br />

continuously in the background<br />

• no tool provided an API to implement/extend<br />

the functionality<br />

• the best tools with a good mix of functionality<br />

used DOS as OS and were not<br />

useable during normal operation<br />

The survey results were discussed within<br />

Healthcare and it was decided to do an inhouse<br />

development of the core logic and<br />

use vendor tools, open-source algorithms<br />

and existing Healthcare tools to achieve<br />

full functionality. Since the majority of<br />

<strong>Siemens</strong>’ installed base of imaging computers<br />

consists of FTS systems, using<br />

Deskview to retrieve hardware sensor<br />

data was a logical choice. FTS has been<br />

contracted to modify Deskview to run in<br />

passive mode and only consume calculation<br />

power, when explicitly called. For<br />

other platforms and operating systems,<br />

different methods are used to retrieve<br />

hardware information, e.g. IPMI or direct<br />

sensor register calls.<br />

3


<strong>Computer</strong> diagnostics<br />

test should work their<br />

way up the <strong>Diagnostics</strong><br />

Pyramid: quick tests<br />

first, then investigate<br />

in more detail,<br />

if necessary.<br />

System design<br />

Root causes for computer system failures<br />

can sometimes be easily detected or require<br />

extensive benchmarking and analysis<br />

in other cases. In order to most efficiently<br />

narrow down a symptom to the<br />

root cause, a systematic approach has<br />

been developed, that operates on several<br />

levels, which are shown in the <strong>Diagnostics</strong><br />

Pyramid below.<br />

An analysis showed, that many problems<br />

can be detected, just by taking an inventory<br />

snapshot of the system and compare<br />

the found components, devices and setting<br />

with a given specification. This scan<br />

takes only a couple of seconds and can<br />

already reveal a lot of information, when<br />

compared with previous runs. The Inventory<br />

level basically answers the question:<br />

„Is everybody there?“<br />

Tests on the Monitoring level take longer,<br />

but reveal information about the current<br />

health status and changing dynamic data.<br />

Examples are: CPU temperature, fan<br />

speeds, number of remapped disk sectors,<br />

SMART trips, network errors etc.<br />

These are answers to the question: „Is<br />

everybody doing well?“<br />

The <strong>Diagnostics</strong> level tests the hardware<br />

according to load profiles, tailored to the<br />

particular usage scenario of that system.<br />

These tests will typically not be run every<br />

day, but can be useful during a scheduled<br />

service period. These tests can easily run<br />

Order of Test Execution<br />

Duration/Cost of Tests<br />

several minutes to several hours. They<br />

can cover CPU stress tests, hard disk performance<br />

tests, memory or network tests<br />

and especially combinations of these<br />

tests. The question to be answered here<br />

is: „Is everybody performing okay?“<br />

Eventually, CODIAG could evolve into an<br />

Expert System in the future. Currently,<br />

planned features for the next version is to<br />

fully integrate CHSBench which is a standalone<br />

tool today.<br />

Internal Architecture<br />

From the start, the core of CODIAG was<br />

developed to be as operating system and<br />

platform independent as possible. Analyzing<br />

the defects found in our repair center,<br />

we defined a set of items that should<br />

be inventoried and monitored. The underlying<br />

data source for each item varies<br />

between hardware platforms and operating<br />

systems.<br />

Currently we use three data providers:<br />

• operating system’s standard information<br />

sources, e.g. WMI, process tables,<br />

registry etc.<br />

• vendor tools, e.g. FTS Deskview<br />

• system category explorer (SCE), a library<br />

interface to access hardware information,<br />

which is normally not exposed<br />

via the other two methods, e.g.<br />

hard disk settings, cache settings etc.<br />

Expert<br />

System<br />

System Analysis<br />

»What is wrong?«<br />

»How can it be fixed?«<br />

Diagnosis<br />

Performability Check<br />

»Is everybody fit & performing OK?«<br />

Monitoring<br />

(of „technical“, dynamic parameters)<br />

Availability Check<br />

»Is everybody doing well?«<br />

Inventory<br />

(of „logistical“, static parameters)<br />

Visibility Check<br />

»Is everybody there?«<br />

Expected Frequency of Faults<br />

Future<br />

FY<br />

10/11<br />

FY<br />

09/10<br />

FY<br />

08/09<br />

CODIAG<br />

&<br />

CHSBENCH<br />

CODIAG<br />

(combined with<br />

FTS DeskView<br />

and other<br />

vendor tools)<br />

Timeline Tools<br />

4


OS<br />

DiagInv<br />

(Inventory, Monitoring)<br />

Vendor Tools<br />

System Category<br />

Explorer<br />

Imaging <strong>Computer</strong> System Hardware<br />

The information provided is then used by<br />

the DiagInv Module to create the report<br />

and do the comparison with expected<br />

values or values from previous runs. Depending<br />

on the settings in the configuration<br />

specification, DiagInv can also raise<br />

errors or warnings, when a deviation is<br />

found.<br />

All configurations and reports are stored<br />

in structured eXtensible Markup Language<br />

(XML) files which conform to a predefined,<br />

versioned XML schema. It is very<br />

straightforward to parse these files for<br />

further analysis and processing.<br />

External Architecture<br />

CHSBench<br />

(<strong>Diagnostics</strong>, Performance)<br />

Typically a Healthcare business unit decides<br />

together with CV ME on the set of<br />

system characteristics to be evaluated<br />

and reported. The interval and particular<br />

points in time when CODIAG shall be run<br />

are also defined together with the BU to<br />

ensure minimal interference with the<br />

normal modality operation. For the start<br />

CV ME recommends to implement CODI-<br />

AG runs at the start and/or shutdown of<br />

the modality, which would – in most<br />

cases – result in daily runs.<br />

OS<br />

Once the reports have been generated,<br />

the syngo Autoreport functionality will be<br />

used to pick up and transfer the files to<br />

Architecture model of<br />

<strong>Computer</strong> <strong>Diagnostics</strong>.<br />

DiagInv implements<br />

the core logic, comparison<br />

functionality and<br />

reporting. Additionally<br />

inventory and monitoring<br />

are collected.<br />

CHSBench can also be<br />

used independently for<br />

performance benchmarking.<br />

<strong>Siemens</strong>. Individual subscription servers<br />

within the <strong>Siemens</strong> Healthcare network<br />

will then receive these files. CV ME will<br />

host a database where these reports are<br />

consolidated and stored for further processing.<br />

If system management (hp OpenView<br />

and the common Remote Service Platform)<br />

is available at a site, additional<br />

functionality can be deployed. CODIAG<br />

can be configured to raise an error or<br />

warning on certain conditions, e.g. a fan<br />

failure or overheating/throttling of the<br />

CPU. Based on system management<br />

templates these warnings can be detected<br />

and so-called events can be generated.<br />

These events are transmitted in real-time<br />

to Healthcare and the existing processes<br />

of the Uptime Service Center can be used<br />

to start an investigation, inform the<br />

customers or initiate a Remote <strong>Diagnostics</strong><br />

session to confirm or correct<br />

the error or schedule a site visit of a service<br />

technician.<br />

5


Customer<br />

Requirement<br />

Specification<br />

1<br />

4<br />

1.4.2010<br />

2.4.2010<br />

Field<br />

3.4.2010 Snapshot<br />

Field<br />

Snapshot<br />

Field<br />

Snapshot<br />

1.9.2013<br />

Engineering<br />

Field<br />

Snapshot<br />

Configuration<br />

Specification<br />

2<br />

Hospital<br />

Repair<br />

&<br />

Manufacturing<br />

Factory<br />

Reference<br />

Snapshot<br />

Recycling<br />

Factory<br />

Reference<br />

Snapshot<br />

Factory<br />

Reference<br />

Snapshot<br />

Operations concept and services<br />

Basis for the operations concept is the<br />

uniform and continuous collection of<br />

system diagnostics data. This data is the<br />

foundation of all further activities and<br />

services.<br />

Basic operations concept<br />

3<br />

During the first two phases (engineering<br />

and manufacturing phases) CODIAG is set<br />

up in general for each class of system and<br />

subsequently for each individual system.<br />

① In the initial project phase CV ME receives<br />

the customer’s requirement<br />

specification and starts the system selection<br />

and validation project<br />

② Hardware engineers create a formal,<br />

structured representation of the requirements<br />

in XML format as „Configuration<br />

Specification“. This specification<br />

contains e.g. the required capacity or<br />

throughput requirement of the storage<br />

subsystem.<br />

③ During manufacturing, in the final<br />

software installation and test phase<br />

the assembled system configuration is<br />

taken as „Factory Reference Snapshot“.<br />

This is the most detailed snapshot,<br />

5<br />

1.9.2013<br />

Engineering<br />

Field<br />

Snapshot<br />

Configuration<br />

Specification<br />

Hospital<br />

Repair<br />

&<br />

3.9.2013<br />

Repair<br />

Snapshot<br />

Manufacturing<br />

Factory<br />

Snapshot<br />

containing also serial numbers of the<br />

system.<br />

The third phase is the continuous daily<br />

operation in the hospital and further<br />

phases deal with unexpected events.<br />

④ In the hospital at the configured points<br />

in time, a snapshot is taken, similar to<br />

the Factory Reference Snapshot. Depending<br />

on the business unit, a comparison<br />

with previous runs and/or with<br />

the Configuation Specification or Factory<br />

Reference Snapshot can be done,<br />

but this is not mandatory. In any case,<br />

the resulting snapshot is prepared for<br />

transmission to the <strong>Siemens</strong> Headquarter<br />

and syngo Autoreport mechanisms<br />

take care of the transmission.<br />

⑤ If an error occurs in the field and the<br />

system cannot be repaired locally, it is<br />

sent back to CV ME’s repair centers.<br />

The last Field Snapshot may be still on<br />

the system or already transferred to<br />

the HQ.<br />

⑥ After arrival, a repair snapshot is taken.<br />

This snapshot can now be compared to<br />

the Configuration Specification, the<br />

Factory Reference Snapshot and the<br />

last Field Snapshot. These comparisons<br />

now enable a more detailed investiga-<br />

7<br />

Snapshot<br />

Deletion<br />

Recycling<br />

6<br />

6


<strong>Computer</strong> diagnostics<br />

test should work their<br />

way up the <strong>Diagnostics</strong><br />

Pyramid: quick tests<br />

first, then investigate<br />

in more detail,<br />

if necessary.<br />

tion, what happened to the system,<br />

than previously possible. After repair,<br />

the system can again be deployed in<br />

the field.<br />

⑦ Eventually, if the system cannot be<br />

repaired, it will be recycled. All<br />

snapshots of this particular system will<br />

be removed from the databases, but of<br />

course the base Configuration Specification<br />

and snapshots of other systems<br />

still in the field are unaffected.<br />

Proactive Maintenance<br />

With the availability of consistent diagnostics<br />

data, further services can be<br />

offered by CV ME. The statistical analysis<br />

and trend analysis of data will be used to<br />

create predictions of components where a<br />

failure might be imminent. Proactively CS<br />

can be informed, that a service site visit<br />

should be scheduled or a replacement<br />

part could be dispatched to the responsible<br />

service technician or even the end<br />

customer, if so desired.<br />

Remote <strong>Diagnostics</strong><br />

The latest hardware generations provide<br />

technologies that enable to remotely<br />

operate and diagnose a malfunctioning<br />

computer system. Of course this depends<br />

on the criticality of the error, e.g. if the<br />

power supply is broken, remote diagnostics<br />

cannot be used either. But in<br />

many cases skilled technicians can remotely<br />

reboot the system, run tests and<br />

perform error correction measures.<br />

Reduction of<br />

overall system<br />

failures<br />

A probability analysis showed, that in<br />

48% of all computer failures – if remotely<br />

analyzed – a complete system exchange<br />

could possibly be avoided.<br />

The usage of Remote <strong>Diagnostics</strong> depends<br />

on a certain infrastructure on<br />

Healthcare’s and the customer’s site, so<br />

the deployment of this service is expected<br />

to more slowly ramp up over time, than<br />

the other services, Proactive Maintenance<br />

and Guided Local <strong>Diagnostics</strong>. Still, this<br />

service delivers the greatest benefits to<br />

CS and the customer.<br />

Guided Local <strong>Diagnostics</strong><br />

Since the length of an on-site visit of a<br />

service technician is one of the largest<br />

cost drivers, CODIAG was designed, to<br />

performa a broad, but shallow depth<br />

analysis first in a very short time to detect<br />

a deviation from an expected<br />

configuration very efficiently. If combined<br />

with Remote <strong>Diagnostics</strong> or Proactive<br />

Maintenance to already identify a failed<br />

part time, the service technician can<br />

replace the spare part and then quickly<br />

test the repaired system, and check<br />

whether the configuration is valid.<br />

Proactive<br />

Maintenance<br />

Continuous system monitoring, transfer of data<br />

to <strong>Siemens</strong> HQ for analysis and predictions<br />

Increase part<br />

exchange,<br />

reduce system<br />

swap<br />

Guided Local<br />

<strong>Diagnostics</strong><br />

Local analysis tools, tailored to <strong>Siemens</strong> systems and<br />

boards, requirement check and factory comparison<br />

Reduction of<br />

field service<br />

dispatch<br />

Reduction of<br />

on-site<br />

maintenance<br />

time<br />

Remote<br />

<strong>Diagnostics</strong><br />

Remote network connect to system, diagnostics,<br />

repair or spare part dispatch, inform technician<br />

7


www.siemens.com

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!