11.07.2015 Views

Automated Generation of Failure Modes and Effects Analyses from ...

Automated Generation of Failure Modes and Effects Analyses from ...

Automated Generation of Failure Modes and Effects Analyses from ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Automated</strong> <strong>Generation</strong> <strong>of</strong> <strong>Failure</strong><strong>Modes</strong> <strong>and</strong> <strong>Effects</strong> <strong>Analyses</strong> <strong>from</strong>AADL Architectural <strong>and</strong> ErrorModelsMyron Hecht, Alex<strong>and</strong>er Lam, Russell Howes, <strong>and</strong> Christopher Vogl,The Aerospace CorporationPresented toSystems <strong>and</strong> S<strong>of</strong>tware Technology ConferenceSalt Lake, City, UTApril, 2010© The Aerospace Corporation 2010


Outline• Motivation• Background on FMEAs• Introduction to AADL• AADL Error Model Annex• Tool Set for Analyzing Risk <strong>and</strong> Reliability/Availability• <strong>Automated</strong> FMEA <strong>Generation</strong> Example• Additional Discussion• Conclusions2


Motivation• <strong>Failure</strong> <strong>Modes</strong> <strong>and</strong> <strong>Effects</strong> <strong>Analyses</strong> (<strong>and</strong> related Criticality <strong>Analyses</strong>)are rigorous <strong>and</strong> comprehensive reliability <strong>and</strong> safety designevaluations– Generally required either by industry st<strong>and</strong>ards or Government policies– A fundamental element <strong>of</strong> defense in many product liability lawsuits• When performed manually, FMEAs are usually done only once duringthe detailed design phase because <strong>of</strong> cost <strong>and</strong> schedule constraints– Labor intensive– Require senior level; analysts• If automated, FMEAs would have significant benefits– Multiple iterations <strong>from</strong> conceptual to detailed design– Enables early identification <strong>of</strong> potential problems• Single points <strong>of</strong> failure• Unanticipated effects– Facilitates trade<strong>of</strong>f studies <strong>and</strong> evaluations <strong>of</strong> alternatives3


<strong>Failure</strong> <strong>Modes</strong> <strong>and</strong> <strong>Effects</strong> Analysis (FMEA)• Purpose– To determine the effect <strong>of</strong> hardware <strong>and</strong> s<strong>of</strong>tware failures upon the system<strong>and</strong> equipment failures.• Classify effects by impact on mission success <strong>and</strong> personnel/equipmentsafety.• Identify single points <strong>of</strong> failure• History– First defined as Military Procedure MIL-P-1629, “Procedures for Performinga <strong>Failure</strong> Mode, <strong>Effects</strong> <strong>and</strong> Criticality Analysis”, November 1949.– Further developed <strong>and</strong> applied by NASA in the 1960’s to improve <strong>and</strong> verifyreliability <strong>of</strong> space program hardware.– Since the 1980s, a st<strong>and</strong>ard <strong>of</strong> practice in a wide variety <strong>of</strong> industries• DoD: MIL-STD-1629A• Industrial: IEC 60812 (1985)• Aviation: SAE ARP 5580 (2001)• Automotive: SAE J1739 (2002)• Space: ECSS-Q-30-02A4


FMEA MethodologyConventionalDefine Ground Rules <strong>and</strong> AssumptionsLevels <strong>of</strong> indentureComponents to be considered<strong>Failure</strong> modes by componentcategorySeverity Level DefinitionsRules for recovery mechanisms<strong>and</strong> compensating provisionsFor Each ComponentPostulate failure <strong>and</strong> failure modeIdentify immediate effect <strong>of</strong> failureIdentify next higher level effects<strong>and</strong> “end effects”Identify compensating provisionsEvaluate severity level at endeffect<strong>Automated</strong>• Ground rules <strong>and</strong> assumptionsdefined by component properties• Components <strong>and</strong> failure modesdefined in models• <strong>Effects</strong> identified throughgraph tracing5


FMEA OutputIn Either Worksheet or Tabular Format…• Identification: <strong>Failure</strong> Mode identification.• Item: For s<strong>of</strong>tware, a process in its context.• <strong>Failure</strong> Mode:– Immediate Effect:– Intermediate Effect: Second level effect.• Operator• External networks• Database• Recovery– End Effect:• System Level (e.g., Individual satellites or the constellation through TT&C functions)• Payload performance• Data to outside users through terrestrial interfaces• Existing Mitigations: Any existing mitigations present in the architecture ordesign were identified.• Severity level:– Set under assumption that existing mitigations assumed to work• Comments:– Additional comments documenting assumptions <strong>and</strong> uncertainties.6


Introduction to the Architecture Analysis& Design Language (AADL)• Society <strong>of</strong> Automotive Engineers (SAE) Aerospace St<strong>and</strong>ard AS5506(2004)– Preceded by more than a decade <strong>of</strong> development under the DARPA Meta-H program• Provides a st<strong>and</strong>ardized textual <strong>and</strong> graphical notation for describings<strong>of</strong>tware <strong>and</strong> hardware system architectures <strong>and</strong> their functionalinterfaces– architectures (using st<strong>and</strong>ard language).– expected program behavior (using behavior annex)– <strong>Failure</strong> <strong>and</strong> recovery behavior (using error annex)7


AADL vs. other OMG Languages for Stochastic Analysis<strong>of</strong> Risk <strong>and</strong> Reliability• Advantages– Objects directly represent real-time system hardware <strong>and</strong> s<strong>of</strong>tware– St<strong>and</strong>ard method for incorporation <strong>of</strong> quantitative attributes• <strong>Failure</strong> <strong>and</strong> Recovery Probabilistic Distributions• Parameters <strong>of</strong> those distributions• Probabilities <strong>and</strong> rates for individual transitions– St<strong>and</strong>ard methods for representing propagation <strong>of</strong> failures across multiplecomponents• Event ports for failure propagations• Guards to enable conditional propagations (important for abstractions<strong>and</strong> reuse)• Drawbacks– No commercial quality tools• Public domain tools are available <strong>and</strong> usable – but not bug free8


AADL Components (graphical representation)– text <strong>and</strong> xml representations also defined9


AADL Error Model Annex• AADL annex that supports stochastic analysis• Defines error model– State transition diagram that represents normal <strong>and</strong> failed states– Error models can be associated with hardware components, s<strong>of</strong>twarecomponents, connections, <strong>and</strong> “system” (composite) components• Error model consists <strong>of</strong>– State definitions– Propagations <strong>from</strong> <strong>and</strong> to other components– Probability distribution <strong>and</strong> parameter definitions– Allowed state transitions <strong>and</strong> probabilities10


Enabling Features <strong>of</strong> AADL• St<strong>and</strong>ard representation <strong>of</strong> architecture <strong>and</strong> error models• Representation <strong>of</strong> failure propagation through systemcomponents– Event Ports– Guards– Propagations• Error Model properties– Working status <strong>of</strong> states– Descriptive information for initial states, effects (subsequentstates), <strong>and</strong> failure modes (transitions)– Initial states– Terminal States


AADL Error Model ExampleFailvisible (in)error model examplefeaturesErrorFree: initial error state;Failed: error state;Fail: error event {Occurrence => poisson lambda};Repair: error event {Occurrence => poisson mu};Failvisible: in out error propagation {Occurrence => fixed p};end example;error model implementation example.generaltransitionsErrorFree-[Fail]->Failed;Failed-[Repair]->ErrorFree;ErrorFree-[in Failvisible]->Failed;Failed-[out Failvisible]->Failed;end example.general;Failvisible(out), prob p12More information: Feiler (2007)


AADL Tool Set• Eclipse Development Environment (Ganymede) <strong>and</strong> Eclipse ModelingFramework (EMF)• Component plug-ins– TopCASED graphical editor to create AADL architecture diagrams (SEI,Aerospace modifications)– Error Model Editor graphical editor to create AADL error model diagrams(The Aerospace Corporation newly developed)– OSATE AADL generator (SEI, The Aerospace Corporation modifications)– ADAPT-M Stochastic Petri net to MoBIUS stochastic analysis network tool((SEI/LAAS Toulouse <strong>and</strong> The Aerospace Corporation)– MoBIUS Quantitative Dependability modeling <strong>and</strong> prediction tool(University <strong>of</strong> Illinois, Champaign Urbana)– FMEAGEN FMEA Generator (The Aerospace Corporation newlydeveloped)13


AADL Modeling Tool Chain Data FlowQualitative Analysis ChainQuantitative Analysis Chain14


Tool Set Screen Shot15


FMEA <strong>Generation</strong> Algorithm Features• Automatically traces <strong>from</strong> all working states to failure states– Terminates when trace detects a restoration condition or a failurecondition• Not limited to only 3 levels <strong>of</strong> effects• Checks to prevent repeated visits to same states– Ensures termination– Of particular importance for recoverable systems


Example: Supplemental Restraint SystemArchitecturalModelError Models


<strong>Generation</strong><strong>of</strong> FMEA<strong>from</strong> PetriNet <strong>of</strong> ErrorModels


Results: Automatically Generated FMEAEnhanced formatting for presentation purposes


More ComplexError Model20


Results: Automatically Generated FMEAIDItemInitial <strong>Failure</strong>Mode1st Level Effect Transition 2nd Level Effect Transition 3rd Level Effect Transition 4th Level Effect Transition 5th Level Effect1.1 SBCU.Primary_SU <strong>Failure</strong> SU.SBCU_Primary ReportDownSBCUSdown <strong>from</strong> SBCU.Primary_SU toSBCU.Primary_SUSU.SBCU_Primary Down<strong>Failure</strong>_case_Minor <strong>from</strong> SBCU.Primary_SU toSBCU.Primary_SUSU.SBCU_Primary DownMinorRecoverMinor <strong>from</strong> SBCU.Primary_SU toSBCU.Primary_SUSU.SBCU_Primary ReportRecoverSBCUSrecover <strong>from</strong> SBCU.Primary_SU toSBCU.Primary_SUSU.SBCU_Primary HotSt<strong>and</strong>bySBCUSrecover <strong>from</strong> SBCU.Primary_SU toSBCU.FMSFMS.SBCU UsingPrimary1.2.1SBCU.FMS guardin PrimaryDown <strong>from</strong>SBCU.Primary_SU to SBCU.FMSFMS.SBCU PrimaryisDown1.2.2.1<strong>Failure</strong>_case_Major <strong>from</strong> SBCU.Primary_SU toSBCU.Primary_SUSU.SBCU_Primary DownMajorRecoverMajor <strong>from</strong> SBCU.Primary_SU toSBCU.Primary_SUSU.SBCU_Primary ReportRecoverSBCUSrecover <strong>from</strong> SBCU.Primary_SU toSBCU.Primary_SUSU.SBCU_Primary HotSt<strong>and</strong>by1.2.2.2SBCUSrecover <strong>from</strong> SBCU.Primary_SU toSBCU.FMSFMS.SBCU UsingPrimary1.3SBCU.FMS guardin PrimaryDown <strong>from</strong>SBCU.Primary_SU to SBCU.FMSFMS.SBCU PrimaryisDown2.1.1 SBCU.Backup_SU <strong>Failure</strong> SU.SBCU_Backup ReportDownSBCUSdown <strong>from</strong> SBCU.Backup_SU toSBCU.Backup_SUSU.SBCU_Backup Down<strong>Failure</strong>_case_Minor <strong>from</strong> SBCU.Backup_SU toSBCU.Backup_SUSU.SBCU_Backup DownMinorRecoverMinor <strong>from</strong> SBCU.Backup_SU toSBCU.Backup_SUSU.SBCU_Backup ReportRecoverSBCUSrecover <strong>from</strong> SBCU.Backup_SU toSBCU.Backup_SUSU.SBCU_Backup HotSt<strong>and</strong>by2.1.2SBCUSrecover <strong>from</strong> SBCU.Backup_SU toSBCU.FMSFMS.SBCU UsingBackup2.22.3SBCU.FMS guardin BackupDown <strong>from</strong> SBCU.Backup_SUFMS.SBCU Downto SBCU.FMSSPCU.FMS guardin BusDown <strong>from</strong> SBCU.FMS toFMS.SPCU WaitingForBusSPCU.FMS2.4SPCU.Primary_SU guardin FMSst<strong>and</strong>by <strong>from</strong> SPCU.FMSSU.SPCU_Primary ColdSt<strong>and</strong>byto SPCU.Primary_SU2.5.1<strong>Failure</strong>_case_Major <strong>from</strong> SBCU.Backup_SU toSBCU.Backup_SUSU.SBCU_Backup DownMajorRecoverMajor <strong>from</strong> SBCU.Backup_SU toSBCU.Backup_SUSU.SBCU_Backup ReportRecoverSBCUSrecover <strong>from</strong> SBCU.Backup_SU toSBCU.Backup_SUSU.SBCU_Backup HotSt<strong>and</strong>by2.5.2SBCUSrecover <strong>from</strong> SBCU.Backup_SU toSBCU.FMSFMS.SBCU UsingBackup2.6SBCU.FMS guardin BackupDown <strong>from</strong> SBCU.Backup_SUFMS.SBCU Downto SBCU.FMS2.7SPCU.FMS guardin BusDown <strong>from</strong> SBCU.FMS toSPCU.FMSFMS.SPCU WaitingForBus2.8SPCU.Primary_SU guardin FMSst<strong>and</strong>by <strong>from</strong> SPCU.FMSSU.SPCU_Primary ColdSt<strong>and</strong>byto SPCU.Primary_SU3.1 SBCU.Primary_PU <strong>Failure</strong> PU.SBCU TerminatedCPUfail <strong>from</strong> SBCU.Primary_PU toSBCU.Primary_SUSU.SBCU_Primary Terminated3.2SBCU.FMS guardin PrimaryTerminated <strong>from</strong>SBCU.Primary_SU to SBCU.FMSFMS.SBCU PrimaryisTerminated4.1 SBCU.Backup_PU <strong>Failure</strong> PU.SBCU TerminatedCPUfail <strong>from</strong> SBCU.Backup_PU toSBCU.Backup_SUSU.SBCU_Backup Terminated4.2SBCU.FMS guardin BackupTerminated <strong>from</strong>SBCU.Backup_SU to SBCU.FMSFMS.SBCU Down4.3SPCU.FMS guardin BusDown <strong>from</strong> SBCU.FMS toSPCU.FMSFMS.SPCU WaitingForBus4.4SPCU.Primary_SU guardin FMSst<strong>and</strong>by <strong>from</strong>SPCU.FMS to SPCU.Primary_SUSU.SPCU_Primary ColdSt<strong>and</strong>by5.1 SPCU.Primary_SU <strong>Failure</strong> SU.SPCU_Primary ReportDownSPCUSdown <strong>from</strong> SPCU.Primary_SU toSPCU.Primary_SUSU.SPCU_Primary Down Recover <strong>from</strong> SPCU.Primary_SU to SPCU.Primary_SU SU.SPCU_Primary ReportRecoverSPCUSrecover <strong>from</strong> SPCU.Primary_SU toSPCU.Primary_SUSU.SPCU_Primary ColdSt<strong>and</strong>bySPCUSrecover <strong>from</strong> SPCU.Primary_SU toSPCU.FMSFMS.SPCU UsingPrimary5.2SPCU.FMS guardin PrimaryDown <strong>from</strong>SPCU.Primary_SU to SPCU.FMSFMS.SPCU Down6 SPCU.Backup_SU <strong>Failure</strong> SU.SPCU_Backup ReportDownSPCUSdown <strong>from</strong> SPCU.Backup_SU toSPCU.Backup_SUSU.SPCU_Backup Down Recover <strong>from</strong> SPCU.Backup_SU to SPCU.Backup_SU SU.SPCU_Backup ReportRecoverSPCUSrecover <strong>from</strong> SPCU.Backup_SU toSPCU.Backup_SUSU.SPCU_Backup ColdSt<strong>and</strong>by7.1 SPCU.Primary_SU <strong>Failure</strong> SU.SPCU_Primary ReportDownSPCUSdown <strong>from</strong> SPCU.Primary_SU toSPCU.Primary_SUSU.SPCU_Primary Down Recover <strong>from</strong> SPCU.Primary_SU to SPCU.Primary_SU SU.SPCU_Primary ReportRecoverSPCUSrecover <strong>from</strong> SPCU.Primary_SU toSPCU.Primary_SUSU.SPCU_Primary ColdSt<strong>and</strong>by7.2SPCU.FMS guardin BackupDown <strong>from</strong>SPCU.Backup_SU to SPCU.FMSFMS.SPCU Down8.1 SPCU.Primary_PU <strong>Failure</strong> PU.SPCU TerminatedCPUfail <strong>from</strong> SPCU.Primary_PU toSPCU.Primary_SUSU.SPCU_Primary Terminated8.2SPCU.FMS guardin PrimaryTerminated <strong>from</strong>SPCU.Primary_SU to SPCU.FMSFMS.SPCU PrimaryisTerminated8.2CPUfail <strong>from</strong> SPCU.Primary_PU toSPCU.Primary_SUSU.SPCU_Primary Terminated8.4SPCU.FMS guardin PrimaryTerminated <strong>from</strong>SPCU.Primary_SU to SPCU.FMSFMS.SPCU PrimaryisTerminated9.1 SPCU.Backup_PU <strong>Failure</strong> PU.SPCU TerminatedCPUfail <strong>from</strong> SPCU.Backup_PU toSPCU.Backup_SUSU.SPCU_Backup Terminated9.2SPCU.FMS guardin BackupTerminated <strong>from</strong>SPCU.Backup_SU to SPCU.FMSFMS.SPCU Down9.3CPUfail <strong>from</strong> SPCU.Backup_PU toSPCU.Backup_SUSU.SPCU_Backup Terminated9.4SPCU.FMS guardin BackupTerminated <strong>from</strong>SPCU.Backup_SU to SPCU.FMSFMS.SPCU Down21


Tool Set Capabilities for Quantitative EvaluationAADL Architecture <strong>and</strong> Error ModelsMobius Stochastic AnalysisNetwork ModelResults22


Conclusions• A new generation tool set for quantitative stochastic analysis <strong>and</strong>qualitative <strong>Failure</strong> <strong>Modes</strong> <strong>and</strong> <strong>Effects</strong> Analysis (FMEAs) for spacesystems is under development– Based on use <strong>of</strong> the Architecture Analysis <strong>and</strong> Design Language (AADL)– Graphically oriented– Modularized with reusable components• <strong>Automated</strong> <strong>Generation</strong> <strong>of</strong> FMEA/CA enables multiple iterationsanalyses throughout all stages <strong>of</strong> the design– Allows design alternatives to be evaluated• Strategies for recovering <strong>from</strong> computing disruptions• H<strong>and</strong>ling failure propagation <strong>and</strong> common mode failures– Enables safety <strong>and</strong> reliability problems to be identified early• Of critical importance to all users <strong>and</strong> stakeholders• Significant economic value where products liability is an issue because<strong>of</strong> conforming <strong>and</strong> exceeding st<strong>and</strong>ard <strong>of</strong> care23


AcronymsADAPT: AADL Architectural models to stochastic Petri nets through model Transformation,AADL: Architecture Analysis & Design LanguageFMEA: <strong>Failure</strong> Mode <strong>and</strong> <strong>Effects</strong> AnalysisFMEA/CA: FMEA /Criticality AnalysisOSATE: Open Source AADL Tool Environment (S<strong>of</strong>tware tool integrated into Eclipse)SAE: Society <strong>of</strong> Automotive EngineersSAN: Stochastic Analysis NetworkTOPCASED: Toolkit In OPen source for Critical Applications & SystEms Development24


References• A. Rugina, K. Kanoun, M Kaaniche, “The ADAPT Tool: From AADL Architectural Models to Stochastic Petri Netsthrough Model Transformation,” 7th European Dependable Computing Conference (EDCC), Kaunas : Lituanie (2008)• Peter Feiler <strong>and</strong> Anna Rugina, Dependability Modeling with the Architecture Analysis & Design Language (AADL),S<strong>of</strong>tware Engineering Institute report CMU/SEI-2007-TN-043, July 2007, available <strong>from</strong> www.sei.cmu.edu• D. D. Deavours, G. Clark, T. Courtney, D. Daly, S. Derisavi, J. M. Doyle, W. H. S<strong>and</strong>ers, <strong>and</strong> P. G. Webster, “TheMobius framework <strong>and</strong> its implementation,” IEEE Trans. on S<strong>of</strong>t. Eng., vol. 28, no. 10, pp. 956–969, October 2002.• IEC 60812 (1985) Analysis techniques for system reliability - Procedures for failure mode <strong>and</strong> effect analysis (FMEA) ,International Electrotechnical Commission,• SAE AS5506 (2004), Aerospace St<strong>and</strong>ard: Architecture Analysis <strong>and</strong> Design Language available online <strong>from</strong>www.sae.org• SAE ARP 5580 (2001) Recommended <strong>Failure</strong> <strong>Modes</strong> <strong>and</strong> <strong>Effects</strong> Analysis (FMEA) Practices for Non-AutomobileApplications, Society <strong>of</strong> Automotive Engineers, available online <strong>from</strong> www.sae.org• SAE J1739 (2002) Potential <strong>Failure</strong> Mode <strong>and</strong> <strong>Effects</strong> Analysis in Design (Design FMEA) <strong>and</strong> Potential <strong>Failure</strong> Mode<strong>and</strong> <strong>Effects</strong> Analysis in Manufacturing <strong>and</strong> Assembly Processes (Process FMEA) <strong>and</strong> <strong>Effects</strong> Analysis for Machinery(Machinery FMEA) , Society <strong>of</strong> Automotive Engineers, available online <strong>from</strong> www.sae.org• SEMATECH (1992) “<strong>Failure</strong> <strong>Modes</strong> <strong>and</strong> <strong>Effects</strong> Analysis (FMEA): A Guide for Continuous Improvement for theSemiconductor Equipment Industry”, Technology Transfer #92020963B-ENG, available online athttp://www.ismi.sematech.org/docubase/abstracts/92020963B-ENG.htmAll trademarks, service marks, <strong>and</strong> trade names are the property <strong>of</strong> their respectiveowners25

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!