Protecting Cadence Data with DesignSync - Flirting With Disaster

Protecting Cadence Data with DesignSync - Flirting With Disaster Protecting Cadence Data with DesignSync - Flirting With Disaster

12.07.2015 Views

Protecting Cadence Data with DesignSync - Flirting With DisasterJohn ThompsonMember of Technical Staff, Agere Systemsjt99@agere.comAbstractIs your Cadence design data safe? Many groups use DesignSync vaults to manage their design data and controlaccess to it, but do not actually keep it safe. Backup procedures are sloppy or incorrect, and human error on thepart of the Synchronicity administrator means that you may be flirting with disaster. This presentation will focus ontwo related topics – preventing data loss in DesignSync, and making sure you can recover data when it is lost.Participants will learn what not to do as Synchronicity vault administrators, and will be shown methods that willprotect the integrity of design data, balanced against the needs of the designers to access the information. Themethods which will be presented are the result of careful planning, augmented by the lessons learned from severalSynchronicity vault loss disasters.1. IntroductionIt should go without saying – your data is important.This is symbolized by Synchronicity’s (nowMatrixOne) decision to refer to their data repository asa “vault” where you check information in and out. Butunless you keep the vault locked up with strict controlover who has the combination, you’re just waiting forproblems. And if your data is indeed valuable, but youaren’t insuring it properly with backups that work incoordination with the Synchronicity server, you’reflirting with disaster.There are several ways that data can be lost orcorrupted:1. There can be a hardware problem that causes datato be written incorrectly, or just plain lost.2. There can be a software error that causes data tobe lost or corrupted.3. A person – employee or outsider – can improperlygain access, and maliciously remove or alter thedata.4. An employee who improperly has access canunintentionally remove or alter the data.5. An employee who properly has access canunintentionally remove or alter the data.The first three are beyond the scope of thispresentation, except for dealing with the effects of suchdamage. This paper will not deal with hardwareproblems or malicious destruction, except toacknowledge that they exist, as one reason for regular,accurate, restorable backups.This presentation will deal with human errors, though.In many cases, the first reaction to such errors is to saythat they shouldn’t have occurred. However, it is not apractical to simply say that human errors should not bemade. While proper training is necessary, and canminimize the frequency of human error, it is essentialto recognize that mistakes will happen, and to put inprocedures which minimize the impact of thoseinevitable errors.2. Acronyms, and Terms UsedITgtarThe Information Technology departmentwithin Agere Systems. This group isresponsible, among other things, forperforming system backups.The Free Software Foundation’s tapearchiveprogram.setUID UNIX programs can have an access bitset so that when they are run, theyoperate with the privileges of the UNIXowner of the program, rather than thoseof the person who invoked the program.This can allow an ordinary user totemporarily act as a privileged user, to(for instance) write data to a directorylocked down against changes by mostusers.dsscSynchronicity’s command-line programfor operating on the workspace and vault.

<strong>Protecting</strong> <strong>Cadence</strong> <strong>Data</strong> <strong>with</strong> <strong>DesignSync</strong> - <strong>Flirting</strong> <strong>With</strong> <strong>Disaster</strong>John ThompsonMember of Technical Staff, Agere Systemsjt99@agere.comAbstractIs your <strong>Cadence</strong> design data safe? Many groups use <strong>DesignSync</strong> vaults to manage their design data and controlaccess to it, but do not actually keep it safe. Backup procedures are sloppy or incorrect, and human error on thepart of the Synchronicity administrator means that you may be flirting <strong>with</strong> disaster. This presentation will focus ontwo related topics – preventing data loss in <strong>DesignSync</strong>, and making sure you can recover data when it is lost.Participants will learn what not to do as Synchronicity vault administrators, and will be shown methods that willprotect the integrity of design data, balanced against the needs of the designers to access the information. Themethods which will be presented are the result of careful planning, augmented by the lessons learned from severalSynchronicity vault loss disasters.1. IntroductionIt should go <strong>with</strong>out saying – your data is important.This is symbolized by Synchronicity’s (nowMatrixOne) decision to refer to their data repository asa “vault” where you check information in and out. Butunless you keep the vault locked up <strong>with</strong> strict controlover who has the combination, you’re just waiting forproblems. And if your data is indeed valuable, but youaren’t insuring it properly <strong>with</strong> backups that work incoordination <strong>with</strong> the Synchronicity server, you’reflirting <strong>with</strong> disaster.There are several ways that data can be lost orcorrupted:1. There can be a hardware problem that causes datato be written incorrectly, or just plain lost.2. There can be a software error that causes data tobe lost or corrupted.3. A person – employee or outsider – can improperlygain access, and maliciously remove or alter thedata.4. An employee who improperly has access canunintentionally remove or alter the data.5. An employee who properly has access canunintentionally remove or alter the data.The first three are beyond the scope of thispresentation, except for dealing <strong>with</strong> the effects of suchdamage. This paper will not deal <strong>with</strong> hardwareproblems or malicious destruction, except toacknowledge that they exist, as one reason for regular,accurate, restorable backups.This presentation will deal <strong>with</strong> human errors, though.In many cases, the first reaction to such errors is to saythat they shouldn’t have occurred. However, it is not apractical to simply say that human errors should not bemade. While proper training is necessary, and canminimize the frequency of human error, it is essentialto recognize that mistakes will happen, and to put inprocedures which minimize the impact of thoseinevitable errors.2. Acronyms, and Terms UsedITgtarThe Information Technology department<strong>with</strong>in Agere Systems. This group isresponsible, among other things, forperforming system backups.The Free Software Foundation’s tapearchiveprogram.setUID UNIX programs can have an access bitset so that when they are run, theyoperate <strong>with</strong> the privileges of the UNIXowner of the program, rather than thoseof the person who invoked the program.This can allow an ordinary user totemporarily act as a privileged user, to(for instance) write data to a directorylocked down against changes by mostusers.dsscSynchronicity’s command-line programfor operating on the workspace and vault.


3. Initial EnvironmentWe already had a mature, well defined methodologyfor using <strong>Cadence</strong> and Synchronicity for our designs.We found early on that it was essential to havededicated space allocated for the vaults, to minimizethe possibility of running out of disk space duringnormal operation. We also quickly determined thatusing the setUID protection method for theSynchronicity mirrors was the better option. Usingdefault UNIX group-ownership protection alloweddesigners to accidentally edit <strong>Cadence</strong> cells located inthe Synchronicity mirror, rather than having themchecked out in locked mode, edited, and then checkedback in to the vault. We defined a logical structure forthe location of <strong>Cadence</strong> libraries <strong>with</strong>in theSynchronicity vault, helping designers to quicklylocate libraries associated <strong>with</strong> their work, and to allowthe EDA department to provide GUIs and scripts toautomate the selection, population, and removal oflibraries from a workspace. Finally, we had anextensive set of access controls and client-side triggers,to prevent designers from accidentally damaging theSynchronicity vault contents, and to control whichdesigners had edit access in the various design libraries[1].4. Synchronicity Vault MaintenanceAlthough we used Synchronicity access controls torestrict users from deleting objects from the vault, thistask still needed to be performed periodically. Userscreated libraries, cells, and views which they laterdetermine are neither needed nor wanted. We felt thatlimiting the “rmvault” and “rmfolder” commands to agroup of experienced vault administrators wassufficient protection. That was not the case.On at least two occasions, different Synchronicityadministrators <strong>with</strong>in Agere issued an improper“rmfolder” command in <strong>DesignSync</strong>. In one of thecases, by reviewing Synchronicity log files, along <strong>with</strong>the saved activity log of the administrator, we wereable to determine that the path of the folder to beremoved had an unintentional space inserted. This wasprobably caused by an errant “cut&paste” operation. Itis suspected, but not proven, that the second caseoccurred in a similar manner.The effect of this unintended space was both simpleand dramatic. Instead of a single Synchronicity vault-URL path, the rmfolder command was given two paths– the first was a valid reference to a folder <strong>with</strong>in thevault, and the second was an invalid (nonexistent) pathof a file object. The server, of course, began to dowhat it was told.When the dssc program did not immediately return aprompt, the mistake was noticed, but the ramificationsof it were not recognized. The command was aborted<strong>with</strong> a control-C, and the dssc program returned aprompt. It was believed that problems had beenavoided. Approximately three hours later, theSynchronicity server began its deletion of the folder<strong>with</strong>in the vault. A few hours after that, nearly theentire vault had been deleted, as “requested” by aSynchronicity vault administrator.This problem could have been avoided if it had beenrecognized immediately that the Synchronicity dsscprogram was not actually doing any work, but wasinstead simply waiting for the server to complete thecommand and return a status. Aborting the dssccommand did nothing, other than tell it to not wait forthe server to finish performing the rmfolder command.The Synchronicity server process, however, wascontinuing to deal <strong>with</strong> the command it had beengiven. It collected the list of all objects that were to beremoved, and then it removed them. Although notobvious, shutting the Synchronicity server down, andrestarting it, would have prevented the errors. If theserver had been faster, or the unintentionallyreferenced folder had been smaller, this would not havebeen an option. There is no other mechanism foraborting a command sent to the Synchronicity server.As <strong>with</strong> most human errors, this “should not” havehappened. But the root of the problem was that wechose to count on experienced users making absolutelyno mistakes. As our experience showed, this is notsomething that should be counted on.Completely eliminating this problem is difficult,though, if not impossible. It is not practical tocompletely remove the capability to clean up theSynchronicity vault. After a review of the problem,and possible solutions to it, we came up <strong>with</strong> thefollowing methodology, implemented through accesscontrols :1. Users <strong>with</strong> read-only access to the Synchronicityvault are not allowed to delete anything. (Thiswas not a change.)2. Designers (users <strong>with</strong> edit access) are allowed todelete cells and cellviews.3. Synchronicity administrators are allowed to deleteanything up to and including a <strong>Cadence</strong> library.4. In cases where deletion is permitted, there is amaximum number of objects that is allowed, andthe user is prompted <strong>with</strong> a list of the objects that


are to be deleted, and queried as to whether it’scorrect.5. If a directory of libraries must be removed, forsome reason, we must explicitly open up accesscontrols before doing the work. If and when thisis done, multiple administrators are involved, andcross-check the commands that are to be executed.We were able to implement this control because wehave a well-defined hierarchy <strong>with</strong>in our Synchronicityvault for where <strong>Cadence</strong> libraries are kept. At siteswhere this is not the case, it might be sufficient, andpreferable, to allow deletion of objects that are at leastN levels down in the Synchronicity Projects directory,for some value or values of N.Our new methodology actually opens permissions a bitfor designers, and lets them clean up after themselvesfor small items. Along <strong>with</strong> making work easier forthese users, it limits the frequency that administrators,<strong>with</strong> more powerful delete privileges, and thus moreability to cause problems, have to go in and removeobjects from the vault.5. Backup and RecoveryAt our company, there is a separate business grouptasked <strong>with</strong> performing backups. They did not havefamiliarity <strong>with</strong> the Synchronicity software, nor didthey have experience <strong>with</strong> backing up a database whichneeded to be put into a “safe” state by external (tothem) means. Because of this, and lessons learnedalong the way, the backup methodology used at ourcompany had to evolve.When Synchronicity first provided for “live backups”of their databases, we created a cron script to runduring off-peak hours, prior to the anticipated start ofbackup-to-tape of the disk volume containing thevaults. The sole purpose of the script was to run theSynchronicity “backup” command. This had severalshortcomings:1. Because the Synchronicity backup needed to becomplete before the tape backup started, andbecause the duration of the task was not entirelypredictable, we had to schedule it to begin earlierthan we wanted. Occasionally, users wereadversely affected by the Synchronicity backupbeing performed.2. Initially, the IT group simply backed up the entiredisk volume where the vaults were located.Although this succeeded in archiving all the data,it did not make it easy to restore the information,because the software package used did not handlehard-links in an ideal manner. Unless you restoredat a directory level which included all copies of allhard-linked objects, and restored it in-place youcould not guarantee that a given hard-linked filewould be restored. The first-encountered instancewas backed up, but all subsequence hard-linkedcopies were backed up only as a reference to thefirst one encountered. If a sub-directory wasrestored, or if the directory was restored to adifferent location, the software failed to properlyrestore any hard-linked objects where the restoredobject was the second or third instance that hadbeen encountered in the original backup.3. Because copies were being made of every singleobject in the vault, the indexing programs for thebackup software had to track literally millions ofdifferent objects, storing information about wherethey were located on tapes, when they werebacked up, etc. Although not actually harmful,this information was useless to us, and did clutterup the backup indexes, making searches takelonger.The second issue was initially worked around byhaving the IT group archive to tape only theSynchronicity “backup” directories. As long as wecleaned out the Synchronicity backup each day, therewas only one instance of any given file. Therefore,there were no problems <strong>with</strong> reloading the backup to anew location for testing to ensure a successful backup.We encountered a problem, however, which made uschange our backup methodology significantly. Whileperforming a test restoration, we discovered twothings: first, the IT department had not turned off thebackups of the entire disk volume, and second, therestoration method was prone to human error. Whenasked to do a restore from tape, so we could test thebackups, the most recent completed backup was lookedup and used. This was from the backup of the entiredisk volume. Compounding the error, the backup wasaccidentally restored in place, rather than restored in alocation off to the side as requested. When therestoration began, the first thing done by the softwarewas to remove the existing directory – which was anactive, running, “good” vault.This should not have occurred, of course. The backupof the entire disk volume should have been stopped; weshould have specified the backup we wanted restored;the backup should have been restored into the specifiedlocation instead of in place. But the underlying errorwas that we used a backup method where it was


possible for an organization unfamiliar <strong>with</strong> thespecifics of our data to corrupt the data at our request.To prevent this error in the future, we changed ourbackup methodology significantly. We now performbackups as follows, using a cron script:1. The synchronicity server is issued the “backup”command using dssc.2. Once this command completes, the script uses gtarto archive the “backup” directory to a file (tarball)located on a file server used expressly for thispurpose.3. Then, cleanup of the Synchronicity backupdirectory, and removal of any “old” (currently,older than two weeks) backup tarballs occurs.4. On its own schedule, the IT department performsdaily backups of all tarballs on the file server.5. A script was written to take a tarball, restore it to atest location, and use Synchronicity commands torestore the backup in to a running test vault.Using this method addresses all the issues we had <strong>with</strong>backups:1. Because the tarballs are not hard-linked files, thereare no issues <strong>with</strong> the IT backup software.2. The tarballs are located on a different volume, andtherefore a different directory, than the actualSynchronicity vaults. No reasonable human errorcan cause an active vault to be overwrittenaccidentally.3. Because we have a disk volume dedicated to ourSynchronicity vaults, there is no significant chancethat IT will accidentally or “helpfully” add thevaults to their backup routine. The entire volumeis known to be off limits, and not in need ofbackups.4. Because we maintain multiple backups on a fileserver, there is no need to coordinate backup times<strong>with</strong> the IT department. We can schedule the vaultbackup to occur at the time best suited for us, andIT can back up the tarballs at the time best suitedfor them.5. Although it was more an inconvenience than anactual problem, the backup software is no longermaintaining useless information on the millions ofseparate objects which make up the vault database.6. If and when an actual vault restoration needs to bemade, we can perform it more quickly, since thebackup is on a disk, rather than on one or moretapes.As noted above, we regularly perform restorations ofSynchronicity data to a test vault, to ensure that thebackup which was made can be restored properly.This is, we believe, an essential part of any backupmethodology. The worst time to discover that yourbackup didn’t work is when you actually need it.6. Future InvestigationsNo methodology should ever be considered final. Ourneeds are constantly evolving, and our methodologymust change, in a controlled manner, to meet thoseneeds. Here is a partial list of changes we arecontemplating:• Incremental Backups. To allow for easierrestorations, as well as simplifying the backupprocedure, we are currently performing fullbackups on the Synchronicity vaults every night.The tradeoff of this is, of course, that a backuptakes longer. As the vault grows, and as our needto access it at all hours increases, we need toinvestigate the use of incremental backups, tominimize the time that the vault is unavailable foredits.• Local Vault Storage. There is less benefit inhaving the vault data kept on a dedicated fileserver, since the information is not backed updirectly by system software. The advantages ofhot-swappable RAID disks may now beoutweighed by speed increases, both for backupand normal access that can be achieved by puttingthe vaults on the machines running theSynchronicity servers. This is, in fact, therecommended configuration from Synchronicity.• Automatic Export of Deleted Objects. Instead ofdeleting the libraries, cells, or views, it would bepreferable to replace the rmfolder command <strong>with</strong> aprocedure that would first export the objects beingdeleted, to allow them to be easily recovered if it’sdetermined that the deletion was not wanted. Wehave not yet been successful in devising such aprocedure.


• Separate vault owner and administrator. By usingsetUID programs, and sending commands to theSynchronicity server, Synchronicity ensures thatall objects in the vault are owned and controlledby a single user ID. For us, that user is the sameas the Synchronicity administrator ID. Althoughsimpler, this leaves a hole in the protections placedon rmfolder. In theory, a Synchronicityadministrator could change-directory to a location<strong>with</strong>in the vault itself, rather than a workspace.From there, if he issues rmfolder to remove localfiles, rather then Synchronicity URLs, accesscontrols would not be run. Doing this, anadministrator could remove vault objectsunintentionally. (This is effectively identical tothe situation where an administrator issues UNIXrm commands to remove files and/or directories.)If the UNIX owner of the vault data was distinctfrom the Synchronicity administrator of that data,the potential for problems could be virtuallyeliminated.7. SummaryUnderstanding what needs to be backed up, and how toperform such backups safely, is essential to a robustmethodology. Don’t assume that the group responsiblefor performing backups knows how to correctlyarchive Synchronicity data, and don’t assume that thevault is just a collection of bytes. Synchronicityprovides a mechanism for safely making a copy of thedata which can then be backed up, and this processmust be used. If at all possible, only archive theSynchronicity backup to tape. Copying the entire vaultmay just take up time and space, but in the worst case,it leaves you open to problems, either preventing arestoration from succeeding, or allowing human errorto overwrite your good data. Consider a two-stagebackup scheme as well, where you create an archivefile of the vault backup, and have that single objectbacked up onto permanent storage. Most important,whatever mechanism you use, you need to periodicallycheck it, to make sure that it is still working. Unlessand until you can restore the data, you don’t reallyknow that you’ve backed it up.Having a robust backup methodology does not meanyou should have to count on it, though. Anycommands which can irreversibly affect the vault needto be closely examined, and protections appropriate toyour environment need to be placed on thosecommands, via Synchronicity access controls. Themost dangerous command is probably rmfolder. Treatit as the potential powder keg that it is, and do notcount on anyone, no matter their experience, operatingcompletely <strong>with</strong>out errors.8. References[1] “Customizing <strong>Cadence</strong>/Synchronicity to Create a DesignReuse Methodology in an Analog/Mixed Signal DFIIEnvironment”, 2003 ICUG Presentation.9. AcknowledgementsI would like to acknowledge the efforts of the EDASynchronicity team at Mendota Heights. Our entiremethodology would not be possible <strong>with</strong>out them.Doug Spielberg, Dan Galena, and Jeff Hildebrand(who has left Agere for other opportunities) have allcontributed to the productive environment we have.Dale Duller, <strong>with</strong> Agar’s Design PlatformOrganization, has also provided valuable insight.Finally, I would like to thank James McCollum andVicki Nelson for their input on this presentation.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!