27.12.2014 Views

SUB=MSTR

SUB=MSTR

SUB=MSTR

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Bit Bucket X’27X<br />

27’<br />

Ed Jaffe, edjaffe@phoenixsoftware.com<br />

Brian Peterson, brian_d_peterson@bluecrossmn.com<br />

Sam Knutson, SKnutson@geico.com<br />

Skip Robinson, robinsjo@sce.com<br />

SHARE 114<br />

Session 2208<br />

Seattle, WA<br />

18 March 2006


I’m JES Not That Into You


JES is a Bottleneck for z/OS Networking<br />

• Many installations force their z/OS networking to depend<br />

upon the presence of JES.<br />

• This is both unnecessary and annoying. It is an area in<br />

which I have been openly critical of IBM. z/OS needs to<br />

evolve away from being a hodgepodge of bolt-on<br />

components.<br />

• On other platforms, networking is just one of many things<br />

“baked” into the operating system.<br />

• z/OS components for consoles, serialization, workload<br />

management, recording, catalog, recovery, etc. are “baked”<br />

in. Why not networking (Answer: Probably because these<br />

components were developed outside Poughkeepsie. ☺ )<br />

• Fortunately, this is one of the easiest things to change!


“Problem” z/OS Startup Dependency Chart<br />

MASTER<br />

WLM GRS CONSOLE JES<br />

Others…<br />

VTAM<br />

TCP/IP<br />

TCAS


Some Obvious Drawbacks<br />

• Additional dependencies are never good.<br />

• JES often uses the networking components for NJE and<br />

must wait until VTAM and/or TCP/IP are available. (This<br />

results in a “chicken & egg” type of issue.)<br />

• A problem with JES startup means no network and no<br />

TSO/E!<br />

• This leaves you unable to easily fix what is most likely a<br />

trivial problem.<br />

• Of course, if you have shared DASD with another system, you can<br />

fix some things that way and re-IPL.<br />

• For JES2 only, you cannot get JES2 to come down cleanly<br />

until all address spaces started under JES2 have shut<br />

down. This means you must wait for VTAM and TCP/IP to<br />

come down before you can proceed with shutting down<br />

JES2.


The Solution Bypass JES. Run Under MSTR.<br />

• VTAM can run under MSTR. If it does, it might be ready<br />

before JES needs it for networking.<br />

• TCPIP can run under MSTR. If it does, it might be ready<br />

before JES needs it for networking.<br />

• TCAS can run under MSTR. If it does, you can LOGON to<br />

TSO/E and use ISPF even if JES is down!<br />

• The secret to running under MSTR is to remove the use of<br />

SYSOUT DDs in the JCL procedures.<br />

• Note: If there ever were restrictions about data sets in<br />

user catalogs, they were lifted long ago. You can<br />

reference data sets cataloged in user catalogs from JCL<br />

submitted with <strong>SUB=MSTR</strong>. (I verified this just a couple<br />

of hours ago! ☺ )


Running VTAM Under MSTR<br />

//VTAM PROC<br />

//VTAM EXEC PGM=ISTINM01,REGION=0M,<br />

// DPRTY=(15,15),TIME=1440,PERFORM=8<br />

//VTAMLST DD DISP=SHR,DSN=SYS1.VTAMLST<br />

//VTAMLIB DD DISP=SHR,DSN=SYS1.VTAMLIB<br />

//SISTCLIB DD DISP=SHR,DSN=SYS1.SISTCLIB<br />

//SYSABEND DD SYSOUT=*,HOLD=YES<br />

//DSDBCTRL DD DSN=SYS1.DSDBCTRL,DISP=SHR<br />

//DSDB1 DD DSN=SYS1.DSDB1,DISP=SHR<br />

//DSDB2 DD DSN=SYS1.DSDB2,DISP=SHR<br />

//TRSDB DD DSN=SYS1.TRSDB,DISP=SHR<br />

I found this VTAM procedure on a system delivered<br />

with IBM’s ADCD. Who in this world is going to bother<br />

looking at a SYSABEND if VTAM fails Use IPCS to<br />

look at the SVC dump. Just remove the statement in<br />

red!


Running TCPIP Under MSTR (Before)<br />

//TCPIP PROC PARMS='CTRACE(CTIEZB00)'<br />

//TCPIP EXEC PGM=EZBTCPIP,REGION=0M,TIME=1440,<br />

// PARM='&PARMS'<br />

//SYSPRINT DD SYSOUT=H,DCB=(RECFM=VB,LRECL=132,BLKSIZE=136)<br />

//ALGPRINT DD SYSOUT=*,DCB=(RECFM=VB,LRECL=132,BLKSIZE=136)<br />

//CFGPRINT DD SYSOUT=*,DCB=(RECFM=VB,LRECL=132,BLKSIZE=136)<br />

//SYSOUT DD SYSOUT=H,DCB=(RECFM=VB,LRECL=132,BLKSIZE=136)<br />

//CEEDUMP DD SYSOUT=*,DCB=(RECFM=VB,LRECL=132,BLKSIZE=136)<br />

//SYSERROR DD SYSOUT=*<br />

//*SYSTCPD DD DSN=TCPIP.SEZAINST(TCPDATA),DISP=SHR<br />

//PROFILE DD DISP=SHR,DSN=SYS1.TCPPARMS(PROF&SYSCLONE)<br />

I found this TCPIP procedure on a system delivered with<br />

IBM’s ADCD. There are six SYSOUT DD statements.<br />

They must be converted to use data sets.


Running TCPIP Under MSTR (After)<br />

//TCPIP PROC PARMS='CTRACE(CTIEZB00)'<br />

//TCPIP EXEC PGM=EZBTCPIP,REGION=0M,TIME=1440,<br />

// PARM='&PARMS'<br />

//SYSPRINT DD DSN=SYS2.TCPIP.&SYSNAME..SYSPRINT,<br />

// DISP=SHR,FREE=CLOSE<br />

//ALGPRINT DD DSN=SYS2.TCPIP.&SYSNAME..ALGPRINT,<br />

// DISP=SHR,FREE=CLOSE<br />

//SYSOUT DD DSN=SYS2.TCPIP.&SYSNAME..SYSOUT,<br />

// DISP=SHR,FREE=CLOSE<br />

//CEEDUMP DD DSN=SYS2.TCPIP.&SYSNAME..CEEDUMP,<br />

// DISP=SHR,FREE=CLOSE<br />

//SYSERROR DD DSN=SYS2.TCPIP.&SYSNAME..SYSERROR,<br />

// DISP=SHR,FREE=CLOSE<br />

//CFGPRINT DD DSN=SYS2.TCPIP.&SYSNAME..CFGPRINT,<br />

// DISP=SHR,FREE=CLOSE<br />

//*SYSTCPD DD DSN=TCPIP.SEZAINST(TCPDATA),DISP=SHR<br />

//PROFILE DD DISP=SHR,DSN=SYS1.TCPPARMS(PROF&SYSCLONE)


Running TN3270 Under MSTR (Before)<br />

//TN3270 PROC PARMS='CTRACE(CTIEZBTN)'<br />

//TN3270 EXEC PGM=EZBTNINI,REGION=0M,PARM='&PARMS'<br />

//SYSPRINT DD SYSOUT=*,DCB=(RECFM=VB,LRECL=132,BLKSIZE=136)<br />

//SYSOUT DD SYSOUT=*,DCB=(RECFM=VB,LRECL=132,BLKSIZE=136)<br />

//CEEDUMP DD SYSOUT=*,DCB=(RECFM=VB,LRECL=132,BLKSIZE=136)<br />

//*TNDBCSCN DD DISP=SHR,DSN=TCPIP.SEZAINST(TNDBCSCN)<br />

//*TNDBCSXL DD DISP=SHR,DSN=TCPIP.SEZAXLD2<br />

//*TNDBCSER DD SYSOUT=*<br />

//PROFILE DD DSN=ADCD.Z111.TCPPARMS(TN3270),DISP=SHR<br />

//*SYSTCPD DD ...<br />

I found this TN3270 procedure on a system delivered<br />

with IBM’s ADCD. There are three SYSOUT DD<br />

statements. They must be converted to use data sets.


Running TN3270 Under MSTR (After)<br />

//TN3270 PROC PARMS='CTRACE(CTIEZBTN)'<br />

//TN3270 EXEC PGM=EZBTNINI,REGION=0M,PARM='&PARMS'<br />

//SYSPRINT DD DSN=SYS2.TN3270.&SYSNAME..SYSPRINT,<br />

// DISP=SHR,FREE=CLOSE<br />

//SYSOUT DD DSN=SYS2.TN3270.&SYSNAME..SYSOUT,<br />

// DISP=SHR,FREE=CLOSE<br />

//CEEDUMP DD DSN=SYS2.TN3270.&SYSNAME..CEEDUMP,<br />

// DISP=SHR,FREE=CLOSE<br />

//*TNDBCSCN DD DISP=SHR,DSN=TCPIP.SEZAINST(TNDBCSCN)<br />

//*TNDBCSXL DD DISP=SHR,DSN=TCPIP.SEZAXLD2<br />

//*TNDBCSER DD SYSOUT=*<br />

//PROFILE DD DSN=ADCD.Z111.TCPPARMS(TN3270),DISP=SHR<br />

//*SYSTCPD DD ...<br />

Similar changes can be made for other TCP/IP server<br />

address spaces if needed.


Running TCAS Under MSTR<br />

//TSO PROC MBR=TSOKEY00<br />

//STEP1 EXEC PGM=IKTCAS00,TIME=1440<br />

//PARMLIB DD DSN=ADCD.Z111.PARMLIB(&MBR),<br />

// DISP=SHR,FREE=CLOSE<br />

//PRINTOUT DD SYSOUT=*,FREE=CLOSE<br />

//*<br />

I found the above TCAS procedure on a system delivered<br />

with IBM’s ADCD. The PRINTOUT DD statements must<br />

be converted to use a RECFM=FA, LRECL=133 data set.<br />

//TSO PROC MBR=TSOKEY00<br />

//STEP1 EXEC PGM=IKTCAS00,TIME=1440<br />

//PARMLIB DD DSN=ADCD.Z111.PARMLIB(&MBR),<br />

// DISP=SHR,FREE=CLOSE<br />

//PRINTOUT DD DSN=SYS2.TCAS.&SYSNAME..PRINTOUT,<br />

// DISP=SHR,FREE=CLOSE<br />

//*


Updated Start Commands For These STCs<br />

• The main thing you have to do is put <strong>SUB=MSTR</strong> on the<br />

start command. Otherwise, it will continue to start under<br />

JES.<br />

• If you want to avoid clutter on your log, you can optionally<br />

add MSGLEVEL=(0,0) to the command.<br />

START VTAM,,,(LIST=&SYSCLONE.),<strong>SUB=MSTR</strong><br />

START TCPIP,<strong>SUB=MSTR</strong><br />

START TCAS,MBR=TSOKEY&TSOKEY,<strong>SUB=MSTR</strong><br />

Or...<br />

START VTAM,,,(LIST=&SYSCLONE.),<strong>SUB=MSTR</strong>,MSGLEVEL=(0,0)<br />

START TCPIP,<strong>SUB=MSTR</strong>,MSGLEVEL=(0,0)<br />

START TCAS,MBR=TSOKEY&TSOKEY,<strong>SUB=MSTR</strong>,MSGLEVEL=(0,0)


Revised z/OS Startup Dependency Chart<br />

MASTER<br />

WLM GRS CONSOLE JES VTAM TCPIP TCAS Others…


TSO/E LOGON To Other Than Primary Subsystem<br />

• By default, TSO/E LOGON always goes to the primary<br />

subsystem—even if TCAS itself is running with<br />

<strong>SUB=MSTR</strong>.<br />

• A LOGON pre-prompt exit (IKJEFLD1) allows users to<br />

specify under which subsystem they wish to LOGON.<br />

• There are several similar exits “floating” around. They’ve<br />

been used by clever sysprogs for decades to allow LOGON<br />

to JES2 running as a secondary subsystem (aka Poly-JES).<br />

• I inherited one from the folks at IBM Global Services<br />

back in the 1990s before everything was outsourced to<br />

Brazil.<br />

• [Aside: Those POK sysprogs were among the best I’ve ever known!]<br />

• The exit is available from<br />

http://cbttape.org/ftp/cbt/CBT377.zip


TSO/E LOGON To Other Than Primary Subsystem<br />

• This pre-prompt exit is activated only when the userid<br />

passed as response to the LOGON prompt is prefixed with<br />

“”.<br />

• You are prompted to enter your desired subsystem name.<br />

• After that the LOGON proceeds as normal.


Tricks For Successful LOGON Under MSTR (Part 1)<br />

• For a LOGON under MSTR to work, the TSO/E LOGON<br />

processor must specify JSTCB=YES on the ATTACH<br />

macro. Otherwise, you will experience abend 0B5 (unable<br />

to attach converter) resulting in a couple of SVC dumps<br />

per attempt.<br />

• The following ZAP changes the ATTACH as needed:<br />

NAME IKJEFLA1 IKJEFLB<br />

VER 0924 0000009D,C9D2D1C5C6D3C340<br />

REP 0924 0000009F,C9D2D1C5C6D3C340<br />

• This ZAP has been working without change in offsets since<br />

day one. If the ATTACH macro parameter list ever moves,<br />

a rework of the ZAP should be trivial.<br />

• I most recently installed this ZAP under z/OS 1.12 and all<br />

works as expected.


Tricks For Successful LOGON Under MSTR (Part 2)<br />

• Another annoyance if you LOGON under MSTR is the<br />

appearance of IKJ56457I PROGRAM ERROR and SVC<br />

dump at logoff time.<br />

• This is due to one of the most common occurrences on a<br />

z/OS system—the 33E abend. I’m sure you see these all<br />

the time on your log:<br />

IEA989I SLIP TRAP ID=X33E MATCHED. JOBNAME=DFHSM60 , ASID=0089.<br />

• Because of the TCB structure when logging on under<br />

MSTR, the 33E is seen by TSO/E code and results in the<br />

error message and SVC dump.<br />

• A simple IKJEFLD2 exit (also available in the same CBT<br />

tape package) converts the return code in the JOB<br />

SCHEDULING EXIT LIST from x’24’ (meaning x’33E’<br />

abend occurred) to x’00’ to avoid the message.


What Works And What Doesn’t Under MSTR<br />

• Everything that doesn’t require JES works.<br />

• You have full ISPF editing and dialog support, catalog,<br />

access methods, everything you could possibly want—even<br />

OMVS and Java.<br />

• If you have the right SPOOL Browse software, you might<br />

be able to use it to access information from your primary<br />

or secondary JES even while logged on under MSTR.<br />

• You cannot submit jobs or print anything from your<br />

TSO/E session. (That would require JES.)


Trying To Submit While Logged On Under MSTR


Decrypting z/OS Unix<br />

Crypticisms


BPXMTEXT Gives Quick and Useful z/OS UNIX Help<br />

• BPXMTEXT displays the description and action text for a reason<br />

code returned from the z/OS UNIX kernel, errnojr values<br />

returned from the C/C++ run-time library, and TCP/IP errno values.<br />

• For zFS reason codes (EFxxnnnn), the xx part of the reason code is<br />

not used to display the module name. (It always displays zFS.)<br />

Therefore, you can use EF00nnnn for zFS reason codes.<br />

• BPXMTEXT internally invokes another REXX called /bin/edcmtext<br />

• EDCMTEXT:<br />

• Validates the reason code parameter<br />

• Calls module EDCEJR via LINKMVS<br />

• Parses and displays the output in a useful format<br />

• BPXMTEXT accepts a single argument: the cryptic reason code. If<br />

you pass no parameters, it tells you what it wants.<br />

• Recommendation: Be sure SYS1.BPXEXEC is on your default<br />

SYSEXEC or SYSPROC concatenation under TSO/E!


BPXMTEXT Gives Quick and Useful z/OS UNIX Help


Does Anybody Really Know<br />

What Time It Is


Sysplex Time<br />

• In the beginning (of Sysplex), there was a device called<br />

“Sysplex Timer”<br />

• Dedicated appliance<br />

• In our experience, rock solid<br />

• Provided the time, always correct<br />

• Became too old, replaced by Server Time Protocol (STP)<br />

−<br />

Lots of great reasons and advantages


STP in a Single Box<br />

• STP originally designed to coordinate time amongst<br />

multiple boxes<br />

• At the same time as STP was developed, CPU boxes<br />

became bigger and bigger<br />

• We implemented STP at the same time as we went from<br />

three boxes (z800/z900s) to one box (z10)


STP and POR<br />

• STP basically “runs” on the System Assist Processor<br />

(SAP) engines<br />

• SAP engines don't run during POR<br />

• What happens during/after POR


STP History<br />

• When STP first came out, STP configurations were lost<br />

during POR<br />

• Because of this, the sysprog had to redefine the<br />

configurations<br />

• STP enhanced to save the configurations across POR


The Sad Story<br />

• We did a planned POR - successful<br />

• After the POR, we performed an Activate of the machine<br />

-successful<br />

• After the Activate, we IPLed each LPAR – successful<br />

• After 2.5 hours, we noticed that STP was 630 seconds<br />

FAST - OOPS!<br />

• A PMH has been opened with IBM. Presumably the<br />

problem will be understood soon.


Lessons Learned<br />

• We don't know why this happened<br />

• Single box STP timing networks seem to have been an<br />

afterthought over the years<br />

• After every POR or ACTIVATE, check the STP time and<br />

make sure it's correct<br />

• ESPECIALLY (maybe only) in a single box STP timing<br />

network<br />

• If you encounter anything in your hardware that looks<br />

wrong, open a PMH with IBM as soon as possible.<br />

• You can always close it later if it turns out to be user error.


TO LIVE AND DIE IN LLA


TO LIVE AND DIE IN LLA<br />

• S LLA,<strong>SUB=MSTR</strong>,REUSASID=YES,LLA=00 is the “correct” way to<br />

start LLA at our shop<br />

• ASID reuse was introduced in z/OS 1.9 and we enable in DIAGxx on<br />

some LPARs i.e. REUSASID(YES)<br />

• When a reusable ASID is requested by the START command or the<br />

ASCRE macro, this reusable ASID is assigned if REUSASID(YES)<br />

is specified in DIAGxx. If REUSASID(NO) is specified in DIAGxx,<br />

an ordinary ASID is assigned. The default is REUSASID(NO). The<br />

use of reusable ASIDs might result in system 0D3 abends, if<br />

products or programs have not been upgraded to tolerate reusable<br />

ASIDs. For more information about reusable ASIDs, see z/OS<br />

MVS Programming: Extended Addressability Guide.<br />

Omit the LLA=00 and you find that you only manage link list<br />

• Omit <strong>SUB=MSTR</strong> and from z/OS 1.9 LLA will restart itself and tell<br />

you to include it in the future i.e. CSV209I LIBRARY LOOKASIDE<br />

START WILL BE RETRIED, ADDING "<strong>SUB=MSTR</strong>" WHICH IS<br />

REQUIRED ON THE START LLA COMMAND


TO LIVE AND DIE IN LLA<br />

/* CSVLLA00 FOR TESTPLX */<br />

/*******************************/<br />

/* LLA EXITS GEICO USES CSVLLIX1 TO SUPPORT IBM MODULE FETCH MONITOR */<br />

/* AND OUR OWN LOCAL CODE WHICH MONITORS */<br />

/*******************************/<br />

EXIT1(ON)<br />

EXIT2(OFF)<br />

/******************************/<br />

/* LINK LIST */<br />

/******************************/<br />

LIBRARIES(-LNKLST-)<br />

FREEZE(-LNKLST-)<br />

/******************************/<br />

/* OTHER LIBRARIES WE MANAGE FOR PERFORMANCE */<br />

/******************************/<br />

LIBRARIES(SYS2.IMS.RESLIB,PROD2.IMS1.LOAD)<br />

FREEZE(SYS2.IMS.RESLIB,PROD2.IMS1.LOAD)


TO LIVE AND DIE IN LLA<br />

• CSVLLA00 is not used by default and is not shipped by<br />

IBM<br />

• We normally start LLA in SYS1.PARMLIB(IEACMD00) and<br />

never stop it but if it is stopped and restarted an error<br />

can be made<br />

• One simple change you can do to prevent this is update<br />

the PROC to default to LLA=00<br />

• We manage many libraries beyond –LNKLST- with FREEZE<br />

for performance and need to insure that this is not<br />

discontinued unintentionally


TO LIVE AND DIE IN LLA<br />

• SYS1.IBM.PROCLIB(LLA)<br />

//LLA PROC LLA=<br />

//LLA EXEC PGM=CSVLLCRE,REGION=0M,PARM='LLA=&LLA'<br />

• SYS1.PROCLIB(LLA) modified<br />

//LLA PROC LLA=00<br />

//LLA EXEC PGM=CSVLLCRE,REGION=0M,PARM='LLA=&LLA'<br />

• REGION=0M was added in z/OS 1.9 resolved a common<br />

problem previously discussed by IBM at SHARE of potentially<br />

running out of below the line storage in LLA address space


TO LIVE AND DIE IN LLA<br />

• SMF can only be used to research programs run by PGM= in JCL<br />

• Data Set Audit Facility (DAF) freeware is one tool that can be used<br />

to report on PGM= use from SMF without writing your own SAS<br />

reports or program.<br />

• DAF reads standard IBM SMF records and generates detailed<br />

dataset audit trail reports based upon user supplied selection criteria<br />

• http://sites.google.com/site/michaeljosephcleary<br />

• IBM free tool Module Fetch Monitor (MFM) can be used to<br />

understand program fetch activity<br />

• MFM has a program that executes and collects data. The data can be<br />

viewed from an ISPF dialog application. In addition, there is a batch<br />

interface.<br />

• It is real time data collection. When the program is stopped, the data<br />

is lost. There is no historical data collection interface. The data can<br />

be written to two logs files you could archive them.


TO LIVE AND DIE IN LLA<br />

• MFM is a non-warranty program supported as time permits by<br />

IBM but is used by some very large sites with some good<br />

success stories.<br />

• If you just want to do some spot checking or investigate some<br />

things and have a good systems programmer to work with then<br />

MFM and native IBM SMF data may fill your needs without<br />

spending money for a tool like SoftAudit nee Tivoli License<br />

Compliance Manager for z/OS<br />

• Session 2876 Module Fetch Monitor (MFM) User Experience by<br />

Greg Thompson at SHARE 96 in Long Beach, CA February, 2001<br />

is a good introduction<br />

• Appendix B Optimizing use of LLA and VLF Redbook System z<br />

Mean Time to Recovery Best Practices<br />

• You can get a copy of MFM by sending an e-mail to Peter Relson<br />

relson@us.ibm.com at IBM and signing an agreement<br />

• MFM provides the names of modules used within an MVS system<br />

when called through the MVS Contents Supervisor, and<br />

integrates the information coming from LLA, but only if the<br />

library is under LLA control


TO LIVE AND DIE IN LLA<br />

• Undocumented but very useful D LLA,STATS command is also handy<br />

to help determine if a library is being accessed<br />

LIBRARY: SYSOP.TSO.DATAUTIL.LINKLIB<br />

MEMBERS: 30<br />

MEMBERS FETCHED: 12 MEMBERS IN VLF: 0<br />

DASD FETCHES: 296 VLF RETRIEVES: 808<br />

• This "undocumented" command is documented in the Redbook<br />

Partitioned Data Set Extended Usage Guide (SG24-6106)<br />

• The command is a cheap way to get an indication of usage if you don’t<br />

need any history just a point in time answer<br />

• Detail data is also available from D LLA,STATS using the LIBRARY<br />

and MEMBER keywords<br />

• D LLA,STATS,LIBRARY=yourdsn,MEMBER=member,FETCHED


TO LIVE AND DIE IN LLA<br />

• D LLA,STATS,LIBRARY=mydsn,MEMBER=COPY,FETCHED<br />

CSV630I 12.00.12 LLA STATS DISPLAY 954<br />

LIBRARY: SYSOP.TSO.DATAUTIL.LINKLIB<br />

MEMBERS: 1 MEMBERS FETCHED: 1 MEMBERS IN VLF: 0<br />

TOTAL DASD FETCHES: 47 TOTAL VLF RETRIEVES: 200<br />

MEMBER: COPY<br />

DASD FETCHES: 47 VLF RETRIEVES: 200<br />

AVERAGE: 123.0465 LLA VALUE: 270300<br />

• Wildcards are accepted!<br />

• D LLA,STATS,LIBRARY=mydsn,MEMBER=*,FETCHED


TO LIVE AND DIE IN LLA<br />

• A CSVLLIX1 exit is included as a data gathering point<br />

• I sometimes add in a few lines of code to issue a WTO message<br />

when a particular module or library is accessed placing<br />

additional libraries under LLA control if needed.<br />

USING CSVLLIX1,R8 Establish R8 as code register.<br />

USING LLP1,R1 Addressability to LLP1.<br />

CLC LLP1PDS2(8),=C'CACEMCKI' CA-OPTIMIZER stub<br />

BNE NOT_IT Not the module we care about<br />

LR R9,R1 save that across WTO<br />

WTO 'CACEMCKI CA-OPTIMIZER USED BY THIS JOB', X<br />

ROUTCDE=(11)<br />

LR R1,R9 restore LLP1<br />

NOT_IT EQU *


TO LIVE AND DIE IN LLA<br />

• Fetch activity occurs throughout z/OS and errors in CSVLLIX1 can<br />

lead to an outage<br />

• A loop from an incorrectly repeated and not updated branch caused<br />

an outage on the systems programmer sandbox where it was tested<br />

NOT_APL2 EQU *<br />

CLC LLP1DSN(25),=C'SYS3.TECHASST.DCF.LINKLIB'<br />

BNE NOT_DCF Not the library we care about<br />

WTO 'GEI$DCF TECHASST.DCF.LINKLIB USED BY THIS JOB ',<br />

ROUTCDE=(11),MCSFLAG=(HRDCPY)<br />

B NOT_IT Done<br />

NOT_DCF EQU *<br />

CLC LLP1DSN(25),=C'SYS3.TECHASST.DLF.LINKLIB'<br />

BNE NOT_DCF<br />

Not the library we care about


TO LIVE AND DIE IN LLA<br />

• A bad CSVLLIX1 exit in the link list<br />

• START for LLA in PARMLIB(COMMNDxx) so LLA start cannot be easily<br />

bypassed<br />

• Normal systems to be IPLed now very “broken”<br />

• For emergency recovery capabilities, it is recommended that every<br />

installation have a small isolated “Get-Well” system to help in situations of<br />

finger checks or corrupted shared system data sets. IBM Hot Topics #7<br />

• IBM does not tell you how to build one of these<br />

• IBM does not supply by default a starter system although one can be<br />

ordered with ServerPac and other offerings<br />

• I used my one pack system to remove the exit and restarted<br />

• I moved the start for LLA and VLF to my automation package from<br />

IEACMD00 so an error in CSVLLIX1 would not be so painful<br />

• Where do you get a Resurrection System


TO LIVE AND DIE IN LLA<br />

• Mark Zelden’s ONEPAKnn & TWOPAKnn documentation<br />

and jobs<br />

• Good examples of building a system from scratch.<br />

Many sites use this for local recovery or as part of<br />

a DR process<br />

• Download at Mark’s web site or CBT file #434<br />

http://home.flash.net/~mzelden/mvsutil.html or<br />

http://www.cbttape.org<br />

• ZZSA standalone environment (freeware) another<br />

alternative http://www.cbttape.org/~jjaeger/<br />

• Serverpac "Full System Replacement“ packs can be saved<br />

and used as a rescue system<br />

• Commercial products like SAE from New Era


TO LIVE AND DIE IN LLA<br />

• Recovery preparation from less dire errors is also useful.<br />

Other tools you may want to have in place<br />

• LOGON PROC with no datasets to reach TSO READY<br />

quickly when trouble strikes ($RESCUE)<br />

• LOGON PROC with only IBM vanilla ISPF ($IBMISPF)<br />

• TSO to LOGON under MSTR without JES using Ed<br />

Jaffe’s CBT Tape File 377<br />

• FTP can be used in a pinch to update PDS members, data<br />

sets, submit jobs, and view output<br />

• RPF by Ron Prins CBT Tape File 415 & 417 RPF/E


Stumbling Over VASTLST


Stumbling over VASTLST<br />

• A month ago we performed scheduled rolling IPLs<br />

• All four sysplex members were reIPLed at z/OS 1.9<br />

• At all times at least one member was running<br />

• Three members live on a z10, one on a z9<br />

• The last one (on z9) would not IPL<br />

• Got WAIT064-09 on several tries<br />

• Which means: program check during NIP<br />

• We had made no z/OS changes, not even PTFs<br />

• All members share sysres, PARMLIB, etc.<br />

• Standalone dump sent to IBM for Sev 1 PMR<br />

• Two hours later another IPL was attempted<br />

• This one worked even though we had changed nothing


Stumbling over VASTLST<br />

• Meanwhile I looked at standalone dump myself<br />

• MTRACE showed these final messages:<br />

IEE252I MEMBER VATLST01 FOUND IN SYS1.PARMLIB<br />

IEA168I VATLST01: VATLST DEFAULT USE ATTRIBUTE<br />

OF PRIVATE USED.<br />

IEA168I VATLST01: SYSTEM DEFAULT USE ATTRIBUTE<br />

OF PRIVATE USED.<br />

IEE252I MEMBER VATLST00 FOUND IN SYS1.PARMLIB<br />

• 01: VATDEF IPLUSE(PRIVATE),SYSUSE(PRIVATE)<br />

• Next expected message did not appear:<br />

*IEE252I MEMBER ALLOC00 FOUND IN SYS1.PARMLIB


Stumbling over VASTLST<br />

• Long ago in a galaxy far away, you defined all (or most or<br />

many) of your DASD volumes in VATLSTxx<br />

• VATLSTxx was named in IEASYSxx at IPL<br />

• Volser, device type, mount status if not defaulted<br />

• Eventually DFSMS made individual definitions moot<br />

• All volumes can now be mounted ’PRIVATE’<br />

• SMS decides how volumes are allocated<br />

• But we continued for decades to define all volumes<br />

• We put lots of useful DASD mgmt info in each record<br />

• RESB01,1,2,3390 N SYSDA 780E B MOD 9 2107...<br />

• Over the years VATLST00 got bigger and bigger


Stumbling over VASTLST<br />

• We had just combined VATLST00s into a single one<br />

• Multiple lists from systems in two data centers<br />

• All DASD accessible via DWDM, so why not combine<br />

• VATLST00 grew suddenly from 6K entries to 17.5K!<br />

• OA23645 (R9 FIN): WLM whacks nonresponsive mem<br />

• System took too long to process the giant VATLST00<br />

• Digesting all those entries on a z9 hit sysplex timeout<br />

• After 180 secs, system was partitioned out<br />

• A few hours later, sysplex was quieter, less noise<br />

• NIP managed to squeak through before timeout hit<br />

• Major lesson: we don’t need VATLST00 at all for IPL!<br />

• VATLST01 provides correct default for all volumes


It Takes Two to Make z Ten Go


It Takes Two to Make z Ten Go<br />

• The z10 processor is a mighty beast<br />

• It can slay dragons and munch down wolverines<br />

• Many an LPAR is hardly more than a bite sized morsel<br />

• So why assign more than one logical CPU<br />

• Doesn’t the MP effect cost than more than it’s worth<br />

• What about the old rules of logical CPU sums<br />

• Ah, beware the lure of the bargain bin uniprocessor<br />

• We booked economy seats in some lower profile LPARs<br />

• Ouch!<br />

• Several times in the past year we took sysplex hits<br />

• One member would hang: whole sysplex languished<br />

• Couldn’t logon to TSO or even SMCS console


It Takes Two to Make z Ten Go<br />

• Example of slow death of System A1<br />

• Many sequences of these two messages:<br />

IXC467I RESTARTING PATHOUT STRUCTURE<br />

IXC_CF#2_SMALL LIST 8<br />

USED TO COMMUNICATE WITH SYSTEM A1<br />

RSN: I/O APPARENTLY STALLED<br />

DIAG073: 08200208 001247CA 001247C8 0000000E<br />

001247BD<br />

IXC466I OUTBOUND SIGNAL CONNECTIVITY<br />

ESTABLISHED WITH SYSTEM A1<br />

VIA STRUCTURE IXC_CF#2_SMALL LIST 8<br />

• Problem is that XCF on A1 cannot do its work<br />

• Preempted by a looping high priority task (various)


It Takes Two to Make z Ten Go<br />

• Because A1 is not actually dead, SFM is flummoxed<br />

• Whether to partition out the stalled member<br />

• Meanwhile the whole sysplex grinds to a halt<br />

• Nothing dies but nothing really works<br />

• Even GRS cannot function, entire sysplex goes slo-mo<br />

• One workaround is a second logical CPU<br />

• While one is tied up, the other can/might continue<br />

• Not a guarantee, but it least a fighting chance to<br />

• Kill the looping task<br />

• V XCF OFF the stalled member<br />

• Either action requires entering OS command(s)<br />

• A comatose sysplex is the ultimate performance<br />

degradation


Acknowledgements Knowing and Unknowing<br />

• James Chan, IBM Global Services, Poughkeepsie<br />

• Peter Hunkeler, IBM Switzerland<br />

• Peter Relson, IBM Poughkeepsie

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!