You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Bit Bucket X’27X<br />
27’<br />
Ed Jaffe, edjaffe@phoenixsoftware.com<br />
Brian Peterson, brian_d_peterson@bluecrossmn.com<br />
Sam Knutson, SKnutson@geico.com<br />
Skip Robinson, robinsjo@sce.com<br />
SHARE 114<br />
Session 2208<br />
Seattle, WA<br />
18 March 2006
I’m JES Not That Into You
JES is a Bottleneck for z/OS Networking<br />
• Many installations force their z/OS networking to depend<br />
upon the presence of JES.<br />
• This is both unnecessary and annoying. It is an area in<br />
which I have been openly critical of IBM. z/OS needs to<br />
evolve away from being a hodgepodge of bolt-on<br />
components.<br />
• On other platforms, networking is just one of many things<br />
“baked” into the operating system.<br />
• z/OS components for consoles, serialization, workload<br />
management, recording, catalog, recovery, etc. are “baked”<br />
in. Why not networking (Answer: Probably because these<br />
components were developed outside Poughkeepsie. ☺ )<br />
• Fortunately, this is one of the easiest things to change!
“Problem” z/OS Startup Dependency Chart<br />
MASTER<br />
WLM GRS CONSOLE JES<br />
Others…<br />
VTAM<br />
TCP/IP<br />
TCAS
Some Obvious Drawbacks<br />
• Additional dependencies are never good.<br />
• JES often uses the networking components for NJE and<br />
must wait until VTAM and/or TCP/IP are available. (This<br />
results in a “chicken & egg” type of issue.)<br />
• A problem with JES startup means no network and no<br />
TSO/E!<br />
• This leaves you unable to easily fix what is most likely a<br />
trivial problem.<br />
• Of course, if you have shared DASD with another system, you can<br />
fix some things that way and re-IPL.<br />
• For JES2 only, you cannot get JES2 to come down cleanly<br />
until all address spaces started under JES2 have shut<br />
down. This means you must wait for VTAM and TCP/IP to<br />
come down before you can proceed with shutting down<br />
JES2.
The Solution Bypass JES. Run Under MSTR.<br />
• VTAM can run under MSTR. If it does, it might be ready<br />
before JES needs it for networking.<br />
• TCPIP can run under MSTR. If it does, it might be ready<br />
before JES needs it for networking.<br />
• TCAS can run under MSTR. If it does, you can LOGON to<br />
TSO/E and use ISPF even if JES is down!<br />
• The secret to running under MSTR is to remove the use of<br />
SYSOUT DDs in the JCL procedures.<br />
• Note: If there ever were restrictions about data sets in<br />
user catalogs, they were lifted long ago. You can<br />
reference data sets cataloged in user catalogs from JCL<br />
submitted with <strong>SUB=MSTR</strong>. (I verified this just a couple<br />
of hours ago! ☺ )
Running VTAM Under MSTR<br />
//VTAM PROC<br />
//VTAM EXEC PGM=ISTINM01,REGION=0M,<br />
// DPRTY=(15,15),TIME=1440,PERFORM=8<br />
//VTAMLST DD DISP=SHR,DSN=SYS1.VTAMLST<br />
//VTAMLIB DD DISP=SHR,DSN=SYS1.VTAMLIB<br />
//SISTCLIB DD DISP=SHR,DSN=SYS1.SISTCLIB<br />
//SYSABEND DD SYSOUT=*,HOLD=YES<br />
//DSDBCTRL DD DSN=SYS1.DSDBCTRL,DISP=SHR<br />
//DSDB1 DD DSN=SYS1.DSDB1,DISP=SHR<br />
//DSDB2 DD DSN=SYS1.DSDB2,DISP=SHR<br />
//TRSDB DD DSN=SYS1.TRSDB,DISP=SHR<br />
I found this VTAM procedure on a system delivered<br />
with IBM’s ADCD. Who in this world is going to bother<br />
looking at a SYSABEND if VTAM fails Use IPCS to<br />
look at the SVC dump. Just remove the statement in<br />
red!
Running TCPIP Under MSTR (Before)<br />
//TCPIP PROC PARMS='CTRACE(CTIEZB00)'<br />
//TCPIP EXEC PGM=EZBTCPIP,REGION=0M,TIME=1440,<br />
// PARM='&PARMS'<br />
//SYSPRINT DD SYSOUT=H,DCB=(RECFM=VB,LRECL=132,BLKSIZE=136)<br />
//ALGPRINT DD SYSOUT=*,DCB=(RECFM=VB,LRECL=132,BLKSIZE=136)<br />
//CFGPRINT DD SYSOUT=*,DCB=(RECFM=VB,LRECL=132,BLKSIZE=136)<br />
//SYSOUT DD SYSOUT=H,DCB=(RECFM=VB,LRECL=132,BLKSIZE=136)<br />
//CEEDUMP DD SYSOUT=*,DCB=(RECFM=VB,LRECL=132,BLKSIZE=136)<br />
//SYSERROR DD SYSOUT=*<br />
//*SYSTCPD DD DSN=TCPIP.SEZAINST(TCPDATA),DISP=SHR<br />
//PROFILE DD DISP=SHR,DSN=SYS1.TCPPARMS(PROF&SYSCLONE)<br />
I found this TCPIP procedure on a system delivered with<br />
IBM’s ADCD. There are six SYSOUT DD statements.<br />
They must be converted to use data sets.
Running TCPIP Under MSTR (After)<br />
//TCPIP PROC PARMS='CTRACE(CTIEZB00)'<br />
//TCPIP EXEC PGM=EZBTCPIP,REGION=0M,TIME=1440,<br />
// PARM='&PARMS'<br />
//SYSPRINT DD DSN=SYS2.TCPIP.&SYSNAME..SYSPRINT,<br />
// DISP=SHR,FREE=CLOSE<br />
//ALGPRINT DD DSN=SYS2.TCPIP.&SYSNAME..ALGPRINT,<br />
// DISP=SHR,FREE=CLOSE<br />
//SYSOUT DD DSN=SYS2.TCPIP.&SYSNAME..SYSOUT,<br />
// DISP=SHR,FREE=CLOSE<br />
//CEEDUMP DD DSN=SYS2.TCPIP.&SYSNAME..CEEDUMP,<br />
// DISP=SHR,FREE=CLOSE<br />
//SYSERROR DD DSN=SYS2.TCPIP.&SYSNAME..SYSERROR,<br />
// DISP=SHR,FREE=CLOSE<br />
//CFGPRINT DD DSN=SYS2.TCPIP.&SYSNAME..CFGPRINT,<br />
// DISP=SHR,FREE=CLOSE<br />
//*SYSTCPD DD DSN=TCPIP.SEZAINST(TCPDATA),DISP=SHR<br />
//PROFILE DD DISP=SHR,DSN=SYS1.TCPPARMS(PROF&SYSCLONE)
Running TN3270 Under MSTR (Before)<br />
//TN3270 PROC PARMS='CTRACE(CTIEZBTN)'<br />
//TN3270 EXEC PGM=EZBTNINI,REGION=0M,PARM='&PARMS'<br />
//SYSPRINT DD SYSOUT=*,DCB=(RECFM=VB,LRECL=132,BLKSIZE=136)<br />
//SYSOUT DD SYSOUT=*,DCB=(RECFM=VB,LRECL=132,BLKSIZE=136)<br />
//CEEDUMP DD SYSOUT=*,DCB=(RECFM=VB,LRECL=132,BLKSIZE=136)<br />
//*TNDBCSCN DD DISP=SHR,DSN=TCPIP.SEZAINST(TNDBCSCN)<br />
//*TNDBCSXL DD DISP=SHR,DSN=TCPIP.SEZAXLD2<br />
//*TNDBCSER DD SYSOUT=*<br />
//PROFILE DD DSN=ADCD.Z111.TCPPARMS(TN3270),DISP=SHR<br />
//*SYSTCPD DD ...<br />
I found this TN3270 procedure on a system delivered<br />
with IBM’s ADCD. There are three SYSOUT DD<br />
statements. They must be converted to use data sets.
Running TN3270 Under MSTR (After)<br />
//TN3270 PROC PARMS='CTRACE(CTIEZBTN)'<br />
//TN3270 EXEC PGM=EZBTNINI,REGION=0M,PARM='&PARMS'<br />
//SYSPRINT DD DSN=SYS2.TN3270.&SYSNAME..SYSPRINT,<br />
// DISP=SHR,FREE=CLOSE<br />
//SYSOUT DD DSN=SYS2.TN3270.&SYSNAME..SYSOUT,<br />
// DISP=SHR,FREE=CLOSE<br />
//CEEDUMP DD DSN=SYS2.TN3270.&SYSNAME..CEEDUMP,<br />
// DISP=SHR,FREE=CLOSE<br />
//*TNDBCSCN DD DISP=SHR,DSN=TCPIP.SEZAINST(TNDBCSCN)<br />
//*TNDBCSXL DD DISP=SHR,DSN=TCPIP.SEZAXLD2<br />
//*TNDBCSER DD SYSOUT=*<br />
//PROFILE DD DSN=ADCD.Z111.TCPPARMS(TN3270),DISP=SHR<br />
//*SYSTCPD DD ...<br />
Similar changes can be made for other TCP/IP server<br />
address spaces if needed.
Running TCAS Under MSTR<br />
//TSO PROC MBR=TSOKEY00<br />
//STEP1 EXEC PGM=IKTCAS00,TIME=1440<br />
//PARMLIB DD DSN=ADCD.Z111.PARMLIB(&MBR),<br />
// DISP=SHR,FREE=CLOSE<br />
//PRINTOUT DD SYSOUT=*,FREE=CLOSE<br />
//*<br />
I found the above TCAS procedure on a system delivered<br />
with IBM’s ADCD. The PRINTOUT DD statements must<br />
be converted to use a RECFM=FA, LRECL=133 data set.<br />
//TSO PROC MBR=TSOKEY00<br />
//STEP1 EXEC PGM=IKTCAS00,TIME=1440<br />
//PARMLIB DD DSN=ADCD.Z111.PARMLIB(&MBR),<br />
// DISP=SHR,FREE=CLOSE<br />
//PRINTOUT DD DSN=SYS2.TCAS.&SYSNAME..PRINTOUT,<br />
// DISP=SHR,FREE=CLOSE<br />
//*
Updated Start Commands For These STCs<br />
• The main thing you have to do is put <strong>SUB=MSTR</strong> on the<br />
start command. Otherwise, it will continue to start under<br />
JES.<br />
• If you want to avoid clutter on your log, you can optionally<br />
add MSGLEVEL=(0,0) to the command.<br />
START VTAM,,,(LIST=&SYSCLONE.),<strong>SUB=MSTR</strong><br />
START TCPIP,<strong>SUB=MSTR</strong><br />
START TCAS,MBR=TSOKEY&TSOKEY,<strong>SUB=MSTR</strong><br />
Or...<br />
START VTAM,,,(LIST=&SYSCLONE.),<strong>SUB=MSTR</strong>,MSGLEVEL=(0,0)<br />
START TCPIP,<strong>SUB=MSTR</strong>,MSGLEVEL=(0,0)<br />
START TCAS,MBR=TSOKEY&TSOKEY,<strong>SUB=MSTR</strong>,MSGLEVEL=(0,0)
Revised z/OS Startup Dependency Chart<br />
MASTER<br />
WLM GRS CONSOLE JES VTAM TCPIP TCAS Others…
TSO/E LOGON To Other Than Primary Subsystem<br />
• By default, TSO/E LOGON always goes to the primary<br />
subsystem—even if TCAS itself is running with<br />
<strong>SUB=MSTR</strong>.<br />
• A LOGON pre-prompt exit (IKJEFLD1) allows users to<br />
specify under which subsystem they wish to LOGON.<br />
• There are several similar exits “floating” around. They’ve<br />
been used by clever sysprogs for decades to allow LOGON<br />
to JES2 running as a secondary subsystem (aka Poly-JES).<br />
• I inherited one from the folks at IBM Global Services<br />
back in the 1990s before everything was outsourced to<br />
Brazil.<br />
• [Aside: Those POK sysprogs were among the best I’ve ever known!]<br />
• The exit is available from<br />
http://cbttape.org/ftp/cbt/CBT377.zip
TSO/E LOGON To Other Than Primary Subsystem<br />
• This pre-prompt exit is activated only when the userid<br />
passed as response to the LOGON prompt is prefixed with<br />
“”.<br />
• You are prompted to enter your desired subsystem name.<br />
• After that the LOGON proceeds as normal.
Tricks For Successful LOGON Under MSTR (Part 1)<br />
• For a LOGON under MSTR to work, the TSO/E LOGON<br />
processor must specify JSTCB=YES on the ATTACH<br />
macro. Otherwise, you will experience abend 0B5 (unable<br />
to attach converter) resulting in a couple of SVC dumps<br />
per attempt.<br />
• The following ZAP changes the ATTACH as needed:<br />
NAME IKJEFLA1 IKJEFLB<br />
VER 0924 0000009D,C9D2D1C5C6D3C340<br />
REP 0924 0000009F,C9D2D1C5C6D3C340<br />
• This ZAP has been working without change in offsets since<br />
day one. If the ATTACH macro parameter list ever moves,<br />
a rework of the ZAP should be trivial.<br />
• I most recently installed this ZAP under z/OS 1.12 and all<br />
works as expected.
Tricks For Successful LOGON Under MSTR (Part 2)<br />
• Another annoyance if you LOGON under MSTR is the<br />
appearance of IKJ56457I PROGRAM ERROR and SVC<br />
dump at logoff time.<br />
• This is due to one of the most common occurrences on a<br />
z/OS system—the 33E abend. I’m sure you see these all<br />
the time on your log:<br />
IEA989I SLIP TRAP ID=X33E MATCHED. JOBNAME=DFHSM60 , ASID=0089.<br />
• Because of the TCB structure when logging on under<br />
MSTR, the 33E is seen by TSO/E code and results in the<br />
error message and SVC dump.<br />
• A simple IKJEFLD2 exit (also available in the same CBT<br />
tape package) converts the return code in the JOB<br />
SCHEDULING EXIT LIST from x’24’ (meaning x’33E’<br />
abend occurred) to x’00’ to avoid the message.
What Works And What Doesn’t Under MSTR<br />
• Everything that doesn’t require JES works.<br />
• You have full ISPF editing and dialog support, catalog,<br />
access methods, everything you could possibly want—even<br />
OMVS and Java.<br />
• If you have the right SPOOL Browse software, you might<br />
be able to use it to access information from your primary<br />
or secondary JES even while logged on under MSTR.<br />
• You cannot submit jobs or print anything from your<br />
TSO/E session. (That would require JES.)
Trying To Submit While Logged On Under MSTR
Decrypting z/OS Unix<br />
Crypticisms
BPXMTEXT Gives Quick and Useful z/OS UNIX Help<br />
• BPXMTEXT displays the description and action text for a reason<br />
code returned from the z/OS UNIX kernel, errnojr values<br />
returned from the C/C++ run-time library, and TCP/IP errno values.<br />
• For zFS reason codes (EFxxnnnn), the xx part of the reason code is<br />
not used to display the module name. (It always displays zFS.)<br />
Therefore, you can use EF00nnnn for zFS reason codes.<br />
• BPXMTEXT internally invokes another REXX called /bin/edcmtext<br />
• EDCMTEXT:<br />
• Validates the reason code parameter<br />
• Calls module EDCEJR via LINKMVS<br />
• Parses and displays the output in a useful format<br />
• BPXMTEXT accepts a single argument: the cryptic reason code. If<br />
you pass no parameters, it tells you what it wants.<br />
• Recommendation: Be sure SYS1.BPXEXEC is on your default<br />
SYSEXEC or SYSPROC concatenation under TSO/E!
BPXMTEXT Gives Quick and Useful z/OS UNIX Help
Does Anybody Really Know<br />
What Time It Is
Sysplex Time<br />
• In the beginning (of Sysplex), there was a device called<br />
“Sysplex Timer”<br />
• Dedicated appliance<br />
• In our experience, rock solid<br />
• Provided the time, always correct<br />
• Became too old, replaced by Server Time Protocol (STP)<br />
−<br />
Lots of great reasons and advantages
STP in a Single Box<br />
• STP originally designed to coordinate time amongst<br />
multiple boxes<br />
• At the same time as STP was developed, CPU boxes<br />
became bigger and bigger<br />
• We implemented STP at the same time as we went from<br />
three boxes (z800/z900s) to one box (z10)
STP and POR<br />
• STP basically “runs” on the System Assist Processor<br />
(SAP) engines<br />
• SAP engines don't run during POR<br />
• What happens during/after POR
STP History<br />
• When STP first came out, STP configurations were lost<br />
during POR<br />
• Because of this, the sysprog had to redefine the<br />
configurations<br />
• STP enhanced to save the configurations across POR
The Sad Story<br />
• We did a planned POR - successful<br />
• After the POR, we performed an Activate of the machine<br />
-successful<br />
• After the Activate, we IPLed each LPAR – successful<br />
• After 2.5 hours, we noticed that STP was 630 seconds<br />
FAST - OOPS!<br />
• A PMH has been opened with IBM. Presumably the<br />
problem will be understood soon.
Lessons Learned<br />
• We don't know why this happened<br />
• Single box STP timing networks seem to have been an<br />
afterthought over the years<br />
• After every POR or ACTIVATE, check the STP time and<br />
make sure it's correct<br />
• ESPECIALLY (maybe only) in a single box STP timing<br />
network<br />
• If you encounter anything in your hardware that looks<br />
wrong, open a PMH with IBM as soon as possible.<br />
• You can always close it later if it turns out to be user error.
TO LIVE AND DIE IN LLA
TO LIVE AND DIE IN LLA<br />
• S LLA,<strong>SUB=MSTR</strong>,REUSASID=YES,LLA=00 is the “correct” way to<br />
start LLA at our shop<br />
• ASID reuse was introduced in z/OS 1.9 and we enable in DIAGxx on<br />
some LPARs i.e. REUSASID(YES)<br />
• When a reusable ASID is requested by the START command or the<br />
ASCRE macro, this reusable ASID is assigned if REUSASID(YES)<br />
is specified in DIAGxx. If REUSASID(NO) is specified in DIAGxx,<br />
an ordinary ASID is assigned. The default is REUSASID(NO). The<br />
use of reusable ASIDs might result in system 0D3 abends, if<br />
products or programs have not been upgraded to tolerate reusable<br />
ASIDs. For more information about reusable ASIDs, see z/OS<br />
MVS Programming: Extended Addressability Guide.<br />
Omit the LLA=00 and you find that you only manage link list<br />
• Omit <strong>SUB=MSTR</strong> and from z/OS 1.9 LLA will restart itself and tell<br />
you to include it in the future i.e. CSV209I LIBRARY LOOKASIDE<br />
START WILL BE RETRIED, ADDING "<strong>SUB=MSTR</strong>" WHICH IS<br />
REQUIRED ON THE START LLA COMMAND
TO LIVE AND DIE IN LLA<br />
/* CSVLLA00 FOR TESTPLX */<br />
/*******************************/<br />
/* LLA EXITS GEICO USES CSVLLIX1 TO SUPPORT IBM MODULE FETCH MONITOR */<br />
/* AND OUR OWN LOCAL CODE WHICH MONITORS */<br />
/*******************************/<br />
EXIT1(ON)<br />
EXIT2(OFF)<br />
/******************************/<br />
/* LINK LIST */<br />
/******************************/<br />
LIBRARIES(-LNKLST-)<br />
FREEZE(-LNKLST-)<br />
/******************************/<br />
/* OTHER LIBRARIES WE MANAGE FOR PERFORMANCE */<br />
/******************************/<br />
LIBRARIES(SYS2.IMS.RESLIB,PROD2.IMS1.LOAD)<br />
FREEZE(SYS2.IMS.RESLIB,PROD2.IMS1.LOAD)
TO LIVE AND DIE IN LLA<br />
• CSVLLA00 is not used by default and is not shipped by<br />
IBM<br />
• We normally start LLA in SYS1.PARMLIB(IEACMD00) and<br />
never stop it but if it is stopped and restarted an error<br />
can be made<br />
• One simple change you can do to prevent this is update<br />
the PROC to default to LLA=00<br />
• We manage many libraries beyond –LNKLST- with FREEZE<br />
for performance and need to insure that this is not<br />
discontinued unintentionally
TO LIVE AND DIE IN LLA<br />
• SYS1.IBM.PROCLIB(LLA)<br />
//LLA PROC LLA=<br />
//LLA EXEC PGM=CSVLLCRE,REGION=0M,PARM='LLA=&LLA'<br />
• SYS1.PROCLIB(LLA) modified<br />
//LLA PROC LLA=00<br />
//LLA EXEC PGM=CSVLLCRE,REGION=0M,PARM='LLA=&LLA'<br />
• REGION=0M was added in z/OS 1.9 resolved a common<br />
problem previously discussed by IBM at SHARE of potentially<br />
running out of below the line storage in LLA address space
TO LIVE AND DIE IN LLA<br />
• SMF can only be used to research programs run by PGM= in JCL<br />
• Data Set Audit Facility (DAF) freeware is one tool that can be used<br />
to report on PGM= use from SMF without writing your own SAS<br />
reports or program.<br />
• DAF reads standard IBM SMF records and generates detailed<br />
dataset audit trail reports based upon user supplied selection criteria<br />
• http://sites.google.com/site/michaeljosephcleary<br />
• IBM free tool Module Fetch Monitor (MFM) can be used to<br />
understand program fetch activity<br />
• MFM has a program that executes and collects data. The data can be<br />
viewed from an ISPF dialog application. In addition, there is a batch<br />
interface.<br />
• It is real time data collection. When the program is stopped, the data<br />
is lost. There is no historical data collection interface. The data can<br />
be written to two logs files you could archive them.
TO LIVE AND DIE IN LLA<br />
• MFM is a non-warranty program supported as time permits by<br />
IBM but is used by some very large sites with some good<br />
success stories.<br />
• If you just want to do some spot checking or investigate some<br />
things and have a good systems programmer to work with then<br />
MFM and native IBM SMF data may fill your needs without<br />
spending money for a tool like SoftAudit nee Tivoli License<br />
Compliance Manager for z/OS<br />
• Session 2876 Module Fetch Monitor (MFM) User Experience by<br />
Greg Thompson at SHARE 96 in Long Beach, CA February, 2001<br />
is a good introduction<br />
• Appendix B Optimizing use of LLA and VLF Redbook System z<br />
Mean Time to Recovery Best Practices<br />
• You can get a copy of MFM by sending an e-mail to Peter Relson<br />
relson@us.ibm.com at IBM and signing an agreement<br />
• MFM provides the names of modules used within an MVS system<br />
when called through the MVS Contents Supervisor, and<br />
integrates the information coming from LLA, but only if the<br />
library is under LLA control
TO LIVE AND DIE IN LLA<br />
• Undocumented but very useful D LLA,STATS command is also handy<br />
to help determine if a library is being accessed<br />
LIBRARY: SYSOP.TSO.DATAUTIL.LINKLIB<br />
MEMBERS: 30<br />
MEMBERS FETCHED: 12 MEMBERS IN VLF: 0<br />
DASD FETCHES: 296 VLF RETRIEVES: 808<br />
• This "undocumented" command is documented in the Redbook<br />
Partitioned Data Set Extended Usage Guide (SG24-6106)<br />
• The command is a cheap way to get an indication of usage if you don’t<br />
need any history just a point in time answer<br />
• Detail data is also available from D LLA,STATS using the LIBRARY<br />
and MEMBER keywords<br />
• D LLA,STATS,LIBRARY=yourdsn,MEMBER=member,FETCHED
TO LIVE AND DIE IN LLA<br />
• D LLA,STATS,LIBRARY=mydsn,MEMBER=COPY,FETCHED<br />
CSV630I 12.00.12 LLA STATS DISPLAY 954<br />
LIBRARY: SYSOP.TSO.DATAUTIL.LINKLIB<br />
MEMBERS: 1 MEMBERS FETCHED: 1 MEMBERS IN VLF: 0<br />
TOTAL DASD FETCHES: 47 TOTAL VLF RETRIEVES: 200<br />
MEMBER: COPY<br />
DASD FETCHES: 47 VLF RETRIEVES: 200<br />
AVERAGE: 123.0465 LLA VALUE: 270300<br />
• Wildcards are accepted!<br />
• D LLA,STATS,LIBRARY=mydsn,MEMBER=*,FETCHED
TO LIVE AND DIE IN LLA<br />
• A CSVLLIX1 exit is included as a data gathering point<br />
• I sometimes add in a few lines of code to issue a WTO message<br />
when a particular module or library is accessed placing<br />
additional libraries under LLA control if needed.<br />
USING CSVLLIX1,R8 Establish R8 as code register.<br />
USING LLP1,R1 Addressability to LLP1.<br />
CLC LLP1PDS2(8),=C'CACEMCKI' CA-OPTIMIZER stub<br />
BNE NOT_IT Not the module we care about<br />
LR R9,R1 save that across WTO<br />
WTO 'CACEMCKI CA-OPTIMIZER USED BY THIS JOB', X<br />
ROUTCDE=(11)<br />
LR R1,R9 restore LLP1<br />
NOT_IT EQU *
TO LIVE AND DIE IN LLA<br />
• Fetch activity occurs throughout z/OS and errors in CSVLLIX1 can<br />
lead to an outage<br />
• A loop from an incorrectly repeated and not updated branch caused<br />
an outage on the systems programmer sandbox where it was tested<br />
NOT_APL2 EQU *<br />
CLC LLP1DSN(25),=C'SYS3.TECHASST.DCF.LINKLIB'<br />
BNE NOT_DCF Not the library we care about<br />
WTO 'GEI$DCF TECHASST.DCF.LINKLIB USED BY THIS JOB ',<br />
ROUTCDE=(11),MCSFLAG=(HRDCPY)<br />
B NOT_IT Done<br />
NOT_DCF EQU *<br />
CLC LLP1DSN(25),=C'SYS3.TECHASST.DLF.LINKLIB'<br />
BNE NOT_DCF<br />
Not the library we care about
TO LIVE AND DIE IN LLA<br />
• A bad CSVLLIX1 exit in the link list<br />
• START for LLA in PARMLIB(COMMNDxx) so LLA start cannot be easily<br />
bypassed<br />
• Normal systems to be IPLed now very “broken”<br />
• For emergency recovery capabilities, it is recommended that every<br />
installation have a small isolated “Get-Well” system to help in situations of<br />
finger checks or corrupted shared system data sets. IBM Hot Topics #7<br />
• IBM does not tell you how to build one of these<br />
• IBM does not supply by default a starter system although one can be<br />
ordered with ServerPac and other offerings<br />
• I used my one pack system to remove the exit and restarted<br />
• I moved the start for LLA and VLF to my automation package from<br />
IEACMD00 so an error in CSVLLIX1 would not be so painful<br />
• Where do you get a Resurrection System
TO LIVE AND DIE IN LLA<br />
• Mark Zelden’s ONEPAKnn & TWOPAKnn documentation<br />
and jobs<br />
• Good examples of building a system from scratch.<br />
Many sites use this for local recovery or as part of<br />
a DR process<br />
• Download at Mark’s web site or CBT file #434<br />
http://home.flash.net/~mzelden/mvsutil.html or<br />
http://www.cbttape.org<br />
• ZZSA standalone environment (freeware) another<br />
alternative http://www.cbttape.org/~jjaeger/<br />
• Serverpac "Full System Replacement“ packs can be saved<br />
and used as a rescue system<br />
• Commercial products like SAE from New Era
TO LIVE AND DIE IN LLA<br />
• Recovery preparation from less dire errors is also useful.<br />
Other tools you may want to have in place<br />
• LOGON PROC with no datasets to reach TSO READY<br />
quickly when trouble strikes ($RESCUE)<br />
• LOGON PROC with only IBM vanilla ISPF ($IBMISPF)<br />
• TSO to LOGON under MSTR without JES using Ed<br />
Jaffe’s CBT Tape File 377<br />
• FTP can be used in a pinch to update PDS members, data<br />
sets, submit jobs, and view output<br />
• RPF by Ron Prins CBT Tape File 415 & 417 RPF/E
Stumbling Over VASTLST
Stumbling over VASTLST<br />
• A month ago we performed scheduled rolling IPLs<br />
• All four sysplex members were reIPLed at z/OS 1.9<br />
• At all times at least one member was running<br />
• Three members live on a z10, one on a z9<br />
• The last one (on z9) would not IPL<br />
• Got WAIT064-09 on several tries<br />
• Which means: program check during NIP<br />
• We had made no z/OS changes, not even PTFs<br />
• All members share sysres, PARMLIB, etc.<br />
• Standalone dump sent to IBM for Sev 1 PMR<br />
• Two hours later another IPL was attempted<br />
• This one worked even though we had changed nothing
Stumbling over VASTLST<br />
• Meanwhile I looked at standalone dump myself<br />
• MTRACE showed these final messages:<br />
IEE252I MEMBER VATLST01 FOUND IN SYS1.PARMLIB<br />
IEA168I VATLST01: VATLST DEFAULT USE ATTRIBUTE<br />
OF PRIVATE USED.<br />
IEA168I VATLST01: SYSTEM DEFAULT USE ATTRIBUTE<br />
OF PRIVATE USED.<br />
IEE252I MEMBER VATLST00 FOUND IN SYS1.PARMLIB<br />
• 01: VATDEF IPLUSE(PRIVATE),SYSUSE(PRIVATE)<br />
• Next expected message did not appear:<br />
*IEE252I MEMBER ALLOC00 FOUND IN SYS1.PARMLIB
Stumbling over VASTLST<br />
• Long ago in a galaxy far away, you defined all (or most or<br />
many) of your DASD volumes in VATLSTxx<br />
• VATLSTxx was named in IEASYSxx at IPL<br />
• Volser, device type, mount status if not defaulted<br />
• Eventually DFSMS made individual definitions moot<br />
• All volumes can now be mounted ’PRIVATE’<br />
• SMS decides how volumes are allocated<br />
• But we continued for decades to define all volumes<br />
• We put lots of useful DASD mgmt info in each record<br />
• RESB01,1,2,3390 N SYSDA 780E B MOD 9 2107...<br />
• Over the years VATLST00 got bigger and bigger
Stumbling over VASTLST<br />
• We had just combined VATLST00s into a single one<br />
• Multiple lists from systems in two data centers<br />
• All DASD accessible via DWDM, so why not combine<br />
• VATLST00 grew suddenly from 6K entries to 17.5K!<br />
• OA23645 (R9 FIN): WLM whacks nonresponsive mem<br />
• System took too long to process the giant VATLST00<br />
• Digesting all those entries on a z9 hit sysplex timeout<br />
• After 180 secs, system was partitioned out<br />
• A few hours later, sysplex was quieter, less noise<br />
• NIP managed to squeak through before timeout hit<br />
• Major lesson: we don’t need VATLST00 at all for IPL!<br />
• VATLST01 provides correct default for all volumes
It Takes Two to Make z Ten Go
It Takes Two to Make z Ten Go<br />
• The z10 processor is a mighty beast<br />
• It can slay dragons and munch down wolverines<br />
• Many an LPAR is hardly more than a bite sized morsel<br />
• So why assign more than one logical CPU<br />
• Doesn’t the MP effect cost than more than it’s worth<br />
• What about the old rules of logical CPU sums<br />
• Ah, beware the lure of the bargain bin uniprocessor<br />
• We booked economy seats in some lower profile LPARs<br />
• Ouch!<br />
• Several times in the past year we took sysplex hits<br />
• One member would hang: whole sysplex languished<br />
• Couldn’t logon to TSO or even SMCS console
It Takes Two to Make z Ten Go<br />
• Example of slow death of System A1<br />
• Many sequences of these two messages:<br />
IXC467I RESTARTING PATHOUT STRUCTURE<br />
IXC_CF#2_SMALL LIST 8<br />
USED TO COMMUNICATE WITH SYSTEM A1<br />
RSN: I/O APPARENTLY STALLED<br />
DIAG073: 08200208 001247CA 001247C8 0000000E<br />
001247BD<br />
IXC466I OUTBOUND SIGNAL CONNECTIVITY<br />
ESTABLISHED WITH SYSTEM A1<br />
VIA STRUCTURE IXC_CF#2_SMALL LIST 8<br />
• Problem is that XCF on A1 cannot do its work<br />
• Preempted by a looping high priority task (various)
It Takes Two to Make z Ten Go<br />
• Because A1 is not actually dead, SFM is flummoxed<br />
• Whether to partition out the stalled member<br />
• Meanwhile the whole sysplex grinds to a halt<br />
• Nothing dies but nothing really works<br />
• Even GRS cannot function, entire sysplex goes slo-mo<br />
• One workaround is a second logical CPU<br />
• While one is tied up, the other can/might continue<br />
• Not a guarantee, but it least a fighting chance to<br />
• Kill the looping task<br />
• V XCF OFF the stalled member<br />
• Either action requires entering OS command(s)<br />
• A comatose sysplex is the ultimate performance<br />
degradation
Acknowledgements Knowing and Unknowing<br />
• James Chan, IBM Global Services, Poughkeepsie<br />
• Peter Hunkeler, IBM Switzerland<br />
• Peter Relson, IBM Poughkeepsie