SUB=MSTR

Bit Bucket X’27X 

27’ 

Ed Jaffe, edjaffe@phoenixsoftware.com 

Brian Peterson, brian_d_peterson@bluecrossmn.com 

Sam Knutson, SKnutson@geico.com 

Skip Robinson, robinsjo@sce.com 

SHARE 114 

Session 2208 

Seattle, WA 

18 March 2006

I’m JES Not That Into You

JES is a Bottleneck for z/OS Networking 

• Many installations force their z/OS networking to depend 

upon the presence of JES. 

• This is both unnecessary and annoying. It is an area in 

which I have been openly critical of IBM. z/OS needs to 

evolve away from being a hodgepodge of bolt-on 

components. 

• On other platforms, networking is just one of many things 

“baked” into the operating system. 

• z/OS components for consoles, serialization, workload 

management, recording, catalog, recovery, etc. are “baked” 

in. Why not networking (Answer: Probably because these 

components were developed outside Poughkeepsie. ☺ ) 

• Fortunately, this is one of the easiest things to change!

“Problem” z/OS Startup Dependency Chart 

MASTER 

WLM GRS CONSOLE JES 

Others… 

VTAM 

TCP/IP 

TCAS

Some Obvious Drawbacks 

• Additional dependencies are never good. 

• JES often uses the networking components for NJE and 

must wait until VTAM and/or TCP/IP are available. (This 

results in a “chicken & egg” type of issue.) 

• A problem with JES startup means no network and no 

TSO/E! 

• This leaves you unable to easily fix what is most likely a 

trivial problem. 

• Of course, if you have shared DASD with another system, you can 

fix some things that way and re-IPL. 

• For JES2 only, you cannot get JES2 to come down cleanly 

until all address spaces started under JES2 have shut 

down. This means you must wait for VTAM and TCP/IP to 

come down before you can proceed with shutting down 

JES2.

The Solution Bypass JES. Run Under MSTR. 

• VTAM can run under MSTR. If it does, it might be ready 

before JES needs it for networking. 

• TCPIP can run under MSTR. If it does, it might be ready 

before JES needs it for networking. 

• TCAS can run under MSTR. If it does, you can LOGON to 

TSO/E and use ISPF even if JES is down! 

• The secret to running under MSTR is to remove the use of 

SYSOUT DDs in the JCL procedures. 

• Note: If there ever were restrictions about data sets in 

user catalogs, they were lifted long ago. You can 

reference data sets cataloged in user catalogs from JCL 

submitted with SUB=MSTR. (I verified this just a couple 

of hours ago! ☺ )

Running VTAM Under MSTR 

//VTAM PROC 

//VTAM EXEC PGM=ISTINM01,REGION=0M, 

// DPRTY=(15,15),TIME=1440,PERFORM=8 

//VTAMLST DD DISP=SHR,DSN=SYS1.VTAMLST 

//VTAMLIB DD DISP=SHR,DSN=SYS1.VTAMLIB 

//SISTCLIB DD DISP=SHR,DSN=SYS1.SISTCLIB 

//SYSABEND DD SYSOUT=*,HOLD=YES 

//DSDBCTRL DD DSN=SYS1.DSDBCTRL,DISP=SHR 

//DSDB1 DD DSN=SYS1.DSDB1,DISP=SHR 

//DSDB2 DD DSN=SYS1.DSDB2,DISP=SHR 

//TRSDB DD DSN=SYS1.TRSDB,DISP=SHR 

I found this VTAM procedure on a system delivered 

with IBM’s ADCD. Who in this world is going to bother 

looking at a SYSABEND if VTAM fails Use IPCS to 

look at the SVC dump. Just remove the statement in 

red!

Running TCPIP Under MSTR (Before) 

//TCPIP PROC PARMS='CTRACE(CTIEZB00)' 

//TCPIP EXEC PGM=EZBTCPIP,REGION=0M,TIME=1440, 

// PARM='&PARMS' 

//SYSPRINT DD SYSOUT=H,DCB=(RECFM=VB,LRECL=132,BLKSIZE=136) 

//ALGPRINT DD SYSOUT=*,DCB=(RECFM=VB,LRECL=132,BLKSIZE=136) 

//CFGPRINT DD SYSOUT=*,DCB=(RECFM=VB,LRECL=132,BLKSIZE=136) 

//SYSOUT DD SYSOUT=H,DCB=(RECFM=VB,LRECL=132,BLKSIZE=136) 

//CEEDUMP DD SYSOUT=*,DCB=(RECFM=VB,LRECL=132,BLKSIZE=136) 

//SYSERROR DD SYSOUT=* 

//*SYSTCPD DD DSN=TCPIP.SEZAINST(TCPDATA),DISP=SHR 

//PROFILE DD DISP=SHR,DSN=SYS1.TCPPARMS(PROF&SYSCLONE) 

I found this TCPIP procedure on a system delivered with 

IBM’s ADCD. There are six SYSOUT DD statements. 

They must be converted to use data sets.

Running TCPIP Under MSTR (After) 

//TCPIP PROC PARMS='CTRACE(CTIEZB00)' 

//TCPIP EXEC PGM=EZBTCPIP,REGION=0M,TIME=1440, 

// PARM='&PARMS' 

//SYSPRINT DD DSN=SYS2.TCPIP.&SYSNAME..SYSPRINT, 

// DISP=SHR,FREE=CLOSE 

//ALGPRINT DD DSN=SYS2.TCPIP.&SYSNAME..ALGPRINT, 


//SYSOUT DD DSN=SYS2.TCPIP.&SYSNAME..SYSOUT, 


//CEEDUMP DD DSN=SYS2.TCPIP.&SYSNAME..CEEDUMP, 


//SYSERROR DD DSN=SYS2.TCPIP.&SYSNAME..SYSERROR, 


//CFGPRINT DD DSN=SYS2.TCPIP.&SYSNAME..CFGPRINT, 


//*SYSTCPD DD DSN=TCPIP.SEZAINST(TCPDATA),DISP=SHR 

//PROFILE DD DISP=SHR,DSN=SYS1.TCPPARMS(PROF&SYSCLONE)

Running TN3270 Under MSTR (Before) 

//TN3270 PROC PARMS='CTRACE(CTIEZBTN)' 

//TN3270 EXEC PGM=EZBTNINI,REGION=0M,PARM='&PARMS' 

//SYSPRINT DD SYSOUT=*,DCB=(RECFM=VB,LRECL=132,BLKSIZE=136) 

//SYSOUT DD SYSOUT=*,DCB=(RECFM=VB,LRECL=132,BLKSIZE=136) 

//CEEDUMP DD SYSOUT=*,DCB=(RECFM=VB,LRECL=132,BLKSIZE=136) 

//*TNDBCSCN DD DISP=SHR,DSN=TCPIP.SEZAINST(TNDBCSCN) 

//*TNDBCSXL DD DISP=SHR,DSN=TCPIP.SEZAXLD2 

//*TNDBCSER DD SYSOUT=* 

//PROFILE DD DSN=ADCD.Z111.TCPPARMS(TN3270),DISP=SHR 

//*SYSTCPD DD ... 

I found this TN3270 procedure on a system delivered 

with IBM’s ADCD. There are three SYSOUT DD 

statements. They must be converted to use data sets.

Running TN3270 Under MSTR (After) 

//TN3270 PROC PARMS='CTRACE(CTIEZBTN)' 

//TN3270 EXEC PGM=EZBTNINI,REGION=0M,PARM='&PARMS' 

//SYSPRINT DD DSN=SYS2.TN3270.&SYSNAME..SYSPRINT, 


//SYSOUT DD DSN=SYS2.TN3270.&SYSNAME..SYSOUT, 


//CEEDUMP DD DSN=SYS2.TN3270.&SYSNAME..CEEDUMP, 


//*TNDBCSCN DD DISP=SHR,DSN=TCPIP.SEZAINST(TNDBCSCN) 

//*TNDBCSXL DD DISP=SHR,DSN=TCPIP.SEZAXLD2 

//*TNDBCSER DD SYSOUT=* 

//PROFILE DD DSN=ADCD.Z111.TCPPARMS(TN3270),DISP=SHR 

//*SYSTCPD DD ... 

Similar changes can be made for other TCP/IP server 

address spaces if needed.

Running TCAS Under MSTR 

//TSO PROC MBR=TSOKEY00 

//STEP1 EXEC PGM=IKTCAS00,TIME=1440 

//PARMLIB DD DSN=ADCD.Z111.PARMLIB(&MBR), 


//PRINTOUT DD SYSOUT=*,FREE=CLOSE 

//* 

I found the above TCAS procedure on a system delivered 

with IBM’s ADCD. The PRINTOUT DD statements must 

be converted to use a RECFM=FA, LRECL=133 data set. 

//TSO PROC MBR=TSOKEY00 

//STEP1 EXEC PGM=IKTCAS00,TIME=1440 

//PARMLIB DD DSN=ADCD.Z111.PARMLIB(&MBR), 


//PRINTOUT DD DSN=SYS2.TCAS.&SYSNAME..PRINTOUT, 


//*

Updated Start Commands For These STCs 

• The main thing you have to do is put SUB=MSTR on the 

start command. Otherwise, it will continue to start under 

JES. 

• If you want to avoid clutter on your log, you can optionally 

add MSGLEVEL=(0,0) to the command. 

START VTAM,,,(LIST=&SYSCLONE.),SUB=MSTR 

START TCPIP,SUB=MSTR 

START TCAS,MBR=TSOKEY&TSOKEY,SUB=MSTR 

Or... 

START VTAM,,,(LIST=&SYSCLONE.),SUB=MSTR,MSGLEVEL=(0,0) 

START TCPIP,SUB=MSTR,MSGLEVEL=(0,0) 

START TCAS,MBR=TSOKEY&TSOKEY,SUB=MSTR,MSGLEVEL=(0,0)

Revised z/OS Startup Dependency Chart 

MASTER 

WLM GRS CONSOLE JES VTAM TCPIP TCAS Others…

TSO/E LOGON To Other Than Primary Subsystem 

• By default, TSO/E LOGON always goes to the primary 

subsystem—even if TCAS itself is running with 

SUB=MSTR. 

• A LOGON pre-prompt exit (IKJEFLD1) allows users to 

specify under which subsystem they wish to LOGON. 

• There are several similar exits “floating” around. They’ve 

been used by clever sysprogs for decades to allow LOGON 

to JES2 running as a secondary subsystem (aka Poly-JES). 

• I inherited one from the folks at IBM Global Services 

back in the 1990s before everything was outsourced to 

Brazil. 

• [Aside: Those POK sysprogs were among the best I’ve ever known!] 

• The exit is available from 

http://cbttape.org/ftp/cbt/CBT377.zip

TSO/E LOGON To Other Than Primary Subsystem 

• This pre-prompt exit is activated only when the userid 

passed as response to the LOGON prompt is prefixed with 

“”. 

• You are prompted to enter your desired subsystem name. 

• After that the LOGON proceeds as normal.

Tricks For Successful LOGON Under MSTR (Part 1) 

• For a LOGON under MSTR to work, the TSO/E LOGON 

processor must specify JSTCB=YES on the ATTACH 

macro. Otherwise, you will experience abend 0B5 (unable 

to attach converter) resulting in a couple of SVC dumps 

per attempt. 

• The following ZAP changes the ATTACH as needed: 

NAME IKJEFLA1 IKJEFLB 

VER 0924 0000009D,C9D2D1C5C6D3C340 

REP 0924 0000009F,C9D2D1C5C6D3C340 

• This ZAP has been working without change in offsets since 

day one. If the ATTACH macro parameter list ever moves, 

a rework of the ZAP should be trivial. 

• I most recently installed this ZAP under z/OS 1.12 and all 

works as expected.

Tricks For Successful LOGON Under MSTR (Part 2) 

• Another annoyance if you LOGON under MSTR is the 

appearance of IKJ56457I PROGRAM ERROR and SVC 

dump at logoff time. 

• This is due to one of the most common occurrences on a 

z/OS system—the 33E abend. I’m sure you see these all 

the time on your log: 

IEA989I SLIP TRAP ID=X33E MATCHED. JOBNAME=DFHSM60 , ASID=0089. 

• Because of the TCB structure when logging on under 

MSTR, the 33E is seen by TSO/E code and results in the 

error message and SVC dump. 

• A simple IKJEFLD2 exit (also available in the same CBT 

tape package) converts the return code in the JOB 

SCHEDULING EXIT LIST from x’24’ (meaning x’33E’ 

abend occurred) to x’00’ to avoid the message.

What Works And What Doesn’t Under MSTR 

• Everything that doesn’t require JES works. 

• You have full ISPF editing and dialog support, catalog, 

access methods, everything you could possibly want—even 

OMVS and Java. 

• If you have the right SPOOL Browse software, you might 

be able to use it to access information from your primary 

or secondary JES even while logged on under MSTR. 

• You cannot submit jobs or print anything from your 

TSO/E session. (That would require JES.)

Trying To Submit While Logged On Under MSTR

Decrypting z/OS Unix 

Crypticisms

BPXMTEXT Gives Quick and Useful z/OS UNIX Help 

• BPXMTEXT displays the description and action text for a reason 

code returned from the z/OS UNIX kernel, errnojr values 

returned from the C/C++ run-time library, and TCP/IP errno values. 

• For zFS reason codes (EFxxnnnn), the xx part of the reason code is 

not used to display the module name. (It always displays zFS.) 

Therefore, you can use EF00nnnn for zFS reason codes. 

• BPXMTEXT internally invokes another REXX called /bin/edcmtext 

• EDCMTEXT: 

• Validates the reason code parameter 

• Calls module EDCEJR via LINKMVS 

• Parses and displays the output in a useful format 

• BPXMTEXT accepts a single argument: the cryptic reason code. If 

you pass no parameters, it tells you what it wants. 

• Recommendation: Be sure SYS1.BPXEXEC is on your default 

SYSEXEC or SYSPROC concatenation under TSO/E!

BPXMTEXT Gives Quick and Useful z/OS UNIX Help

Does Anybody Really Know 

What Time It Is

Sysplex Time 

• In the beginning (of Sysplex), there was a device called 

“Sysplex Timer” 

• Dedicated appliance 

• In our experience, rock solid 

• Provided the time, always correct 

• Became too old, replaced by Server Time Protocol (STP) 

− 

Lots of great reasons and advantages

STP in a Single Box 

• STP originally designed to coordinate time amongst 

multiple boxes 

• At the same time as STP was developed, CPU boxes 

became bigger and bigger 

• We implemented STP at the same time as we went from 

three boxes (z800/z900s) to one box (z10)

STP and POR 

• STP basically “runs” on the System Assist Processor 

(SAP) engines 

• SAP engines don't run during POR 

• What happens during/after POR

STP History 

• When STP first came out, STP configurations were lost 

during POR 

• Because of this, the sysprog had to redefine the 

configurations 

• STP enhanced to save the configurations across POR

The Sad Story 

• We did a planned POR - successful 

• After the POR, we performed an Activate of the machine 

-successful 

• After the Activate, we IPLed each LPAR – successful 

• After 2.5 hours, we noticed that STP was 630 seconds 

FAST - OOPS! 

• A PMH has been opened with IBM. Presumably the 

problem will be understood soon.

Lessons Learned 

• We don't know why this happened 

• Single box STP timing networks seem to have been an 

afterthought over the years 

• After every POR or ACTIVATE, check the STP time and 

make sure it's correct 

• ESPECIALLY (maybe only) in a single box STP timing 

network 

• If you encounter anything in your hardware that looks 

wrong, open a PMH with IBM as soon as possible. 

• You can always close it later if it turns out to be user error.

TO LIVE AND DIE IN LLA

TO LIVE AND DIE IN LLA 

• S LLA,SUB=MSTR,REUSASID=YES,LLA=00 is the “correct” way to 

start LLA at our shop 

• ASID reuse was introduced in z/OS 1.9 and we enable in DIAGxx on 

some LPARs i.e. REUSASID(YES) 

• When a reusable ASID is requested by the START command or the 

ASCRE macro, this reusable ASID is assigned if REUSASID(YES) 

is specified in DIAGxx. If REUSASID(NO) is specified in DIAGxx, 

an ordinary ASID is assigned. The default is REUSASID(NO). The 

use of reusable ASIDs might result in system 0D3 abends, if 

products or programs have not been upgraded to tolerate reusable 

ASIDs. For more information about reusable ASIDs, see z/OS 

MVS Programming: Extended Addressability Guide. 

Omit the LLA=00 and you find that you only manage link list 

• Omit SUB=MSTR and from z/OS 1.9 LLA will restart itself and tell 

you to include it in the future i.e. CSV209I LIBRARY LOOKASIDE 

START WILL BE RETRIED, ADDING "SUB=MSTR" WHICH IS 

REQUIRED ON THE START LLA COMMAND


/* CSVLLA00 FOR TESTPLX */ 

/*******************************/ 

/* LLA EXITS GEICO USES CSVLLIX1 TO SUPPORT IBM MODULE FETCH MONITOR */ 

/* AND OUR OWN LOCAL CODE WHICH MONITORS */ 

/*******************************/ 

EXIT1(ON) 

EXIT2(OFF) 

/******************************/ 

/* LINK LIST */ 

/******************************/ 

LIBRARIES(-LNKLST-) 

FREEZE(-LNKLST-) 

/******************************/ 

/* OTHER LIBRARIES WE MANAGE FOR PERFORMANCE */ 

/******************************/ 

LIBRARIES(SYS2.IMS.RESLIB,PROD2.IMS1.LOAD) 

FREEZE(SYS2.IMS.RESLIB,PROD2.IMS1.LOAD)


• CSVLLA00 is not used by default and is not shipped by 

IBM 

• We normally start LLA in SYS1.PARMLIB(IEACMD00) and 

never stop it but if it is stopped and restarted an error 

can be made 

• One simple change you can do to prevent this is update 

the PROC to default to LLA=00 

• We manage many libraries beyond –LNKLST- with FREEZE 

for performance and need to insure that this is not 

discontinued unintentionally


• SYS1.IBM.PROCLIB(LLA) 

//LLA PROC LLA= 

//LLA EXEC PGM=CSVLLCRE,REGION=0M,PARM='LLA=&LLA' 

• SYS1.PROCLIB(LLA) modified 

//LLA PROC LLA=00 

//LLA EXEC PGM=CSVLLCRE,REGION=0M,PARM='LLA=&LLA' 

• REGION=0M was added in z/OS 1.9 resolved a common 

problem previously discussed by IBM at SHARE of potentially 

running out of below the line storage in LLA address space


• SMF can only be used to research programs run by PGM= in JCL 

• Data Set Audit Facility (DAF) freeware is one tool that can be used 

to report on PGM= use from SMF without writing your own SAS 

reports or program. 

• DAF reads standard IBM SMF records and generates detailed 

dataset audit trail reports based upon user supplied selection criteria 

• http://sites.google.com/site/michaeljosephcleary 

• IBM free tool Module Fetch Monitor (MFM) can be used to 

understand program fetch activity 

• MFM has a program that executes and collects data. The data can be 

viewed from an ISPF dialog application. In addition, there is a batch 

interface. 

• It is real time data collection. When the program is stopped, the data 

is lost. There is no historical data collection interface. The data can 

be written to two logs files you could archive them.


• MFM is a non-warranty program supported as time permits by 

IBM but is used by some very large sites with some good 

success stories. 

• If you just want to do some spot checking or investigate some 

things and have a good systems programmer to work with then 

MFM and native IBM SMF data may fill your needs without 

spending money for a tool like SoftAudit nee Tivoli License 

Compliance Manager for z/OS 

• Session 2876 Module Fetch Monitor (MFM) User Experience by 

Greg Thompson at SHARE 96 in Long Beach, CA February, 2001 

is a good introduction 

• Appendix B Optimizing use of LLA and VLF Redbook System z 

Mean Time to Recovery Best Practices 

• You can get a copy of MFM by sending an e-mail to Peter Relson 

relson@us.ibm.com at IBM and signing an agreement 

• MFM provides the names of modules used within an MVS system 

when called through the MVS Contents Supervisor, and 

integrates the information coming from LLA, but only if the 

library is under LLA control


• Undocumented but very useful D LLA,STATS command is also handy 

to help determine if a library is being accessed 

LIBRARY: SYSOP.TSO.DATAUTIL.LINKLIB 

MEMBERS: 30 

MEMBERS FETCHED: 12 MEMBERS IN VLF: 0 

DASD FETCHES: 296 VLF RETRIEVES: 808 

• This "undocumented" command is documented in the Redbook 

Partitioned Data Set Extended Usage Guide (SG24-6106) 

• The command is a cheap way to get an indication of usage if you don’t 

need any history just a point in time answer 

• Detail data is also available from D LLA,STATS using the LIBRARY 

and MEMBER keywords 

• D LLA,STATS,LIBRARY=yourdsn,MEMBER=member,FETCHED


• D LLA,STATS,LIBRARY=mydsn,MEMBER=COPY,FETCHED 

CSV630I 12.00.12 LLA STATS DISPLAY 954 

LIBRARY: SYSOP.TSO.DATAUTIL.LINKLIB 

MEMBERS: 1 MEMBERS FETCHED: 1 MEMBERS IN VLF: 0 

TOTAL DASD FETCHES: 47 TOTAL VLF RETRIEVES: 200 

MEMBER: COPY 

DASD FETCHES: 47 VLF RETRIEVES: 200 

AVERAGE: 123.0465 LLA VALUE: 270300 

• Wildcards are accepted! 

• D LLA,STATS,LIBRARY=mydsn,MEMBER=*,FETCHED


• A CSVLLIX1 exit is included as a data gathering point 

• I sometimes add in a few lines of code to issue a WTO message 

when a particular module or library is accessed placing 

additional libraries under LLA control if needed. 

USING CSVLLIX1,R8 Establish R8 as code register. 

USING LLP1,R1 Addressability to LLP1. 

CLC LLP1PDS2(8),=C'CACEMCKI' CA-OPTIMIZER stub 

BNE NOT_IT Not the module we care about 

LR R9,R1 save that across WTO 

WTO 'CACEMCKI CA-OPTIMIZER USED BY THIS JOB', X 

ROUTCDE=(11) 

LR R1,R9 restore LLP1 

NOT_IT EQU *


• Fetch activity occurs throughout z/OS and errors in CSVLLIX1 can 

lead to an outage 

• A loop from an incorrectly repeated and not updated branch caused 

an outage on the systems programmer sandbox where it was tested 

NOT_APL2 EQU * 

CLC LLP1DSN(25),=C'SYS3.TECHASST.DCF.LINKLIB' 

BNE NOT_DCF Not the library we care about 

WTO 'GEI$DCF TECHASST.DCF.LINKLIB USED BY THIS JOB ', 

ROUTCDE=(11),MCSFLAG=(HRDCPY) 

B NOT_IT Done 

NOT_DCF EQU * 

CLC LLP1DSN(25),=C'SYS3.TECHASST.DLF.LINKLIB' 

BNE NOT_DCF 

Not the library we care about


• A bad CSVLLIX1 exit in the link list 

• START for LLA in PARMLIB(COMMNDxx) so LLA start cannot be easily 

bypassed 

• Normal systems to be IPLed now very “broken” 

• For emergency recovery capabilities, it is recommended that every 

installation have a small isolated “Get-Well” system to help in situations of 

finger checks or corrupted shared system data sets. IBM Hot Topics #7 

• IBM does not tell you how to build one of these 

• IBM does not supply by default a starter system although one can be 

ordered with ServerPac and other offerings 

• I used my one pack system to remove the exit and restarted 

• I moved the start for LLA and VLF to my automation package from 

IEACMD00 so an error in CSVLLIX1 would not be so painful 

• Where do you get a Resurrection System


• Mark Zelden’s ONEPAKnn & TWOPAKnn documentation 

and jobs 

• Good examples of building a system from scratch. 

Many sites use this for local recovery or as part of 

a DR process 

• Download at Mark’s web site or CBT file #434 

http://home.flash.net/~mzelden/mvsutil.html or 

http://www.cbttape.org 

• ZZSA standalone environment (freeware) another 

alternative http://www.cbttape.org/~jjaeger/ 

• Serverpac "Full System Replacement“ packs can be saved 

and used as a rescue system 

• Commercial products like SAE from New Era


• Recovery preparation from less dire errors is also useful. 

Other tools you may want to have in place 

• LOGON PROC with no datasets to reach TSO READY 

quickly when trouble strikes ($RESCUE) 

• LOGON PROC with only IBM vanilla ISPF ($IBMISPF) 

• TSO to LOGON under MSTR without JES using Ed 

Jaffe’s CBT Tape File 377 

• FTP can be used in a pinch to update PDS members, data 

sets, submit jobs, and view output 

• RPF by Ron Prins CBT Tape File 415 & 417 RPF/E

Stumbling Over VASTLST

Stumbling over VASTLST 

• A month ago we performed scheduled rolling IPLs 

• All four sysplex members were reIPLed at z/OS 1.9 

• At all times at least one member was running 

• Three members live on a z10, one on a z9 

• The last one (on z9) would not IPL 

• Got WAIT064-09 on several tries 

• Which means: program check during NIP 

• We had made no z/OS changes, not even PTFs 

• All members share sysres, PARMLIB, etc. 

• Standalone dump sent to IBM for Sev 1 PMR 

• Two hours later another IPL was attempted 

• This one worked even though we had changed nothing


• Meanwhile I looked at standalone dump myself 

• MTRACE showed these final messages: 

IEE252I MEMBER VATLST01 FOUND IN SYS1.PARMLIB 

IEA168I VATLST01: VATLST DEFAULT USE ATTRIBUTE 

OF PRIVATE USED. 

IEA168I VATLST01: SYSTEM DEFAULT USE ATTRIBUTE 

OF PRIVATE USED. 

IEE252I MEMBER VATLST00 FOUND IN SYS1.PARMLIB 

• 01: VATDEF IPLUSE(PRIVATE),SYSUSE(PRIVATE) 

• Next expected message did not appear: 

*IEE252I MEMBER ALLOC00 FOUND IN SYS1.PARMLIB


• Long ago in a galaxy far away, you defined all (or most or 

many) of your DASD volumes in VATLSTxx 

• VATLSTxx was named in IEASYSxx at IPL 

• Volser, device type, mount status if not defaulted 

• Eventually DFSMS made individual definitions moot 

• All volumes can now be mounted ’PRIVATE’ 

• SMS decides how volumes are allocated 

• But we continued for decades to define all volumes 

• We put lots of useful DASD mgmt info in each record 

• RESB01,1,2,3390 N SYSDA 780E B MOD 9 2107... 

• Over the years VATLST00 got bigger and bigger


• We had just combined VATLST00s into a single one 

• Multiple lists from systems in two data centers 

• All DASD accessible via DWDM, so why not combine 

• VATLST00 grew suddenly from 6K entries to 17.5K! 

• OA23645 (R9 FIN): WLM whacks nonresponsive mem 

• System took too long to process the giant VATLST00 

• Digesting all those entries on a z9 hit sysplex timeout 

• After 180 secs, system was partitioned out 

• A few hours later, sysplex was quieter, less noise 

• NIP managed to squeak through before timeout hit 

• Major lesson: we don’t need VATLST00 at all for IPL! 

• VATLST01 provides correct default for all volumes

It Takes Two to Make z Ten Go

It Takes Two to Make z Ten Go 

• The z10 processor is a mighty beast 

• It can slay dragons and munch down wolverines 

• Many an LPAR is hardly more than a bite sized morsel 

• So why assign more than one logical CPU 

• Doesn’t the MP effect cost than more than it’s worth 

• What about the old rules of logical CPU sums 

• Ah, beware the lure of the bargain bin uniprocessor 

• We booked economy seats in some lower profile LPARs 

• Ouch! 

• Several times in the past year we took sysplex hits 

• One member would hang: whole sysplex languished 

• Couldn’t logon to TSO or even SMCS console


• Example of slow death of System A1 

• Many sequences of these two messages: 

IXC467I RESTARTING PATHOUT STRUCTURE 

IXC_CF#2_SMALL LIST 8 

USED TO COMMUNICATE WITH SYSTEM A1 

RSN: I/O APPARENTLY STALLED 

DIAG073: 08200208 001247CA 001247C8 0000000E 

001247BD 

IXC466I OUTBOUND SIGNAL CONNECTIVITY 

ESTABLISHED WITH SYSTEM A1 

VIA STRUCTURE IXC_CF#2_SMALL LIST 8 

• Problem is that XCF on A1 cannot do its work 

• Preempted by a looping high priority task (various)


• Because A1 is not actually dead, SFM is flummoxed 

• Whether to partition out the stalled member 

• Meanwhile the whole sysplex grinds to a halt 

• Nothing dies but nothing really works 

• Even GRS cannot function, entire sysplex goes slo-mo 

• One workaround is a second logical CPU 

• While one is tied up, the other can/might continue 

• Not a guarantee, but it least a fighting chance to 

• Kill the looping task 

• V XCF OFF the stalled member 

• Either action requires entering OS command(s) 

• A comatose sysplex is the ultimate performance 

degradation

Acknowledgements Knowing and Unknowing 

• James Chan, IBM Global Services, Poughkeepsie 

• Peter Hunkeler, IBM Switzerland 

• Peter Relson, IBM Poughkeepsie

SUB=MSTR

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?