Saturday, March 31, 2007

DB2 Dynamic Cache Switch ON

We keep getting problems because of DB2 V7 to V8 Upgrade...Last week we had a problem wherein our DB2 Admin turned of the Dynamic Cache Switch ON and it caused some problems in our time critical application load jobs...

Last week on Sunday, 4 to 5 of our SELECT and Load jobs were down and there were around 88 jobs that were waiting in ESP to run after these abended jobs to complete the entire schedule by Monday 7:00 AM...

We worked closely with the DB2 Admin group and asked them if there were any change in DB2 in this weekend, since these abended jobs were running smoothly all these days...For this, the DB2 group replied back stating that the DYNAMIC CACHE switch on DB2 in production was turned ON, which made us believe that this could be the problem behind the jobs down...

The job was down with the following error:

---------------------------------------------------------------------------------
Error From SYSOUT:
----------------------
READY ,,
DSN SYSTEM (DSN)
DSN
RUN PROGRAM (DSNTIAUL) PLAN (DSNTIAUL) PARMS ('SQL'
DSN ENDED DUE TO ERROR+

SYSTEM ABEND CODE 0C4 REASON CODE 00000038 PSW 077C20019744FCC6 ASIDS:
HOME,= 004E PRIMARY = 00E1 SECONDARY = 00E1

---------------------------------------------------------------------------------
Sample Job:
------------

//ABNDSTEP EXEC PGM=IKJEFT01,DYNAMNBR=20
//SYSOUT DD SYSOUT=*
//SYSPRINT DD SYSOUT=*
//SYSTSOUT DD SYSOUT=*
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *
DSN SYSTEM (DSN)
RUN PROGRAM (DSNTIAUL) +
PLAN (DSNTIAUL) +
PARMS ('SQL') +
LIBRARY ('SYS3.RUNLIB.LOAD')
/*
//SYSREC00 DD DSN=,DISP=(,CATLG,CATLG),
//UNIT=DASD,SPACE=(CYL,(5,5),RLSE),
//DCB=(RECFM=FB,LRECL=562)
//SYSREC01 DD DSN==,DISP=(,CATLG,CATLG),
//UNIT=DASD,SPACE=(CYL,(5,5),RLSE),
//DCB=(RECFM=FB,LRECL=562)
...
...
...
//SYSREC11 DD DSN==,DISP=(,CATLG,CATLG),
//UNIT=DASD,SPACE=(CYL,(5,5),RLSE),
//DCB=(RECFM=FB,LRECL=562)
//SYSPUNCH DD DUMMY
//*
//****Please note that I have shown one of the query of the 11
//****queries. Each query is the same as the one shown below
//****with just difference in the predicates and they would be
//****unloaded to above 11 SYSRECs in the JCLs...All of them
//****are present in the same control card and each query is
//****separated by a semi colon (;)...The actual query is thus
//****4500 lines...
//*
//SYSIN DD *

SELECT
FISC_YR
,FISC_WK
,CAST ('ABT' AS CHAR(03))
,CAST (9999 AS SMALLINT) AS COL1
,CAST (0 AS SMALLINT) AS COL2
,CAST (COL_A AS SMALLINT) AS COL3
,CAST (9999 AS SMALLINT) AS COL4
...
...

,CAST (COALESCE( STDDEV(DURATION_HRS) / 24 ,0) AS DECIMAL(5,1)) AS ST_DEV
,CAST (0 AS DECIMAL(4,1)) AS TOTAL_90
,CAST (0 AS DECIMAL(4,1)) AS TOTAL_98

FROM
(
SELECT FISC_YR,FISC_WK,COL_A,COL_B,COL_C FROM
A
INNER JOIN
B
ON A.COL1 = B.COL1

WHERE A.TYPE = 'D'
AND B.COL_5 = 'ST'

UNION ALL

SELECT FISC_YR,FISC_WK,COL_A,COL_B,COL_C FROM
A
INNER JOIN
B
ON A.COL1 = B.COL1

WHERE
A.TYPE = 'D'
AND B.COL_5 = 'ST'
AND A.STATE = 'TX'

) X

GROUP BY
FISC_YR
, FISC_WK
, X.COL_A
, X.COL_B
, COL3

UNION

SELECT
FISC_YR
,FISC_WK
,CAST ('ABT' AS CHAR(03))
,CAST (9999 AS SMALLINT) AS COL1
,CAST (0 AS SMALLINT) AS COL2
,CAST (COL_A AS SMALLINT) AS COL3
,CAST (9999 AS SMALLINT) AS COL4
...
...
,CAST (COALESCE( STDDEV(DURATION_HRS) / 24 ,0) AS DECIMAL(5,1)) AS ST_DEV
,CAST (0 AS DECIMAL(4,1)) AS TOTAL_90
,CAST (0 AS DECIMAL(4,1)) AS TOTAL_98
FROM
(
SELECT FISC_YR,FISC_WK,COL_A,COL_B,COL_C FROM
A
INNER JOIN
B
ON A.COL1 = B.COL1

WHERE A.TYPE = 'D'
AND B.COL_5 = 'ST'

UNION ALL

SELECT
FISC_YR
,FISC_WK
,COL_A
,COL_B
,COL_C

FROM
A

INNER JOIN
B

ON A.COL1 = B.COL1

WHERE
A.TYPE = 'D'
AND B.COL_5 = 'ST'
AND A.STATE = 'TX'

) X

GROUP BY
FISC_YR
, FISC_WK
, COL4
, X.COL_A
, X.COL_B
, COL3 ;

/*
//*
//*--------------------------------------------------------------------------*
//* ABNDSTP1 STEP WILL ABEND IF ABOVE STEP HAD CODE CODE > 4 *
//*--------------------------------------------------------------------------*
// IF ABNDSTEP.RC GT 4 THEN
//ABNDSTP1 EXEC PGM=ABENDER,PARM=020
// ENDIF
//*
---------------------------------------------------------------------------------
The problem in the above query was the STDDEV function which was trying to refer a column in the inner query...Without the STDDEV the job ran fine...But we couldnt jeopardize on the results by removing the STDDEV function and replacing it with some other function....

Hence we requested DYNAMIC CACHING to be turned OFF in production to the DB2 group...They did and we went ahead and restarted the jobs...Everything ran fine after that...We also completed the entire schedule within 6:00 AM Monday well within the SLA time of Monday 7 AM...


Further the DB2 group has opened up a ticket with IBM for a patch that could be applied to make sure this query works out when the DYNAMIC CACHE is turned ON again in production because according to them the DYNAMIC CACHE turn ON would result in significant performance improvement....

Apparently we were able to replicate the Error in Test environment and DB2 group is taking the abend dumps from the test and sending it to IBM...In the following days, we would try to apply the patch first in Test and run our jobs...If they run fine over in Test with the DYNAMIC CACHE switch Turned ON, then they would apply the patch in production too and turn on the DYNAMIC CACHE switch in production too...

- Suresh Gangadharan

1 comment:

Deepak said...

Good one. You are lucky you identified the problem easily (?)!!