SQL Plan Statistics, Pstack examples

The query in this example deals with two tables: EMP_TEST, the employee table with 1 million rows, and DEPT_TEST, the department table with 100 rows. There is one-to-many relationship between these two tables: each employee belongs to a department. The employees are more or less evenly and randomly distributed between departments, with each department having about 10,000 employees. You can reproduce this test case using the script below.

The query pulls a set of employees with employee IDs in the range from 1 to 1000; therefore, 1000 rows are expected. Also, we need to see department info, so we have to join to the department table. As part of department info, we need to include a column showing how many employees are in the department - this information is not stored in the department table, so the query should derive that on the fly.

The query joins three datasets, EMP_TEST, DEPT_TEST, and result set of a subquery (c). The EMP_TEST table has a unique index (primary key) on empno column and another index on deptno column. The DEPT_TEST table has a unique index (primary key) on deptno column. Both tables have statistics, but there are no column statistics.

Thinking about options to execute the query, there are a few relatively obvious things:

This seems to be a good strategy. Because employee rows are distributed randomly between departments, there is a good chance that 1000 rows will refer to all or almost all departments. If they were all in the same department, or just few of them, we could improve our strategy by running a range scan on the EMP_TEST(deptno) index. But that is not the case, so we have to conclude that the most efficient strategy is the one described above.

So far the SQL optimizer has been following our strategy. Note, that the estimated cardinality 4491 rows for the index is different from the expected 1000 - this can be explained by the absence of column statistics. The cardinality 2495 rows from EMP_TEST seems to be a surprise, given the fact that the EMP_TEST(EMPNO) index is unique. Anyway, even though the estimate of 2495 rows is not perfect, it is not too far from what should be expected during the execution.

The real surprise is how the query optimizer decided to process the subquery. The HASH JOIN (2) operation joins the data set created by the nested loops (3), and another temporary data set created by INDEX FAST FULL SCAN (8) operation of the IX_EMP_DEPTNO index. The IX_EMP_DEPTNO (EMT_TEST(DEPTNO)) index is not unique; there are 1,000,000 entries in this index and on average 10,000 entries per distinct DEPTNO key. Let see what is going to happen when 1000 rows are joined to 1,000,000 rows on the DEPTNO column. Every row in the first data set is joined to a multiple number of entries in the second data set - on average 10,000 - producing a very large internal data set: 1000 rows in (3) x 10,000 entries in (8) = 10,000,000 rows. It doesn't matter what type of join is used, it is what relational algebra dictates. Then finally, the GROUP BY operation (1) is applied on this large internal data set. In our strategy, the GROUP BY was to be executed for the subquery's data set only. The optimizer decided to transform the plan and execute the GROUP BY as the last operation.

Ouch! This is not optimal at all. But the optimizer sees it differently. Look at the cardinality in step (2) - 2495 rows. The optimizer assumes that every row in (3) will be joined to only 1 row in (8). It seems that even without column statistics, the optimizer should be able to better estimate the outcome of the join using statistics on this index, see NUM_ROWS and DISTINCT_KEYS values:

It is out of scope of this article to find out why this happens; it can be a result of missing functionality in the query optimizer. The point is that this plan is reproducible in many versions, including 11g R2; we are going to execute this query and collect the plan statistics to confirm our findings. We are going to use Lab128's built-in support for SQL execution with Plan Statistics enabled. See SQL Plan Statistics for more details. We are also going to use Lab128's Explain Plan window which is capable of combining plan statistics with the traditional presentation of the execution plan.

Up to this point (NESTED LOOPS), the execution was going fast: total elapsed time was .047 second. Number of rows in the internal data set: 1,000. Number of gets: 1,012.

As predicted, bad things started happening on HASH JOIN steps, where every row from the NESTED LOOPS step was joined to EMP_TEST represented by the IX_EMP_DEPTNO index:

Fixing the query.

This example will not be complete without trying to fix the query. The root problem, as we have seen in previous section, was that SQL optimizer merged the subquery and moved GROUP BY to the end of execution. We can try to prevent this transformation, forcing GROUP BY to be a part of subquery. First, let's do the old trick of adding redundant "rownum > 0" condition to the subquery, which worked in many Oracle versions. Second, we will prevent the transformation with an Oracle hint.

Using "rownum > 0". This old trick prevents the subquery from merging with the rest of query.

This run was substantially better; the execution time went down from 60 to 13.4 seconds:

The subquery was executed first, and then the result set was MERGE JOINed to the 1,000-row subset from EMP_TEST. Finally, another MERGE JOIN step joined the result set with the DEPT_TEST table.

You may wonder what these yellow blocks mean. This is how Lab128 highlights steps with substantial (>5%) elapsed time. This helps to identify problematic steps. The "FILTER (rownum > 0)" and "COUNT" steps contributed 67% of elapsed time - these steps were introduced by "rownum > 0" condition. This was surprisingly high. Let's not use this condition and use an Oracle hint instead.

Removing "rownum > 0" actually helped a lot. The execution time improved from 13.4 to 4.6 seconds:

The query has very similar plan, only the overhead caused "rownum > 0" has been eliminated.

Conclusion.

Using Pstack technique for troubleshooting of runaway query.

Once in a while you have a situation when a query has been running for a long time and it seems to never finish. Let's return to our original query and take Pstack snapshots using Lab128's feature described in "Advanced troubleshooting using pstack traces" in Top Processes.

We are going to start the query, wait a bit until the Oracle process executing our query pops up in the Top Processes window, and then select Pstack option in pop-up menu. We will take 20 snapshots; here is the report provided by Lab128:

Host Name:	db12
OS PID:	25684
Database:		Time:	Tue, 10/19/2010 3:20:41 PM, loc

Pstack parsed and aggregated report

In this report function names are converted into Explain Plan operations when possible. Several pstack snapshots have been merged into the tree with the node frequency indicated in () parenthesis. When the call path deviated from other snapshots, a new branch was added to the tree. You should focus on finding the last frequent node having child nodes with significantly reduced frequency. This last node (function) is where the process has spent most of the time. This provides a clue on which step the query was spending time.

0. SELECT FETCH:  (20) 
 1. HASH GROUP BY: Fetch (20) 
  2. HASH JOIN: Fetch (20) 
   3. INDEX UNIQUE SCAN: FetchFastFullScan (20) 
    4. HASH JOIN: InnerProbeHashTable (20) 
     5. HASH JOIN: WalkHashBucket (20) 
      6. HASH GROUP BY: RowP (11) 
       7. qeshLoadRowForGBY (11) 
        8. qeshrGetUHash_Fast (6) 
        8. qeshIHProbeURow_Simple (5) 
         9. HASH GROUP BY: AggregateRecords (2) 
          10. qesaFastMergeNonDist (1) 
          10. qeshrGotoWorkArea (1) 
         9. qeshrPUCompare (3) 
          10. ?? (2) 
      6. kxhrUnpack (7) 
       7. rworupo (6) 
      6. kxhrPUcompare (2)

Pstack raw data

#0  0x00000000024a4faf in rworupo ()
#1  0x0000000003067105 in kxhrUnpack ()
#2  0x0000000002dbd75a in qerhjWalkHashBucket ()
#3  0x0000000002dbd061 in qerhjInnerProbeHashTable ()
#4  0x0000000002e300e1 in qerixFetchFastFullScan ()
#5  0x0000000002e469f4 in rwsfcd ()
#6  0x0000000002dc0bef in qerhjFetch ()
#7  0x0000000002edae3b in qerghFetch ()
#8  0x0000000003003325 in opifch2 ()
#9  0x000000000317201a in kpoal8 ()
#10 0x0000000001362d84 in opiodr ()
#11 0x0000000003b4ee7e in ttcpip ()
#12 0x000000000135e454 in opitsk ()
#13 0x0000000001361374 in opiino ()
#14 0x0000000001362d84 in opiodr ()
#15 0x00000000013547a3 in opidrv ()
#16 0x0000000001f06032 in sou2o ()
#17 0x000000000071437b in opimai_real ()
#18 0x00000000007142cc in main ()


#0  0x00000000024a5059 in rworupo ()
#1  0x0000000003067105 in kxhrUnpack ()
#2  0x0000000002dbd75a in qerhjWalkHashBucket ()
#3  0x0000000002dbd061 in qerhjInnerProbeHashTable ()
#4  0x0000000002e300e1 in qerixFetchFastFullScan ()
#5  0x0000000002e469f4 in rwsfcd ()
#6  0x0000000002dc0bef in qerhjFetch ()
#7  0x0000000002edae3b in qerghFetch ()
#8  0x0000000003003325 in opifch2 ()
#9  0x000000000317201a in kpoal8 ()
#10 0x0000000001362d84 in opiodr ()
#11 0x0000000003b4ee7e in ttcpip ()
#12 0x000000000135e454 in opitsk ()
#13 0x0000000001361374 in opiino ()
#14 0x0000000001362d84 in opiodr ()
#15 0x00000000013547a3 in opidrv ()
#16 0x0000000001f06032 in sou2o ()
#17 0x000000000071437b in opimai_real ()
#18 0x00000000007142cc in main ()
...

The instructions in the report help to interpret the report. We can conclude that the process was spending most of the time in functions called by the "HASH JOIN" and "HASH GROUP BY" steps.

The script to recreate tables used in this article.

This test was conducted on 10g (10.2.0.4) and 11g (11.2.0.1) databases with default SQL Optimizer parameters:

	Value		Value		Value
LAST_ANALYZED	18.10.2010 3:18:16 AM	AVG_LEAF_BLOCKS_PER_KEY	19	MBYTES	16
UNIQUENESS	NONUNIQUE	AVG_DATA_BLOCKS_PER_KEY	4,460	BYTES / ROW	16.8
NUM_ROWS	1,000,000	CLUSTERING_FACTOR	446,028	ROWS / LEAF_BLOCK	512
LEAF_BLOCKS	1,954	SAMPLE_SIZE	1,000,000	EXTENTS	31
DISTINCT_KEYS	100	BLEVEL	2	BLOCKS / EXTENT	66.1