Alternative "Pipelined" Presentation of the Query Execution Plan

Lab128 - Tools for Advanced Oracler Tuning and Monitoring.

Reference Guide.

Alternative "Pipelined" Presentation of the Query Execution Plan.

We all are used to the EXPLAIN PLAN command and the output shown in a tree-like fashion:

select e.EMPNO, e.ENAME, m.ENAME as manager, DNAME 
from dept d, emp e, emp m 
where e.DEPTNO=d.DEPTNO and e.MGR=m.EMPNO and DNAME='SALES'

------------------------------------------------- 
| Id  | Operation                     | Name    |
-------------------------------------------------
|   0 | SELECT STATEMENT              |         |
|   1 |  NESTED LOOPS                 |         |
|   2 |   NESTED LOOPS                |         |
|*  3 |    TABLE ACCESS FULL          | EMP     |
|*  4 |    TABLE ACCESS BY INDEX ROWID| DEPT    |
|*  5 |     INDEX UNIQUE SCAN         | PK_DEPT |
|   6 |   TABLE ACCESS BY INDEX ROWID | EMP     |
|*  7 |    INDEX UNIQUE SCAN          | PK_EMP  |
-------------------------------------------------

In this query, the action starts with a full scan of the EMP table (id=3). Each row then joins to the DEPT table on DEPTNO column, but first it traverses the PK_DEPT index (id=5) using the DEPTNO value from the EMP row, finds the ROWID, and accesses DEPT (id=4). The DEPT row is checked to see if the condition DNAME='SALES' is satisfied (id=4). If not, the control goes back to the EMP table, resuming FULL scan operation and processing next row. If the condition was 'true', the execution proceeds further to join the EMP table. The value of the MGR column of the row from the full scan is used to traverse the PK_EMP index (id=7), and then obtained ROWID is used to access the row in EMP (id=6). Well, now we have all the values for one output row which could be placed in the output buffer. Potentially, the row can be consumed or pipelined to the next processing step. Meanwhile, our execution process returns to the full scan (id=3) and repeats the above steps until all rows in EMP are processed.

We already mentioned that the output rows become available before the entire execution is over, enabling data flow pipelining and eliminating an intermediate storage buffer. In fact, inside the execution process we have already seen pipeline processing: the rows resulting from a full scan were consumed by the join operation between the EMP and DEPT tables, and the rows processed on that step were consumed by the next join operation.

Don't you think it will be more natural to present these pipelined operations in a more linear fashion?

Let's take a look at simpler query:

select e.ENAME, d.DNAME from DEPT d, EMP e
where e.DEPTNO=d.DEPTNO and e.EMPNO=7654;

------------------------------------------------
| Id  | Operation                    | Name    |
------------------------------------------------
|   0 | SELECT STATEMENT             |         |
|   1 |  NESTED LOOPS                |         |
|   2 |   TABLE ACCESS BY INDEX ROWID| EMP     |
|*  3 |    INDEX UNIQUE SCAN         | PK_EMP  |
|   4 |   TABLE ACCESS BY INDEX ROWID| DEPT    |
|*  5 |    INDEX UNIQUE SCAN         | PK_DEPT |
------------------------------------------------

In this query, the row accessed in EMP using the index is pipelined to the join operation with the DEPT table. The execution plan could be shown as following:

PK_EMP (INDEX UNIQUE SCAN) -> EMP (TABLE ACCESS BY INDEX ROWID) -> PK_DEPT (INDEX UNIQUE SCAN) -> DEPT (TABLE ACCESS BY INDEX ROWID)

This can be represented with this picture. Note that the starting operation is in the lower right corner. The NL PIPELINE line was added for clarity purposes only to distinguish the boundaries of each operation.

The transformation applied to the original plan can be explained in this diagram:

This transformation can be recursively applied to any number of NESTED LOOP joins. Let's return to the original query and see how it is going to look:

Can all operations during execution be pipelined? Certainly not. For example, a simple ORDER BY added to this statement will cause all output rows to be buffered, potentially spilling data into a temporary tablespace, and then sorted before making them ready to be used down the pipeline. There are exceptions to this, but in general ORDER BY is not pipelined. The same goes for GROUP BY; consider this example:

select g.LOC, g.CNT, e.ENAME as MANAGER_NAME 
from 
 (
   select LOC, MGR, COUNT(*) as CNT from EMP e, DEPT d 
   where e.DEPTNO=d.DEPTNO group by LOC, MGR having COUNT(*)>3
 ) g, EMP e
where g.MGR=e.EMPNO;

In this query, we are looking for managers with more than 3 employees per location. The inner subquery includes GROUP BY and HAVING clauses. This forces Oracle to collect all rows needed to perform GROUP BY first before proceeding with the join to the EMP (e) table.

This situation can be denoted by an arrow of a different color (HASH GROUP BY step) as shown in this picture:

So far, we have been talking about NESTED LOOP joins. Are other types of joins implemented with pipelining? Let's take the HASH join. During the first phase, an internal structure is created in-memory (or spilled out to a temporary tablespace, if big enough) for the first dataset. This structure has each row easily located by its "address" - a hash value calculated by a hash function applied on values of columns used in the join. The rows for this phase can be fed from the pipeline. The hash join cannot proceed before this structure is created, therefore effectively stopping the pipeline process. Once this phase is completed, rows from the second dataset are "probed" against the hash structure using the same hash function. This operation can be pipelined; therefore, HASH join can be a half-blocking and half-pipelined operation. This semi-pipelined operation can be seen in this picture:

select --+use_hash(d,e)
e.EMPNO, e.ENAME, m.ENAME as manager, DNAME
from dept d, emp e, emp m
where e.DEPTNO=d.DEPTNO and e.MGR=m.EMPNO and DNAME='SALES'

Is this alternative presentation of a plan useful?

Well, this is really experimental stuff. Try it out and let us know. It especially helps in certain cases; one example is a query with many UNIONs. Having nested loops unfolded makes it very easy to find the table where execution starts for each UNIONed subquery. This alternative style of presenting a plan can be appealing to beginners. Experienced users may find unfolded pipelines too long to follow. The best option is toggling between the two styles and taking the best from each of them.

This new style has been pioneered and implemented in Lab128. You can toggle between traditional and alternative presentation using "Pipeline" check box in the Explain Plan window.