Oracle8i Concepts
Release 8.1.5

A67781-01

Library

Product

Contents

Index

Prev Next

24
Optimization of Joins

Will you, won't you, will you, won't you, will you join the dance?

Lewis Carroll

This chapter discusses how the Oracle optimizer executes SQL statements that contain joins, anti-joins, and semi-joins. It also describes how the optimizer can use bitmap indexes to execute star queries, which join a fact table to multiple dimension tables. This chapter includes:

Optimizing Join Statements

To choose an execution plan for a join statement, the optimizer must make these interrelated decisions:

access paths  

As for simple statements, the optimizer must choose an access path to retrieve data from each table in the join statement. (See "Choosing Access Paths".)  

join operations  

To join each pair of row sources, Oracle must perform one of these operations:

  • nested loops

  • sort-merge

  • cluster

  • hash join (not available with rule-based optimization)

 

join order  

To execute a statement that joins more than two tables, Oracle joins two of the tables, and then joins the resulting row source to the next table. This process is continued until all tables are joined into the result.  

Join Operations

The optimizer can use the following operations to join two row sources:

Nested Loops Join

To perform a nested loops join, Oracle follows these steps:

  1. The optimizer chooses one of the tables as the outer table, or the driving table. The other table is called the inner table.

  2. For each row in the outer table, Oracle finds all rows in the inner table that satisfy the join condition.

  3. Oracle combines the data in each pair of rows that satisfy the join condition and returns the resulting rows.

Figure 24-1 shows the execution plan for this statement using a nested loops join:

SELECT * 
  FROM emp, dept 
  WHERE emp.deptno = dept.deptno; 

Figure 24-1 Nested Loops Join


To execute this statement, Oracle performs these steps:

Sort-Merge Join

Oracle can only perform a sort-merge join for an equijoin. To perform a sort-merge join, Oracle follows these steps:

  1. Oracle sorts each row source to be joined if they have not been sorted already by a previous operation. The rows are sorted on the values of the columns used in the join condition.

  2. Oracle merges the two sources so that each pair of rows, one from each source, that contain matching values for the columns used in the join condition are combined and returned as the resulting row source.

Figure 24-2 shows the execution plan for this statement using a sort-merge join:

SELECT * 
  FROM emp, dept 
  WHERE emp.deptno = dept.deptno; 

Figure 24-2 Sort-Merge Join


To execute this statement, Oracle performs these steps:

Cluster Join

Oracle can perform a cluster join only for an equijoin that equates the cluster key columns of two tables in the same cluster. In a cluster, rows from both tables with the same cluster key values are stored in the same blocks, so Oracle only accesses those blocks.

Additional Information:

Oracle8i Tuning provides guidelines for deciding which tables to cluster for best performance.  

Figure 24-3 shows the execution plan for this statement in which the EMP and DEPT tables are stored together in the same cluster:

SELECT * 
  FROM emp, dept 
  WHERE emp.deptno = dept.deptno; 

Figure 24-3 Cluster Join


To execute this statement, Oracle performs these steps:

A cluster join is nothing more than a nested loops join involving two tables that are stored together in a cluster. Since each row from the DEPT table is stored in the same data blocks as the matching rows in the EMP table, Oracle can access matching rows most efficiently.

Hash Join

Oracle can only perform a hash join for an equijoin. Hash join is not available with rule-based optimization. You must enable hash join optimization, using the initialization parameter HASH_JOIN_ENABLED (which can be set with the ALTER SESSION command) or the USE_HASH hint.

To perform a hash join, Oracle follows these steps:

  1. Oracle performs a full table scan on each of the tables and splits each into as many partitions as possible based on the available memory.

  2. Oracle builds a hash table from one of the partitions (if possible, Oracle will select a partition that fits into available memory). Oracle then uses the corresponding partition in the other table to probe the hash table. All partition pairs that do not fit into memory are placed onto disk.

  3. For each pair of partitions (one from each table), Oracle uses the smaller one to build a hash table and the larger one to probe the hash table.

Figure 24-4 shows the execution plan for this statement using a hash join:

SELECT * 
  FROM emp, dept 
  WHERE emp.deptno = dept.deptno; 

Figure 24-4 Hash Join


To execute this statement, Oracle performs these steps:

The initialization parameter HASH_AREA_SIZE controls the amount of memory used for hash join operations and the initialization parameter HASH_MULTIBLOCK_IO_COUNT controls the number of blocks a hash join operation should read and write concurrently.

Additional Information:

See Oracle8i Tuning for more information about these initialization parameters and the USE_HASH hint.  

Choosing Execution Plans for Join Statements

This section describes how the optimizer chooses an execution plan for a join statement:

Note these considerations that apply to the cost-based and rule-based approaches:

Choosing Execution Plans for Joins with the Cost-Based Approach

With the cost-based approach, the optimizer generates a set of execution plans based on the possible join orders, join operations, and available access paths. The optimizer then estimates the cost of each plan and chooses the one with the lowest cost. The optimizer estimates costs in these ways:

With the cost-based approach, the optimizer's choice of join orders can be overridden with the ORDERED hint. If the ORDERED hint specifies a join order that violates the rule for outer join, the optimizer ignores the hint and chooses the order. You can also override the optimizer's choice of join operations with hints.

Additional Information:

See Oracle8i Tuning for information on using hints.  

Choosing Execution Plans for Joins with the Rule-Based Approach

With the rule-based approach, the optimizer follows these steps to choose an execution plan for a statement that joins R tables:

  1. The optimizer generates a set of R join orders, each with a different table as the first table. The optimizer generates each potential join order using this algorithm:

    1. To fill each position in the join order, the optimizer chooses the table with the most highly ranked available access path according to the ranks for access paths shown in Table 23-1. The optimizer repeats this step to fill each subsequent position in the join order.

    2. For each table in the join order, the optimizer also chooses the operation with which to join the table to the previous table or row source in the order. The optimizer does this by "ranking" the sort-merge operation as access path 12 and applying these rules:

    • If the access path for the chosen table is ranked 11 or better, the optimizer chooses a nested loops operation using the previous table or row source in the join order as the outer table.

    • If the access path for the table is ranked lower than 12, and there is an equijoin condition between the chosen table and the previous table or row source in join order, the optimizer chooses a sort-merge operation.

    • If the access path for the chosen table is ranked lower than 12, and there is not an equijoin condition, the optimizer chooses a nested loops operation with the previous table or row source in the join order as the outer table.

  2. The optimizer then chooses among the resulting set of execution plans. The goal of the optimizer's choice is to maximize the number of nested loops join operations in which the inner table is accessed using an index scan. Since a nested loops join involves accessing the inner table many times, an index on the inner table can greatly improve the performance of a nested loops join.

    Usually, the optimizer does not consider the order in which tables appear in the FROM clause when choosing an execution plan. The optimizer makes this choice by applying the following rules in order:

    1. The optimizer chooses the execution plan with the fewest nested-loops operations in which the inner table is accessed with a full table scan.

    2. If there is a tie, the optimizer chooses the execution plan with the fewest sort-merge operations.

    3. If there is still a tie, the optimizer chooses the execution plan for which the first table in the join order has the most highly ranked access path:

    • If there is a tie among multiple plans whose first tables are accessed by the single-column indexes access path, the optimizer chooses the plan whose first table is accessed with the most merged indexes.

    • If there is a tie among multiple plans whose first tables are accessed by bounded range scans, the optimizer chooses the plan whose first table is accessed with the greatest number of leading columns of the composite index.

    • If there is still a tie, the optimizer chooses the execution plan for which the first table appears later in the query's FROM clause.

Views in Outer Joins

For a view that is on the right side of an outer join, the optimzer can use one of two methods, depending on how many base tables the view accesses:

Merging a View That Has a Single Base Table

A view that has one base table and is on the right side of an outer join can be merged into the query block of an accessing statement. (See "Merging the View's Query into the Statement".) View merging is possible even if an expression in the view can return a non-null value for a NULL.

Example:

Consider the view NAME_VIEW, which concatenates first and last names from the EMP table:

CREATE VIEW name_view 
  AS SELECT emp.firstname || emp.lastname AS emp_fullname, emp.deptno 
       FROM emp; 

and consider this outer join statement, which finds the names of all employees in London and their departments, as well as any departments that have no employees:

SELECT dept.deptno, name_view.emp_fullname 
  FROM emp_fullname, dept 
  WHERE dept.deptno = name_view.deptno(+) 
    AND dept.deptloc = 'London'; 

The optimizer merges the view's query into the outer join statement. The resulting statement looks like this:

SELECT dept.deptno, DECODE(emp.rowid, NULL, NULL, emp.firstname || emp.lastname) 
  FROM emp, dept 
  WHERE dept.deptno = emp.deptno(+) 
    AND dept.deptloc = 'London'; 

The transformed statement selects only the employees who work in London.

Pushing the Join Predicate into a View That Has Multiple Base Tables

For a view with multiple base tables on the right side of an outer join, the optimizer can push the join predicate into the view (see "Pushing the Predicate into the View") if the initialization parameter OPTIMIZER_FEATURES_ENABLE is set to TRUE or the accessing query contains the PUSH_JOIN_PRED hint.

Pushing a join predicate is a cost-based transformation that can enable more efficient access path and join methods, such as transforming hash joins into nested loop joins, and full table scans to index scans.

Additional Information:

See Oracle8i Tuning for information about optimizer hints.  

Example:

Consider the view LONDON_EMP, which selects the employees who work in London:

CREATE VIEW london_emp 
  AS SELECT emp.ename 
       FROM emp, dept 
       WHERE emp.deptno = dept.deptno 
         AND dept.deptloc = 'London'; 

and consider this outer join statement, which finds the engineers and accountants working in London who received bonuses:

SELECT bonus.job, london_emp.ename 
  FROM bonus, london_emp 
  WHERE bonus.job IN ('engineer', 'accountant') 
    AND bonus.ename = london_emp.ename(+); 

The optimizer pushes the outer join predicate into the view. The resulting statement (which does not conform to standard SQL syntax) looks like this:

SELECT bonus.job, london_emp.ename 
  FROM bonus, (SELECT emp.ename FROM emp, dept 
                   WHERE bonus.ename = london_emp.ename(+) 
                     AND emp.deptno = dept.deptno 
                     AND dept.deptloc = 'London') 
  WHERE bonus.job IN ('engineer', 'accountant'); 

Optimizing Anti-Joins and Semi-Joins

An anti-join returns rows from the left side of the predicate for which there is no corresponding row on the right side of the predicate. That is, it returns rows that fail to match (NOT IN) the subquery on the right side. For example, an anti-join can select a list of employees who are not in a particular set of departments:

SELECT * FROM emp 
  WHERE deptno NOT IN 
    (SELECT deptno FROM dept 
      WHERE loc = 'HEADQUARTERS'); 

The optimizer uses a nested loops algorithm for NOT IN subqueries by default, unless the initialization parameter ALWAYS_ANTI_JOIN is set to MERGE or HASH and various required conditions are met that allow the transformation of the NOT IN subquery into a sort-merge or hash anti-join. You can place a MERGE_AJ or HASH_AJ hint in the NOT IN subquery to specify which algorithm the optimizer should use.

A semi-join returns rows that match an EXISTS subquery, without duplicating rows from the left side of the predicate when multiple rows on the right side satisfy the criteria of the subquery. For example:

SELECT * FROM dept 
  WHERE EXISTS 
    (SELECT * FROM emp 
      WHERE dept.ename = emp.ename 
        AND emp.bonus > 5000); 

In this query, only one row needs to be returned from DEPT even though many rows in EMP might match the subquery. If there is no index on the BONUS column in EMP, a semi-join can be used to improve query performance.

The optimizer uses a nested loops algorithm for EXISTS subqueries by default, unless the initialization parameter ALWAYS_SEMI_JOIN is set to MERGE or HASH and various required conditions are met. You can place a MERGE_SJ or HASH_SJ hint in the EXISTS subquery to specify which algorithm the optimizer should use.

Additional Information:

See Oracle8i Tuning for information about optimizer hints.  

Optimizing "Star" Queries

One type of data warehouse design centers around what is known as a "star" schema, which is characterized by one or more very large fact tables that contain the primary information in the data warehouse and a number of much smaller dimension tables (or "lookup" tables), each of which contains information about the entries for a particular attribute in the fact table.

A star query is a join between a fact table and a number of lookup tables. Each lookup table is joined to the fact table using a primary-key to foreign-key join, but the lookup tables are not joined to each other.

Cost-based optimization recognizes star queries and generates efficient execution plans for them. (Star queries are not recognized by rule-based optimization.)

A typical fact table contains keys and measures. For example, a simple fact table might contain the measure Sales, and keys Time, Product, and Market. In this case there would be corresponding dimension tables for Time, Product, and Market. The Product dimension table, for example, would typically contain information about each product number that appears in the fact table.

A star join is a primary-key to foreign-key join of the dimension tables to a fact table. The fact table normally has a concatenated index on the key columns to facilitate this type of join.

Additional Information:

See Oracle8i Tuning for more information about dimensions and data warehouses.  

Star Query Example

This section discusses star queries with reference to the following example:

SELECT SUM(dollars) 
  FROM facts, time, product, market 
  WHERE market.stat = 'New York' 
    AND product.brand = 'MyBrand' 
    AND time.year = 1995 
    AND time.month = 'March' 
    /* Joins*/ 
    AND time.key = facts.tkey 
    AND product.pkey = facts.pkey 
    AND market.mkey = facts.mkey; 

Tuning Star Queries

To execute star queries efficiently, you must use cost-based optimization. Begin by gathering statistics (using the DBMS_STATS package or the ANALYZE command) for each of the tables accessed by the query.

Indexing

In the example above, you would construct a concatenated index on the columns tkey, pkey, and mkey. The order of the columns in the index is critical to performance. the columns in the index should take advantage of any ordering of the data. If rows are added to the large table in time order, then tkey should be the first key in the index. When the data is a static extract from another database, it is worthwhile to sort the data on the key columns before loading it.

If all queries specify predicates on each of the small tables, a single concatenated index suffices. If queries that omit leading columns of the concatenated index are frequent, additional indexes may be useful. In this example, if there are frequent queries that omit the time table, an index on pkey and mkey can be added.

Hints

Usually, if you analyze the tables the optimizer will choose an efficient star plan. You can also use hints to improve the plan. The most precise method is to order the tables in the FROM clause in the order of the keys in the index, with the large table last. Then use the following hints:

/*+ ORDERED USE_NL(facts) INDEX(facts fact_concat) */ 

A more general method is to use the STAR hint /*+ STAR */.

Extended Star Schemas

Each of the small tables can be replaced by a join of several smaller tables. For example, the product table could be normalized into brand and manufacturer tables. Normalization of all of the small tables can cause performance problems. One problem is caused by the increased number of permutations that the optimizer must consider. The other problem is the result of multiple executions of the small table joins.

You can solve both of these problems by using denormalized views. For example:

CREATE VIEW prodview AS SELECT /*+ NO_MERGE */ * 
    FROM brands, mfgrs WHERE brands.mfkey = mfgrs.mfkey; 

This hint reduces the optimizer's search space and causes caching of the result of the view.

Star Transformation

The star transformation is a cost-based query transformation aimed at executing star queries efficiently. Whereas the star optimization works well for schemas with a small number of dimensions and dense fact tables, the star transformation may be considered as an alternative if any of the following holds true:

The star transformation does not rely on computing a Cartesian product of the dimension tables, which makes it better suited for cases where fact table sparsity and/or a large number of dimensions would lead to a large Cartesian product with few rows having actual matches in the fact table. In addition, rather than relying on concatenated indexes, the star transformation is based on combining bitmap indexes on individual fact table columns.

The transformation can thus choose to combine indexes corresponding precisely to the constrained dimensions. There is no need to create many concatenated indexes where the different column orders match different patterns of constrained dimensions in different queries.


Attention:

Bitmap indexes are available only if you have purchased the Oracle8i Enterprise Edition. In Oracle8i, bitmap indexes are not available and star query processing uses B*-tree indexes. In the Oracle8i Enterprise Edition, the parallel bitmap index join algorithm is also available for star query processing.

See Getting to Know Oracle8i for more information about the features available in Oracle8i and Oracle8i Enterprise Edition.  


The star transformation works by generating new subqueries that can be used to drive a bitmap index access path for the fact table.

Consider a simple case with three dimension tables, "d1", "d2", and "d3", and a fact table, "fact". The following query:

EXPLAIN PLAN FOR 
  SELECT * FROM fact, d1, d2, d3  
    WHERE fact.c1 = d1.c1 AND fact.c2 = d2.c1 AND fact.c3 = d3.c1 
    AND d1.c2 IN (1, 2, 3, 4) 
    AND d2.c2 < 100 
    AND d3.c2 = 35 

gets transformed by adding three subqueries:

SELECT * FROM fact, d1, d2, d3 
  WHERE fact.c1 = d1.c1 AND fact.c2 = d2.c1 AND fact.c3 = d3.c3 
    AND d1.c2 IN (1, 2, 3, 4) 
    AND d2.c2 < 100 
    AND d3.c2 = 35 
    AND fact.c1 IN (SELECT d1.c1 FROM d1 WHERE d1.c2 IN (1, 2, 3, 4)) 
    AND fact.c2 IN (select d2.c1 FROM d2 WHERE d2.c2 < 100) 
    AND fact.c3 IN (SELECT d3.c1 FROM d3 WHERE d3.c2 = 35) 

In addition, if it is cost effective, one or more of the subqueries may be further optimized by storing its results in a temporary table. Then the subquery is replaced with a subquery on the temporary table. For example, if the first subquery above was selected for this temporary table transformation, a temporary table named ORA_TEMP_1_123 is created and filled with the results of the subquery:

SELECT d1.c1 from d1 where d1.c2 in (1, 2, 3, 4) 

The fully transformed query is now:

SELECT * FROM fact, ORA_TEMP_1_123, d2, d3 
  WHERE fact.c1 = ORA_TEMP_1_123.c1 AND fact.c2 = d2.c1 and fact.c3 = d3.c1 
    AND ORA_TEMP_1_123.c1 IN (1, 2, 3, 4) 
    AND d2.c2 < 100 
    AND d3.c2 = 35 
    AND fact.c1 IN (SELECT ORA_TEMP_1_123.c1 FROM ORA_TEMP_1_123) 
    AND fact.c2 IN (SELECT d2.c1 FROM d2 WHERE d2.c2 < 100) 
    AND fact.c3 IN (SELECT d3.c1 FROM d3 WHERE d3.c2 = 35) 

Given that there are bitmap indexes on fact.c1, fact.c2, and fact.c3, the newly generated subqueries can be used to drive a bitmap index access path in the following way.

For each value of c1 that is retrieved from the first subquery, the bitmap for that value is retrieved from the index on fact.c1 and these bitmaps are merged. The result is a bitmap for precisely those rows in fact that match the condition on d1 in the subquery WHERE clause.

Similarly, the values from the second subquery are used together with the bitmap index on fact.c2 to produce a merged bitmap corresponding to the rows in fact that match the condition on d2 in the second subquery. The same operations apply to the third subquery. The three merged bitmaps can then be ANDed, resulting in a bitmap corresponding to those rows in fact that meet the conditions in all three subqueries simultaneously.

This bitmap can be used to access fact and retrieve the relevant rows. These are then joined to d1, d2, and d3 to produce the answer to the query. No Cartesian product is needed.

Execution Plan

The following execution plan might result from the query above:

SELECT STATEMENT 
 TEMP TABLE GENERATION 
 TEMP TABLE GENERATION 
  HASH JOIN 
   HASH JOIN 
    HASH JOIN 
     TABLE ACCESS              FACT            BY INDEX ROWID 
      BITMAP CONVERSION                        TO ROWIDS 
       BITMAP AND 
        BITMAP MERGE 
         BITMAP KEY ITERATION 
          TABLE ACCESS         D3              FULL 
          BITMAP INDEX         FACT_C3         RANGE SCAN 
        BITMAP MERGE 
         BITMAP KEY ITERATION 
          TABLE ACCESS         ORA_TEMP_1_123  FULL 
          BITMAP INDEX         FACT_C1         RANGE SCAN 
        BITMAP MERGE 
         BITMAP KEY ITERATION 
          TABLE ACCESS         D2              FULL 
          BITMAP INDEX         FACT_C2         RANGE SCAN 
     TABLE ACCESS              ORA_TEMP_1_123  FULL 
    TABLE ACCESS               D2              FULL 
   TABLE ACCESS                D3              FULL 

In this plan the fact table is accessed through a bitmap access path based on a bitmap AND of three merged bitmaps. The three bitmaps are generated by the BITMAP MERGE row source being fed bitmaps from row source trees underneath it. Each such row source tree consists of a BITMAP KEY ITERATION row source which fetches values from the subquery row source tree, which in this example is just a full table access, and for each such value retrieves the bitmap from the bitmap index. After the relevant fact table rows have been retrieved using this access path, they are joined with the dimension tables and temporary tables to produce the answer to the query. The two rows in the execution plan labelled "TEMP TABLE GENERATION" contain the SQL commands used to create and populate the temporary table. These commands are in the OTHER column of the execution plan, which was not displayed in the example above.

The star transformation is a cost-based transformation in the following sense. The optimizer generates and saves the best plan it can produce without the transformation. If the transformation is enabled, the optimizer then tries to apply it to the query and if applicable, generates the best plan using the transformed query. Based on a comparison of the cost estimates between the best plans for the two versions of the query, the optimizer will then decide whether to use the best plan for the transformed or untransformed version.

If the query requires accessing a large percentage of the rows in the fact table, it may well be better to use a full table scan and not use the tranformations. However, if the constraining predicates on the dimension tables are sufficiently selective that only a small portion of the fact table needs to be retrieved, the plan based on the transformation will probably be superior.

Note that the optimizer will generate a subquery for a dimension table only if it decides that it is reasonable to do so based on a number of criteria. There is no guarantee that subqueries will be generated for all dimension tables. The optimizer may also decide, based on the properties of the tables and the query, that the transformation does not merit being applied to a particular query. In this case the best regular plan will be used.

Using Star Transformation

You can enable star transformation by setting the value of the initialization parameter STAR_TRANSFORMATION_ENABLED to TRUE. To use star transformation without temporary tables, set the value of the parameter to TEMP_DISABLE. Use the STAR_TRANSFORMATION hint to make the optimizer use the best plan in which the transformation has been used.

Restrictions on Star Transformation

Star transformation is not supported for tables with any of the following characteristics:

In addition, temporary tables will not be used by star transformation under the following conditions:




Prev

Next
Oracle
Copyright © 1999 Oracle Corporation.

All Rights Reserved.

Library

Product

Contents

Index