Introduction
In this article, we’ll see the basic concepts of
SQL JOINs. In the later part of the article, we’ll focus on the advanced subject of Self-JOIN and some interesting observations on how inner JOIN can be simulated using left JOIN. The author has tried his best to amalgamate various topics in a single concept.
The JOIN
keyword is used in a SQL statement to query data from two or more tables based on a relationship between certain columns in these tables.
Inner JOIN
A JOIN that displays only rows that have a match in both the JOINed tables is known as inner JOIN. This is the default type of JOIN in the Query and View Designer.
Outer JOIN
A JOIN that includes rows even if they do not have related rows in the joined table is an Outer JOIN. You can create three different outer JOINs to specify the unmatched rows to be included:
Left Outer JOIN: In Left Outer JOIN, all rows in the first-named table, i.e. “left” table, which appears leftmost in the JOIN clause, are included. Unmatched rows in the right table do not appear.
Right Outer JOIN: In Right Outer JOIN, all rows in the second-named table, i.e. “right” table, which appears rightmost in the JOIN clause, are included. Unmatched rows in the left table are not included.
Full Outer JOIN: In Full Outer JOIN, all rows in all the joined tables are included, whether they are matched or not.
Additional Notes related to JOIN
The following are three classic examples to demonstrate the cases where Outer JOIN is useful. You must have noticed several instances where
developers write query as given below.
1.
SELECT
t1.*
2.
FROM
Table1 t1
3.
WHERE
t1.ID
NOT
IN
(
SELECT
t2.ID
FROM
Table2 t2)
4.
GO
The query demonstrated above can be easily replaced by Outer JOIN. Indeed, replacing it by Outer JOIN is the best practice. The query that generates the same result as above is shown here using Outer JOIN and WHERE clause in JOIN.
1.
/*
LEFT
JOIN
-
WHERE
NULL
*/
2.
SELECT
t1.*,t2.*
3.
FROM
Table1 t1
4.
LEFT
JOIN
Table2 t2
ON
t1.ID = t2.ID
5.
WHERE
t2.ID
IS
NULL
The above example can also be created using Right Outer JOIN.
NOT Inner JOIN
Remember, the term Not Inner JOIN does not exist in
database terminology. However, when full Outer JOIN is used along with WHERE condition, as explained in the above two examples, it will give you exclusive result to Inner JOIN. This JOIN will show all the results that were absent in Inner JOIN.
Cross JOIN
A cross JOIN devoid of a WHERE clause produces the Cartesian product of the tables involved in the JOIN. The size of a Cartesian product result set is the number of rows in the first table multiplied by the number of rows in the second table. One common example is when a company lists all its products in a pricing table to compare each product with others prices.
Self-JOIN
In this particular case, one table JOINs to itself with one or two aliases to stave off confusion. A self-JOIN can be of any type, as long as the joined tables are the same. A self-JOIN is unique in the sense that it involves a relationship with only one table. A common example is when a company has a hierarchal reporting structure whereby a member of staff reports to another member. Self-JOIN can either be an Outer JOIN or an Inner JOIN.
Self-JOIN is accomplished by using table name aliases to give each instance of the table a separate name. Joining a table to itself can be useful when you want to compare values in a column to other values of the same column. Self-JOIN is a JOIN in which records from a table are combined with other records from the same table when there are matching values in the joined fields. A self-JOIN can either be an inner JOIN or an outer JOIN. A table is joined to itself based upon a field or combination of fields that have duplicate data in different records. The data type of the inter-related columns must be of the same type or cast to the same type.
Now, think of a situation where all the data you require is contained within a single table, but data needed to extract is related to each other in the table itself. Examples of this type of data relate to employee information, where the table may have both an employee’s ID number for each record and also a field that displays the ID number of an employee’s supervisor or manager. To retrieve the data, it is mandatory for the tables to relate/JOIN to itself.
Another example that can be tried on SQL SERVER 2005 sample database AdventureWorks is to find products that are supplied by more than one vendor. Please refer to the sample database for table structure.
Note: Before we continue further let me make it very clear that INNER JOIN should be used where it cannot be used and simulating INNER JOIN using any other JOINs will degrade the performance. If there are scopes to convert any OUTER JOIN to INNER JOIN, it should be done with priority.
Run the following two scripts and observe the result-set. It will be identical.
01.
USE AdventureWorks
02.
GO
03.
/* Example
of
INNER
JOIN
*/
04.
SELECT
p.ProductID, piy.ProductID
05.
FROM
Production.Product p
06.
INNER
JOIN
Production.ProductInventory piy
ON
piy.ProductID = p.ProductID
07.
GO
08.
/* Example
of
LEFT
JOIN
simulating
INNER
JOIN
*/
09.
SELECT
p.ProductID, piy.ProductID
10.
FROM
Production.Product p
11.
LEFT
JOIN
Production.ProductInventory piy
ON
1 = 1
12.
WHERE
piy.ProductID = p.ProductID
13.
GO
After gazing at the identical result the first question that cropped up in my mind was - what is behind the scene plan? Looking at the actual execution plan of the query it is quite evident that even if LEFT JOIN is used in SQL
Server Query Optimizer, it converts to INNER JOIN since results are the same and
performance is better.
Looking at the above scenario it makes me ponder how smart Query Optimizer Engine is and how it might be saving innumerable performance-related issues for sub-optimal queries.
Now let us try to grasp the cause of LEFT JOIN acting as INNER JOIN. When 1= 1 is used in ON clause it is always true and converts LEFT JOIN to CROSS JOIN. However, when WHERE condition’s effect is applied to the above CROSS JOIN it produces a result similar to INNER JOIN in our case. SQL Server Query Optimizer interprets this in advance and uses INNER JOIN right away.
I think a good question to ask in an interview would be
-“How to write an OUTER JOIN that will give you the exact result, execution plan and performance as INNER JOIN?”