ARRAY JOIN Clause
It is a common operation for tables that contain an array column to produce a new table that has a row with each individual array element of that initial column, while values of other columns are duplicated. This is the basic case of what
ARRAY JOIN
clause does.
Its name comes from the fact that it can be looked at as executing
JOIN
with an array or nested data structure. The intent is similar to the
arrayJoin
function, but the clause functionality is broader.
Syntax:
Supported types of
ARRAY JOIN
are listed below:
ARRAY JOIN
- In base case, empty arrays are not included in the result of
JOIN
.
LEFT ARRAY JOIN
- The result of
JOIN
contains rows with empty arrays. The value for an empty array is set to the default value for the array element type (usually 0, empty string or NULL).
Basic ARRAY JOIN Examples
ARRAY JOIN and LEFT ARRAY JOIN
The examples below demonstrate the usage of the
ARRAY JOIN
and
LEFT ARRAY JOIN
clauses. Let's create a table with an
Array
type column and insert values into it:
The example below uses the
ARRAY JOIN
clause:
The next example uses the
LEFT ARRAY JOIN
clause:
ARRAY JOIN and arrayEnumerate function
This function is normally used with
ARRAY JOIN
. It allows counting something just once for each array after applying
ARRAY JOIN
. Example:
In this example, Reaches is the number of conversions (the strings received after applying
ARRAY JOIN
), and Hits is the number of pageviews (strings before
ARRAY JOIN
). In this particular case, you can get the same result in an easier way:
ARRAY JOIN and arrayEnumerateUniq
This function is useful when using
ARRAY JOIN
and aggregating array elements.
In this example, each goal ID has a calculation of the number of conversions (each element in the Goals nested data structure is a goal that was reached, which we refer to as a conversion) and the number of sessions. Without
ARRAY JOIN
, we would have counted the number of sessions as sum(Sign). But in this particular case, the rows were multiplied by the nested Goals structure, so in order to count each session one time after this, we apply a condition to the value of the
arrayEnumerateUniq(Goals.ID)
function.
Using Aliases
An alias can be specified for an array in the
ARRAY JOIN
clause. In this case, an array item can be accessed by this alias, but the array itself is accessed by the original name. Example:
Using aliases, you can perform
ARRAY JOIN
with an external array. For example:
Multiple arrays can be comma-separated in the
ARRAY JOIN
clause. In this case,
JOIN
is performed with them simultaneously (the direct sum, not the cartesian product). Note that all the arrays must have the same size by default. Example:
The example below uses the arrayEnumerate function:
Multiple arrays with different sizes can be joined by using:
SETTINGS enable_unaligned_array_join = 1
. Example:
ARRAY JOIN with Nested Data Structure
ARRAY JOIN
also works with
nested data structures
:
When specifying names of nested data structures in
ARRAY JOIN
, the meaning is the same as
ARRAY JOIN
with all the array elements that it consists of. Examples are listed below:
This variation also makes sense:
An alias may be used for a nested data structure, in order to select either the
JOIN
result or the source array. Example:
Example of using the arrayEnumerate function:
Implementation Details
The query execution order is optimized when running
ARRAY JOIN
. Although
ARRAY JOIN
must always be specified before the
WHERE
/
PREWHERE
clause in a query, technically they can be performed in any order, unless result of
ARRAY JOIN
is used for filtering. The processing order is controlled by the query optimizer.