You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/en/getting-started/example-datasets/opensky.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ title: "Crowdsourced air traffic data from The OpenSky Network 2020"
7
7
8
8
The data in this dataset is derived and cleaned from the full OpenSky dataset to illustrate the development of air traffic during the COVID-19 pandemic. It spans all flights seen by the network's more than 2500 members since 1 January 2019. More data will be periodically included in the dataset until the end of the COVID-19 pandemic.
When this setting has a value greater than than zero only a single replica starts the merge immediately if merged part on shared storage and `allow_remote_fs_zero_copy_replication` is enabled.
290
+
When this setting has a value greater than zero only a single replica starts the merge immediately if merged part on shared storage and `allow_remote_fs_zero_copy_replication` is enabled.
291
291
292
292
:::note Zero-copy replication is not ready for production
293
293
Zero-copy replication is disabled by default in ClickHouse version 22.8 and higher. This feature is not recommended for production use.
Controls how the [query cache](../query-cache.md) handles `SELECT` queries against system tables, i.e. tables in databases `system.*` and `information_schema.*`.
1695
+
1696
+
Possible values:
1697
+
1698
+
-`'throw'` - Throw an exception and don't cache the query result.
1699
+
-`'save'` - Cache the query result.
1700
+
-`'ignore'` - Don't cache the query result and don't throw an exception.
When set to `true` the metadata files are written with `VERSION_FULL_OBJECT_KEY` format version. With that format full object storage key names are written to the metadata files.
5305
-
When set to `false` the metadata files are written with the previous format version, `VERSION_INLINE_DATA`. With that format only suffixes of object storage key names are are written to the metadata files. The prefix for all of object storage key names is set in configurations files at `storage_configuration.disks` section.
5317
+
When set to `false` the metadata files are written with the previous format version, `VERSION_INLINE_DATA`. With that format only suffixes of object storage key names are written to the metadata files. The prefix for all of object storage key names is set in configurations files at `storage_configuration.disks` section.
Copy file name to clipboardExpand all lines: docs/en/sql-reference/aggregate-functions/reference/uniqcombined.md
+26-9Lines changed: 26 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,36 +15,53 @@ The `uniqCombined` function is a good choice for calculating the number of diffe
15
15
16
16
**Arguments**
17
17
18
-
The function takes a variable number of parameters. Parameters can be `Tuple`, `Array`, `Date`, `DateTime`, `String`, or numeric types.
18
+
-`HLL_precision`: The base-2 logarithm of the number of cells in [HyperLogLog](https://en.wikipedia.org/wiki/HyperLogLog). Optional, you can use the function as `uniqCombined(x[, ...])`. The default value for `HLL_precision` is 17, which is effectively 96 KiB of space (2^17 cells, 6 bits each).
19
+
-`X`: A variable number of parameters. Parameters can be `Tuple`, `Array`, `Date`, `DateTime`, `String`, or numeric types.
19
20
20
-
`HLL_precision` is the base-2 logarithm of the number of cells in [HyperLogLog](https://en.wikipedia.org/wiki/HyperLogLog). Optional, you can use the function as `uniqCombined(x[, ...])`. The default value for `HLL_precision` is 17, which is effectively 96 KiB of space (2^17 cells, 6 bits each).
21
21
22
22
**Returned value**
23
23
24
24
- A number [UInt64](../../../sql-reference/data-types/int-uint.md)-type number.
25
25
26
26
**Implementation details**
27
27
28
-
Function:
28
+
The `uniqCombined` function:
29
29
30
30
- Calculates a hash (64-bit hash for `String` and 32-bit otherwise) for all parameters in the aggregate, then uses it in calculations.
31
-
32
31
- Uses a combination of three algorithms: array, hash table, and HyperLogLog with an error correction table.
33
-
34
-
For a small number of distinct elements, an array is used. When the set size is larger, a hash table is used. For a larger number of elements, HyperLogLog is used, which will occupy a fixed amount of memory.
35
-
32
+
- For a small number of distinct elements, an array is used.
33
+
-When the set size is larger, a hash table is used.
34
+
- For a larger number of elements, HyperLogLog is used, which will occupy a fixed amount of memory.
36
35
- Provides the result deterministically (it does not depend on the query processing order).
37
36
38
37
:::note
39
-
Since it uses 32-bit hash for non-`String`type, the result will have very high error for cardinalities significantly larger than `UINT_MAX` (error will raise quickly after a few tens of billions of distinct values), hence in this case you should use [uniqCombined64](../../../sql-reference/aggregate-functions/reference/uniqcombined64.md#agg_function-uniqcombined64)
38
+
Since it uses a 32-bit hash for non-`String`types, the result will have very high error for cardinalities significantly larger than `UINT_MAX` (error will raise quickly after a few tens of billions of distinct values), hence in this case you should use [uniqCombined64](../../../sql-reference/aggregate-functions/reference/uniqcombined64.md#agg_function-uniqcombined64).
40
39
:::
41
40
42
-
Compared to the [uniq](../../../sql-reference/aggregate-functions/reference/uniq.md#agg_function-uniq) function, the `uniqCombined`:
41
+
Compared to the [uniq](../../../sql-reference/aggregate-functions/reference/uniq.md#agg_function-uniq) function, the `uniqCombined` function:
43
42
44
43
- Consumes several times less memory.
45
44
- Calculates with several times higher accuracy.
46
45
- Usually has slightly lower performance. In some scenarios, `uniqCombined` can perform better than `uniq`, for example, with distributed queries that transmit a large number of aggregation states over the network.
47
46
47
+
**Example**
48
+
49
+
Query:
50
+
51
+
```sql
52
+
SELECT uniqCombined(number) FROM numbers(1e6);
53
+
```
54
+
55
+
Result:
56
+
57
+
```response
58
+
┌─uniqCombined(number)─┐
59
+
│ 1001148 │ -- 1.00 million
60
+
└──────────────────────┘
61
+
```
62
+
63
+
See the example section of [uniqCombined64](../../../sql-reference/aggregate-functions/reference/uniqcombined64.md#agg_function-uniqcombined64) for an example of the difference between `uniqCombined` and `uniqCombined64` for much larger inputs.
0 commit comments