Arrow
| Input | Output | Alias |
|---|---|---|
| ✔ | ✔ |
Description
Apache Arrow comes with two built-in columnar storage formats.
ClickHouse supports read and write operations for these formats.
Arrow is Apache Arrow's "file mode" format, designed for in-memory random access.
Data types matching
The table below shows the supported data types and how they correspond to ClickHouse data types in INSERT and SELECT queries.
Arrow data type (INSERT) | ClickHouse data type | Arrow data type (SELECT) |
|---|---|---|
BOOL | Bool | BOOL |
UINT8, BOOL | UInt8 | UINT8 |
INT8 | Int8/Enum8 | INT8 |
UINT16 | UInt16 | UINT16 |
INT16 | Int16/Enum16 | INT16 |
UINT32 | UInt32 | UINT32 |
INT32 | Int32 | INT32 |
UINT64 | UInt64 | UINT64 |
INT64 | Int64 | INT64 |
FLOAT, HALF_FLOAT | Float32 | FLOAT32 |
DOUBLE | Float64 | FLOAT64 |
DATE32 | Date32 | UINT16 |
DATE64 | DateTime | UINT32 |
TIMESTAMP, TIME32, TIME64 | DateTime64 | TIMESTAMP |
STRING, BINARY | String | BINARY |
STRING, BINARY, FIXED_SIZE_BINARY | FixedString | FIXED_SIZE_BINARY |
DECIMAL | Decimal | DECIMAL |
DECIMAL256 | Decimal256 | DECIMAL256 |
LIST | Array | LIST |
STRUCT | Tuple | STRUCT |
MAP | Map | MAP |
UINT32 | IPv4 | UINT32 |
FIXED_SIZE_BINARY, BINARY | IPv6 | FIXED_SIZE_BINARY |
FIXED_SIZE_BINARY, BINARY | Int128/UInt128/Int256/UInt256 | FIXED_SIZE_BINARY |
DURATION | Interval (Nanosecond/Microsecond/Millisecond/Second) | DURATION |
INT64 | Interval (Minute/Hour/Day/Week/Month/Quarter/Year) | INT64 |
Arrays can be nested and can have a value of the Nullable type as an argument. Tuple and Map types can also be nested.
The DICTIONARY type is supported for INSERT queries, and for SELECT queries there is an output_format_arrow_low_cardinality_as_dictionary setting that allows to output LowCardinality type as a DICTIONARY type. Note that there might be unused values in LowCardinality dictionary, which can lead to unused values in Arrow DICTIONARY during output.
Unsupported Arrow data types:
FIXED_SIZE_BINARYJSONUUIDENUM.
The data types of ClickHouse table columns do not have to match the corresponding Arrow data fields. When inserting data, ClickHouse interprets data types according to the table above and then casts the data to the data type set for the ClickHouse table column.
Example usage
In the example below we use the forex dataset available in the
ClickHouse SQL playground.
Selecting data
We select one day of EUR/USD exchange rates from the playground and save it
into a local forex_eurusd.arrow file. We query the playground over the HTTP
interface, where the host is sql-clickhouse.clickhouse.com and the user is
demo (which has no password):
Reading the file back
We can now read the local Arrow file back with
clickhouse-local using the
file table function. The file is
self-describing, so the Arrow format infers the schema automatically:
Inserting data
To load an Arrow file into a ClickHouse table, pipe it into clickhouse-client
with FORMAT Arrow:
Format settings
| Setting | Description | Default |
|---|---|---|
input_format_arrow_allow_missing_columns | Allow missing columns while reading Arrow input formats | 1 |
input_format_arrow_case_insensitive_column_matching | Ignore case when matching Arrow columns with CH columns. | 0 |
input_format_arrow_import_nested | Obsolete setting, does nothing. | 0 |
input_format_arrow_skip_columns_with_unsupported_types_in_schema_inference | Skip columns with unsupported types while schema inference for format Arrow | 0 |
output_format_arrow_compression_method | Compression method for Arrow output format. Supported codecs: lz4_frame, zstd, none (uncompressed) | lz4_frame |
output_format_arrow_fixed_string_as_fixed_byte_array | Use Arrow FIXED_SIZE_BINARY type instead of Binary for FixedString columns. | 1 |
output_format_arrow_low_cardinality_as_dictionary | Enable output LowCardinality type as Dictionary Arrow type | 0 |
output_format_arrow_string_as_string | Use Arrow String type instead of Binary for String columns | 1 |
output_format_arrow_use_64_bit_indexes_for_dictionary | Always use 64 bit integers for dictionary indexes in Arrow format | 0 |
output_format_arrow_use_signed_indexes_for_dictionary | Use signed integers for dictionary indexes in Arrow format | 1 |