Iceberg Support in Velox Backend

Supported Spark version

All the spark version is supported, but for convenience, only Spark 3.4 is well tested. Now only read is supported in Gluten.

Support Status

Following value indicates the iceberg support progress:

Value Description
Offload Offload to the Velox backend
PartialOffload Some operators offload and some fallback
Fallback Fallback to spark to execute
Exception Cannot fallback by some conditions, throw the exception
ResultMismatch Some hidden bug may cause result mismatch, especially for some corner case

Adding catalogs

Fallback

Creating a table

Fallback

Writing

Fallback

INSERT INTO local.db.table VALUES (1, 'a'), (2, 'b'), (3, 'c');

PartialOffload

The write is fallback while read is offload.

INSERT INTO local.db.table SELECT id, data FROM source WHERE length(data) = 1;

Reading

Read data

Offload/Fallback

Table Type No Delete Position Delete Equality Delete
unpartition Offload Offload Fallback
partition Fallback mostly Fallback mostly Fallback
metadata Fallback    

Offload the simple query.

SELECT count(1) as count, data
FROM local.db.table
GROUP BY data;

If delete by Spark and copy on read, will generate position delete file, the query may offload.

If delete by Flink, may generate the equality delete file, fallback in tht case.

Now we only offload the simple query, for partition table, many operators are fallback by Expression StaticInvoke such as BucketFunction, wait to be supported.

DataFrame reads are supported and can now reference tables by name using spark.table:

val df = spark.table("local.db.table")
df.count()

Read metadata

Fallback

SELECT data, _file FROM local.db.table;

DataType

Timestamptz in orc format is not supported, throws exception. UUID type and Fixed type is fallback.

Format

PartialOffload

Supports parquet and orc format. Not support avro format.

SQL

Only support SELECT.

Configuration

Catalogs

Supports all the catalog options, which is not used in native engine.

SQL Extensions

Fallback

Supports the option spark.sql.extensions, fallback the SQL command CALL.

Runtime configuration

Read options

Spark option Status
snapshot-id Support
as-of-timestamp Support
split-size Support
lookback Support
file-open-cost Support
vectorization-enabled Not Support
batch-size Not Support

Back to top

Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0. Apache Gluten, Gluten, Apache, the Apache feather logo, and the Apache Gluten project logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other marks mentioned may be trademarks or registered trademarks of their respective owners.

Apache Gluten is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

Privacy Policy

This site uses Just the Docs, a documentation theme for Jekyll.