GPU Acceleration in Velox/Gluten

Unified execution engine leveraging CUDF for hardware-accelerated Spark SQL queries


1. Overview

  • Purpose: Accelerate Velox operators via CUDF APIs, replacing CPU execution when enabled.
  • Status: Experimental (TPC-H SF1 validated). Integrates RAPIDS ecosystem with Apache Spark via Gluten .
  • Key Benefit: Some queries achieved up to 8.1x speedup on x86 vs. Spark Java engine .

2. Prerequisites

  • CUDA Toolkit: 12.8.0 (download).
  • NVIDIA Drivers: Compatible with CUDA 12.8.
  • Container Toolkit: Install nvidia-container-toolkit (guide).
  • System Reboot: Required after driver installation.
  • Environment Setup: Use start_cudf.sh for host configuration .

3. Implementation Mechanics

  • Operator Conversion:
    • Velox PlanNodes → GPU operators when spark.gluten.sql.columnar.cudf=true.
    • Falls back to CPU operators if GPU unsupported (triggers row/columnar data conversion) .
  • Debugging: Enable spark.gluten.debug.enabled.cudf=true for operator replacement logs.
  • Memory: Global RMM memory manager, cannot align with Spark memory system.

4. Docker Deployment

docker pull apache/gluten:centos-9-jdk8-cudf  # Pre-built GPU image
docker run --name gpu_gluten_container --gpus all -it apache/gluten:centos-9-jdk8-cudf
  • Image Includes: Native build cache, Gluten dependencies, Spark 3.4 environment.

5. Build & Deployment

Dependencies

The OS, Spark version, Java version aligns with Gluten CPU.

Compilation Commands

If building in the docker image, no need to set up script and build arrow.

./dev/buildbundle-veloxbe.sh --run_setup_script=OFF --build_arrow=OFF --enable_cudf=ON

6. GPU Operator Support Status

| Operator | Status | Notes |
|—————–|—————–|————————–| | Scan | ❌ Not supported| In Development |
| Project | ⚠️ Partial | Function TPCH-compatible |
| Filter | ✅ Implemented | Core operator |
| OrderBy | ✅ Implemented | Merged in Velox #12735 |
| Aggregation | ⚠️ Partial | TPCH-compatible |
| Join | ⚠️ Partial | TPCH-compatible |
| Spill | ❌ Not supported | In Planning |


7. Performance Validation

GPU performs better on operator HashJoin and HashAggregation. Single Operator like Hash Agg shows 5x speedup.


8. Relevant Resources

  1. CUDF Docs - GPU operator APIs.
  2. Gluten GPU Issue #9098 - Development tracker.

Back to top

Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0. Apache Gluten, Gluten, Apache, the Apache feather logo, and the Apache Gluten project logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other marks mentioned may be trademarks or registered trademarks of their respective owners.

Apache Gluten is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

Privacy Policy

This site uses Just the Docs, a documentation theme for Jekyll.