High-Bandwidth Memory (HBM) support
Gluten supports allocating memory on HBM. This feature is optional and is disabled by default. It is implemented on top of Memkind library. You can refer to memkind’s readme for more details.
Build Gluten with HBM
Gluten will internally build and link to a specific version of Memkind library and hwloc. Other dependencies should be installed on Driver and Worker node first:
sudo apt install -y autoconf automake g++ libnuma-dev libtool numactl unzip libdaxctl-dev
After the set-up, you can now build Gluten with HBM. Below command is used to enable this feature
cd /path/to/gluten
## The script builds four jars for spark 3.2.2, 3.3.1, 3.4.3 and 3.5.1.
./dev/buildbundle-veloxbe.sh --enable_hbm=ON
Configure and enable HBM in Spark Application
At runtime, MEMKIND_HBW_NODES enviroment variable is detected for configuring HBM NUMA nodes. For the explaination to this variable, please refer to memkind’s manual page. This can be set for all executors through spark conf, e.g. --conf spark.executorEnv.MEMKIND_HBW_NODES=8-15. Note that memory allocation fallback is also supported and cannot be turned off. If HBM is unavailable or fills up, the allocator will use default(DDR) memory.