Running HPL with Volcano on Kubernetes

SiriusKoan

October 10, 2025 · 4 min read

#Kubernetes #HPC

文中提到的檔案皆放在 SiriusKoan/hpc-benchmarks-container。

What’s HPL #

HPL (High Performance Linpack) 是一個利用求解多元線性方程組來評估超級電腦效能的程式，主要被用來測試全球的超級電腦的速度 (TOP500)。

超級電腦是由多台小電腦所組成的，程式可以使用 MPI (Message Passing Interface) 來在多台機器之間溝通。HPL 便支援 MPI，所以可以作為評測的工具。

HPL 的設定檔是 HPL.dat，範例如下：

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
4            # of problems sizes (N)
29 30 34 35  Ns
4            # of NBs
1 2 3 4      NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
4            Ps
4            Qs
16.0         threshold
3            # of panel fact
0 1 2        PFACTs (0=left, 1=Crout, 2=Right)
2            # of recursive stopping criterium
2 4          NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
3            # of recursive panel fact.
0 1 2        RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
0            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
0            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)

What’s Volcano #

Volcano 是一個 Kubernetes-native 的 batch scheduling 系統，專門設計給 HPC 使用。

他補足了原生 Kubernetes scheduler 的不足，支援更多 HPC 常用的排程方式及 feature，如 Gang Scheduling（讓同一個 job 的 workers 一起被佈署，避免只有部份 workers 上線的狀況）、多層的 queue、preemption、異質裝置支援（如 GPU）等等。

Volcano 也支援許多 HPC、AI/ML 相關的系統，例如前面提到的 MPI、Kubeflow（在 Kubernetes 上的 ML 開發及佈署平台）、TensorFlow 等等。

Containerize HPL #

HPL 目前沒有方便直接用的 Docker image，所以參考了 ExplorerRay/hpc-container-def 的做法，生了一個 Dockerfile 的版本。

FROM debian:trixie-slim AS build

# Environment variables for ignoring some harmless errors (from %environment)
ENV PMIX_MCA_gds="^ds12"
ENV PMIX_MCA_psec="^munge"

# Install dependencies and build HPL
RUN apt-get -y update && \
    DEBIAN_FRONTEND=noninteractive apt-get -y install \
        make g++ gcc gfortran wget openmpi-bin libopenmpi-dev libopenblas-dev && \
    apt-get clean autoclean && \
    apt-get autoremove --yes && \
    rm -rf /var/lib/apt/lists/*

RUN cd /opt && \
    wget https://www.netlib.org/benchmark/hpl/hpl-2.3.tar.gz && \
    tar -xzf hpl-2.3.tar.gz && \
    cd hpl-2.3/setup && \
    cp Make.Linux_PII_CBLAS ../Make.linux && \
    cd .. && \
    sed -i 's|^TOPdir       = $(HOME)/hpl|TOPdir       = /opt/hpl-2.3|' Make.linux && \
    sed -i 's|^ARCH         = .*$|ARCH         = linux|' Make.linux && \
    sed -i 's|^MPdir        = .*$|MPdir        = /usr/lib/x86_64-linux-gnu/openmpi|' Make.linux && \
    sed -i 's|^MPlib        = .*$|MPlib        = $(MPdir)/lib/libmpi.so|' Make.linux && \
    sed -i 's|^LAdir        = .*$|LAdir        = /usr/lib/x86_64-linux-gnu/openblas-pthread|' Make.linux && \
    sed -i 's|^LAlib        = .*$|LAlib        = $(LAdir)/libopenblas.a|' Make.linux && \
    sed -i 's|^CC           = .*$|CC           = /usr/bin/mpicc|' Make.linux && \
    sed -i 's|^LINKER       = .*$|LINKER       = /usr/bin/mpicc|' Make.linux && \
    echo "Building HPL Benchmarks..." && \
    make arch=linux && \
    rm /opt/hpl-2.3.tar.gz

FROM debian:trixie-slim

ENV PMIX_MCA_gds="^ds12"
ENV PMIX_MCA_psec="^munge"

WORKDIR /opt/hpl-2.3/bin/linux

RUN apt update && \
    apt install -y openmpi-bin libopenmpi-dev libopenblas-dev openssh-server && \
    apt-get clean autoclean && \
    apt-get autoremove --yes && \
    rm -rf /var/lib/apt/lists/*
COPY --from=build /opt/hpl-2.3/bin/linux /opt/hpl-2.3/bin/linux

CMD ["sleep", "infinity"]

此 image 已經上傳至 DockerHub。