部署 Slurm

以下内容由 gpt-4o 生成

配置 MUNGE

  1. 安装 MUNGE:

    sudo apt install munge libmunge-dev libmunge2
    
  2. 生成 MUNGE 密钥:

    sudo -u munge /usr/sbin/mungekey --verbose
    
  3. 将 MUNGE 密钥上传到所有节点。

  4. 启动 MUNGE 服务:

    sudo systemctl enable munge
    sudo systemctl start munge
    

参见:Installation Guide · dun/munge Wiki

配置 MariaDB 数据库

  1. 安装依赖:

    sudo apt install mariadb-client mariadb-server libmariadb-dev-compat libmariadb-dev libmariadb3
    
  2. 启动 MariaDB 并设置 root 密码:

    sudo systemctl enable mariadb
    sudo systemctl start mariadb
    sudo mysql_secure_installation
    
  3. 创建 Slurm 数据库和用户:

    sudo mysql -u root -p
    
    CREATE DATABASE slurm_acct_db;
    CREATE USER 'slurm'@'localhost' IDENTIFIED BY 'your_password';
    GRANT ALL ON slurm_acct_db.* TO 'slurm'@'localhost';
    FLUSH PRIVILEGES;
    EXIT;
    

安装 Slurm

  1. 安装依赖:

    sudo apt install build-essential bzip2
    
  2. 下载 Slurm 源码并编译:

    wget https://download.schedmd.com/slurm/slurm-25.05.2.tar.bz2
    tar -xjf slurm-*.tar.bz2 && rm -rf slurm-*.tar.bz2
    cd slurm-*
    ./configure
    make -j$(nproc)
    sudo make install
    cd .. && rm -rf slurm-*
    
  3. 编辑 Slurm 配置文件:

    sudoedit /usr/local/etc/slurm.conf
    
    ClusterName=cluster
    ControlMachine=controlhost
    SlurmUser=slurm
    SlurmdPort=6818
    SlurmctldPort=6817
    AuthType=auth/munge
    StateSaveLocation=/var/spool/slurm/state
    SlurmdSpoolDir=/var/spool/slurmd
    
  4. 创建状态保存目录:

    sudo mkdir -p /var/spool/slurm/state
    sudo chown slurm: /var/spool/slurm/state
    
  5. 启动 Slurm 服务:

    # Slurm 控制器
    sudo systemctl enable slurmctld
    sudo systemctl start slurmctld
    # Slurm 守护进程
    sudo systemctl enable slurmd
    sudo systemctl start slurmd
    

参见:Slurm Workload Manager - Quick Start Administrator Guide

posted @ 2025-08-19 14:04  Undefined443  阅读(31)  评论(0)    收藏  举报