openlava配置计算节点

下为开源版本openlava的安装和部署,LSF(LSF Community Edition)有社区版可用


安装配置(安装目录:/opt/openlava-4.0/):

yum install  tcl-devel  # dependency
yum install ncurses-devel  #dependency
# 解压openlava-4.0.tar.gz  
tar -xzvf openlava-4.0.tar.gz  
# 进入安装包  
cd openlava-4.0  
# 编译安装,默认安装位置/opt/openlava-4.0/  
./configure  
make  
make install  
# 创建openlava账户  
useradd -r openlava  
# 拷贝config到安装目录  
cp -rf config/* /opt/openlava-4.0/etc/  
# 配置环境变量,更改文件权限或属主(若有多个节点,每个节点上都需要配置!)  
chown -R openlava:openlava /opt/openlava-4.0  
cp -rf /opt/openlava-4.0/etc/openlava /etc/init.d/  
cp -rf /opt/openlava-4.0/etc/openlava.* /etc/profile.d/  
chmod 755 /etc/init.d/openlava  
chmod 755 /etc/profile.d/openlava.*  
chown -R openlava:openlava /etc/init.d/openlava  
chown -R openlava:openlava /etc/profile.d/openlava.*  
# 执行  
chkconfig openlava on

以下是为集群添加负载节点配置实例


添加节点:

#每一个节点均要设置,并开启openlava service

# 创建openlava账户
useradd -r openlava
# 配置环境变量,更改文件权限和所属

cp -rf /public/openlava/openlava-4.0-releaseetc/openlava /etc/init.d/  
cp -rf /public/openlava/openlava-4.0-releaseetc/openlava.* /etc/profile.d/
chown -R openlava:openlava /public/openlava/openlava-4.0-release
chmod 755 /etc/init.d/openlava
chmod 755 /etc/profile.d/openlava.*
chown -R openlava:openlava /etc/init.d/openlava
chown -R openlava:openlava /etc/profile.d/openlava.*
# 执行
chkconfig openlava on

#开启该节点openlava:
service openlava start

#导入环境变量
source /etc/profile.d/openlava.sh

#测试openlava服务
lsid, lshosts, bhosts查看状态是否ok

参考:https://www.geek-share.com/detail/2791760708.html


 问题: 

  • Failed in an LSF library call: Failed in sending/receiving a message: Connection reset by peer
    #Run:
    lsadmin reconfig
    badmin reconfig
    badmin mbdrestart

    refs: https://www.ibm.com/support/pages/bsub-command-fails-lsf-library-call-failed-sending-receiving-message

  •  查看与切换节点状态(节点状态改变后,bhosts查看稍等下才会显示状态变化)
    /etc/init.d/openlava status  #查看节点状态
    /etc/init.d/openlava stop    #停止
    /etc/init.d/openlava restart #重启
  • 配置文件在openlava安装路径下的etc目录
    • lsb.hosts:配置最大Jobs数
      • MXJ可大于核数,0状态为closed
    • lsf.cluster.openlava:主配置文件
    • lsf.users用户配置文件(提交任务数限制等)
    • 修改配置文件后需要运行:
      #Run:
      badmin reconfig
  • batch system daemon not responding ... still trying batch system daemon not responding ... still trying
posted @ 2021-07-16 14:24  Deven_xu  阅读(61)  评论(0)    收藏  举报