作业调度系统配置总结-未验证版

验证版等今后验证了在公众号上展示。

 

 

 本章总结前四章的问题,以便今后验证。

===============================let‘s begin=============================

注意事项:torque不支持centOS8系统

1. 安装CentOS7系统 # 刻盘,安装

2. 更新yum源

wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos7.repo # 或者直接在这个目录下找到bak文件夹中的base文件,复制出来。

yum clean all

yum makecache

3. 下载gcc等依赖

yum install gcc gcc-c++ tcl-devel tk-devel make -y

yum clean all

4. 下载其他相关程序 (见github Yuan-SW-F) # 不急

yum install vim

5. 挂载磁盘

mount /dev/sdb1 /abyss

6. 下载torque

cd /path; 

mkdir Ypipe; cd Ypipe

mkdir soft; cd soft

wget https://src.fedoraproject.org/lookaside/pkgs/torque/torque-6.1.1.1.tar.gz/sha512/74ff683f56d04a4d08774896c9f9875c68aa2cacfe6c1c8c65246da52396443d3f7497bc8a6a1f06d357f52c65153fc9db00692f514ac30279e4c765547d98c0/torque-6.1.1.1.tar.gz

yum install libxml2-devel openssl-devel gcc gcc-c++ boost-devel libtool -y

./configure

make

make install

安装成功!!!

7. 查看节点信息

cat /proc/cpuinfo # 用于后面的节点配置

8. 更改主机名

cat /etc/sysconfig/network
# Created by anaconda
*****9

9. 设置计算机时间

ln -s ../usr/share/zoneinfo/Asia/Shanghai /etc/localtime

10. 设置公钥

ssh-keygen -t rsa # 回车三次即可

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys # 每个节点都如此操作

将所有节点的公钥合并,实现无密登录

11. 计算节点配置

拷贝文件到计算节点:

for i in *** *** ;do
scp /chenlab/Ypipe/soft/torque-6.1.1.1/contrib/init.d/{pbs_mom,trqauthd} root@$i:/etc/init.d/
scp /chenlab/Ypipe/soft/torque-6.1.1.1/{torque-package-clients-linux-x86_64.sh,torque-package-mom-linux-x86_64.sh} root@$i:/root
done

以下操作为每个节点的单独操作:

cd /root

./torque-package-clients-linux-x86_64.sh --install

./torque-package-mom-linux-x86_64.sh --install

12. 配置几个文件

cat /var/spool/torque/server_name

*****9

cat /var/spool/torque/server_priv/nodes
*****9 np=12
*****8 np=122

cat /etc/hosts
192.168.*****9 *****9
192.168.*****8 *****8

cat /var/spool/torque/mom_priv/config
$pbsserver *****9
$logevent 255

vi /etc/profile

加入变量如下:

TORQUE=/usr/local
MAUI=/usr/local
if [ `id -u` -eq 0 ]; then
PATH=$TORQUE/bin:$TORQUE/sbin:$TORQUE/bin:$MAUI/sbin:$MAUI/bin:$PATH
else
PATH=$TORQUE/bin:$MAUI/bin:$PATH
fi
export PATH=/usr/local/bin:$PATH
export PATH=/usr/local/sbin:$PATH
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/bin:$PATH
export PATH=/usr/local/sbin:$PATH
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

### 上下两份是冗余的

cat /etc/ld.so.conf
include ld.so.conf.d/*.conf
/usr/local/lib

/sbin/ldconfig /etc/ld.so.conf

13. 添加用户账号

useradd usr_name

torque.setup usr_name

若已有pbs运行,ps -e | grep pbs | kill 后重试

14. 开启pbs

for i in pbs_server pbs_sched pbs_mom trqauthd; do sudo service $i start;done

15. nodes数据丢失,重新添加

vi /var/spool/torque/server_priv/nodes

cat /var/spool/torque/mom_priv/config
$pbsserver *****9
$logevent 255

=========================================================

16. 重新开启pbs

for i in pbs_server pbs_sched pbs_mom trqauthd; do sudo service $i restart;done

=========================================================

17. 投任务

$ cat zsleep.sh
while [ 1 ];do
echo `date` >> zzz.test
sleep 5s
done

$ qsub zsleep.sh

 

18. 换成自己设置的队列名

qmgr -c 'print server'
qmgr -c "c q abyss"
qmgr -c "s q abyss queue_type=Execution"
qmgr -c "s q abyss enabled=true"
qmgr -c "s s default_queue=abyss"
qmgr -c "s q abyss started=true"
qmgr -c 'print server'
qmgr -c "s q abyss resources_default.nodes = 1"
qmgr -c 'print server'
qmgr -c 'd q batch'
qmgr -c 'print server'
qmgr -c "s q abyss resources_default.walltime = 1000:00:00"

19. 若计算节点down,关闭防火墙

$ cat /etc/selinux/config

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=enforcing
# SELINUXTYPE= can take one of three values:
# targeted - Targeted processes are protected,
# minimum - Modification of targeted policy. Only selected processes are protected.
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
SELINUX=disabled # 加入这一行

 

service iptables stop # 关闭防火墙

==========================

down状态的节点变成了free

==========================

重启

===========================

$ pbsnodes -a

===========================

等会再试

pbsnodes -a

service iptables stop

pbsnodes -a # 再看的时候就可以了。

 

======================================

免责申明:本文档是我自己看的,可能还有bug,相应bug可以在前几篇博客找答案。

posted on 2021-02-26 14:02  Yuan-SW-F(abysw)  阅读(153)  评论(0)    收藏  举报

导航