作业调度系统配置总结-未验证版
验证版等今后验证了在公众号上展示。
本章总结前四章的问题,以便今后验证。
===============================let‘s begin=============================
注意事项:torque不支持centOS8系统
1. 安装CentOS7系统 # 刻盘,安装
2. 更新yum源
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos7.repo # 或者直接在这个目录下找到bak文件夹中的base文件,复制出来。
yum clean all
yum makecache
3. 下载gcc等依赖
yum install gcc gcc-c++ tcl-devel tk-devel make -y
yum clean all
4. 下载其他相关程序 (见github Yuan-SW-F) # 不急
yum install vim
5. 挂载磁盘
mount /dev/sdb1 /abyss
6. 下载torque
cd /path;
mkdir Ypipe; cd Ypipe
mkdir soft; cd soft
wget https://src.fedoraproject.org/lookaside/pkgs/torque/torque-6.1.1.1.tar.gz/sha512/74ff683f56d04a4d08774896c9f9875c68aa2cacfe6c1c8c65246da52396443d3f7497bc8a6a1f06d357f52c65153fc9db00692f514ac30279e4c765547d98c0/torque-6.1.1.1.tar.gz
yum install libxml2-devel openssl-devel gcc gcc-c++ boost-devel libtool -y
./configure
make
make install
安装成功!!!
7. 查看节点信息
cat /proc/cpuinfo # 用于后面的节点配置
8. 更改主机名
cat /etc/sysconfig/network
# Created by anaconda
*****9
9. 设置计算机时间
ln -s ../usr/share/zoneinfo/Asia/Shanghai /etc/localtime
10. 设置公钥
ssh-keygen -t rsa # 回车三次即可
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys # 每个节点都如此操作
将所有节点的公钥合并,实现无密登录
11. 计算节点配置
拷贝文件到计算节点:
for i in *** *** ;do
scp /chenlab/Ypipe/soft/torque-6.1.1.1/contrib/init.d/{pbs_mom,trqauthd} root@$i:/etc/init.d/
scp /chenlab/Ypipe/soft/torque-6.1.1.1/{torque-package-clients-linux-x86_64.sh,torque-package-mom-linux-x86_64.sh} root@$i:/root
done
以下操作为每个节点的单独操作:
cd /root
./torque-package-clients-linux-x86_64.sh --install
./torque-package-mom-linux-x86_64.sh --install
12. 配置几个文件
cat /var/spool/torque/server_name
*****9
cat /var/spool/torque/server_priv/nodes
*****9 np=12
*****8 np=122
cat /etc/hosts
192.168.*****9 *****9
192.168.*****8 *****8
cat /var/spool/torque/mom_priv/config
$pbsserver *****9
$logevent 255
vi /etc/profile
加入变量如下:
TORQUE=/usr/local
MAUI=/usr/local
if [ `id -u` -eq 0 ]; then
PATH=$TORQUE/bin:$TORQUE/sbin:$TORQUE/bin:$MAUI/sbin:$MAUI/bin:$PATH
else
PATH=$TORQUE/bin:$MAUI/bin:$PATH
fi
export PATH=/usr/local/bin:$PATH
export PATH=/usr/local/sbin:$PATH
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/bin:$PATH
export PATH=/usr/local/sbin:$PATH
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
### 上下两份是冗余的
cat /etc/ld.so.conf
include ld.so.conf.d/*.conf
/usr/local/lib
/sbin/ldconfig /etc/ld.so.conf
13. 添加用户账号
useradd usr_name
torque.setup usr_name
若已有pbs运行,ps -e | grep pbs | kill 后重试
14. 开启pbs
for i in pbs_server pbs_sched pbs_mom trqauthd; do sudo service $i start;done
15. nodes数据丢失,重新添加
vi /var/spool/torque/server_priv/nodes
cat /var/spool/torque/mom_priv/config
$pbsserver *****9
$logevent 255
=========================================================
16. 重新开启pbs
for i in pbs_server pbs_sched pbs_mom trqauthd; do sudo service $i restart;done
=========================================================
17. 投任务
$ cat zsleep.sh
while [ 1 ];do
echo `date` >> zzz.test
sleep 5s
done
$ qsub zsleep.sh
18. 换成自己设置的队列名
qmgr -c 'print server'
qmgr -c "c q abyss"
qmgr -c "s q abyss queue_type=Execution"
qmgr -c "s q abyss enabled=true"
qmgr -c "s s default_queue=abyss"
qmgr -c "s q abyss started=true"
qmgr -c 'print server'
qmgr -c "s q abyss resources_default.nodes = 1"
qmgr -c 'print server'
qmgr -c 'd q batch'
qmgr -c 'print server'
qmgr -c "s q abyss resources_default.walltime = 1000:00:00"
19. 若计算节点down,关闭防火墙
$ cat /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=enforcing
# SELINUXTYPE= can take one of three values:
# targeted - Targeted processes are protected,
# minimum - Modification of targeted policy. Only selected processes are protected.
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
SELINUX=disabled # 加入这一行
service iptables stop # 关闭防火墙
==========================
down状态的节点变成了free
==========================
重启
===========================
$ pbsnodes -a
===========================
等会再试
pbsnodes -a
service iptables stop
pbsnodes -a # 再看的时候就可以了。
======================================
免责申明:本文档是我自己看的,可能还有bug,相应bug可以在前几篇博客找答案。
posted on 2021-02-26 14:02 Yuan-SW-F(abysw) 阅读(153) 评论(0) 收藏 举报