HPC store data in archive (long term), submit job and run, and kill (cancel) job

1. Log in

Once you have been approved, you can access HPC from:

  Within the NYU network:   

ssh NYUNetID@prince.hpc.nyu.edu

 

2. Store Data in '/archive' (long term)

Files on scratch are NOT backed up

Files on /scratch are NOT backed up. Always backup your important data to /archive. /archive is only available on HPC login nodes, not from compute nodes.

 

https://wikis.nyu.edu/display/NYUHPC/Clusters+-+Prince

 

  cd /archive/k/ky13
  pwd

  "can save data at /archive/k/ky13, however it can not be executed successfully by sbatch"

 

The best choice is : 1) copy the latest data to "/archive/" for long term storage consistently , 2) then copy data from /archive to /scratch/ for executing srcipts.

 

Please execute scripts on /scratch/ky13 or /home/ky13

 

 

Transfer files: Between your computer and the HPC

  • A File:
scp /Users/local/data.txt NYUNetID@prince.hpc.nyu.edu:/archieve/k/NYUNetID/path/
scp /Users/local/data.txt NYUNetID@prince.hpc.nyu.edu:/archieve/k/ky13/
  • A Folder:
scp -r /Users/local/path NYUNetID@prince.hpc.nyu.edu:/archive/k/NYUNetID/path/

3. Submit job and Run

 

A simple example

A typical batch script on an NYU Prince cluster looks something like these:

 myscript.s

#!/bin/bash
# the above line tells the shell how to execute this script
#
# job-name
#SBATCH --job-name=Scapy
#
# need 4 nodes
#SBATCH --nodes=4
#SBATCH --cpus-per-task=2
#
# expect the job to finish within 5 hours. If it takes longer than 5 hours, SLURM can kill it
#SBATCH --time=20:00:00
#
# expect the job to use no more than 24GB of memory
#SBATCH --mem=24GB
#
# once job ends, send me an email
#SBATCH --mail-type=END
#SBATCH --mail-user=xxx@xx.com
#
# both standard output and error are directed to the same file.
#SBATCH --output=outlog_%A_%a.out
##SBATCH --error=_%A_%a.err
#SBATCH  --error=errlog_%A_%a.out
#
# first we ensure a clean running environment:
module purge
mkdir -p py3.6.3
# and load the module for the software we are using:
module load python3/intel/3.6.3
# create the virtual environment for install new libraries which do not need sudo permissions right.
virtualenv --system-site-packages py3.6.3
source py3.6.3/bin/activate
pip3 install pillow
pip3 install scapy

#source py3.6.3/bin/activate /home/ky13/py3.6.3
cd /scratch/ky13/Experiments/xxx/Pcap2sessions_Scapy/3_pcap_parser/
python3 pcap2sessions_scapy.py

 

 You submit the job with sbatch:

$ sbatch myscript.s

And monitor its progress (as is discussed further in here) with:

$ squeue -u $USER

https://wikis.nyu.edu/display/NYUHPC/Submitting+jobs+with+sbatch

 

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --time=5:00:00
#SBATCH --mem=2GB
#SBATCH --job-name=myTest
#SBATCH --mail-type=END
#SBATCH --mail-user=bob.smith@nyu.edu
#SBATCH --output=slurm_%j.out
  
module purge
module load stata/14.2
RUNDIR=$SCRATCH/my_project/run-${SLURM_JOB_ID/.*}
mkdir -p $RUNDIR
  
DATADIR=$SCRATCH/my_project/data
cd $RUNDIR
stata -b do $DATADIR/data_0706.do
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --time=5:00:00
#SBATCH --mem=2GB
#SBATCH --job-name=myTest
#SBATCH --mail-type=END
#SBATCH --mail-user=bob.smith@nyu.edu
#SBATCH --output=slurm_%j.out
 
module purge
  
SRCDIR=$HOME/my_project/code
RUNDIR=$SCRATCH/my_project/run-${SLURM_JOB_ID/.*}
mkdir -p $RUNDIR
  
cd $SLURM_SUBMIT_DIR
cp my_input_params.inp $RUNDIR
  
cd $RUNDIR
module load fftw/intel/3.3.5
$SRCDIR/my_exec.exe < my_input_params.inp

 


 4. Issues

# submit job to nyu cluster (HPC)

>Sbatch run_scapy_pcap.sh

# check the job.

> squeue -u $USER

 

run_scapy_pcap.sh

 

#!/bin/bash
# the above line tells the shell how to execute this script
#
# job-name
#SBATCH --job-name=Scapy
#
# need 4 nodes
#SBATCH --nodes=4
#SBATCH --cpus-per-task=2
#
# expect the job to finish within 5 hours. If it takes longer than 5 hours, SLURM can kill it
#SBATCH --time=40:00:00
#
# expect the job to use no more than 24GB of memory
#SBATCH --mem=24GB
#
# once job ends, send me an email
#SBATCH --mail-type=END
#SBATCH --mail-user=xxx@xxx.com
#
# both standard output and error are directed to the same file.
#SBATCH --output=outlog_%A_%a.out
##SBATCH --error=_%A_%a.err
#SBATCH  --error=errlog_%A_%a.out
#
# first we ensure a clean running environment:
module purge
mkdir -p py3.6.3
# and load the module for the software we are using:
module load python3/intel/3.6.3
# create the virtual environment for install new libraries which do not need sudo permissions right.
virtualenv --system-site-packages py3.6.3
source py3.6.3/bin/activate
pip3 install pillow
pip3 install scapy

#source py3.6.3/bin/activate /home/ky13/py3.6.3
cd /scratch/ky13/Experiments/application_classification_project_201806/Pcap2Sessions_Scapy/3_pcap_parser/
python3 pcap2sessions_scapy.py -i '../../VPN_NonVPN_2016_Dataset/' -o './log.txt'

 

Issue 1:

 

 

https://bugs.schedmd.com/show_bug.cgi?id=3214#c4


>sacct -o JobID,ReqMem,MaxVMSize,MaxRSS,MaxRSSTask,State,NodeList -j 9281183

 Note: 9281183 is JobID.

 

Solution:

    The codes is not implemented by distributed code, so it always only run in single node, even I request 4 nodes.

 

# need 4 nodes

## SBATCH --nodes=4

#SBATCH --nodes=1

#SBATCH --cpus-per-task=8

 

To kill a running job, or remove a queued job from the queue, use scancel:

$ scancel jobid

To cancel ALL of your jobs:

$ scancel -u NetID

 

 

References:

1. https://wikis.nyu.edu/display/NYUHPC/Scratch+area+cleanup

2. https://wikis.nyu.edu/display/NYUHPC/Cancelling+batch+jobs+at+Prince

 

posted on 2018-10-14 23:53  Quinn-Yann  阅读(548)  评论(0编辑  收藏  举报