CPU, core and threads in HPC --SBATCH setup

Posted on 2020-08-20 Edited on 2021-10-24

使用的集群会有计算结点和登录结点。以lab的集群为例

[huangsisi@tc6000 ~]$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
sugon*       up   infinite      1   idle node1
low          up   infinite      1   idle node1
download     up   infinite      1   idle tc6000

-N node oriented
-l 提供更多的信息：number of CPUs, memory, temporary disk (also called scratch space), node weight (an internal parameter specifying preferences in nodes for allocations when there are multiple possibilities), features of the nodes (such as processor type for instance) and the reason, if any, for which a node is down.

[huangsisi@tc6000 ~]$ sinfo -Nl
Sun Aug 16 10:38:15 2020
NODELIST   NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
node1          1    sugon*        idle  144   4:18:2 206359        0      1   (null) none
node1          1       low        idle  144   4:18:2 206359        0      1   (null) none
tc6000         1  download        idle   12   1:12:1  63889        0      1   (null) none

node1是计算结点（不连网），tc6000是登录结点（连网），sbatch递交作业后slurm是在计算结点进行工作调度。

sinfo信息

CPU, core and threads

CPU的核数是指硬件上存在着几个核心，例如双核就是包括2个相对独立的CPU核心单元组。线程数是一种逻辑的概念，是模拟出的CPU核心数。

总核数 = 物理CPU个数 $\times$ 每个物理CPU的核数 (core)
总逻辑CPU数 = 物理CPU个数 $\times$ 每个物理CPU的核数 (core) $\times$ 线程数 (thread)

tc6000

12 logical CPUs = 1 physical CPU $\times$ 12 cores $\times$ 1 thread

node1

144 logical CPUs = 4 physical CPUs $\times$ 18 cores $\times$ 2 threads

后续sbatch作业还需要node1的一些信息

对于memory默认单位是Mb，对于node1，206359Mb相当于201G，啊但事实上我们有2T……大概是没有显示完全。

感受下单位换算的数量级，一般来说如果看文件信息ll，不加-h给的是字节（byte）数。

1byte=8bit. 1KB=1024byte. 
1MB=1024KB=1,048,576byte. 
1GB=1024MB=1,073,741,824byte. 
1TB=1024GB=1,099,511,627,776byte

我们一共有3个分区，分别是sugon,low,download，是人为自定义的，我们一般把作业递交到sugon（打星号的是默认递交的分区）。

A partition is a set of compute nodes (computers dedicated to … computing,) grouped logically. Typical examples include partitions dedicated to batch processing, debugging, post processing, or visualization.

看一下运行时间限制，由于这个集群只有我们lab使用，所以是infinite，也太舒服了QVQ……

1
2
3

$ sinfo --partition sugon
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
sugon*       up   infinite      1   idle node1

SBATCH 参数设置

Slurm_Cheat_Sheet.pdf

由上面的信息，我们可以更好的设置这些参数

#SBATCH --partition 工作分区，我们用sugon
#SBATCH --nodelist可以制定在哪个节点运行任务
#SBATCH --exclude 可以设置不放在某个节点跑任务，比如近期生研院集群似乎挂了的cpu08
#SBATCH --nodes 使用nodes数量，我们只有1个
#SBATCH --ntasks tasks数量，可能分配给不同node
#SBATCH --ntasks-per-node 每个结点的tasks数量，由于我们只有1 node，所以ntasks和ntasks-per-node是相同的
#SBATCH --cpus-per-task 每个task使用的core的数量（默认 1 core per task），同一个task会在同一个node
#SBATCH --mem 这个作业要求的内存 (Specified in MB)
#SBATCH --mem-per-cpu 每个core要求的内存 (Specified in MB)

在slurm中，一个任务（task）被理解为一个进程（process），一个多进程（multi-process）程序由多个任务组成。相反，多线程（multithreaded）程序只有一个任务，但这个任务使用多个logical CPU。

更好的理解ntasks，参考what-does-the-ntasks-or-n-tasks-does-in-slurm

The --ntasks parameter is useful if you have commands that you want to run in parallel within the same batch script. This may be two separate commands separated by an & or two commands used in a bash pipe |.

例子1

#!/bin/bash

#SBATCH --ntasks=1

srun sleep 10 & 
srun sleep 12 &
wait

总共会用时22s，并有警告Job step creation temporarily disabled, retrying。但是如果#SBATCH --ntasks=1，两个task会同时进行，总共用时12s。

例子2

#!/bin/bash
#SBATCH --ntasks=8
## more options
echo hello

上面这个脚本只会输出1行，但是后面使用srun以后，会输出8行。

#!/bin/bash
#SBATCH --ntasks=8
## more options
srun echo hello

所以说如果我们是想比如说samtools可以使用更多的线程，应该修改的是#SBATCH --cpus-per-task，并且记得在后面修改samtools的线程参数，否则多余的线程就会被占着却不使用。

多线程实例

Use ${SLURM_CPUS_PER_TASK} variable to pass the number of cores in the job to commands that accept an argument for number of core, cpus or threads.

For job arrays, the default file name is “slurm-%A_%a.out”, “%A” is replaced by the job ID and “%a” with the array index. For other jobs, the default file name is “slurm-%j.out”, where the “%j” is replaced by the job ID.

#!/bin/bash
#SBATCH --job-name=test
#SBATCH -o %x_%A.%a_%N.out
#SBATCH --account=huangsisi
#SBATCH --partition=sugon
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --mem-per-cpu=2000
#SBATCH --time=400:00:00
#SBATCH --mail-user=sisih@mit.edu
#SBATCH --mail-type=ALL

source activate rna

InDir="/public/home/shared_lab/first11/Fasta_file"

bowtie2-build NC_045512.2.fna index
bowtie2 -p ${SLURM_CPUS_PER_TASK} -x index \
   -1 $InDir/P3-VERO-P3-1-vero_L4_1.fq.gz \
   -2 $InDir/P3-VERO-P3-1-vero_L4_2.fq.gz -S out.sam

samtools view -@${SLURM_CPUS_PER_TASK} -F 12 -bS out.sam >tmp.bam
# samtools sort -@${SLURM_CPUS_PER_TASK} tmp.bam > tmp.sort.bam
# bam to fastq
# samtools fastq has no thread argument
samtools fastq -1 r1.fq -2 r2.fq -n tmp.bam
#  -1 FILE   write paired reads flagged READ1 to FILE
#  -2 FILE   write paired reads flagged READ2 to FILE

基于Ubuntu系统的查看信息方式

cpu全部信息

1	less -SN /proc/cpuinfo

查看物理CPU个数

[huangsisi@tc6000 ~]$ grep "physical id" /proc/cpuinfo| sort -u| wc -l
2
[huangsisi@tc6000 ~]$ grep "physical id" /proc/cpuinfo| sort -u
physical id     : 0
physical id     : 1

查看每个物理CPU中core的个数

1 2	[huangsisi@tc6000 ~]$ grep "cpu cores" /proc/cpuinfo\| uniq cpu cores : 12

或者

1 2	[huangsisi@tc6000 ~]$ grep 'core id' /proc/cpuinfo\| sort -u\| wc -l 12

查看逻辑CPU的个数

1 2	[huangsisi@tc6000 ~]$ grep "processor" /proc/cpuinfo\| wc -l 48

？怎么不太一致，看起来是

48 logical CPUs = 2 physical CPUs $\times$ 12 cores $\times$ 2 threads

我愿意相信这个信息比sinfo更准……好惊喜呀！

下面到计算结点看看情况

# 登录计算结点
[huangsisi@tc6000 ~]$ ssh node1
Last login: Mon Jul 27 15:31:57 2020
# 查看物理CPU个数
[huangsisi@node1 ~]$ grep "physical id" /proc/cpuinfo| sort -u| wc -l
4
[huangsisi@node1 ~]$ grep "physical id" /proc/cpuinfo| sort -u
physical id     : 0
physical id     : 1
physical id     : 2
physical id     : 3
# 查看每个物理CPU中core的个数
[huangsisi@node1 ~]$ grep "cpu cores" /proc/cpuinfo| uniq
cpu cores       : 18
[huangsisi@node1 ~]$ grep 'core id' /proc/cpuinfo| sort -u| wc -l
18
# 查看逻辑CPU的个数
[huangsisi@node1 ~]$ grep "processor" /proc/cpuinfo| wc -l
144

node1和之前sinfo得到的信息一致。