Building Containers

参考 https://pawseysc.github.io/containers-bioinformatics-workshop/5.build/index.html

本文利用docker搭建镜像,然后通过singularity pull转化到singularity镜像格式。

目标:搭建容器镜像

  • RStudio 镜像
  • Conda 镜像
  • 自己编译软件的镜像

使用 Docker(更好的兼容性)Dockerfile

  • FROM:选择base image
  • RUN:执行命令
  • ENV:定义环境变量

关键步骤

  • docker build:搭建镜像
  • docker run:测试调试
  • docker tag:命名
  • docker push:分享到公共注册列表(public registry)
  • singularity pull docker-daemon:Singularity转到线下使用

The best place to find useful base image is the Docker Hub online registry. Here is a non-comprehensive list of potentially useful base images (the version tags are not necessarily the most recent ones):

  • OS images, such as ubuntu:18.4, debian:buster and centos:7;
  • R images, in particular the rocker/ repository by the Rocker project:
    • Base R: rocker/r-ver:3.6.1;
    • RStudio: rocker/rstudio:3.6.1;
    • Tidyverse+RStudio: rocker/tidyverse:3.6.1;
  • Python images, such as python:3.8 and python:3.8-slim (a lightweight version);
  • Conda images by Anaconda, such as continuumio/miniconda3:4.8.2;
  • Jupyter images, in particular the jupyter/ repository by Jupyter Docker Stacks
    (unfortunately making extensive use of the latest tag), for instance:
    • Base Jupyter: jupyter/base-notebook:latest;
    • Jupyter with scientific Python packages: jupyter/scipy-notebook:latest;
    • Data science Jupyter, including Python, scientific Python packages, R, Tidyverse and more: jupyter/datascience-notebook:latest.

RStudio镜像

1
$ cd /home/sisih/work/containers-bioinformatics-workshop/exercises/build/r-ggtree

根据前文我们需要R和Tidyverse

1
FROM rocker/tidyverse:3.6.1

如果Rshell需要安装包

1
R -e 'BiocManager::install("ggtree")'

将其写入Dockerfile

1
2
3
FROM rocker/tidyverse:3.6.1

RUN R -e 'BiocManager::install("ggtree")'

创建镜像,remember we’re running from the directory where the Dockerfile is (.).

1
$ sudo docker build -t ggtree:2.0.4 .

测试安装,显示ggtree版本

1
$ sudo docker run ggtree:2.0.4 R -e 'packageVersion("ggtree")'
1
2
> packageVersion("ggtree")
[1] ‘2.0.4’

成功啦!

查看已有镜像

1
2
3
4
5
$ sudo docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
ggtree 2.0.4 eaef5bae8e39 5 days ago 2.11GB
hello-world latest bf756fb1ae65 13 months ago 13.3kB
rocker/tidyverse 3.6.1 5c4c75978566 13 months ago 2.1GB

顺带,如果想删除(因为缓存的镜像多了十分占用存储空间)

1
docker rmi <your-image-id>

也可以删除多个

1
docker rmi <your-image-id> <your-image-id> ...

一次性删除所有镜像

1
docker rmi $(docker images -q)

搭建镜像的时候会有依赖,所以需要强制,例如

1
sudo docker rmi -f 378dc77b6f16

编译软件镜像:samtools为例

1
$ cd ../compile-samtools

假设已知如何在Ubuntu上安装samtools(人工编译)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/bin/bash

# Install apt dependencies
apt-get update
apt-get -y install \
gcc \
libbz2-dev \
libcurl4-openssl-dev \
liblzma-dev \
libncurses5-dev \
libncursesw5-dev \
make \
perl \
tar \
vim \
wget \
zlib1g-dev

# Build samtools
mkdir /build
cd /build

wget https://github.com/samtools/samtools/releases/download/1.9/samtools-1.9.tar.bz2
tar -vxjf samtools-1.9.tar.bz2

cd samtools-1.9
./configure --prefix=/apps
make
make install

cd htslib-1.9
make
make install
  • apt:Ubuntu package manager。使用\增强可读性
  • /build:创建build目录
  • 下载并解压
  • 配置编译samtools(指定安装路径为/apps
  • 编译htslib (a companion library)

到docker hub上找到Ubuntu最新版本 https://hub.docker.com/_/ubuntu?tab=tags&page=1&ordering=last_updated

使用base image:ubuntu:1=20.04

第一版:安装工具依赖

参考脚本install_samtools.sh

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
FROM ubuntu:20.04

RUN apt-get update
RUN apt-get -y install \
gcc \
libbz2-dev \
libcurl4-openssl-dev \
liblzma-dev \
libncurses5-dev \
libncursesw5-dev \
make \
perl \
tar \
vim \
wget \
zlib1g-dev
1
$ sudo docker build -t sam:1 .

共3步(对应3个RUN)

第二版:安装samtools

直接复制后文

1
2
3
4
5
6
7
8
9
10
11
12
13
[..]

RUN mkdir /build
RUN cd /build
RUN wget https://github.com/samtools/samtools/releases/download/1.9/samtools-1.9.tar.bz2
RUN tar -vxjf samtools-1.9.tar.bz2
RUN cd samtools-1.9
RUN ./configure --prefix=/apps
RUN make
RUN make install
RUN cd htslib-1.9
RUN make
RUN make install

build

1
$ sudo docker build -t sam:2 .
  • 由于上一步docker buildlayer cachingapt命令立刻完成执行
1
2
3
4
5
6
7
8
Step 1/14 : FROM ubuntu:20.04
---> f63181f19b2f
Step 2/14 : RUN apt-get update
---> Using cache
---> c04e12e376bf
Step 3/14 : RUN apt-get -y install gcc libbz2-dev libcurl4-openssl-dev liblzma-dev libncurses5-dev libncursesw5-dev make perl tar vim wget zlib1g-dev
---> Using cache
---> 79b0541592e2
  • 14步
  • 第9步报错
1
2
3
4
5
6
7
8
9
 ---> eb81517ddecb
Step 8/14 : RUN cd samtools-1.9
---> Running in 2109f8f2da98
Removing intermediate container 2109f8f2da98
---> 8d4795073cdc
Step 9/14 : RUN ./configure --prefix=/apps
---> Running in 5bf3b2a258f5
/bin/sh: 1: ./configure: not found
The command '/bin/sh -c ./configure --prefix=/apps' returned a non-zero code: 127

每一个RUN都会建一个临时的中间container

根据脚本./configure理应在/build/samtools-1.9但是却没有找到

为了了解发生甚么事情,在镜像构建的最后一步中使用docker run和交互式shell的标志-it打开一个交互式shell。这是有效调试映像构建过程的好方法。

第三版: &&在相同文件夹执行

在每个步骤的末尾,均以—>开头的行打印字母数字镜像ID。 在失败步骤(9/14)之前获取镜像ID:5bf3b2a258f5 于是执行bash命令

1
$ sudo docker run -it 8d4795073cdc bash

进入中间镜像启动的容器中,查看目录

1
# ls /build

发现是空的

查看当前目录

1
2
3
4
# ls /
bin dev lib libx32 opt run sbin tmp
boot etc lib32 media proc samtools-1.9 srv usr
build home lib64 mnt root samtools-1.9.tar.bz2 sys var

发现samtools-1.9.tar.bz2samtools-1.9都在根目录,而在Dockerfile中我们期望它们在/build目录下

exit或按Ctrl-D来关闭容器交互式shell

Docker构建:每次发出新的RUN指令时,命令都是从根目录/执行的,而不是像常规Linux shell从最后一个活动目录执行的。

解决办法:将需要从同一目录执行的bash命令打包到同一条RUN指令中。 使用Linux语法&&连接它们。注意这仅在前一个命令成功结束(没有错误)的情况下,shell(或此处的Docker)才会执行下一条命令。 例如:

1
2
RUN cd /build && \
wget https://github.com/samtools/samtools/releases/download/1.9/samtools-1.9.tar.bz2

修改Dockerfile

1
2
3
4
5
6
7
8
9
10
11
12
13
[..]

RUN mkdir /build
RUN cd /build && \
wget https://github.com/samtools/samtools/releases/download/1.9/samtools-1.9.tar.bz2 && \
tar -vxjf samtools-1.9.tar.bz2 && \
cd samtools-1.9 && \
./configure --prefix=/apps && \
make && \
make install && \
cd htslib-1.9 && \
make && \
make install
1
$ sudo docker build -t sam:3 .

在容器镜像中成功编译samtools

1
2
Successfully built 177fe64fbc1d
Successfully tagged sam:3

测试执行容器镜像中的samtools,使用docker run

1
2
$ sudo docker run sam:3 samtools
docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: exec: "samtools": executable file not found in $PATH: unknown.

报错:在$PATH中找不到samtools可执行文件

第四版:找到samtools可执行文件,添加环境变量

再一次打开交互shell,这次docker run -it直接查看镜像sam:3

1
$ sudo docker run -it sam:3 bash

./configure --prefix=/apps知道将软件包安装在/apps

使用ls检查该目录,例如查找子目录bin,这是可执行文件的常用位置。

1
2
3
4
5
6
7
8
9
10
$ sudo docker run -it sam:3 bash
root@2f0ea0b9beba:/# ls /apps
bin include lib share
root@2f0ea0b9beba:/# ls /apps/bin/
ace2sam interpolate_sam.pl plot-bamstats soap2sam.pl
bgzip maq2sam-long psl2sam.pl tabix
blast2sam.pl maq2sam-short sam2vcf.pl varfilter.py
bowtie2sam.pl md5fa samtools wgsim
export2sam.pl md5sum-lite samtools.pl wgsim_eval.pl
htsfile novo2sam.pl seq_cache_populate.pl zoom2sam.pl

耶!/apps/bin/目录有可执行文件,包括samtools

键入quit或按键Ctrl-D关闭交互进程

于是最后一步将/apps/bin/目录加入$PATH

如果在Dockerfile写入RUN export PATH=/apps/bin:$PATH,同样在该步RUN结束后就会失效了(类似之前目录的问题),正确方法是申明Docker特有的环境变量ENV

1
2
3
[..]

ENV PATH=/apps/bin:$PATH

最后一次……

1
$ sudo docker build -t sam:4 .

因为环境变量前面的步骤都cached,所以非常快。测试

1
2
3
4
5
6
$ sudo docker run sam:4 samtools

Program: samtools (Tools for alignments in the SAM format)
Version: 1.9 (using htslib 1.9)

Usage: samtools <command> [options]

成功啦!

Conda镜像

1
$ cd ../conda-samtools

docker hub找到miniconda https://hub.docker.com/r/continuumio/miniconda3/tags?page=1&ordering=last_updated

1
2
3
FROM continuumio/miniconda3:4.8.2

RUN conda install -y -c bioconda samtools=1.9

注意加-y

1
$ sudo docker build -t samconda:1 .
1
2
Successfully built 672d5f0e8429
Successfully tagged samconda:1

测试

1
2
3
4
5
6
$ sudo docker run samconda:1 samtools

Program: samtools (Tools for alignments in the SAM format)
Version: 1.9 (using htslib 1.9)

Usage: samtools <command> [options]

成功啦!

分享转换镜像

上传镜像到 Docker hub

首先一定要记得登录!!

否则在后续docker push操作后会报错

1
denied requested access to the resource is denied
1
2
3
$ sudo docker login
username: sisih
password : secret~
1
2
3
4
5
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded

这意味着下一次不用再登录了

为镜像命名一个新名称,其中包括我们的Docker Hub帐户,使用docker tag

1
$ sudo docker tag sam:4 sisih/test-samtools:1.9

通过docker push上传新命名的镜像

1
2
3
4
5
6
7
8
9
10
$ sudo docker push sisih/test-samtools:1.9
The push refers to repository [docker.io/sisih/test-samtools]
f049843be2d1: Pushed
9e67c05ad194: Pushed
432cb97b4f75: Pushed
ef88f84d30ae: Pushed
02473afd360b: Mounted from library/ubuntu
dbf2c0f42a39: Mounted from library/ubuntu
9f32931c9d28: Mounted from library/ubuntu
1.9: digest: sha256:0f842cb99ed65158157fdbec4d0f4744e4dd34e6cf4da41624b4a9a44fa956f2 size: 1786

嘻嘻成功啦

欢迎来我的主页康康 sisih’s Profile (docker.com)

转换到Singularity镜像格式

1
$ singularity pull docker://sisih/test-samtools:1.9
1
INFO:    Creating SIF file...

有了

1
2
3
$ ls
best_practices install_samtools.sh test-samtools-1.9.sif
Dockerfile solutions

测试

1
$ singularity exec test-samtools_1.9.sif samtools

成功!

注意事项

编译好的镜像根目录是read only的,所以如果想对此操作,可以把内部的文件复制出来。