NCBI 手册上说, The makeblastdb application produces BLAST databases from FASTA files. Assigning a unique identifier to every sequence:
begin with “>”
not use “|” in the indentifier
在现有需要建库的 reads 的 fa 文件中,它的 id 长下面这样
1 2
sed -n 1p dr.fsa >CP009803.1|CM002272.1|CP029339.1
把 | 换成 _
1 2 3
sed -i 's/原字符串/替换字符串/g' filename # mac unix + '' sed -i '''s/|/_/g' dr.fsa
结尾加g:替换每一行匹配的每个字符,否则只替换匹配到的第一个
1 2 3
sed -i '''s/|/_/g' dr.fsa sed -n 1p dr.fsa >CP009803.1_CM002272.1_CP029339.1
尝试建库 makeblastdb
1
makeblastdb -in dr.fsa -dbtype nucl -out dr -parse_seqids
但是报错:id 太长了……
1
BLAST Database creation error: Near line 17, the local id is too long. Its length is 87 but the maximum allowed local id length is 50. Please find and correct all local ids that are too long.
决定把 _ 及之后的都删掉
1
sed -i 's/\_.*//' dr.fsa
再次建库
1
makeblastdb -in dr.fsa -dbtype nucl -out dr -parse_seqids
报错:需要 id unique……
1 2 3
BLAST Database creation error: Error: Duplicate seq_ids are found: GB|CP021130.1 make a unique identifier for each sequence
Building a new DB, current time: 01/25/2022 16:46:10 New DB name: [我的目录]/repeats New DB title: dr.fa Sequence type: Nucleotide Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 26958 sequences in 0.460787 seconds.