0%

Unix shell是命令行解释器或shell,可为类Unix操作系统提供命令行用户界面。 shell既是交互式命令语言又是脚本语言,操作系统使用它通过shell脚本来控制系统的执行。

当shell完成执行程序后,它将在屏幕上将输出发送给用户,这是标准输出设备。 因此,它被称为命令解释器

Read more »

出发点:
1.对原序列一段区域感兴趣,将这一小段序列取出来保存为fasta文件作为reference
2.fastq文件align到reference,大部分是map不上的,只要map上的
3.比对结果可视化

Read more »

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s of characters to relatively long (e.g. mammalian) genomes.

Bowtie 2:将测序reads与长的参考序列进行比对的工具

Bowtie 2 outputs alignments in SAM format, enabling interoperation with a large number of other tools (e.g. SAMtools, GATK) that use SAM.

Read more »

SPAdes的原理部分之前已经写过了,见de Bruijn graph

SPAdes takes as input paired-end reads, mate-pairs and single (unpaired) reads in FASTA and FASTQ. 进行assembly后输出fastq文件,推荐scaffold作为结果文件。

SPAdes 有以下几个模块:

  • BayesHammer – 矫正Illumina reads的读取错误,该工具在single cell和标准数据集上均能很好地工作。
  • IonHammer – 矫正IonTorrent data的读取错误,同样在两种数据集上良好工作。
  • SPAdes – iterative short-read genome assembly module 迭代短读基因组组装模块; 根据读取的长度和数据集类型自动选择K的值。
  • MismatchCorrector – 改善contigs和scaffolds的错配mismatch和短插入缺失indel率;这个模块使用BWA工具。MismatchCorrector默认关闭,我们可以开启它(建议)。
Read more »

Paired-end sequencing

Paired-end sequencing allows users to sequence both ends of a fragment and generate high-quality, alignable sequence data.

Paired-end sequencing facilitates detection of genomic rearrangements and repetitive sequence elements, as well as gene fusions and novel transcripts.

1.基因组重排
2.重复序列元素
3.基因融合
4.新颖的转录本

Read more »

由于生物基础薄弱,一些单位(关于mutation rate)就把我搞乱了,咳咳故作镇定。

…a nucleotide substitution rate of per site per year both in poultry and in House Finches, an exceptionally fast rate rivaling some of the highest estimates reported…

Mutation Rate (SNP/Nucleotide/Replication) (Barrick and Lenski, 2013)

1.substitution rate of…per site per year: SNP/Nucleotide/Year
2.SNP/Nucleotide/Year * yy Year = SNP/Nucleotide
3.SNP/Nucleotide / rr Replication = SNP/Nucleotide/Replication
4.SNP/Nucleotide/Replication: number of mutations per base pair for each replication event
5.SNP/Nucleotide/Replication=mutations/bp/replication

Read more »

classical sequencing

ribonucleotide: a nucleotide containing ribose as its pentose component.

deoxy-: the monomer, or single unit, of DNA or deoxyribonucleic acid.

dideoxy-: a modified deoxy-. This group is needed for the next nucleotide to attach to a growing polynucleotide chain during DNA synthesis.

give DNA polymerase a template and some dideoxy
nucleotides. It won’t be able to extend because there’s no 3-prime OH.

Read more »

登入lab AWS服务器的第一天,我就把他们根目录下的.bashrc文件修改了……但我现在觉得(自认为)应该是改进了他们的conda管理,加一个initialize通过修改路径进行default conda的修改,当然这其中我求助了几位朋友。但愿他们这个白天使用服务器一切正常。

不管怎样,还在前辈的建议下加了comment,万一有什么问题,lab的伙伴们也方便处理。

Read more »

A de Bruijn graph is a compact representation based on short words (-mers) that is ideal for high coverage, very short read (25–50 bp) data sets.

Read more »

Whole-genome sequencing (WGS) is a comprehensive method for analyzing entire genomes.–commonly associated with sequencing human genomes

Next-generation sequencing (NGS) technology–useful for sequencing any species, such as agriculturally important livestock, plants, or disease-related microbes.

Read more »