
A beginner guide to SNP calling from high-throughput DNA-sequencing data

2012, Human Genetics

  • the objective is to identify genetic variants such as single nucleotide polymorphism (SNP) from high-throughput DNA sequencing (HTS) data.
  • pipeline: 1. quality control 2. mapping of short reads to the reference genome 3. visualization and post-processing of the alignment including base quality recalibration 4. SNP calling procedure along with filtering of SNP candidates
Suspected close contacts as the pilot indicator of the growth trend of confirmed population during the COVID-19 pandemic: A simulation approach

Sisi Huang, Anding Zhu, Yan Wang, Yancong Xu, Lu Li, Dexing Kong



Regarding to the actual situation of the new coronavirus disease 2019 epidemic, social factors should be taken into account and the increasing growth trend of confirmed populations needs to be explained. A proper model needs to be established, not only to simulate the epidemic, but also to evaluate the future epidemic situation and find a pilot indicator for the outbreak.

The original susceptible-infectious-recover model is modified into the susceptible-infectious-quarantine-confirm-recover combined with social factors (SIDCRL) model, which combines the natural transmission with social factors such as external interventions and isolation. The numerical simulation method is used to imitate the change curve of the cumulative number of the confirmed cases and the number of cured patients. Furthermore, we investigate the relationship between the suspected close contacts (SCC) and the final outcome of the growth trend of confirmed cases with a simulation approach.

This article selects four representative countries, that is, China, South Korea, Italy, and the United States, and gives separate numerical simulations. The simulation results of the model fit the actual situation of the epidemic development and reasonable predictions are made. In addition, it is analyzed that the increasing number of SCC contributes to the epidemic outbreak and the prediction of the United States based on the population of the SCC highlights the importance of external intervention and active prevention measures.

The simulation of the model verifies its reliability and stresses that observable variable SCC can be taken as a pilot indicator of the coronavirus disease 2019 pandemic.

**Keywords: ** COVID-19, SIR model, social factors, numerical simulation, suspected close contacts, confirmed case, temporary hospital

The numerical simulation of SIDCRL model shows it gives an excellent fit of the realistic data. Then it is derived from the simulation result that the increasing number of SCC contributes to the epidemic outbreak, which highlights the importance of external intervention and the active prevention measures in all countries. The paper is well-written and the new model and the corresponding simulation results are interesting both theoretically and practically and present a new direction to investigate the COVID-19 epidemic for the related scientists.




Tips: shift+右键就可以在当前目录进入Linux终端,超棒的哎!

这里介绍一种方式,或说一种数据结构:Full-text Minute-size index (FM Index / BWT)

参考基因序列经过BWT变换后,通过FM Index和FL mapping能够实现reads的快速匹配。

  • 给定参考基因和一组reads,至少能找到一个“良好”的局部比对,或说找到一个read在参考基因序列中的位置。
  • 怎样的比对结果是“良好”的?
    • 错配越少越好
    • 低质量的碱基错配要比高质量的碱基错配更好
承接上一篇Global Alignment of Protein Sequence马尔科夫链的部分继续。马尔科夫链在学随机过程或者计算机模拟的时候都会学到,这里主要讲述它在基因序列上的应用。


  • 将多个试验结果按时间标记为一系列“前后相继”的状态:

  • 也称为离散时间马尔可夫链(discrete-time Markov chain): 描述从状态到状态的转换的随机过程;

  • 马尔可夫性质(无记忆性): 下一状态的概率分布只能由当前状态决定,在时间序列中它前面的事件均与之无关

  • 推广到连续时间状态的情形,统称:Markov 过程





  • 迭代关系 ,有,则为其极限分布,记

  • Perron-Frobenius 定理



  • 需要注意的是这个左特征向量存在的条件很低,可是说总是存在的,但是这不一定是任意初始向量的迭代极限,参考马氏链定理的条件:非周期的转移概率矩阵,任何两个状态是连通的。的唯一非负解,称为马氏链的平稳分布。





这篇文章主要讲述了不同的BLAST方法,为什么要用氨基酸序列进行比对,如何处理gap惩罚,并用动态规划的方法全局比对找到最优解,然后回溯获得比对结果,同时可以应用到半全局比对和局部比对。这其中需要注意的是PAM matrix的一些特点,这个评分矩阵的设计非常有内涵,最后提了一下我们需要通过DNA序列进化来知道如何设计这样的评分系统的是合理的,而DNA进化序列实际上是一条马尔科夫链。

本文主要介绍了基因结构,包含开放阅读框open reading frames(ORF),内含子intron,外显子exon,编码基因 coding sequence(CDS),非翻译区untranslated region(UTR),互补DNA complementary DNA(cDNA),核糖体结合位点 ribosome binding site(RBS)。



这些基因结构信息用GTF file保存。

