Pairwise comparative analysis of six haplotype assembly methods based on users’ experience

BMC Genomic Data

Table 4 Models, assumptions, and features of 6 HA algorithms

Method	Model or algorithms	Assumptions	Seq Error	Seq Cov	Read Len	Obj Fun
HapCUT2	maximum likelihood-based	Heterozygous sites known in advance; allele count independent		X	X	X
MixSIH	probabilistic mixture model	sequence error rate independent of fragments and positions; mixture probabilities are equal (pm(0) = pm(1) = 0.5 )		X	X	X
PEATH	probabilistic evolutionary algorithm	All input variables are independent, no copy number variation	X			X
WhatsHap	Fixed-parameter tractable algorithm, dynamic programming	Allele with higher alignment score is assumed to be supported by the read, variants are sorted by position, recombination events equally likely at any position		X	X	X
SDhaP	MEC		X	X		X
MAtCHap	maximum allele co-occurrence			X		X

The second column is the model or algorithm used by each HA method. The third column is the assumption used by each HA algorithm. The fourth to seventh columns are features or factors that were considered by the authors of each HA method. These features/factors are sequencing error (Seq Error), sequencing coverage or depth (Seq Cov), sequencing read length (Read Len), and objective function (Obj Fun). “X” in each column means that the HA method in that row incorporates that feature or metric. Some cells in the assumption column are left blank, which means that no clear assumptions can be found in the corresponding papers.

ISSN: 2730-6844