Definitive Diagnostics

SCIENTIFIC RATIONALE – WHY DI-HAPLOID AND NOT DIPLOID?
Why is it important that sequenced haploid fragments be reassembled into the correct chromosomal phase?
Because understanding genetic and epigenetic function requires that the sequences of each of the pair of inherited chromosomes of the di-haploid genome are correctly assigned.
The terms ‘haploid’ and ‘haplotype’ underlie the importance of di-haploidy. Haploid, or haploid genotype, refers to half the genetic material that we inherit from each of our parents. The term ‘haplotype’ is a contraction of haploid genotype, and was first applied to chromosomal lengths in which genes remain associated during inheritance (Ceppellini et al: 1967).
In the extreme, a haplotype extends the full length of each haploid chromosome. However, ‘haplotype’ has come to be used as an operational term in many situations, including —
- combinations of alleles of single nucleotide polymorphisms (SNPs);
- haploid sequence fragments analysed in high-throughput (next-gen) single molecule sequencing (SMS);
- chromosomal segments comprising multiple genes shown to be linked by family studies.
The lateral extent of these haplotypes is usually not precisely defined. For instance, SNP haplotype extent tends to be the heterozygous combination that occurs at the ends of segment lengths of homozygous SNP pairs.
Mendel (1866) was the first to show that inheritance of traits was particulate in that traits were independently inherited. We now know that each parental chromosome is the product of cross-overs (meiotic recombination) between the chromosome pairs.
More precisely, the basic genetic inheritance ‘particle’ of each parental chromosome contributing to an offspring is the haploid segment between cross-over boundaries (Simons: 1989, 1990). This definition was applied to haplotype mapping of unrelated individuals by what is now known as Genome-Wide Association mapping (Simons, 1990)
Haplomics views the genome as a di-haplome. Within each haplome, each parental chromosome in an individual is a haplotype comprising a linear mosaic of contiguous haploid segments bounded by cross-over sites which demarcate the inherited segments. These haploid segments are the basis of Mendel’s particulate inheritance.
All current diagnostic DNA testing involves testing islands of DNA that occur at intervals, from the smallest single DNA nucleotide SNPs, to a few tens or at most a few thousands of DNA nucleotides, with gaps in between each test unit. The challenge is then to join the separated markers, from SNPs to sequences, into a continuous DNA chain that reflects the chromosome of origin (chromosome phasing).
Current practice for chromosome phasing depends on bioinformatics (computer algorithms) to make the most informed estimate that two or more DNA test units are on the same chromosome.
It is widely assumed that, eventually, computational advances in maximum likelihood estimation will inexorably advance towards sufficient discriminatory power to assign markers to the true chromosome on which they were inherited; i.e to the true inherited di-haploid state.
A parallel belief is that the latest sequencing technology (single molecule sequencing - SMS) will eventually provide so much sequencing information (by ‘deep sequencing’) that, in conjunction with ever-more tweaking of bioinformatic algorithms, it will be possible to correctly realign the contents of the diploid “soup”; in technical terms, that assignment of Identity-by-state in single individuals will correspond to Identity-by-descent determined by family studies.
While each SMS sequence fragment is haploid in that it comes from a single chromosome, each fragment can only be phased with its neighbours if the overlap sequences are informative by having distinguishing DNA markers, no matter how ‘deep’ the sequencing.
Unfortunately, even in the highly variable sequences of the transplant tissue typing genes, there are ‘deserts’ of invariant sequences much longer than current sequence read lengths which do not permit phase assignment.
Genetic typing for transplantation is a long-established medical practice which involves matching the patient’s two chromosome haplotypes known from family studies with candidate single donors, for whom no family information is available. More than 40 years of experience world-wide has led to the view that inferred chromosomal phase approach does not provide an unequivocal, definitive, 2-chromosome haplotype profile. There are early indications that transplant complications (Graft-versus-Host disease) may be increased in patients who are genotype matched but haplotype mismatched.
Haplomics view is that phasing of diploid DNA as haplotypes, even of multiple coding regions within a genetic locus, let alone multiple locus heterozygous sequences, is, in principle, unattainable, and that definitive di-haploid sequence, which is the fundamental prerequisite for unassumptive genetic research and diagnostics, requires chromosome separation.
Why is comprehension of chromosome haplotype composition, and definition of inherited haploid segments, a critical requirement of definitive genetics and genomics?
Because genetic function is influenced by sequences at chromosomal locations well outside the structural gene regions of transcription and translation.
At the simplest level, function is influenced by the type of translated protein. Even alternatively spliced isoforms occur on some haplotypes, but not on others. At least some of those ‘pseudogenes’ producing detectable transcripts also exhibit such haplotypic restriction in that they occur on some haplotypes but not on others. Whether splicing is canonical or alternative, sequence elements that influence gene expression may reside tens of kilobases distant from a gene, in the intergenic region, even located within retrotransposons.
The concept of a gene as a ‘cookie cut’ sequence limited to the mRNA transcript is being replaced by ‘genomic regulatory region’ to take account of distant regulatory elements.
The haplotype length and sequence content of genomic regulatory regions between homologous chromosomes may not correspond, so gene expression regulatory elements may also vary, qualitatively, as present or absent, such that genetic locus allele assignment alone is inadequate for evaluation of genetic function.
Expression regulation involves both cis- and trans-chromosome phase sequences. In clinical genetics, where two mutations occur in the same gene but at different locations, haploid analysis is required to distinguish mutations occurring on opposite chromosomes, conferring risk by compound heterozygosity, from the co-occurrence of mutations on the same chromosome, unassociated with known risk.
In addition to the strong phase dependency of genetic influences on gene expression, epigenetic phenomena are also regulated by phase-specific sequence elements. Also, there is emerging evidence of ‘cross-talk’ between chromosomes affecting both genetic expression and epigenetic phenomena that can only be resolved by distinction between each of the paired chromosomes.
Haplomics’ predicts that definitive genetic / genomic diagnostics will involve comparison of the haploid components of the two HAPLOMES of each individual in populations of patients and comparison subjects.