- This event has passed.
Graduate Seminar: Taylor Petty
March 31 @ 3:30 pm - 4:30 pm
Forensic DNA Mixture Deconvolution with Next-Generation Sequencing
Next-generation sequencing (NGS) provides a higher-resolution view of DNA mixtures processed from crime scenes than the current practice of Capillary Electrophoresis (CE). Whereas CE merely counts Short Tandem Repeats (STRs) and ignores flanking regions, NGS considers allelic variation and mutation within STRs as well as flanking regions. New methods are required in order to take advantage of this new, richer data type, and precision is of the essence since both false positives and false negatives have high social cost in the world of criminal prosecution. The National Institute of Science and Technology has been charged with developing better methods to model this data more accurately for the purpose of criminal prosecution, and we are working with millions of rows of data they have processed in their labs. We develop a Forensic Levenshtein distance (a modified version of the standard Levenshtein distance) for gene sequences that allows STR drops in addition to standard Levenshtein edits. We use this metric to train a Pareto distribution on distances within homozygous mixtures. We then implement an EM algorithm on this Pareto distribution on a mixture of homozygous alleles at a single locus. Our latest result is implementing a Fiducial Gibbs Sampler to recover mixing proportions of homozygous alleles at a single locus. Our goal is to be able to use NGS technology to provide likelihood ratios of whether DNA from a person of interest is in the DNA sample from the scene. Ongoing work includes adapting this method to synthesize across all loci of any given mixture, as well as training an algorithm to use hierarchical clustering to determine the number of true allelic peaks in a mixture. Future directions include using a more precise MLE method for setting costs in the metric, as well as working on a prototype of scalable open-source software for practitioners that can help interpret DNA mixtures using NGS data.