Find your genetic relatives quickly and accurately with ISI’s iLASH algorithm – USC Viterbi


Pixabay // Qimono

Admit it – at some point in our childhood we all wondered if we had a long-lost twin or a prominent relative. We may even have gone so far as to get genetic testing done by a company like 23AndMe and learn more about our ancestors and distant relatives. Although we often conduct genetic testing for personal purposes, identifying individuals who share DNA could have a major impact on our understanding of the health and structure of the human population as a whole.

This process of studying genetic links has been streamlined with the introduction of iLASH (IBD by LocAlity-Sensitive Hashing), an algorithm that efficiently identifies relevant genetic links between large groups of people, which then led to major advances in population genetics and personalized medicine among others other fields.

What is iLASH?

iLASH was born when Ruhollah Shemirani, Ph.D. Student at USC, has teamed up with Jose-Luis Ambite, research team leader at USC’s Information Sciences Institute (ISI) and Associate Research Professor of Computer Science at USC Viterbi, to study how genetic links between individuals elucidate genetic causes of disease can and the genetic structure of populations. In addition, iLASH can also be used for other purposes, e.g. B. to find distant relatives via services like 23AndMe.

Making genetic connections can make the world feel smaller than you think. Interesting examples are the experiences of Ambite and many others who have used similar services.

“I share 0.07% of my genome with Greg Ver Steeg, another researcher at ISI,” said Ambite, according to 23andMe. “Greg is American (of Dutch descent, many generations ago). I am from Spain. Nonetheless, we share a bit of DNA from a common ancestor. “

In essence, iLASH is a method of estimating IBD, or Identity-By-Descent. “IBD estimation is the process of finding out where and how much each pair of individuals in a genetic dataset share their DNA based on a common ancestry,” Shemirani explained.

IBD estimation is the first step in IBD mapping, a novel technique for identifying the genetic basis of diseases. This process is divided into three steps, each of which is published independently of one another in a paper. The first step, published in Nature Communications on June 10ththe , involves estimating genetic segments shared by pairs of individuals using iLASH. Next, this genetic mating information is used to create groups of “distant families” using network clustering techniques. The last post focuses on statistical methods to show whether these “distant families” have increased disease rates or other characteristics.

A pioneer in IBD estimation

What is the difference between iLASH and other genetic algorithms? Scalability and Accuracy. With the ability to perform large-scale IBD estimation, biobanks or biological specimen holdings for research that were previously not feasible can now be analyzed for genetic links at unprecedented rates.

“Before iLASH, it took more than a week (~ 6 days per chromosome) to find genetic links in a data set of 50,000 people,” Shemirani said. “The same data set is analyzed by iLASH in an hour!”

To achieve this, iLASH uses Locality Sensitive Hashing (LSH), which eliminates unrelated pairs of genetic samples, leaving remaining pairs with a high probability of shared DNA. This complex algorithm has been facilitated by parallel computing, which allows multiple processes to run concurrently and creates an efficient approach to IBD estimation.

As a crucial step, Shemirani and Ambite worked with geneticists and researchers from various institutions to ensure that iLASH is compatible with popular formats used by bioinformaticians who apply information generated from the algorithm to biological and medical research.

“We couldn’t have achieved this without such feedback from real geneticists at the University of Colorado Medical Campus and the Icahn School of Medicine on Mount Sinai,” Shemirani said.

Revolutionary population genetics

iLASH has significant practical applications in both population genetics and personalized medicine.

In the area of ​​population genetics, the efficiency and accuracy of iLASH as an IBD estimation method is unparalleled in other types of analysis and has already been implemented by experts across the country.

“We can use iLASH for the first time on very large datasets to extract migration patterns and recent fine-scale ancestral structures,” said Shemirani.

In fact, Dr. Gillian Belbin, a researcher at the Icahn School of Medicine on Mount Sinai, used iLASH to analyze the UK Biobank, a genetic record of 500,000 people in the UK. Among other things, the study showed patterns of common ancestors with Nordic populations who inhabited areas that are historically points of contact for Viking populations.

Include diversity in the conversation

In medicine, iLASH is not only a powerful tool for researching the genetic origins of rare diseases, but also a promising way to improve our understanding of genetic diversity.

“Helping to discover these rare genetic origins for various diseases is just one of the benefits of such studies,” noted Ambite. “For example, you can help geneticists calibrate risk calculations for genetic diseases for various non-European population groups.”

Building on previous analyzes limited to white European populations, iLASH enables researchers to extend existing results to a wider range of populations.

“Including iLASH in genetic study pipelines such as polygenic risk assessments or disease mapping studies will help account for population structure and hidden relationships in the datasets,” said Shemirani. “This will help to partially solve the problems arising from the imbalance or lack of diversity in data sets and studies related to population demographics.”

Another benefit of iLASH is that it is less expensive compared to many other alternatives in medicine, which makes it a much more accessible option.

Go forward

While iLASH has shown great promise in various applications, there is still a lot to be done. Shemirani identified three particular improvements they are currently working on.

The biggest challenge is to create a distributed version of iLASH to meet the increasing demands for scalability. As the dataset grows day by day, iLASH needs the necessary resources to accurately and efficiently cover a significant amount of data.

In addition, Shemirani and Ambite also want to create a cloud service for iLASH, with ethical and security issues related to sensitive genetic data being an issue for this goal.

Finally, adding incremental analysis would enable iLASH to be used in commercial environments where new customers are constantly being added and need to be integrated into the existing data set.

While not all of us will find a lost twin or prominent relative, iLASH can help researchers glean critical genetic information that will aid relevant research in population genetics and medicine that will benefit all of us in the long term.

Source link


Comments are closed.