Filling genomics’ biggest gap

By Huawei

How data analytics technology is enabling genome sequencing.

With only one case in history, RPI Deficiency is the rarest genetic disease in the world. It affects nerve cells, causing weakness, numbness, and pain in the hands and feet. The disorder also affects important body functions such as digestion, urination, and circulation.

Identifying rare diseases such as RPI Deficiency is not so straightforward, but genome sequencing can help. The approach can uncover millions of other mutations, so doctors understand and treat a wide range of diseases better.

Healthcare facilities need to be able to handle large amounts of data to make this process possible.


Huawei shares how its high-performance data analytics solution helps to advance genome sequencing.

Uses of genome sequencing


Genome sequencing involves breaking DNA into multiple pieces and arranging the fragments in sequences. Medical experts use this data to understand the genetic composition of viruses, for example.

This method was key in supporting public health efforts during the pandemic. Understanding the genetic code of the coronavirus helped scientists to design effective vaccines.

Researchers can also compare these individual sequences to identify anomalies in a person’s genetic code. Variants in DNA may pose health concerns as they have been linked to several types of cancer and developmental disorders, American Scientist wrote.

Huawei is working with West China Hospital to study how behavioural and environmental factors affect genes. This approach is known as multiomics, and can give doctors a better understanding of how molecular changes contribute to development and disease.

They can then use this information to customise medical treatment. "The innovation will accelerate the widespread application of…big data in precision medicine,” Dr Yu Haopeng, a data scientist at West China Hospital said.

In 2020, Huawei partnered with the hospital to analyse the genes of 100,000 Chinese patients with rare diseases. The project allowed the hospital to create personalised medicine, pinpoint gene mutations, and predict a person’s susceptibility to disease and drug response.

Challenges in genome sequencing


Huawei’s High-Performance Data Analytics Solution helped to shorten the hospital’s genome sequencing analysis time from 24 hours to just seven minutes, but achieving this was no easy feat. The success of this method hinges on three main challenges.

First, when data from each step of the genome sequencing process is stored in different places, scientists risk losing data when transferring them. A network interruption, for instance, would spell trouble.

This could decrease analysis efficiency. Allowing users to access the same piece of data over the different stages of genome sequencing would be ideal.

Second, genome sequencing takes up lots of storage space. Huawei estimates that a person’s genome data is about 100 GB large. This is equivalent to around 208 days of 24/7 browsing, according to Broadband Wherever. Sequencing efforts would have to cater to the hundreds of thousands of patients a hospital serves.

Third, analysing all this data requires intensive computing resources. This process involves mass data transmission, which imposes huge pressures on network bandwidths.

One solution to three problems


To overcome these challenges, genome sequencing requires a platform that can seamlessly integrate the data from each step, support high network bandwidth and computing power, as well as store large amounts of data. High-performance data analytics may be the solution to all three.

Huawei’s powerful computing system allows users to analyse data across the different processes of genome sequencing without the need for migration.

This is thanks to essential technologies like lock-free data structures. Simply put, if one process is interrupted in the middle of genome sequencing, other processes are not locked out from doing their jobs and can proceed as usual. There is no need to put data on hold while waiting for all processes to reboot.

Data migration is also no longer necessary with multi-granularity disk space management. This is an algorithm that pre-allocates disk space to improve the performance of storage devices. Instead of figuring out where data should reside and how it should travel from one process to another, technology has got it all figured out.

Stable high bandwidth and higher storage capacity have also made quicker genome analysis and testing possible.

Huawei says that a storage system should deliver at least six GB/s of data to ensure its accuracy, completeness, and consistency. Huawei’s High Performance Data Analytics Solution offers a 32 GB/s bandwidth, which is more than five times the minimal level designated.

It used to take 13 years to fully sequence the genome of a single person. Now, with the help of High Performance Data Analytics, labs can achieve this within a day.

But supporting such a large database requires high power output, and this is no small fiscal undertaking. The five-year cost of owning Huawei’s solution is 61 per cent lower than that of an average computer data storage server, ESG wrote.

Developments in the genome sequencing industry pose new challenges to high-end storage, but high-performance data analytics can overcome them. A storage solution can help hospitals to develop treatments faster for better patient recovery — which is the ultimate aim of healthcare.

For more information on Huawei High-Performance Data Analytics and how Huawei creates new value together with partners, please click here. Pictures courtesy of Huawei.