Vietnam’s successes in sequencing coronavirus genomes without int’l reference genome

Vietnamese scientists at the Vietnam Academy of Science and Technology (VAST) have successfully built a technological process for sequencing the SARS-CoV-2 virus using the new generation sequencing machine PacBio Sequel.

Operating the new generation PacBio Sequel gene sequencing system. (Photo: NDO/Thanh Quy)
Operating the new generation PacBio Sequel gene sequencing system. (Photo: NDO/Thanh Quy)

This is the result of a project on whole-genome sequencing and de novo assembly of the SARS-CoV-2 virus that causes COVID-19, carried out by the VAST’s Institute of Biotechnology (IBT) to support the fight against COVID-19 in Vietnam.

PacBio Sequel is a state-of-art new-generation unique gene sequencing system in Vietnam, allowing the sequencing of long DNA fragments with the shortest time and highest accuracy compared to other devices available in the country.

To carry out the task, the IBT has coordinated with the Pasteur Institute in Ho Chi Minh City and the National Institute of Hygiene and Epidemiology (NIHE) to develop a technical process to sequence the entire SARS-CoV-2 virus genome using PacBio Sequel sequencing technology.

The mission was accomplished within a year, with the successful construction of a six-step RNA virus genome sequencing process carried out within only about 48 hours, while other methods take about 72 hours.

The study analysed four virus samples, one of which was isolated by the HCMC Pasteur Institute (isolated from a Vietnamese patient returning from Pennsylvania, USA, and entering HCMC on March 17, 2020), and the remaining three virus samples provided by NIHE, originating from the outbreak at Bach Mai Hospital collected on March 25 and 28, 2020.

As a result, the whole genome sequence of four strains of SARS-CoV-2 with a length of over 29,500 nucleotide/genome were sequenced, with genome assembly for a continuous sequence (contig) reaching an accuracy of 99.99%.

It marks the first time in Vietnam, the genome assembly process does not depend on the international reference genome. Meanwhile, other methods used at research institutes in the nation at present must be based on world reference gene sequences.

Analysis based on sequences uploaded to the global influenza data sharing database (GISAID) up to August 25, 2020, showed the presence of six virus groups in Vietnam, including L, S, V, G, GR and GH.

The analysis results also show that the strain provided by the Pasteur Institute in HCMC is in the GH group, circulating mainly in North America. The three strains provided by the NIHE are in the GR group, indicating European origin, with the possibility of transmission from a large entry wave in Hanoi in early March 2020.

The results of comparing the genome sequences of the virus strains circulating in Vietnam until April 1, 2021, show that there are currently eight groups of SARS-CoV-2 in Vietnam according to the GISAID classification, with dozens of different variations of S, L, V, G, GR, GH, GV and GRY.

According to the IBT, the genome sequencing data from the above research results will contribute to determining the origin of the coronavirus and the number of sources of infection within the outbreaks, which provides a scientific basis and important information in formulating strategies and plans to effectively prevent and control the spread of the virus in the community.

The successful application of the PacBio Sequel system's long-sequencing technique to the SARS-CoV-2 virus opens up the possibility of rapid and accurate viral genome sequencing without relying on international reference genome sequences. This allows Vietnamese scientists to sequence new viral pathogens in the future without needing a reference genome.

After this success, the VAST is ready to cooperate with other medical institutions in sequencing the genome of the SARS-CoV-2 virus on a large scale in urgent cases.