You don’t want to be a virus in Dr. David Ho’s lab. Pretty much every day since the COVID-19 pandemic began, Ho and his team have done nothing but find ways to stress SARS-CoV-2, the virus that causes the disease. His goal: pressure the virus relentlessly enough that it mutates to survive, so drug developers can understand how the virus might respond to new treatments. As a virologist with decades of experience learning about another obstinate virus, HIV, Ho knows just how to apply that mutation-generating stress, whether by starving the virus, bathing it in antibodies that disrupt its ability to infect cells, or bombarding it with enough promising antiviral drug candidates to make it blink. “We actually have more mutants [of SARS-CoV-2] selected in the lab than I suspect most labs do,” says Ho.
<strong>“We just need the will and leadership and especially the public to demand that the devastation of COVID-19 is something that shouldn’t have happened and that we never want to have happen again.”</strong> <strong> “Viruses mutate. It’s what they do.”</strong> [time-brightcove not-tgx=”true”]
As a result of that work, “we have basically been seeing viral evolution happen in front of our eyes for the past year and a half,” he says. Ho, director of the Aaron Diamond AIDS Research Center at Columbia University, is among the vanguard of researchers aggressively finding ways to dismantle SARS-CoV-2 from the inside out—by mining the virus’s genetic code for signs of weakness. The viral genome, it turns out, is an underutilized pool of useful information about the virus’s likes, dislikes and survival strategies, all coded in the 30,000 base pairs that make up its genome. Ho, who built his career around finding ways to control HIV with drugs, said then of understanding the disease, “It’s the virus, stupid.” Among the many lessons that will be taught in public-health classrooms and in genetics labs around the world after the COVID-19 pandemic recedes is a corollary: “It’s the genetics, stupid.”
One of the most powerful ways of fighting a pandemic caused by a never-before-seen virus is by decoding the microbial culprit’s genome. And doing so can, and should, be the top priority of public-health efforts going forward, so scientists can expose how the microbe works, to guide, in real time, the best ways for controlling and ultimately snuffing it out.
Ho’s group was among the first to identify a new mutation in SARS-CoV-2 that was responsible for a growing proportion of new infections diagnosed in New York City in February. He alerted city, state and U.S. Centers for Disease Control and Prevention (CDC) officials, as well as Dr. Anthony Fauci, the White House chief medical adviser on COVID-19. Ho also added these sequences to the public database GISAID (which stands for Global Initiative on Sharing All Influenza Data, reflecting its initial focus on flu), which collects disease-causing genetic codes from researchers around the world. “When we looked at the database, we found these mutations were already there,” Ho says. “It’s just that no one was scrutinizing or interrogating the database on a regular basis.”
That’s already changing, and GISAID is becoming the digital watering hole for public-health, infectious-disease and policy-making experts as concerns about new variants, and what they mean for immunity provided by vaccines, dominate public-health decisions about COVID-19. “People are looking at the database differently from this point on,” says Ho.
That’s also true of genomics more broadly. The COVID-19 pandemic is a hands-on workshop in how genetic information can help us more quickly control a pandemic. Relying on the SARS-CoV-2 code, first made public in January 2020, researchers at academic labs were ready to develop a diagnostic test for the virus within weeks (although regulators were slow to greenlight them). Teams at the National Institute of Allergy and Infectious Diseases and biotech company Moderna, as well as U.S.-based Pfizer and German biotech BioNTech, went to work to develop vaccines relying on the virus’s genetic material called mRNA and set new speed records in coming up with formulas ready to test in people. In under a year, they stunned medical experts when they showed their shots were 94% and 95% efficacious, respectively, in protecting people from symptoms of COVID-19, becoming the first COVID-19 vaccines authorized by the U.S. Food and Drug Administration, in December.
Knowing the virus’s genetic footprint, scientists at other pharmaceutical companies developed other types of vaccines, as well as drugs to treat infection. And a year into the pandemic, the same viral genetic blueprint is helping researchers predict how different patients’ immune systems will respond to infection and triage those who might be more prone to getting seriously ill so they can be treated more aggressively early on. It’s also enabling experts like Ho to track deviations in the genetic code that might enable the virus to slip past these drugs and vaccines.
“Genomics and genomic epidemiology have emerged as an incredibly powerful tool in fighting this pandemic,” says Francis deSouza, CEO of Illumina, which makes the genetic-sequencing machines that form the foundation of this field. “And they will be essential to how we fight future biological threats, whether it’s the next coronavirus or antimicrobial resistance or even bioterrorism.”
The COVID-19 response is among the first major pandemic scenarios to benefit from decades of investment in genetic sequencing and genomic science. “In public health, we are using sequencing as a really powerful tool to understand how diseases spread, to understand how diseases and infections behave,” says Dr. Jinlene Chan, deputy secretary for public-health services at the Maryland department of health. However, while the technology and the know-how to stay ahead of viruses like SARS-CoV-2 already exist, the resources to coordinate and exploit that knowledge and expertise remain spotty. Few state public-health departments, for example, were in a position to sequence the samples of SARS-CoV-2 from people in their jurisdiction when the virus made its debut in the winter of 2019–20. “All of the U.S. has been having challenges scaling up genomic surveillance,” says Giovanna Carpi, an assistant professor of biological sciences at Purdue University, whose lab assisted the Indiana state health department in its sequencing efforts. “The infrastructure is not present.”
In April 2021, about a year after the virus began its relentless invasion of the human population, Congress earmarked $1.7 billion for the CDC to ramp up its sequencing efforts to include more consistent genetic sweeps of COVID-19 samples in the country and to fund state and local laboratories to purchase the equipment and hire the staff needed to decode more disease-causing viruses and bacteria more regularly. The funding will seed a stronger genetic-based system of disease reconnaissance, which should pick up new pathogens as well as keep tabs on how quickly these microbes threaten the public’s health.
Currently, the CDC and its partners throughout the country are sequencing 20,000 to 30,000 SARS-CoV-2 genomes a week. It’s more than the 3,000 the agency was sampling at the start of the year, but, experts say, still far from adequate; on average, states are sequencing under 2% of their positive samples.
Researchers, and governments, also still need to figure out a better way to coordinate this effort around the globe. “We haven’t learned this much about any disease so quickly, I would say, in the history of science that I’m aware of,” says Sumit Chanda, the director and a professor of the immunity and pathogenesis program at Sanford Burnham Prebys Medical Discovery Institute. “Genomic technology allowed us to get here. But if we really want to get serious about preparing for the next pandemic, there needs to be a global command and control infrastructure, with transparency from all governments around the world. These viruses don’t know national boundaries, so it does not make sense to have a Balkanized response to the virus.”
“Viruses mutate. It’s what they do,” Fauci likes to remind us. Almost every time a virus makes new copies of itself, it makes a mistake—or a few. The majority of the time, these genetic typos don’t change how the virus acts. But sometimes, by chance, a genetic mutation gives the virus an advantage—it could help the virus infect cells more easily, for example, and thus spread more efficiently from one person to another. Or a mutation might help the virus evade the immune responses generated by vaccines. In such cases, those advantages help the mutant virus outcompete its peers, eventually dominating new frontiers.
That’s what happened soon after SARS-CoV-2 left Wuhan, China, in late 2019. Hitching rides on unsuspecting people traveling from that region to the rest of the world, the virus soon found entire continents of people to infect. That fertile infecting ground pushed the virus to make its first noteworthy mutation, one that helped it to both form a tighter handshake with the cells it was infecting and high-five as many other un-infected human cells as possible as well. “This first variant that spread quickly in early to mid-2020 was just a random mutation—these happen by chance all the time,” says Neville Sanjana, an assistant professor of biology at New York University and the New York Genome Center who has studied SARS-CoV-2’s first major genetic morphing, which experts dubbed D614G, in detail. “Now it’s virtually impossible to get COVID-19 without getting this mutation, so we’ve seen how natural selection and evolution can take a new mutation and bring it to dominance in a population.”
Like a magnet to metal, the spike proteins embedded all around the virus’s surface are attracted to a specific protein receptor lining the surface of human immune cells. How tightly the virus sticks to this receptor determines in part how infectious the virus is, and certain mutations can affect how closely the virus binds to human cells. Another such -mutation—which has since become the dominant strain in new cases of COVID-19 around the world—was discovered in December 2020, by scientists in the Kent region of southeastern U.K., largely because of the country’s highly coordinated genetic surveillance system.
Last November, despite a lockdown across the country, daily case numbers were inching upward in a few areas, including Kent. At the time, health labs were genetically decoding samples from around 5% of positive COVID-19 tests in the country. Researchers at Public Health England noticed that about half the sequences shared similar genetic aberrations, potentially representing a new variant. But they couldn’t be sure whether the changes represented a possible new variant or just random genetic noise.
That’s where the commitment that the U.K. government made in establishing a network of “lighthouse labs” proved invaluable. These local labs test tens of thousands of swabs daily using automated machines trained to detect three genetic signatures of SARS-CoV-2. Oddly, the lighthouse labs in Kent were only detecting two of those signatures in their samples. More complete genomic sequencing of those samples confirmed health officials’ worst fears: they did indeed have a new variant of the virus that had mutated from the D614G strain.
The next step was to understand what the mutation meant. Did the new variant—which they were calling the B.1.1.7—cause more severe symptoms? Did it spread more? For those answers, the results from the high-tech genetic sequencing had to be combined with old-fashioned, boots-on-the-ground (or, increasingly, fingers-on-the-keyboard) epidemiology to match the genetic information with individual cases of disease. Public-health experts connected electronic health records in the National Health Service of anyone who tested positive with contact-tracing information to figure out what proportion of people who came into close contact with someone infected with the B.1.1.7 variant then became infected with it themselves.
Again, the results validated their concerns: it appeared that many who came into contact with people infected with the B.1.1.7 variant also became infected with the same variant, suggesting, says Dr. Tom Frieden, president of Resolve to Save Lives and a former director of the CDC, “with a high degree of confidence that yes, it’s more infectious.”
Frieden points out that the takeaway from that experience should be that we need to invest both in tech and in people. “Despite the importance of exciting new technology like genetic sequencing, in the end, it comes down to people—do you have the people who are able to collect the data well, analyze it well, interpret it well, disseminate it well and use it well? That’s something you don’t get by spending a ton of money all at once. You get that by building a field.”
The U.K. is emerging as a model for how to construct such a field of genomic disease management. Within months, scientists were able to answer another question about the variant that was on everyone’s mind as more people were getting vaccinated against COVID-19: Would the immunity generated by the vaccines protect against B.1.1.7.? The answer, to the relief of public-health officials worldwide, was yes. They tested blood from vaccinated people against lab versions of the B.1.1.7 variant and found that the immune cells present in the vaccinated blood could still neutralize the virus.
That’s been the case with the other major mutations that have contributed to the handful of new variants emerging in the past year—including B.1.351, first identified in South Africa; as well as P.1, first identified in Brazil; and the B.1.617 variants emerging from India.
That said, it might be only a matter of time before variant strains do find a way to evade vaccine-based protection—the more the virus replicates among unvaccinated and unprotected people, the more chances that immunity-evading mutations can pop up. Ho and others are searching for patterns in the virus’s previous mutations to understand in what direction it might morph in the future to ensure that any new COVID-19 treatments are not just effective but also durable.
To create a dynamic map of how the virus is changing, researchers need a deep pool of virus to sequence. “Ideally what you want to do is surveillance sequencing,” says Chanda. “That means going out to hot zones, going into animals, going into the local population, and doing genomic sequencing to see what’s popping up.” The problem is, the public-health labs that would theoretically be doing this work in the U.S. don’t have the resources or expertise to do comprehensive genetic sequencing and read the raw genetic code.
The use of genetics to track disease can be traced to the 1990s, when researchers at the CDC used the most basic DNA-analysis methods to routinely test produce and other food products for bacteria in a national network called PulseNet. Frieden, who at the time was an Epidemic Intelligence Service officer—public health’s version of a disease detective—in New York City’s department of health, conducted one of the CDC’s first genomic infectious disease studies, on an outbreak of drug-resistant tuberculosis in the city. He had to send hundreds of samples to CDC technicians in Atlanta, since New York’s labs weren’t equipped with the proper genetic tools. “I had to fly down to the CDC, and the TB labs at the time were still based in Quonset huts from World War II that had leaking ceilings,” he says. “I had to put 350 images on a big, rutted wooden table and compare them by eye to see if they were similar. It took three weeks.”
By the beginning of the next decade, the human genome was fully sequenced and companies like Illumina had developed sequencing machines that could produce more accurate and detailed maps of any living thing’s genome. The CDC, along with state health departments, began sequencing tuberculosis bacteria and influenza viruses on a regular basis to monitor changes in the pathogens that could hint at more troublesome strains. But the system was still relatively feeble. In 2012, a frank review of the U.S.’s genetic-sequencing capabilities at the time revealed “that there are high school science labs that have more sophisticated genomic tools than the CDC does,” says Frieden, who by then was the agency’s director and commissioned the review. He lobbied Congress for funding, and in 2014 the federal government established the Advanced Molecular Detection program, which relies on high-throughput genetic sequencing to detect and manage outbreaks of infectious disease, with $30 million a year over five years.
That was a start, but the result is still, says Dr. Greg Armstrong, head of the CDC’s molecular detection program, “a very patchy system.” Much of the sequencing in the U.S. occurs in academic labs for research purposes, to better understand diseases from influenza to cancer, or in commercial labs working for pharmaceutical companies to develop smarter drugs to target tumors. “Public health in general has fallen behind in this area,” says Armstrong. “We’ve really been doing a lot of catch-up over the last several years.”
State health labs vary widely in how much genetic-sequencing equipment—and qualified personnel—they have on hand, forcing many to partner with local academic teams to get the job done. The Texas department of health began sequencing SARS-CoV-2 samples last June—but was able to complete only about 50 sequences a week, since there was only one public-health lab in the state capable of conducting such genomic work. So the agency turned to better-equipped academic institutes and private medical centers, including at Baylor College of Medicine, Texas A&M, and Houston Methodist, a hasty stopgap pattern seen in state after state last year.
To build more reliability into these sequencing efforts, the CDC in May 2020 launched the SARS-CoV-2 Sequencing for Public Health Emergency Response, Epidemiology and Surveillance consortium, a group of now more than 200 public-health, academic and commercial labs that agree to sequence COVID-19 samples and share the data on GISAID. Separately, to bolster the public-health contributions, the CDC also asked state and local health departments to send more COVID-19 specimens on a routine basis to the CDC for analysis—the starting ask was five samples every other week. That scaled up to a peak of 750 specimens a week from these public-health labs.
To improve on that, when President Biden took office in January, he proposed a massive investment in public health as part of his American Rescue Plan—and prioritized genomics. The $1.7 billion in funding to the CDC to support the genetic-sequencing network in the U.S. couldn’t come at a more critical time, as public-health experts face the next phase of the pandemic: keeping on top of any new variants and ensuring that vaccines continue to be effective. The bulk of funding will go toward sequencing machines and hiring bioinformatics experts to read and interpret raw genetic data in public-health labs throughout the country. Building that expertise is critical for avoiding the delay of shipping samples to central labs at the CDC.
The Biden plan will also create six Centers of Excellence in Genomic Epidemiology, to further solidify the currently haphazard partnerships between state health departments and academic groups. Those relationships will be essential, says Ho, since “the top-ranked sequencing experts, bioinformatics experts, are largely in academia, and I suspect many may not want to leave their posts.” Some funding will help to create a uniform data system so public-health labs can share and analyze genetic-sequencing information seamlessly.
Private labs will also play a role in expanding the breadth of the sequencing network. Last month, the CDC announced a partnership with North Carolina–based Mako Medical to sequence about 5,000 positive samples of SARS-CoV-2 a week, taken randomly from Mako’s clientele of hospitals, long-term-care facilities, pharmaceutical companies, workplaces and public-health labs in 43 states. With partners like Mako, the CDC is ramping up its ability to sequence any positive samples from places like airports, since travelers are a common vector for introducing new variants of the virus into the country. That strategy is proving useful in the U.K., where sequencing of positive samples from international travelers began in March 2021, with help from commercial labs. “In some ways the sequencing from airports is acting as an early radar system to find out what new variants are spreading around the world,” says Dr. Gareth Williams, a co-founder and the medical director of Oncologica, one of the labs working on the project.
How well other countries learn from the U.K.’s efforts will likely shape the world’s response to the inevitable next eruption of infectious disease. Embedding genomic techniques into the -public-health arsenal won’t, on its own, prevent pandemics, but together with proven measures like hand hygiene, social distancing and mask-wearing, it could help contain them and minimize their toll on human health. “We got pretty lucky that [COVID-19] vaccines work as incredibly well as they do,” says Sanford Burnham’s Chanda. “But we can’t just rely on luck. We need to make a global commitment and come up with an organization that has some teeth and has some funding whose job it is to survey, track and share genetic information. We have the tools to do it—we just need the will and leadership and especially the public to demand that the devastation of COVID-19 is something that shouldn’t have happened and that we never want to have happen again.”
—With reporting by Madeline Roache and Simmone Shah