Massively parallel profiling reveals thousands of hidden viral proteins and conserved mechanisms
Scientists at the Broad Institute of MIT and Harvard have identified 4,208 previously unannotated open reading frames (ORFs) across 679 human-associated viral genomes using a high-throughput method called massively parallel ribosome profiling (MPRP).
Many viruses carry genetic instructions that elude detection by standard genome annotation tools. Beyond the well-documented protein-coding regions, viral genomes often encode noncanonical ORFs, short or irregular sequences that may not start with the standard ATG codon.
These elusive sequences are known to impact viral replication, host immune responses, and gene regulation, yet their detection requires experimental methods beyond current computational tools. Existing approaches, including traditional ribosome profiling, have been constrained by the need for pathogen-specific culturing systems and biosafety-level facilities.
In the study, "Pan-viral ORFs discovery using massively parallel ribosome profiling," in Science, researchers developed MPRP to identify translated regions by profiling synthetic viral sequences across thousands of oligonucleotides.
The analysis encompassed 20,170 oligonucleotides derived from 679 viral genomes, expressed in human HEK293T and A549 cell lines. Each 200-nucleotide fragment represented either wild-type or modified sequences targeting the 5′ untranslated regions (UTRs) and the beginnings of annotated coding regions.
Get free science updates with Science X Daily and Weekly Newsletters — to customize your preferences!
Viral expression was driven by both cap-dependent and IRES-dependent mechanisms under conditions that mimicked viral infection stress.
Reproducibility was high both within cell-type replicates (Pearson's R = 0.92 between two HEK293T runs) and between HEK293T and A549 cells (R = 0.89). MPRP detected 5,381 translated ORFs in total, of which 4,208 were previously unannotated non-canonical ORFs.
Ribosome-footprint profiles were strongly concentrated at the experimentally inferred initiation codons—including many non-AUG starts—and displayed the expected trinucleotide periodicity.
To pinpoint the hidden start locations, each gene was synthesized in three distinct 200-nt oligos: a wild-type 5′UTR+first-CDS fragment, an identical fragment with the annotated AUG mutated to GCC, and an upstream-extended fragment, so that a loss of footprint signal in the AUG→GCC mutant confirms the true start location.
Screening also uncovered hundreds of upstream ORFs (uORFs) with pronounced ribosome-stalling peaks consistent with translational repression of the downstream main coding sequences.
Comparison with natural viral infection datasets revealed a strong overlap in ribosome footprints, with synthetic profiling accurately reproducing translation patterns from viruses such as influenza A and hepatitis C. The technique also exposed an internal ORF in the +1 frame of the influenza M1 gene in multiple strains, including the H5N1 bird flu virus linked to recent livestock outbreaks.
Further analysis linked translated noncanonical ORFs to immune recognition. In reanalyzed immunopeptidome datasets, seven peptides derived from newly discovered ORFs were detected on class-I human leukocyte antigen (HLA-I) complexes. Six of these were predicted to bind effectively to host immune receptors. Including noncanonical ORFs increased the total number of mapped HCMV peptides by 7.4%, and several outperformed canonical proteins in peptide yield per length.
To explore how upstream ORFs might modulate viral protein expression, the researchers repeated experiments under conditions that elevate eIF2alpha phosphorylation, a known translational regulator during stress.
In cells treated with sodium arsenite (to induce phosphorylation of eIF2α), ribosomes bypassed many upstream open reading frames (uORFs) and shifted translation toward the main coding sequences. This behavior aligns with a well-characterized stress response in which phosphorylated eIF2α reduces the likelihood of translation initiation at inhibitory uORFs, allowing ribosomes to scan further downstream.
This is an important discovery, as it hints at a conserved mechanism by which viral gene expression may be temporally regulated through host stress pathways.
By encoding uORFs that respond to eIF2α phosphorylation, viruses may synchronize the translation of key proteins with specific phases of the host's cellular state, potentially delaying protein production until immune defenses are suppressed or replication machinery is fully available.
This suggests that uORFs are not merely incidental features of viral genomes but may serve as regulatory modules shaped by selective pressures to exploit host translational control systems.
Researchers conclude that MPRP offers a rapid, scalable method for identifying untranslated regions in viruses, including those that are difficult to culture or require high-containment facilities. By exposing unrecognized viral proteins, the method opens new directions for probing and understanding immune responses, regulating gene expression, and accelerating vaccine design.
"Within a few weeks, MPRP can detect ORFs in a newly discovered virus, independently of its culturing conditions," the authors write, further suggesting that "...incorporation of noncanonical ORFs into T cell assays has the potential to enhance their sensitivity and facilitate the identification of vaccine targets."
Written for you by our author , edited by , and fact-checked and reviewed by —this article is the result of careful human work. We rely on readers like you to keep independent science journalism alive. If this reporting matters to you, please consider a (especially monthly). You'll get an ad-free account as a thank-you.
More information: Weingarten-Gabbay et al, Pan-viral ORFs discovery using massively parallel ribosome profiling, Science (2025). .
Journal information: Science
© 2025 Science X Network