SeqHub
CompanyNovember 20, 2025SeqHub Team5 min read

Partnering with JGI's Antonio Camargo to Make Hyper-Prevalent Gut Microbes Discoverable

We're excited to share that Antonio Camargo and the Joint Genome Institute (JGI) have made their latest dataset — UHGV-HyperPrevalent — publicly available on SeqHub:

🔗 seqhub.org/apcamargo/collections/uhgv-hyperprevalent

This dataset accompanies their recent preprint, A genomic atlas of the human gut virome elucidates genetic factors shaping host interactions, which identifies a limited set of viruses of bacteria, called bacteriophages, that are widespread across the microbiomes of diverse human populations.

A continuing collaboration

SeqHub first partnered with JGI to build the OMG and OG datasets (the foundation for Tatta Bio's genomic language models), enabling scientists to search and reuse thousands of curated microbial genomes. Building on that collaboration, the new UHGV-HyperPrevalent release makes it even easier to explore how population-level diversity shapes the functional landscape of the human gut.

About the dataset

The UHGV-HyperPrevalent collection contains proteins encoded by 328 phage genomes that are hyper-prevalent across global human gut metagenomes.

These genomes represent a focused subset of the Unified Human Gastrointestinal Virome (UHGV), a comprehensive catalogue of viral genomes assembled from gut metagenomes spanning diverse human populations.

Unlike most phages, these hyper-prevalent viruses were detected in metagenomic samples from multiple countries and host backgrounds. In the associated study, they were shown to have greater host-switching capacity, allowing them to adapt to different ecological niches and persist across microbiomes with distinct compositions. Much of this capacity is determined by phage-encoded proteins, underscoring how reliable protein annotations can help us understanding large-scale ecological observations, such as host range and global prevalence.

Why this matters

Most references of the human gut microbiome still capture only a fraction of its diversity. By highlighting hundreds of high-prevalence yet under-represented phages, this work expands our understanding of viral ecology, host interactions, and the unseen genetic diversity shaping gut communities.

Equally important, this release advances open science — pairing a preprint with a fully reusable dataset. By making the underlying genomes and annotations available on SeqHub, the JGI team ensures that others can search, reuse, and build upon their work immediately. It's a model for how transparent data sharing accelerates discovery and strengthens reproducibility across microbiome research.

Explore the data

You can now:

  • Search, filter, and annotate relevant proteins and whole genomes
  • View predicted structures, gene neighborhoods, and additional relevant literature
  • Reuse the data for your own comparative or functional analyses

Access the Dataset and Research

Explore 328 hyper-prevalent phage genomes from the human gut virome and read the accompanying preprint.

Ready to share your datasets?

Join JGI and a growing community of scientists using SeqHub to make their sequence data more discoverable and impactful.

Join Discord