Inferentus Launching October 1, 2025!
Data analyses à la carte

Gene-centric metagenomics

Code: METAGC01

Short-read gene-centric metagenomics, including contig assembly, detection and functional annotation of protein-coding sequences, gene abundance profiles (KEGG), pairwise dissimilarity metrics, MDS and various visualizations. The client provides raw shotgun metagenomic sequences and sample metadata.

Log in to see availability and payment modalities.

▾ General introduction

Metagenomic sequencing is a well-established and powerful technique in modern microbiology, in which random DNA fragments obtained from a biological sample are sequenced and analyzed. These DNA fragments generally originate from many different taxa and from all genomic regions, thus offering a deep view of a microbial community. Metagenomics has been used to study microbial communities in virtually all ecosystems, ranging from the deep ocean and subsurface sediments, to soil, bioreactors and the human gut.

In gene-centric metagenomics, one specifically focuses on the detection and identification of protein-coding genes, or fragments of such genes, in order to determine the likely biochemical functions or metabolic potential of resident microbes. This information becomes very powerful when combined with experimental treatments, or surveys across space or time. For example, one can use gene-centric metagenomics to determine the proportions of microbes capable of photosynthesis, iron respiration or denitrification, or the prevalence of antibiotic resistance genes, separately in each sample. One could then examine whether these variables change across an environmental gradient, or between patient treatment groups. It is important to note that these "functional profiles" summarize the overall microbial community in each sample, and do not resolve which taxon happens to encode which biochemical function. For many studies this is plenty and sufficient information. If an identification of individual taxa involved with specific biochemical functions is needed, one can instead use genome-resolved metagenomics, which is a separate analysis that we also offer.

A typical gene-centric metagenomic study proceeds as follows:

  • Collection of small amounts (<1 g) of material from each sample by the researcher.
  • Extraction of DNA from each sample using an in-house of commercial kit. This step is sometimes outsourced to an academic or commercial service provider.
  • DNA fragment size selection, library preparation and sequencing of the fragments. This step is commonly performed by an academic or commercial service provider. The most widespread technology is short read Illumina sequencing, which yields large numbers of sequences around 150-300 bp long.
  • Sequencing ultimately yields a separate set of DNA sequences for each sample, ranging from thousands to billions of sequences per sample, with each sequence covering some random part of some genome. These data are commonly stored in fastq files, which are delivered by the sequencing service provider to the researcher.
  • Computational analysis of the sequences, including trimming and removal of poor quality (i.e., likely erroneous) sequences, assembly of sequences into longer contiguous segments ("contigs"), detection and annotation of protein-coding genes, and estimation of gene abundances based on reads mapped to contigs.
  • Statistical analysis, hypothesis testing and visualization of gene/functional profiles. This step generally incorporates additional sample metadata, such as information about treatment groups, chemical measurements at each site, disease symptoms in human subjects, and so on.

We are eager to help you with your metagenomic analysis. Simply configure the analysis to your preferences, upload your raw sequences and metadata, and we can handle it from there.

▸ Overview of provided analysis
▸ Input requirements
▸ Examples of data products
▸ Examples of generated figures
▸ Used 3rd party resources
▸ Relevant publications
▸ Price and billing