FoodSeq

There’s a lot to learn from stool

Obtaining accurate, objective dietary data is challenging. Dietary surveys are burdensome and costly, are subject to recall and social desirability bias, and often fail to account for rare, culturally important food items. To overcome limitations in traditional dietary assessment the David Lab has pioneered the use of DNA sequencing to track human diet. FoodSeq is a technique that applies DNA metabarcoding technology to human stool to obtain objective information about food taxa consumed. We amplify and sequence plant and animal DNA from human stool using primers that target the chloroplast (trnL intron) and mitochondria (12S rRNA gene), respectively. We use human-targeting blocking primers to reduce detection of human mitochondrial DNA. We enumerate food species intake by mapping DNA sequences to a custom database of reference sequences from over 450 plant and animal species/varieties that we have curated.

With FoodSeq, The David Lab has derived objective dietary information from hundreds of stool samples collected for a variety of purposes both domestically and internationally, including microbiome research, cancer research, nutritional epidemiology, and food-as-medicine interventions. We have successfully used FoodSeq in infant stool samples [and in samples from elderly patients], and we have demonstrated its utility in populations that are difficult to survey using traditional dietary intake methodologies. FoodSeq has enumerated distinct US dietary patterns among various US populations, as well as dietary patterns unique to dozens of other countries. In some cases, where de-identified clinical and demographic data from participants is available, FoodSeq has revealed associations between certain dietary patterns and increased likelihood of disease.

FoodSeq has key conceptual advantages:

(1) It is robust to differences in socioeconomic status and literacy.

(2) It yields intuitive relationships with food species (rather than invisible nutrients), and yet is naturally standardized across languages and cultures using the universal language of DNA.

(3) It is non-invasive and economical: we estimate our current costs at $60/sample, which is cheaper than having an interviewer administer a traditional 24-hr dietary recall.

Would you like to use FoodSeq to measure dietary diversity in a set of individuals? Below is an overview of our pipeline to enable other labs to use this technology in their own research. We are also happy to explore potential collaborations. For more information, email us at foodseq@duke.edu to discuss ways in which FoodSeq may be able to assess dietary intake in your research context.

trnL-pipeline

This repository is designed to support assessment of plant dietary intake directly from human stool samples using DNA metabarcoding with the trnL-P6(UAA) marker1. It includes

An experimental protocol to amplify trnL from DNA extracted from human fecal samples (details in trnL-pipeline/protocol)
A computational pipeline to analyze high-throughput trnL amplicon sequencing data (in trnL-pipeline/pipeline)
A reference database of trnL sequences for food plants used by humans to assign trnL reads to a plant taxon (in trnL-pipeline/reference)
An R package of convenience functions written by the David lab to facilitate running steps in the computational pipeline and subsequent analysis (hosted at https://github.com/ammararuby/MButils)GitHub – ammararuby/MButils: Custom Function Container for DNA MetabarcodingCustom Function Container for DNA Metabarcoding. Contribute to ammararuby/MButils development by creating an account on GitHub.

These methods were developed by the David Lab at Duke University and accompany the manuscript “Diversity of plant DNA in stool is linked to dietary quality, age, and household income”2.

Recommendations for sequencing: A 2x150bp (300 cycle) sequencing run guarantees sufficient overlap for assembly. A 2x75bp (150 cycle) run enables assembly of trnL reads from all but one food plant in the reference: the water chestnut Eleocharis dulcis, which has an unusually long 154 bp trnL-P6 sequence (7 standard deviations outside the mean reference sequence length).