Cette formation est dédiée à l’analyse de données métagénomiques
procaryotes de type «shotgun» issues de la technologie de séquençage
Illumina. Nous présenterons les étapes bioinformatiques nécessaires pour
nettoyer les données brutes,les caractériserd’un point de
vuetaxonomique, et les comparer selon leur contenuen mots(k-mer). Nous
aborderons ensuite les différentes stratégies à employer pour obtenir
des comptagessur des gènes prédits. Enfin nous présenterons quelques
outils pour obtenir une annotationfonctionnelle des échantillons.A
l’issue des 2 jours de formation, les stagiaires connaîtront le
périmètre, les avantages et limites des analyses de données de
séquençageshotgun. Ils seront capables d’utiliser les outils
présentéssur les jeux de données de la formation. Ilsseront capables
d’identifier les outilset méthodes adaptées au cadre de leurs
analyses.
Altschul, Stephen F, Warren Gish, Webb Miller, Eugene W Myers, and David
J Lipman. 1990. “Basic Local Alignment Search Tool.”Journal of Molecular Biology 215 (3): 403–10.
Anders, Simon, Paul Theodor Pyl, and Wolfgang Huber. 2015.
“HTSeq—a Python Framework to Work with High-Throughput Sequencing
Data.”Bioinformatics 31 (2): 166–69.
Bankevich, Anton, Sergey Nurk, Dmitry Antipov, Alexey A Gurevich,
Mikhail Dvorkin, Alexander S Kulikov, Valery M Lesin, et al. 2012.
“SPAdes: A New Genome Assembly Algorithm and Its Applications to
Single-Cell Sequencing.”Journal of Computational
Biology 19 (5): 455–77.
Bengtsson-Palme, Johan, Martin Ryberg, Martin Hartmann, Sara Branco,
Zheng Wang, Anna Godhe, Pierre De Wit, et al. 2013. “Improved
Software Detection and Extraction of Its1 and ITS 2 from Ribosomal ITS
Sequences of Fungi and Other Eukaryotes for Analysis of Environmental
Sequencing Data.”Methods in Ecology and Evolution 4
(10): 914–19.
Benoit, Gaëtan, Mahendra Mariadassou, Stéphane Robin, Sophie Schbath,
Pierre Peterlongo, and Claire Lemaitre. 2019b. “SimkaMin: fast and resource frugal de novo comparative
metagenomics.”Bioinformatics, September. https://doi.org/10.1093/bioinformatics/btz685.
———. 2019a. “SimkaMin: Fast and Resource Frugal de
Novo Comparative Metagenomics.” Edited by John Hancock.
Bioinformatics, September. https://doi.org/10.1093/bioinformatics/btz685.
Benoit, Gaëtan, Pierre Peterlongo, Mahendra Mariadassou, Erwan Drezen,
Sophie Schbath, Dominique Lavenier, and Claire Lemaitre. 2016.
“Multiple Comparative Metagenomics Using Multiset k-Mer
Counting.”PeerJ Computer Science 2: e94.
Boyd, Joel A, Ben J Woodcroft, and Gene W Tyson. 2018.
“GraftM: A Tool for Scalable, Phylogenetically
Informed Classification of Genes Within Metagenomes.”Nucleic
Acids Research 46 (10): e59–59. https://doi.org/10.1093/nar/gky174.
Buchfink, Benjamin, Chao Xie, and Daniel H Huson. 2014. “Fast and
Sensitive Protein Alignment Using DIAMOND.”Nature Methods 12 (1): 59–60. https://doi.org/10.1038/nmeth.3176.
Callahan, Benjamin J, Paul J McMurdie, Michael J Rosen, Andrew W Han,
Amy Jo A Johnson, and Susan P Holmes. 2016. “Dada2:
High-Resolution Sample Inference from Illumina Amplicon Data.”Nature Methods 13 (7): 581.
Cantalapiedra, Carlos P, Ana Hernández-Plaza, Ivica Letunic, Peer Bork,
and Jaime Huerta-Cepas. 2021. “eggNOG-Mapper V2: Functional
Annotation, Orthology Assignments, and Domain Prediction at the
Metagenomic Scale.”Molecular Biology and Evolution 38
(12): 5825–29.
Caporaso, J Gregory, Kyle Bittinger, Frederic D Bushman, Todd Z
DeSantis, Gary L Andersen, and Rob Knight. 2009. “PyNAST: A
Flexible Tool for Aligning Sequences to a Template Alignment.”Bioinformatics 26 (2): 266–67.
Eren, A. Murat, Ozcan C. Esen, Christopher Quince, Joseph H. Vineis,
Hilary G. Morrison, Mitchell L. Sogin, and Tom O. Delmont. 2015.
“Anvi’o: An Advanced Analysis and Visualization Platform for
‘Omics Data.”PeerJ 3 (October): e1319. https://doi.org/10.7717/peerj.1319.
Escudié, Frédéric, Lucas Auer, Maria Bernard, Mahendra Mariadassou,
Laurent Cauquil, Katia Vidal, Sarah Maman, Guillermina Hernandez-Raquet,
Sylvie Combes, and Géraldine Pascal. 2017. “FROGS: Find, Rapidly,
OTUs with Galaxy Solution.”Bioinformatics 34 (8):
1287–94.
Ewels, Philip, Måns Magnusson, Sverker Lundin, and Max Käller. 2016.
“MultiQC: Summarize Analysis Results for Multiple Tools and
Samples in a Single Report.”Bioinformatics 32 (19):
3047–48.
Fu, Limin, Beifang Niu, Zhengwei Zhu, Sitao Wu, and Weizhong Li. 2012.
“CD-HIT: Accelerated for Clustering the
Next-Generation Sequencing Data.”Bioinformatics 28
(23): 3150–52. https://doi.org/10.1093/bioinformatics/bts565.
Gourlé, Hadrien, Oskar Karlsson-Lindsjö, Juliette Hayer, and Erik
Bongcam-Rudloff. 2019. “Simulating Illumina Metagenomic Data with
InSilicoSeq.”Bioinformatics 35 (3): 521–22.
Huerta-Cepas, Jaime, Kristoffer Forslund, Luis Pedro Coelho, Damian
Szklarczyk, Lars Juhl Jensen, Christian von Mering, and Peer Bork. 2017.
“Fast Genome-Wide Functional Annotation
through Orthology Assignment by eggNOG-Mapper.”Molecular Biology and Evolution 34 (8): 2115–22. https://doi.org/10.1093/molbev/msx148.
Hyatt, Doug, Gwo-Liang Chen, Philip F LoCascio, Miriam L Land, Frank W
Larimer, and Loren J Hauser. 2010. “Prodigal: Prokaryotic Gene
Recognition and Translation Initiation Site Identification.”BMC Bioinformatics 11 (1): 119.
Joshi, NA, and JN Fass. 2011. “Sickle: A Sliding-Window, Adaptive,
Quality-Based Trimming Tool for FastQ Files.”
Kanehisa, Minoru, Yoko Sato, and Kanae Morishima. 2016.
“BlastKOALA and GhostKOALA: KEGG Tools for Functional
Characterization of Genome and Metagenome Sequences.”Journal
of Molecular Biology 428 (4): 726–31.
Kang, Dongwan D, Jeff Froula, Rob Egan, and Zhong Wang. 2015.
“MetaBAT, an Efficient Tool for Accurately Reconstructing Single
Genomes from Complex Microbial Communities.”PeerJ 3:
e1165.
Kieser, Silas, Joseph Brown, Evgeny M. Zdobnov, Mirko Trajkovski, and
Lee Ann McCue. 2020. “ATLAS: A Snakemake Workflow for
Assembly, Annotation, and Genomic Binning of Metagenome Sequence
Data.”BMC Bioinformatics 21 (1). https://doi.org/10.1186/s12859-020-03585-4.
Kolmogorov, Mikhail, Mikhail Rayko, Jeffrey Yuan, Evgeny Polevikov, and
Pavel Pevzner. 2019. “metaFlye:
Scalable Long-Read Metagenome Assembly Using Repeat Graphs,” May.
https://doi.org/10.1101/637637.
Kopylova, Evguenia, Laurent Noé, and Hélène Touzet. 2012.
“SortMeRNA: Fast and Accurate Filtering of Ribosomal RNAs in
Metatranscriptomic Data.”Bioinformatics 28 (24):
3211–17.
Köster, Johannes, and Sven Rahmann. 2012. “Snakemake—a Scalable
Bioinformatics Workflow Engine.”Bioinformatics 28 (19):
2520–22.
Lagesen, Karin, Peter Hallin, Einar Andreas Rødland, Hans-Henrik
Stærfeldt, Torbjørn Rognes, and David W Ussery. 2007. “RNAmmer:
Consistent and Rapid Annotation of Ribosomal RNA Genes.”Nucleic Acids Research 35 (9): 3100–3108.
Laslett, Dean, and Bjorn Canback. 2004. “ARAGORN, a Program to
Detect tRNA Genes and tmRNA Genes in Nucleotide Sequences.”Nucleic Acids Research 32 (1): 11–16.
Li, Dinghua, Chi-Man Liu, Ruibang Luo, Kunihiko Sadakane, and Tak-Wah
Lam. 2015. “MEGAHIT: An Ultra-Fast Single-Node Solution for Large
and Complex Metagenomics Assembly via Succinct de Bruijn Graph.”Bioinformatics 31 (10): 1674–76.
Li, Heng. 2013. “Aligning Sequence Reads, Clone Sequences and
Assembly Contigs with BWA-MEM.”arXiv Preprint
arXiv:1303.3997.
Li, Heng, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan, Nils
Homer, Gabor Marth, Goncalo Abecasis, and Richard Durbin. 2009.
“The Sequence Alignment/Map Format and SAMtools.”Bioinformatics 25 (16): 2078–79.
Magoč, Tanja, and Steven L Salzberg. 2011. “FLASH: Fast Length
Adjustment of Short Reads to Improve Genome Assemblies.”Bioinformatics 27 (21): 2957–63.
Mahé, Frédéric, Torbjørn Rognes, Christopher Quince, Colomban de Vargas,
and Micah Dunthorn. 2015. “Swarm V2: Highly-Scalable and
High-Resolution Amplicon Clustering.”PeerJ 3: e1420.
McKenna, Aaron, Matthew Hanna, Eric Banks, Andrey Sivachenko, Kristian
Cibulskis, Andrew Kernytsky, Kiran Garimella, et al. 2010. “The
Genome Analysis Toolkit: A MapReduce Framework for Analyzing
Next-Generation DNA Sequencing Data.”Genome Research 20
(9): 1297–1303.
McMurdie, Paul J, and Susan Holmes. 2013. “Phyloseq: An r Package
for Reproducible Interactive Analysis and Graphics of Microbiome Census
Data.”PloS One 8 (4): e61217.
Menzel, Peter, Kim Lee Ng, and Anders Krogh. 2016. “Fast and
Sensitive Taxonomic Classification for Metagenomics with Kaiju.”Nature Communications 7: 11257.
Meola, Marco, Etienne Rifa, Noam Shani, Celine Delbes, Helene Berthoud,
and Christophe Chassard. 2018. “DAIRYdb: A Manually Curated Gold
Standard Reference Database for Improved Taxonomy Annotation of 16s rRNA
Gene Sequences from Dairy Products.”bioRxiv, 386151.
Mikheenko, Alla, Andrey Prjibelski, Vladislav Saveliev, Dmitry Antipov,
and Alexey Gurevich. 2018. “Versatile genome
assembly evaluation with QUAST-LG.”Bioinformatics 34 (13): i142–50. https://doi.org/10.1093/bioinformatics/bty266.
Mikheenko, Alla, Vladislav Saveliev, and Alexey Gurevich. 2015.
“MetaQUAST: evaluation of metagenome
assemblies.”Bioinformatics 32 (7): 1088–90. https://doi.org/10.1093/bioinformatics/btv697.
Nawrocki, Eric P, Diana L Kolbe, and Sean R Eddy. 2009. “Infernal
1.0: Inference of RNA Alignments.”Bioinformatics 25
(10): 1335–37.
Nilsson, Rolf Henrik, Karl-Henrik Larsson, Andy F S Taylor, Johan
Bengtsson-Palme, Thomas S Jeppesen, Dmitry Schigel, Peter Kennedy, et
al. 2018. “The UNITE Database for Molecular Identification of
Fungi: Handling Dark Taxa and Parallel Taxonomic
Classifications.”Nucleic Acids Research 47 (D1):
D259–64.
Okonechnikov, Konstantin, Ana Conesa, and Fernando Garcı́a-Alcalde. 2015.
“Qualimap 2: Advanced Multi-Sample Quality Control for
High-Throughput Sequencing Data.”Bioinformatics 32 (2):
292–94.
Ondov, Brian D, Nicholas H Bergman, and Adam M Phillippy. 2011.
“Interactive Metagenomic Visualization in a Web Browser.”BMC Bioinformatics 12 (1): 385.
Parks, Donovan H, Michael Imelfort, Connor T Skennerton, Philip
Hugenholtz, and Gene W Tyson. 2015. “CheckM: Assessing the Quality
of Microbial Genomes Recovered from Isolates, Single Cells, and
Metagenomes.”Genome Research 25 (7): 1043–55.
Petersen, Thomas Nordahl, Søren Brunak, Gunnar Von Heijne, and Henrik
Nielsen. 2011. “SignalP 4.0: Discriminating Signal Peptides from
Transmembrane Regions.”Nature Methods 8 (10): 785–86.
Poplin, Ryan, Valentin Ruano-Rubio, Mark A DePristo, Tim J Fennell,
Mauricio O Carneiro, Geraldine A Van der Auwera, David E Kling, et al.
2018. “Scaling Accurate Genetic Variant Discovery to Tens of
Thousands of Samples.”BioRxiv, 201178.
Price, Morgan N, Paramvir S Dehal, and Adam P Arkin. 2010.
“FastTree 2–Approximately Maximum-Likelihood Trees for Large
Alignments.”PloS One 5 (3): e9490.
Quast, Christian, Elmar Pruesse, Pelin Yilmaz, Jan Gerken, Timmy
Schweer, Pablo Yarza, Jörg Peplies, and Frank Oliver Glöckner. 2012.
“The SILVA Ribosomal RNA Gene Database Project: Improved Data
Processing and Web-Based Tools.”Nucleic Acids Research
41 (D1): D590–96.
Quinlan, Aaron R, and Ira M Hall. 2010. “BEDTools: A Flexible
Suite of Utilities for Comparing Genomic Features.”Bioinformatics 26 (6): 841–42.
Rognes, Torbjørn, Tomáš Flouri, Ben Nichols, Christopher Quince, and
Frédéric Mahé. 2016. “VSEARCH: A Versatile Open Source Tool for
Metagenomics.”PeerJ 4: e2584.
Sevim, Volkan, Juna Lee, Robert Egan, Alicia Clum, Hope Hundley, Janey
Lee, R Craig Everroad, et al. 2019. “Shotgun Metagenome Data of a
Defined Mock Community Using Oxford Nanopore, PacBio and Illumina
Technologies.”Scientific Data 6 (1): 1–9.
Shen, Wei, Shuai Le, Yan Li, and Fuquan Hu. 2016. “SeqKit: A
Cross-Platform and Ultrafast Toolkit for FASTA/q File
Manipulation.”PloS One 11 (10): e0163962.
Steinegger, Martin, Milot Mirdita, and Johannes Söding. 2019.
“Protein-Level Assembly Increases Protein Sequence Recovery from
Metagenomic Samples Manyfold.”Nature Methods 16 (7):
603–6. https://doi.org/10.1038/s41592-019-0437-4.
Steinegger, Martin, and Johannes Söding. 2018. “Clustering Huge
Protein Sequence Sets in Linear Time.”Nature
Communications 9 (1). https://doi.org/10.1038/s41467-018-04964-5.
Thorvaldsdóttir, Helga, James T Robinson, and Jill P Mesirov. 2013.
“Integrative Genomics Viewer (IGV): High-Performance Genomics Data
Visualization and Exploration.”Briefings in
Bioinformatics 14 (2): 178–92.
Vollmers, John, Sandra Wiegand, and Anne-Kristin Kaster. 2017.
“Comparing and Evaluating Metagenome Assembly Tools from a
Microbiologist’s Perspective - Not Only Size Matters!”PLOS
ONE 12 (1): 1–31. https://doi.org/10.1371/journal.pone.0169662.
Weiss, Stéphanie, Franck Samson, David Navarro, and Serge Casaregola.
2013. “YeastIP: A Database for Identification and Phylogeny of
Saccharomycotina Yeasts.”FEMS Yeast Research 13 (1):
117–25.
Wheeler, David L, Tanya Barrett, Dennis A Benson, Stephen H Bryant,
Kathi Canese, Vyacheslav Chetvernin, Deanna M Church, et al. 2006.
“Database Resources of the National Center for Biotechnology
Information.”Nucleic Acids Research 35 (suppl_1):
D5–12.
Zhang, Jiajie, Kassian Kobert, Tomáš Flouri, and Alexandros Stamatakis.
2013. “PEAR: A Fast and Accurate Illumina Paired-End reAd
mergeR.”Bioinformatics 30 (5): 614–20.
Zhou, Yanqing, Yaru Chen, Shifu Chen, and Jia Gu. 2018. “Fastp: An
Ultra-Fast All-in-One FASTQ Preprocessor.”Bioinformatics 34 (17): i884–90. https://doi.org/10.1093/bioinformatics/bty560.
References
Reuse
Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".