Abstract:With the current accumulation of metagenome data, it is possible to build an integrated platform for processing of rigorously selected metagenomic samples (also referred as “metagenomic communities” here) of interests. Any metagenomic samples could then be searched against this database to find the most similar sample(s). However, on one hand, current databases with a large number of metagenomic samples mostly serve as data repositories but not well annotated database, and only offer few functions for analysis. On the other hand, the few available methods to measure the similarity of metagenomic data could only compare a few pre-defined set of metagenome. It has long been intriguing scientists to effectively calculate similarities between microbial communities in a large repository, to examine how similar these samples are and to find the correlation of the meta-information of these samples. In this work we propose a novel system, Meta-Mesh, which includes a metagenomic database and its companion analysis platform that could systematically and efficiently analyze, compare and search similar metagenomic samples. In the database part, we have collected more than 7 000 high quality and well annotated metagenomic samples from the public domain and in-house facilities. The analysis platform supplies a list of online tools which could accept metagenomic samples, build taxonomical annotations, compare sample in multiple angle, and then search for similar samples against its database by a fast indexing strategy and scoring function. We also used case studies of “database search for identification” and “samples clustering based on similarity matrix” using human-associated habitat samples to demonstrate the performance of Meta-Mesh in metagenomic analysis. Therefore, Meta-Mesh would serve as a database and data analysis system to quickly parse and identify similar metagenomic samples from a large pool of well annotated samples.