Products Content

The NextBio platform offers an unparalleled opportunity to make discoveries by integrating information from multiple life sciences knowledge domains. These domains include high-throughput experimental data, text-based resources, extensive gene and compound annotations and knowledge-derived entity sets (biogroups). A common semantic framework and a powerful data correlation engine enable users to quickly find associations relevant to their areas of research. By placing their proprietary data alongside public domain data in NextBio, organizations have a powerful new means to more fully extract actionable knowledge. Below we provide more details of the content that makes NextBio such a powerful resource.

Experimental Data

NextBio's integrated database contains publicly available data from a variety of sources, including GEO, caBIG, and Array Express, among others. Results from over 5,000 RNA microarray studies have been added to the platform to date. For Professional and Enterprise users, content is enriched by data from other high-throughput technologies, such as Genome Wide Association (GWAS), ChIP-SEQ studies, ChIP-chip, microRNA, DNA Methylation, Copy-Number Variation and more. Genomic expression and other data entering NextBio undergoes rigorous quality control, passes through our semi-automated processing pipeline and is then tagged. All data within NextBio is correlated, and billions of pre-computations are calculated, enabling instant access to all studies and associations. All in all, NextBio currently contains over 35,000 study results, and over 6 billion correlated data points spanning multiple data types, diverse therapeutic areas and many disease and normal tissue states.

Literature, News, and Clinical Trials

NextBio differentiates itself from other biomedical literature search tools by casting a wider net and delivering results from three text-based resources — the biomedical literature, clinical trials and health & science news — presenting these results alongside matches found from experimental studies. Using a comprehensive ontological framework, NextBio searches abstracts from over 19 million citations including all full-text content available in PubMed Central, over 87,000 clinical trials listings, and aggregated science news pulled from hundreds of sources. NextBio subsequently displays the findings from its entity extraction algorithms so that users can further refine results by filtering based on associated terms. This enhanced search means NextBio users are enabled to answer a broader array of questions and keep more fully abreast of their field of study.


Biogroups are collections of genes or proteins sharing a specific biological function or membership in a common pathway. Having over 38,000 biogroups in NextBio to correlate against enables researchers to quickly and thoroughly interpret their own findings by identifying the commonalities among many of the genes or proteins comprising their own list of results thereby getting a better picture of the larger cellular events at play in their phenomena of interest. For example, by doing a bioset-to-biogroup analysis, an investigator might determine that their result list is enriched in genes regulated by a specific transcription factor or enriched in proteins comprising the extracellular matrix. Two primary sources of the canonical gene lists represented as biogroups include the Gene Ontology Consortium and the Broad Institute's MSigDB.


With NextBio, all synonyms for genes, diseases, or drugs are recognized, allowing all relevant information to be presented for each and every query. NextBio incorporates sets of ontologies that include 65,000 disease concepts, 8 million compound clusters, and thousands of tissues, among others. These ontologies come from such accredited sources as SNOMED-CT®, NCBI's Entrez Gene and PubChem to name a few.

Cross-species Search

Currently, NextBio supports seven model organisms, Homo sapiens, Mus musculus, Rattus norvegicus, Macaca mulatta, Caenorhabditis elegans, Drosophila melanogaster, and Saccharomyces cerevisiae. Based on our ontologies, NextBio has created a cross-species ortholog-based translation engine, enabling users to retrieve all relevant results for genes, regardless of organism. This provides the largest possible context to interpret gene function and activity in normal tissues or disease states.