FAQ

General FAQs

How do I query an app?

Correlation Engine applications ("apps") can be accessed from the top of every page within Correlation Engine. This area, called the app menu, allows you to query any app whenever you need it.


To query an app:

  1. Go to the top of any page within Correlation Engine.
  2. Enter your term in the query field (under the app menu).
  3. Click the icon for the app you want to query.

If you're already viewing results for a query term, you can use that same query term to query a different app. Just click the icon for that app.

How do I query an app with a bioset?

This process works a little differently.


To query an app with a bioset:

  1. Find a bioset of interest. (You can find biosets you've imported in the My Studies page, expand results from other apps, or browse the Curated Studies app page.)
  2. Click the mini-QuickView icon QuickView icon next to the bioset's name.
  3. Click the icon for the app you want to query.

You'll also see app icons in results pages: next to a bioset name, or in a menu when you move the cursor over an acceptable query term (e.g., the name of a gene). In these cases, just click the app icon to query it with that particular bioset or query term.

How do I query an app with a sequence region?

To query an app with a sequence region:

  1. Click the "Search sequence regions" link next to the query button, under the app menu.
  2. Choose an organism and a chromosome from the drop-down menus.
  3. Enter the start and stop coordinates. (The region must be less than 10 million bases.)
  4. Click the icon for the app you want to query.

If you want to query with a regular query term, click the "Go back to main search" link.


You can query QuickView, Curated Studies, Disease Atlas, Pharmaco Atlas, Knockdown Atlas, and Genome Browser with a sequence region. If you query Genome Browser with a sequence region, Genome Browser will launch in a new window, focused on that region.

How does Correlation Engine determine the content of query results?

Correlation Engine applications ("apps") use public and private experimental data as their sources—not PubMed or other scientific literature sources. This means that you may potentially discover information that was absent or not well-supported in the original research literature.


For example: If you query Disease Atlas with a gene, the results will display diseases highly correlated with that gene. These results come from experimental studies in which a significant result was found for that gene.


Correlation Engine has scored and ranked all listed diseases using factors such as: study tags, the gene's significance within a study, and the total number of studies for a disease in which that gene was measured.

What are Correlation Engine's data sources?

Correlation Engine uses a combination of public, private, and proprietary information.


Data correlations. The Correlation Engine library of genomic data comes from several public sources, including:

Correlation Engine's curation team also manually curates studies from published literature.


Gene/SNP identifiers. Correlation Engine recognizes commonly used public gene identifiers as well as specific vendor identifiers. Correlation Engine maps individual gene identifiers to standard reference identifiers using the following sources:

To enable seamless comparison across different species, Correlation Engine uses ortholog information from:

Ontologies for semantic tagging. Correlation Engine has developed standardized vocabularies with which to tag its biosets. Sources include:

How were the auto-complete terms selected?

Correlation Engine's auto-complete and tag cloud terms come from the following ontologies and indexes:

How are Correlation Engine results scored and ranked?

Correlation Engine uses proprietary algorithms to calculate and rank the diseases most significantly correlated with a queried gene, SNP, sequence region, bioset, or biogroup.


First, we identify individual biosets that are significantly correlated with your query term. Based on the statistical significance of these correlations, we then rank all of the studies that contain correlated biosets.


For example, when ranking Disease Atlas results for a queried gene, we consider the following:

  1. The total number of disease-specific studies in which the gene was measured;
  2. The number of disease-specific studies in which the gene was found to be significant;
  3. The ranks of the gene in each of the disease studies;
  4. The consistency of the gene's association across the disease studies.

Depending on which app you've queried, we group correlated studies together based on our gene indexes, standardized vocabularies, and semantic tags (e.g., Disease Atlas results are grouped by disease). We call this process "categorization".


During categorization, we apply additional statistical criteria, such as correction for multiple hypothesis testing. Then we rank the diseases by statistical significance. We assign a numerical score of 100 to the most significant result, and normalize the other results' scores to the top-ranked result.


For a detailed description of Correlation Engine's methods, please see our paper, "Ontology-Based Meta-Analysis of Global Collections of High-Throughput Public Data" (Kupershmidt et al., 2010) in PLoS ONE.

Is a low-scoring result nonsignificant?

No. Correlation Engine scores all results relative to the top-ranked result, whose score is set to 100. Although a low-scoring result might have less statistical significance compared to the top-ranked result, it could still have real biological relevance.

What do the Venn diagrams in my expanded results mean?

The Venn diagram shows you the following information:

  • How many genes are in your queried bioset or biogroup;
  • How many genes are in the correlated bioset or biogroup;
  • How many genes are in both.

The p-value is calculated as the probability that such an overlap would occur by chance (assuming that your bioset or biogroup is not actually correlated with the target bioset or biogroup).

How can I download a list of the studies shown in my query results?

To download a list of studies:

  1. From your results page, click the Correlated Studies tab.
  2. Click the Export Results button.

The Export Results button is available throughout the Correlation Engine website for paid subscribers. This button will export all appropriate details for the results page you are viewing.

What do the abbreviations in the Supporting Data Types column mean?

Here are the definitions for the Supporting Data Types:

  1. CN: DNA Copy Number
  2. DT: Therapeutic
  3. GM: Germline Mutation
  4. GT: SNP GWAS
  5. HA: Histone Acetylation
  6. HM: Histone Methylation
  7. HU: Histone Ubiquitination
  8. ME: DNA Methylation
  9. MI: miRNA Expression
  10. MU: Mutations/Phenotypic
  11. PD: Protein-DNA Binding
  12. RE: RNA Expression
  13. SM: Somatic Mutation

If you're on a results page, you can also move the mouse cursor over each abbreviation to see its definition.

QuickView FAQs

What is QuickView?

QuickView is a Correlation Engine application ("app") that you can query to get a quick, top-level view of all the data and information Correlation Engine has about a particular gene, SNP, sequence region, biogroup, bioset, phenotype, tissue, or compound.


There are several ways to query QuickView:

  • By default. Go to any non-app page (e.g., Home, My Studies, My Projects), and enter a term into the query field at the top of the page. Then press the Enter key on your keyboard.
  • By clicking its icon in the app menu. Go to the top of any app page and enter a term into the query field. Then click the QuickView icon QuickView icon.
  • By clicking its icon in expanded results. Click the mini-QuickView icon QuickView icon next to any query term listed in a results page. Your action will query QuickView with that term.
What can my QuickView results tell me?

QuickView has two tabs that organize your results:

  1. Correlation Engine Summary. This tab lists the information that is available only through Correlation Engine:
    • The top five data correlations between your query term and each relevant Correlation Engine app;
    • The top five correlated studies from Curated Studies;
    • The top five hits from Literature.
    • The top five hits from Clinical Trials.

    Click any result name to view its expanded result within that app. To see the full query results for that app, you can click an app name, an app icon, or the Explore Results link.

  2. General Info. This tab displays general biomedical knowledge for your query term. This allows you to view a vast range of public information and Correlation Engine data correlations together in one place.
What are the sources for the General Info tab in QuickView?

QuickView compiles information from PubMed, Gene Ontology, dbSNP, PubChem, MSigDB and other public data sources.

Curated Studies FAQs

What is Curated Studies?

Curated Studies is a Correlation Engine application ("app") that lets you browse or query all datasets that you or Correlation Engine have imported.


To query Curated Studies:

  1. Go to the top of any page within Correlation Engine.
  2. Enter your term in the query field (under the app menu).
  3. Click the Curated Studies icon Curated Studies icon above the query field.
How do I query Curated Studies with a bioset?

This process works a little differently.

To query Curated Studies with a bioset:

  1. Find a bioset of interest. (You can find biosets you've imported in the My Studies page, expand results from other apps, or browse the Curated Studies app page.)
  2. Click the mini-QuickView icon QuickView icon next to the bioset's name.
  3. Click the Curated Studies icon Curated Studies icon above the query field.
What kinds of query terms can I use with Curated Studies?

You can query Curated Studies with just about any query term or keyword. Results will depend on what type of query term you enter.

  • For genes, SNPs, sequence regions, biogroups, or biosets: Your results will display as a ranked list of studies in which your query term was found to be significantly correlated.
  • For concepts (phenotypes, tissues, or compounds): Your results will display as a list of studies that mention, or are tagged with, your query term.

You can also query Curated Studies with keywords that don't belong to the above types of query terms. In this case, Curated Studies will show studies resulting from a text-based search with your query term.


Finally, you can browse studies without using a query term as a starting point from within the app. To do this, go to the Curated Studies home page and click the All Curated Studies button. You'll then see a list of all curated studies that you can filter and browse.

How does Curated Studies differ from other apps?

Results from Curated Studies are only lists of individual studies correlated with or referencing your query term. They are not categorized or grouped in any way. By contrast, Disease Atlas groups studies according to disease, and Genetic Markers groups studies according to gene or SNP.

How do I go back to All Available Studies from a Curated Studies query result?

Click the "All Curated Studies" button.

Can I use Curated Studies to see if a particular study is being processed in Correlation Engine's analysis pipeline?

You can only do this if you are a Correlation Engine Enterprise customer.


To query Curated Studies for a particular study:

  1. Click the "All Curated Studies" button.
  2. Click the Pending Studies tab to browse studies currently being processed in Correlation Engine's analysis pipeline.

On the Pending Studies page, you can request a particular study for Correlation Engine to curate and import. You can also request that a particular study be raised to high priority.

What can I do with Curated Studies?

You can use Curated Studies to directly inspect public genomic data. This can be useful if you're interested in all studies (general, or correlated to a query term) that have been performed on a specific data type, experimental design, or species.


Because other Correlation Engine apps rank results by statistical significance to your query term, you can also use Curated Studies to look up negative results. For example, you can find out whether a gene of interest was not significant in a particular kind of experimental study.

How do I see the statistics for a specific bioset?

To see the statistics (e.g., p-value, fold change, copy number change, score) for a specific bioset:

  1. Click a study title. This will show the bioset(s) within the study that correlate to your query term.
  2. Click a bioset title. The column types displayed will depend on the data type. For example, a public RNA expression bioset will contain fold change and p-value, while a copy number bioset will show either copy number change or Z-score.

Body Atlas FAQs

What is Body Atlas?

Body Atlas is a unique tool that you can use to find normalized gene expression across all tissues, cell types, cell lines, and stem cells in the Correlation Engine library.


A Body Atlas query for a gene, bioset or biogroup will produce a list of tissues and cell types ranked by relevance. You can also sort your results by absolute gene expression or body system, or across all body systems.


A tissue or cell line (biosource) query will result in a list of genes ranked by expression levels in the queried tissue or cell line. You can also view genes ranked by tissue-specific expression, or cell line specific-copy number variations and mutations.


Use Body Atlas to:


  • Identify where previously uncharacterized genes are expressed. Use Body Atlas to look for high- or low-expressing cell types and cell lines that can serve as useful model systems for your queried gene.
  • Identify gene expression patterns. Use Body Atlas to look for tissue or cell line specific gene expression levels that can serve as genetic markers for your experiments.
  • Find biogroup information. Biogroup results are assigned a p-value based on the overlap between the biogroup and each tissue or cell type. A directional arrow indicates whether overlapping genes are predominantly up- or down-regulated.
  • Find bioset information. Bioset results are designated as positively or negatively correlated with a tissue or cell type. (This is because biosets contain directional data.)

To query Body Atlas:

  1. Go to the top of any page within Correlation Engine.
  2. Enter your term in the query field (just below the app menu).
  3. Click the Body Atlas icon Body Atlas icon above the query field.
How do I query Body Atlas with a bioset?

This process works a little differently.

To query Body Atlas with a bioset:

  1. Find a bioset of interest. (You can find biosets you've imported in the My Studies page, expand results from other apps, or browse the Curated Studies app page.)
  2. Click the mini-QuickView icon QuickView icon next to the bioset's name.
  3. Click the Body Atlas icon Body Atlas icon above the query field.
What tissues and cell types are covered in Body Atlas?

The Body Atlas biosets have been drawn from all RNA expression studies that have used the Affymetrix GeneChip© Human Genome U133 Plus 2.0 Array for human studies, and the Affymetrix GeneChip© Mouse Genome 430A 2.0 Arrays for mouse studies.


We incorporated our data as follows:

  • 128 human tissues from 1,068 arrays
  • 170 human cell types from 1,125 arrays
  • 748 human cell lines from 881 arrays
  • 52 stem cells from 141 arrays
  • 151 mouse tissues from 2,730 arrays
  • 409 mouse cell types from 1585 arrays
How has Correlation Engine normalized gene expression across tissues and cells?

First, we perform a per-chip median normalization on probesets common to all platforms. We then combine these probesets using quantile normalization.


Intensities for probesets unique to particular platforms are rescaled to the same per-chip median; we then fit them by linear interpolation, using the intensities of the common probesets between platforms as a reference.

How does Correlation Engine calculate its scores for biogroup or bioset query results?

The score for a given tissue or cell represents the magnitude of the correlation score between a queried bioset or biogroup, and the gene expression bioset for that tissue or cell.


When you query Body Atlas with a biogroup (a nondirectional set of genes), results include a direction column that indicates the sign of the correlation score.


When you query Body Atlas with a bioset that contains directional information (e.g., gene expression fold change for a condition of interest), the results include a correlation column. This column indicates whether correlation was positive or negative.

What are the different ways to view Body Atlas results?

You can view Body Atlas content in two ways. Gene, biogroup or bioset queries display a list of tissues, cell types, cell lines or stem cells related to the query term. Tissue or cell line queries display a list of gene expression levels corresponding to the query term.

Gene, biogroup or bioset queries in Body Atlas will display tissues grouped by body system as the default view. Click the corresponding tab to view cell types, cell lines, and stem cells. Clicking the name of a body system will jump to that group's results.


To rank Body Atlas results strictly by degree of expression or correlation—without categorizing them into groups—choose "Ranks" from the "View by:" menu. Clicking the name of a body system will highlight tissues and cell types that belong to that body system.

Tissue or cell line queries will display a list of expression levels of all genes in the queried biosource or tissue. Click the corresponding tab to view tissue-specific gene expression or a complete list of somatic mutations and copy number changes in a particular cell line.

What is Body Atlas RNA-seq based (GTEx)?

The Body Atlas biosets have been drawn from RNA-seq expression studies taken from the Genotype-Tissue Expression project (GTEx).

The GTEx project is a publicly funded project that aims to provide a comprehensive atlas of gene expression and regulation across multiple human tissues; additional information can be found at GTEx project.

RNA samples used in GTEx were extracted from normal human tissues, poly-A selected, and sequenced using Illumina 74bp paired-end technology.

For Correlation Engine Body Atlas, we downloaded the raw read data, subjected data to stringent quality controls and processed it using RNA Express 1.0 pipeline. A subset of 505 high confidence samples were used.

RNA Express pipeline was developed at Illumina and is available on BaseSpace.

Tissue specific gene ranks were derived from differential expression p-values (tissue of interest vs all tissues). P-values were calculated by edgeR package.

The Genotype - Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health (commonfund.nih.gov/GTEx). Additional funds were provided by the NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. Donors were enrolled at Biospecimen Source Sites funded by NCI \ Leidos Biomedical Research , Inc. sub contracts to the National Disease Research Interchange (10XS170) , Roswell Park Cancer Institute (10XS171), and Science Care, Inc. (X10S172). The Laboratory, Data Analysis, and Coordinating Center (LDACC) was funded through a contract (HHSN268201000029C) to the The Broad Institute, Inc. Biorepository operations were funded through a Leidos Biomedical Research, Inc. sub contract to Van Andel Research Institute (10ST1035). Additional data repository and project management were provided by Leidos Biomedical Research, Inc. (HHSN261200800001E). The Brain Bank was supported supplements to University of Miami grant DA006227. Statistical Methods development grants were made to the University of Geneva (MH090941 & MH101814 ), the University of Chicago (MH090951 , MH090937 , MH101825, & MH101820 ), the University of North Carolina - Chapel Hill (MH090936) , North Carolina State University (MH101819), Harvard University (MH090948) , Stanford University (MH101782), Washington University (MH 101810), and to the University of Pennsylvania (MH101822). The datasets used for the analyses described in this manuscript were obtained from dbGaP at http://www.ncbi.nlm.nih.gov/gap through dbGaP accession number phs000424.v4.p1.

What tissues and cell types are covered in Body Atlas RNA-seq based (GTEx)?

We incorporated our data as follows::

  • 50 human tissues from 505 RNA-seq samples are represented.
What do the Venn diagrams in expanded Body Atlas results mean?

The Venn diagram shows you the following information:

  • How many genes are in your queried biogroup or bioset;
  • How many genes are differentially expressed in the correlated tissue or cell type (which is treated as a bioset);
  • How many genes are in both.

The p-value is calculated as the probability that such an overlap would occur by chance under the assumption that your bioset is not actually correlated with the tissue or cell type.

How do I use the Body System locator?

When viewing by category, click the name of a body system to jump to that category in the results.


When viewing by rank, click the name of a body system to highlight all tissues or cell types belonging to that category in the results.

How can I export Body Atlas results?

To export Body Atlas results as a .csv file:

  1. Go to your Body Atlas results page.
  2. Click the Export button on the right side of the page.

Genetic Markers FAQs

What is Genetic Markers?

Genetic Markers is a Correlation Engine application ("app") that finds genes and SNPs significantly correlated to a phenotype or compound. Results are ranked in order of statistical significance.


Correlation Engine determines a marker's significance through a meta-analysis that takes into account the marker's rank across all studies tagged with the queried phenotype or compound.


To query Genetic Markers:

  1. Go to the top of any page within Correlation Engine.
  2. Enter your term in the query field (under the app menu).
  3. Click the Genetic Markers icon Genetic Markers icon above the query field.
What kinds of query terms can I use with Genetic Markers?

You can query Genetic Markers with a phenotype or compound. The first tab of the results page (Correlated Genes) will show a list of genes correlated with your query term, ranked by statistical significance. Click the Correlated SNPs tab to view correlated SNPs.


To see a list of studies that support the correlations, click a gene or SNP name.

What types of data are used to rank Genetic Markers results?

Correlation Engine integrates multiple types of genomic data to rank the significance of genes and SNPs tagged with a given phenotype or compound. These data types may include RNA and miRNA expression, SNPs identified through GWAS, epigenetic data, CNVs, and mutation data. In addition, Genetic Markers uses curated data from OMIM, Jackson Labs, and DrugBank.

I queried Genetic Markers with a gene or SNP, but got an error message. How do I find all of the studies or biosets related to a gene or SNP?

To find all of the curated studies that are correlated with a gene or SNP, query Curated Studies. (Genetic Markers only accepts phenotypes or compounds as query terms.)


Querying Curated Studies will return a ranked list of all studies that contain your gene or SNP as a significant result.

I don't see a particular gene or SNP listed in my Genetic Markers results. Does this mean that the gene or SNP is not associated with my query term?

Possibly. A gene or SNP may be absent from results because either a) Correlation Engine has no studies tagged with your query term, or b) studies tagged with your query term are not significantly correlated with the gene or SNP you're interested in.


To find data for a gene or SNP not listed in Genetic Markers results, browse Curated Studies:

  1. Click the Curated Studies icon Curated Studies icon in the app menu.
  2. Click the All Curated Studies button.
  3. Click the Keyword link in the filter bar.
  4. Enter the name of the gene or SNP. (You can also select the name from the auto-complete menu.)
  5. Click the Apply Filter button.
How do I see the statistics for a specific gene or SNP listed in Disease Atlas results?

To see the statistics (e.g., p-value, fold change, copy number change, score) for a specific gene/SNP:

  1. Click the Correlated Genes or Correlated SNPs tab.
  2. Click a gene or SNP name to see a breakdown of supporting studies by organism and data type.
  3. Click the View Individual Studies button to see all studies in which the specific gene/SNP was found to be significant that have been tagged with your query term.

To expand each study:

  1. Click a study name.
  2. This shows the bioset(s) within the study that correlate to your query term.

To see statistics and scores from a bioset:

  1. Click a bioset name.
  2. The column types displayed will depend on the data type. For example, a public RNA expression bioset will contain fold change and p-value, while a copy number bioset will show either copy number change or Z-score.
Why is the number of studies shown in Correlated Genes or Correlated SNPs different from the number shown in the Correlated Studies tab?

There are two possibilities. The Correlated Studies view shows all of the public or private studies that contain a significant correlation with your query term. The Correlated Genes and Correlated SNPs views, on the other hand, group these studies by category (in this case, by gene or by SNP).


Studies that appear in the Correlated Studies view might be excluded from the Correlated Genes or Correlated SNPs view if they've failed—when grouped—to meet additional scoring and ranking significance criteria (e.g., correction for multiple hypothesis testing).


Differences between study counts can also occur if private studies—studies you imported or those that others have imported and shared with you—contain significant correlations with your query term. Private studies with correlations are shown in the Correlated Studies view, but are not included in the categorization process.


(If you are a Correlation Engine Enterprise user, studies that belong to Enterprise projects are included in categorized results. However, studies that you have created or imported on your own are not.)

Disease Atlas FAQs

What is Disease Atlas?

Disease Atlas is a Correlation Engine application ("app") that finds diseases, traits, conditions, and surrogate endpoints associated with a gene, sequence region, SNP, biogroup, or bioset. Results are grouped by disease and ranked according to statistical significance.


Disease Atlas categories only include the subset of phenotypes that have been specifically tagged "disease". So while you can query other apps with the phenotype "aging", you won't find "aging" among the categories listed in Disease Atlas results.


To query Disease Atlas:

  1. Go to the top of any page within Correlation Engine.
  2. Enter your term in the query field (under the app menu).
  3. Click the Disease Atlas icon Disease Atlas icon above the query field.
How do I query Disease Atlas with a bioset?

This process works a little differently.


To query Body Atlas with a bioset:

  1. Find a bioset of interest. (You can find biosets you've imported in the My Studies page, expand results from other apps, or browse the Curated Studies app page.)
  2. Click the mini-QuickView icon QuickView icon next to the bioset's name.
  3. Click the Disease Atlas icon Disease Atlas icon above the query field.
What kinds of query terms can I use with Disease Atlas?

You can query Disease Atlas with a gene, sequence region, biogroup, or bioset. The Correlated Diseases tab will show a list of diseases correlated with your query term, grouped into broad disease categories. Clicking the name of a disease category will jump to the results for that category.


To display diseases ranked by statistical significance, choose "Categories" from the "View by:" menu. Clicking the name of a disease category will highlight results for that category.


To see a list of studies that support the correlations, click a disease name.

I queried Disease Atlas with a disease but got an error message. How do I find all of the studies or biosets related to a disease?

To find all of the curated studies that are correlated with a disease or other phenotype, query Curated Studies. (Disease Atlas only accepts genes, sequence regions, SNPs, biogroups, or biosets as query terms.)


Querying Curated Studies will return a ranked list of all studies that are tagged with your disease or other phenotype.

I don't see a particular disease listed in my Disease Atlas results. Does this mean that the disease is not associated with my query term?

Possibly. A disease may be absent from results because either a) Correlation Engine has no studies tagged with the disease, or b) studies tagged with the disease are not significantly correlated with your query term.


To find data for a disease not listed in Disease Atlas results, browse Curated Studies:

  1. Click the Curated Studies icon Curated Studies icon in the app menu.
  2. Click the All Curated Studies button.
  3. Click the Keyword link in the filter bar.
  4. Enter the name of the disease. (You can also select the name from the auto-complete menu.)
  5. Click the Apply Filter button.
How do I see statistics for a specific disease in my results?

To see the statistics (e.g., p-value, fold change, copy number change, score) for a specific disease:

  1. Click the Correlated Diseases tab.
  2. Click a disease to see a breakdown of supporting studies by organism and data type.
  3. Click the View Individual Studies button to see all studies that have a significant correlation to your query term that have been tagged with the specific disease.

To expand each study:

  1. Click a study name.
  2. This shows the bioset(s) within the study that correlate to your query term.

To see statistics and scores from the expanded bioset:

  1. Click a bioset name.
  2. The column types displayed will depend on the data type. For example, a public RNA expression bioset will contain fold change and p-value, while a copy number bioset will show either copy number change or Z-score.
How do I use the Disease Category locator?

When viewing by category, click the name of a disease category to jump to that category in the results.


When viewing by rank, click the name of a disease category to highlight all diseases belonging to that category in the results.

Why is the number of studies shown in Correlated Diseases different from the number shown in the Studies For... tab?

There are two possibilities. Both tabs show all of the public and private studies tagged with a disease that contain a significant correlation with your query term.


The Correlated Diseases view, however, shows your results categorized by disease.


Studies that appear in the Studies For... tab might be excluded from the Correlated Diseases tab if they've failed—when grouped by disease—to meet additional scoring and ranking significance criteria (e.g., correction for multiple hypothesis testing).


Differences between study counts can also occur if private studies—studies you imported or those that others have imported and shared with you—contain significant correlations with your query term. Private studies with correlations are shown in the Studies For... tab, but are not included in the categorization process.


(If you are a Correlation Engine Enterprise user, studies that belong to Enterprise projects are included in categorized results. However, studies you have created or imported on your own are not.)

Pharmaco Atlas FAQs

What is Pharmaco Atlas?

Pharmaco Atlas is a Correlation Engine application ("app") that finds compounds and treatments significantly correlated to a gene, sequence region, biogroup, or bioset. Results are ranked in order of statistical significance.


To use Pharmaco Atlas:

  1. Go to the top of any page within Correlation Engine.
  2. Enter your term in the query field (under the app menu).
  3. Click the Pharmaco Atlas icon Pharmaco Atlas icon above the query field.
How do I query Pharmaco Atlas with a bioset?

This process works a little differently.


To query Pharmaco Atlas with a bioset:

  1. Find a bioset of interest. (You can find biosets you've imported in the My Studies page, expand results from other apps, or browse the Curated Studies app page.)
  2. Click the mini-QuickView icon QuickView icon next to the bioset's name.
  3. Click the Pharmaco Atlas icon Pharmaco Atlas above the query field.
What kinds of query terms can I use with Pharmaco Atlas?

You can query Pharmaco Atlas with a gene, sequence region, biogroup, or bioset. The Correlated Compounds tab (the first tab of the results page) will show a list of compounds and treatments that are correlated with your query term, ranked by statistical significance.


To see a list of studies that support the correlations, click a compound name.

I queried Pharmaco Atlas with a compound, but got an error message. How do I find all of the studies or biosets related to a compound?

To find all of the curated studies that are correlated with a compound, query Curated Studies. (Pharmaco Atlas only accepts genes, sequence regions, biogroups, or biosets as query terms.)


Querying Curated Studies will return a ranked list of all studies that are tagged with your compound.

I don't see a particular compound listed in my Pharmaco Atlas results. Does this mean that the compound is not associated with my query term?

Possibly. A compound may be absent from results because either a) Correlation Engine has no studies tagged with the compound, or b) studies tagged with the compound are not significantly correlated with your query term.


To find data for a compound not listed in Pharmaco Atlas results, browse Curated Studies:

  1. Click the Curated Studies icon Curated Studies icon in the app menu.
  2. Click the All Curated Studies button.
  3. Click the Keyword link in the filter bar.
  4. Enter the name of the compound. (You can also select the name from the auto-complete menu.)
  5. Click the Apply Filter button.
How do I see statistics for a specific compound in my Pharmaco Atlas results?

To see the statistics (e.g., p-value, fold change, copy number change, score) for a specific compound:

  1. Click the Correlated Compounds tab.
  2. Click a compound name to see a breakdown of supporting studies by organism and data type.
  3. Click the View Individual Studies button to see all studies that have a significant correlation to your query term, and which have also been tagged with the specific compound.

To expand a study for more detail:

  1. Click a study name.
  2. This shows the bioset(s) within the study that correlate to your query term.

To see statistics and scores from a bioset:

  1. Click a bioset name.
  2. The column types displayed will depend on the data type. For example, a public RNA expression bioset will contain fold change and p-value, while a copy number bioset will show either copy number change or Z-score.
How do I use the Compound Category locator?

When viewing by category, click the name of a compound category to jump to that category in the results.


When viewing by rank, click the name of a compound category to highlight all compounds belonging to that category in the results.

Why is the number of studies shown in the Correlated Compounds tab different from the number shown in the Studies For... tab?

There are two possibilities. Both tabs show all of the public and private studies tagged with a compound that contain a significant correlation with your query term.


The Correlated Compounds tab, however, shows you results categorized by compound and treatment.


Studies that appear in the Studies For... tab might be excluded from the Correlated Compounds tab if they've failed—when grouped by compound—to meet additional scoring and ranking significance criteria (e.g., correction for multiple hypothesis testing).


Differences between study counts can also occur if private studies—studies you imported, or those that others have imported and shared with you—contain significant correlations with your query term. Private studies with correlations are shown in the Studies For... tab, but are not included in the categorization process.


(If you are a Correlation Engine Enterprise user, studies that belong to Enterprise projects are included in categorized results. However, studies that you have created or imported on your own are not.)

Knockdown Atlas FAQs

What is Knockdown Atlas?

Knockdown Atlas is a Correlation Engine application ("app") that finds genes whose perturbation affects your query term. Querying Knockdown Atlas is like performing a knockdown, knockout, or overexpression experiment in reverse: You can see which genetic perturbations affect a gene, and how.


To query Knockdown Atlas:

  1. Go to the top of any page within Correlation Engine.
  2. Enter your term in the query field (under the app menu).
  3. Click the Knockdown Atlas icon Knockdown Atlas icon above the query field.
How do I query Knockdown Atlas with a bioset?

This process works a little differently.


To query Knockdown Atlas with a bioset:

  1. Find a bioset of interest. (You can find biosets you've imported in the My Studies page, expand results from other apps, or browse the Curated Studies app page.)
  2. Click the mini-QuickView icon QuickView icon next to the bioset's name.
  3. Click the Knockdown Atlas icon Knockdown Atlas above the query field.
What kinds of query terms can I use with Knockdown Atlas?

You can query Knockdown Atlas with a gene, sequence region, biogroup, or bioset. The Perturbed Genes tab (the first tab of the results page) will show a list of genes whose perturbation is correlated with your query term, ranked by statistical significance.


To see a list of studies that support the correlations, click the name of a perturbed gene.

Why is the app called Knockdown Atlas if it also contains results from overexpression studies?

Knockdown Atlas shows results from any genetic perturbation experiment, including knockout, gene silencing, and overexpression experiments. However, knockdowns and knockouts are the predominant type of experiment covered.

How do I find all the genes affected by perturbing a specific gene?

To get a list of studies in which a specific gene is perturbed, query Curated Studies.


To query Curated Studies for a specific perturbed gene:

  1. Go to the top of the page and clear any query terms from the query field.
  2. Click the Curated Studies icon Curated Studies icon.
  3. Click the All Curated Studies button.
  4. Click the Keyword link in the filter bar.
  5. Enter the name of the perturbed gene. (You can also select a name from the auto-complete menu.)
  6. Click the Apply Filter button.
  7. Click the Advanced link.
  8. In the Experiment Design menu, check the Mutant vs. wildtype box.
  9. Click the Apply Filter button.

(Note: Currently, Correlation Engine does not group these studies by gene.)

I don't see a particular gene listed in my Knockdown Atlas results. Does this mean that perturbation of the gene does not affect my queried gene?

Possibly. A perturbed gene may be absent from results because either a) Correlation Engine has no studies in which the gene was perturbed, or b) studies in which the gene was perturbed are not significantly correlated with your query term.


To find data for a perturbed gene not listed in Knockdown Atlas results, browse Curated Studies:

  1. Click the Curated Studies icon Curated Studies icon in the app menu.
  2. Click the All Curated Studies button.
  3. Click the Keyword link in the filter bar.
  4. Enter the name of the perturbed gene. (You can also select the name from the auto-complete menu.)
  5. Click the Apply Filter button.
  6. Click the Keyword link in the filter bar.
  7. In the Experiment Design menu, check the Genetic Perturbation box.
  8. Click the Apply Filter button.
How do I see the statistics for a specific genetic perturbation listed in Knockdown Atlas results?

To see the statistics (e.g., p-value, fold change, copy number change, score) for a specific genetic perturbation:

  1. Click the Perturbed Genes tab.
  2. Click a gene name to see a breakdown of supporting studies by organism and data type.
  3. Click the View Individual Studies button to see all studies in which the specific genetic perturbation was found to significantly affect your query term.

To expand each study for more details:

  1. Click a study name.
  2. This shows the bioset(s) within the study that correlate to your query term.

To see statistics and scores from a bioset:

  1. Click a bioset name.
  2. The column types displayed will depend on the data type. For example, a public RNA expression bioset will contain fold change and p-value, while a copy number bioset will show either copy number change or Z-score. The direction of the arrow shows the effect on the query term (i.e., an up arrow for up-regulated, a down arrow for down-regulated).
Why is the number of studies shown in the Perturbed Genes tab different from the number shown in the Studies For... tab?

There are two possibilities. Both tabs show all of the public and private studies in which a genetic perturbation significantly affected your query term.


The Perturbed Genes tab, however, shows your results categorized by perturbed gene.


Studies that appear in the Studies For... might be excluded from the Perturbed Genes view if they've failed—when grouped by perturbed gene—to meet additional scoring and ranking significance criteria (e.g., correction for multiple hypothesis testing).


Differences between study counts can also occur if private studies—studies you imported or those that others have imported and shared with you—contain significant correlations with your query term. Private studies with correlations are shown in the Studies For... tab, but are not included in the categorization process.


(If you are a Correlation Engine Enterprise user, studies that belong to Enterprise projects are included in categorized results. However, studies that you have created or imported on your own are not.)

Biogroups FAQs

What is Biogroups?

Biogroups is a Correlation Engine application (“app”) that shows biogroups for which your queried bioset, phenotype or compound is highly enriched. When you query Biogroups with a bioset, you'll receive a ranked list of biogroups that highly overlap with the bioset.


When you query Biogroups with a phenotype or compound, you'll receive a ranked list of biogroups that highly overlap with biosets tagged with your query term.


To query Biogroups with a phenotype or compound:

  1. Go to the top of any page within Correlation Engine.
  2. Enter your query in the query field (under the app menu).
  3. Click the Pathway Enrichment icon Biogroups icon above the query field.

To query Biogroups with a bioset:

  1. Find a bioset of interest. (You can find biosets you've imported in the My Studies page, expand results from other apps, or browse the Curated Studies app page.)
  2. Click the mini-QuickView icon QuickView icon next to the bioset's name.
  3. Click the Pathway Enrichment icon Biogroups icon above the query field.
How do I query Biogroups with a bioset?

This process works a little differently.


To query Biogroups with a bioset:

  1. Find a bioset of interest. (You can find biosets you've imported in the My Studies page, expand results from other apps, or browse the Curated Studies app page.)
  2. Click the mini-QuickView icon QuickView icon next to the bioset's name.
  3. Click the Pathway Enrichment icon Biogroups icon above the query field.
What does "biogroup" mean?

A biogroup is a collection of genes that are associated with a specific biological function, pathway, or similar criteria. No numerical information is directly associated with a biogroup.


Gene lists represented as biogroups in Correlation Engine come from the following sources:

  • Gene Ontology (biological processes, cellular components, molecular functions)
  • MSigDB (canonical pathways, positional gene sets, regulatory motif gene sets)
  • InterPro (protein families)
  • TargetScan (predicted miRNA targets)
How do I see the statistics for a specific biogroup listed in my Biogroups results?

To see the statistics for a specific biogroup:

  1. Click a biogroup name to see a breakdown of supporting studies by organism and data type.
  2. Click the View Individual Studies button to see studies containing biosets that highly overlap with the specific biogroup, and which have been tagged with your queried phenotype or compound.

To expand each study:

  1. Click a study name.
  2. This shows the bioset(s) within the study that overlaps with the biogroup.

To see statistics and scores for a bioset-biogroup correlation:

  1. Click a bioset name.
  2. The Venn diagram shows statistics describing the bioset-biogroup overlap. Use the drop-down menu to include all genes, only up-regulated genes, or only down-regulated genes in the comparison.
  3. The column types displayed in the feature list below the Venn diagram will depend on the data type. For example, a public RNA expression bioset will contain fold change and p-value, while a copy number bioset will show either copy number change or Z-score.
I queried Biogroups with a biogroup, but got an error message. How do I find a list of the genes within a pathway, molecular function, protein family, or biological process?

To find all of the genes contained in a biogroup, query QuickView and click General Info on the results page. (Biogroups only accepts phenotypes, compounds, and biosets as query terms.)

How do I find out which biogroups a gene belongs to?

To find out which biogroups a gene belongs to, query QuickView.


To query QuickView:

  1. Go to the top of the page and enter the name of a gene in the query field.
  2. Click the QuickView icon QuickView icon.
  3. On your results page, click the General Info tab.
  4. View the Transcription Factor Binding Sites biogroups at the top of the page.
  5. Scroll down to the bottom of the page to view the Gene Ontology, Pathways, and Protein Family biogroups.

When you see the QuickView icon QuickView icon next to a gene name, you can also click the icon to query QuickView with that gene.

What do the Venn diagrams in expanded Biogroups results mean?

If you queried with a bioset, the Venn diagram shows you the following information:

  1. How many genes are in your queried bioset;
  2. How many genes are in the correlated biogroup;
  3. How many genes are in both.

If you queried with a phenotype or compound, the diagram shows the overlap between the correlated biogroup and a bioset from a study tagged with your query term.


The p-value is calculated as the probability that such an overlap would occur by chance under the assumption that there is no biological link between your bioset and the biogroup.

How do I export results from a Biogroups query?

Click the Export Results button at the top of a Biogroups results page. This will download a list of all correlated biogroups.


To download a list of the genes common to your queried bioset and a specific biogroup:

  1. Click a biogroup name.
  2. Click the Export Data button.

To download the Venn diagram and associated statistics, click the Export Image button.

How does Biogroups compare to other pathway enrichment analysis tools?

Biogroups performs enrichment analysis using canonical gene lists that represent not just pathways, but also protein families, molecular functions, and biological processes.


In addition, Correlation Engine has developed advanced gene set enrichment analysis algorithms that take into account the direction of each gene within a bioset (e.g., up-/down-regulation or amplification), as well as its rank. For more details, please see our paper, "Ontology-Based Meta-Analysis of Global Collections of High-Throughput Public Data", (Kupershmidt et al. 2010) in PLoS ONE.

Genome Browser FAQs

What is Genome Browser?

Genome Browser is an easy-to-use, interactive application ("app") that you can use to view the physical relationships across biosets and different types of genomic elements. Some of these elements include genes, miRNA targets, CNVs, CpG islands, SNPs, GWAS associations, and LD blocks.

What kinds of species does Genome Browser support?

Genome Browser supports human and mouse. We plan to add additional species in the future.

Why isn't Genome Browser working on my computer?

Genome Browser requires Adobe® Flash® Player 10 to work. Please make sure that your Flash player is the correct version.

How do I use Genome Browser?

To query Genome Browser:

  1. Go to the top of any page within Correlation Engine.
  2. Enter your term in the query field (under the app menu).
  3. Click the Genome Browser icon Genome Browser icon above the query field. Genome Browser will open in a new window, focused on your query term.

You can also launch Genome Browser without a query term:

  1. Go to the top of any page within Correlation Engine.
  2. Click the Genome Browser icon Genome Browser icon. Genome Browser will open in a new window, displaying an overview of all human chromosomes.
  3. Click a chromosome to explore it in greater detail.
  4. Type the name of a gene or SNP in the search field at the top of the page and press Enter on your keyboard. The view will reload, focused on the gene or SNP.
How do I view biosets in Genome Browser?

To directly query Genome Browser with a bioset:

  1. Expand any study-level page (for example: Curated Studies, app results, or private studies) to the bioset level.
  2. Click the mini-QuickView icon QuickView icon next to the bioset's name.
  3. Click the Genome Browser icon Genome Browser icon above the query field.
  4. Genome Browser will open in a new window, with the bioset preloaded as a track. (Note: The first page that will be an overview of all chromosomes. Significant features in the bioset display as colored regions across the genome.

You can also load biosets as tracks from within Genome Browser:

  1. Click the Find Data tab at the top of the window to browse and filter all available studies.
  2. Click the Add to GB button to make the bioset available as a track when you return to the main Genome Browser tab.
  3. From the Tracks Setup tab, you can also hide or show biosets you've added.
How do I zoom in, zoom out, pan left, and pan right?

Genome Browser has several ways to zoom and pan.


To zoom in and out, do one of the following:

  • Move the slider in the Zoom section of the control strip near the top of the screen.
  • Click the + or - buttons on either side of the zoom slider.
  • Click and drag the edges of the red viewfinder that appears over the cytogenetic map. This will change the borders of the viewing area.
  • Double-click anywhere in the main viewing area to zoom in multiple levels at once.

To pan left and right, do one of the following:

  • Click and drag the main viewing area.
  • Click the left or right arrow buttons in the Pan section of the control strip near the top of the screen.
  • Click and drag the middle of the red viewfinder that appears over the cytogenetic map. (You may need to zoom out in order to move the viewfinder in this way.)
How do I show or hide different tracks?

Click the Tracks Setup tab at the top of the window. This tab will show you a list of standard tracks with boxes that you can check or uncheck, to indicate whether you want to show or hide them.


Any biosets you have selected will also appear here. You can show or hide each bioset as an individual track.


You can also collapse individual tracks without hiding them completely. To do this, click the arrow to the left of the track name in the main viewing area.

Why do some Genome Browser tracks show bars and others show histograms?

When viewing from an expanded view, individual genomic features are grouped together and shown as histograms. As you zoom in, individual features will appear at the proper length and position.


Some tracks, such as the CNV track, are more sparse. These tracks will switch to individual features before denser tracks do (e.g., the SNP track).

Where do the data in standard tracks come from?

The data in standard tracks comes from the following sources:

How can I select single or multiple features within a track to get more information?

There are three ways that you can learn more about a feature or group of features within a region:

  • Click a feature to place alignment lines at a feature's start and end coordinates. This helps line up features across different tracks, and also helps estimate a feature's position.
  • Double-click a feature to open a pop-up window with more details and (when applicable) links to relevant information. This action works for most features.
  • Activate "Selection Tool" and highlight region of interest to open a popup window with a detailed list of features in that region, their coordinates and other associated information. This is a very useful tool for reviewing a group of features densely populated within a chromosomal region of interest. Note, the selection tool a fairly high zoom level, which can vary for different tracks
What do a feature's different colors mean?

Most tracks have a Legend link at their right end that explains the colors in a track. Colors generally indicate that a feature has a special property; for example, a score above a threshold, or association with some phenotype.

Can I use Genome Browser to visualize sequencing reads?

At the moment, this is not a Correlation Engine feature.

Does Correlation Engine support all of the annotation tracks that the UCSC Genome Browser supports?

Not at this time. However, if there are particular UCSC tracks that you think would be useful in your research, please let us know at support@nextbio.com. We're always looking for new ways to expand Correlation Engine's offerings.

What are the LD and GWAS phenotype association tracks?

Each track shows blocks of linkage disequilibrium (LD) for a specific HapMap population. You can use the Tracks Setup tab to hide or show different HapMap populations.


Correlation Engine calculates LD block structure using average recombination rate inferred from HapMap SNP data for a given population. LD blocks are found by scanning the data for islands of consecutive markers where the recombination rate remains lower than the average recombination rate for the chromosome.


The minimum size of an island is seven SNPs. Each inferred LD block is displayed as a bar that spans the included SNPs. Not every SNP belongs to an LD block, and the LD block structure changes depending on which population you view.


(Note: Correlation Engine is also working on an r2-based LD inference method. We'll make this available as a Genome Browser option once testing is complete.)


Phenotypes are assigned to LD blocks based on any associations with individual SNPs belonging to a block. We consider an LD block to be associated with a given phenotype if the smallest p-value of any SNP belonging to that block is < 10e-5.


We report the closest HapMap population to the population in which an association was discovered, along with relevant links in rollover and pop-up windows for the LD block. An LD block with a qualifying association is colored purple; unassociated LD blocks are orange.


Multiple phenotypes may be associated with a single LD block. Associated SNPs may also be located outside of an LD block; these SNPs are colored cyan and appear in line with the LD blocks.

Literature FAQs

What is the Literature app?

Literature is a Correlation Engine application ("app") that shows a list of PubMed publications that match your query term. Results are listed in order of relevance. To sort the listed articles by publication date, select Date from the drop-down menu above the results.


Correlation Engine provides innovative filtering options by extracting key biomedical terms from the abstracts (and full text, when available) and displaying them as a tag cloud. To further filter and refine your results, click any term in the tag cloud. To specify which kinds of tags are shown in the tag cloud, click any of the filter categories to the right of the blue "filter terms" arrow (e.g., phenotype, tissue).



You can also filter by keyword. To do this, enter a term into the field below the tag cloud and click Filter. Click the "Clear all" button to return to your original, unfiltered results.


Literature differs from other Correlation Engine genomic apps (such as Disease Atlas) in that its results come from text-based searches, rather than data correlations.

What does the News tab do?

The News tab shows news articles related to your search term, sourced from hundreds of publicly available biology- and health-related news publications.


As with Literature and Clinical Trials results, news articles are listed by relevance. The News page, however, allows you to click the Date link to see the list ordered by date. The filter terms option bar (located just above the tag cloud) lets you view only certain subcategories of terms, such as phenotypes.

How does Correlation Engine rank relevant literature matches for a search?

Correlation Engine indexes over 19 million abstracts from PubMed and over 130,000 full-text publications from PubMed Central. For Literature searches, Correlation Engine uses a number of heuristics, including:

  • An extensive ontology with relationships between terms, synonyms, as well as a term hierarchy;
  • A customized, domain-specific stop word list and analyzer that emphasizes ontology terms;
  • The overall authority of the journal in which the paper was published;
  • Date of publication.
What is a tag cloud?

A tag cloud is a list of relevant terms ("tags") that have been extracted from the text results of your search. These tags are terms that appear throughout the abstracts and article text in your search results. Seeing tags displayed in a tag cloud can help you discover associations you might not have thought of before.


Tags are listed alphabetically within the tag cloud; tags that appear in a larger typeface are more strongly associated to your search term.


Correlation Engine uses only the top 50 results to construct the tag cloud. To see an even more informative tag cloud, select 200 or 1000 from the drop-down menu to the far right of the filter options bar. This action will include that amount of results when the tag cloud is constructed. (Note: Including more results will also increase computing time.)

What is the ScienceDirect tab?
The ScienceDirect tab shows you ScienceDirect articles matching your query term(s). This tab is only available to Correlation Engine Enterprise customers who subscribe to ScienceDirect.
What is the Matching Sentences tab?

The Matching Sentences tab shows you ScienceDirect articles in which your query terms appear in the same sentence or paragraph. These articles are more likely to directly discuss how your query terms may relate to each other. See Matching Sentences FAQs for more information.


Matching Sentences is only available to Correlation Engine Enterprise customers who subscribe to ScienceDirect.

What is the Section Search tab?

The Section Search tab shows you ScienceDirect articles in which your query terms appear in a designated section. For example, you may specify that your query terms must appear in the Methods, Results, or Discussion sections of an article. See Section Search FAQs for more information.


Section Search is only available to Correlation Engine Enterprise customers who subscribe to ScienceDirect.

Clinical Trials FAQs

What is Clinical Trials?

A: Clinical Trials is a Correlation Engine application ("app") that shows all clinical trials from ClinicalTrials.gov that match your query term. Results are listed in order of relevance. To sort the listed trials by date of last update, select Date from the drop-down menu above the results.


Correlation Engine provides innovative filtering options by extracting key biomedical terms from the trial descriptions and displaying them in a tag cloud. To further filter and refine your results, click any term in the tag cloud. To specify which kinds of tags are shown in the tag cloud, click any of the filter categories to the right of the blue "filter terms" arrow (e.g., phenotype, tissue).


You can also filter by keyword. To do this, enter a term into the field below the tag cloud and click Filter. Click the "Clear all" button to return to your original, unfiltered results.


Clinical Trials differs from other Correlation Engine genomic apps (such as Disease Atlas) in that its results come from text-based searches, rather than data correlations.

What is a tag cloud?

A tag cloud is a list of relevant terms ("tags") that have been extracted from the text results of your search. These tags are terms that appear throughout the abstracts and article text in your search results. Seeing tags displayed in a tag cloud can help you discover associations you might not have thought of before.


Tags are listed alphabetically within the tag cloud; tags that appear in a larger typeface are more strongly associated to your search term.


NextBio uses only the top 50 results to construct the tag cloud. To see an even more informative tag cloud, select 200 or 1000 from the drop-down menu to the far right of the filter options bar. This action will include that amount of results when the tag cloud is constructed. (Note: Including more results will also increase computing time.)

What does the Studies & Projects page display?
The Studies & Projects section allows you to access studies according to your permissions. In other words, you can view just your studies, just those studies you have access through specific projects, just public studies or all studies.
For a gene search, what does the corresponding link to the Gene Details show?
The Gene Details section provides a summary of gene information. Here you can find information such as alternate names, links to gene orthologs, known transcription factor and miRNA binding sites for the gene, and membership in existing gene ontology lists, pathways and protein families. Additionally, Correlation Engine Professional and Enterprise users can launch the Correlation Engine Genome Browser from this page to view a graphic representation of the gene in chromosomal context.
How does the "auto-complete" function work?
The auto-complete function simplifies the selection of genes, pathways, tissues, authors, SNPs, and other biomedical concepts by providing a drop-down list of matches for you to pick from as you type. In order to provide the most appropriate suggestions, it uses a combination of biological and medical ontologies and other proprietary heuristics. The use of "auto-complete" is optional, and you can simply type in your term and press the Enter key to bypass it.
How does Correlation Engine rank Data Correlation results when I search for my gene of interest?
Correlation Engine ranks all of the studies for a given gene based on the activity of that gene in each individual experiment. For example, if a drug induces the activity of "Gene A" more than any other gene in a dataset ( or "bioset"), Gene A will get the highest rank amongst all genes profiled in that individual study. If you query Gene A in Correlation Engine, the bioset mentioned above, which gives Gene A a ranking of 1, will show up ranked higher than another bioset where, for instance, it is the 5th highest-induced gene and given rank 5. Correlation Engine's algorithms normalize gene ranks based on platform size and other factors.
How does Correlation Engine rank Data Correlation results for a biogroup of interest?
Biogroups represent any set of genes or proteins that share some biological property, such as function or common regulatory motif. Correlation Engine uses proprietary rank-based statistics to correlate biogroups with experimental data. If the majority of genes encoding proteins involved in the MAPK signaling pathwy are highly active in a given bioset (resulting in a corresponding low (or significant) p-value, this bioset will be highly ranked in the results of a Correlation Engine search of MAPK signaling pathway.
How does the Correlation Engine rank results for my tissue, phenotype or compound of interest?
Correlation Engine uses a combination of its proprietary rank-based statistics and various meta-analysis techniques to compute the most significant genes and biogroups associated with a tissue, phenotype, or a compound under investigation. To perform this calculation, Correlation Engine combines all studies related to a given topic to identify the most significant genes and functional trends. This enables you to glean information from the standpoint of "collective experimental intelligence". On the top of the Data Correlations results page, you can see a list of the best-correlated genes and biogroups for your search term, and a list of all the relevant studies below. When you select a ranked gene or a biogroup of interest, you can then access a subset of studies related to your original term but limited to those matching the selected gene or biogroup.
How does filtering of results work?
On the Data Correlations search page you can use filters to narrow down a large list of matching results to a restricted subset according to organism, data type or keyword. Within text-based search results (Literature, Clinical Trials and News), enter any term in the filter box to narrow results.
What criteria does Correlation Engine use to rank relevant literature matches for a search?

Correlation Engine indexes over 19 million abstracts from PubMed and over 130,000 full-text publications from PubMed Central. For its literature search, Correlation Engine uses a number of heuristics, including:

  • Extensive ontology with relationships between terms, synonyms, as well as a term hierarchy
  • A customized domain-specific stop word list and analyzer that emphasizes ontology terms
  • The authority of the journal where the paper was published
  • Date of publication

Community and My Correlation Engine

Why should I create a personal user profile on Correlation Engine?
A personal profile page on Correlation Engine is an online scientific CV, where you can list positions, degrees, and publications. Your Correlation Engine profile is linked to our comprehensive literature search, so you can easily claim your journal articles as your own. With a personal profile you can save search results and organize them by project. You can also join the Correlation Engine community once you create a personal profile.
Can I control who sees my profile?
Users have complete privacy control over their own profiles. Correlation Engine makes it easy for your profile to be seen by only your groups and contacts, only registered members of Correlation Engine, or all users of Correlation Engine. Read our privacy policy.
What is the Correlation Engine Community?
The Correlation Engine community is made up of users and groups in the Correlation Engine system. Our users come from a broad range of organizations, including research universities, as well as biotechnology and pharmaceutical companies all over the world.
What are Correlation Engine contacts?
Correlation Engine Contacts are your personal online scientific community - colleagues, lab mates, and collaborators. These can be people from your own lab or institution as well as Correlation Engine users with similar research interests or backgrounds.
Who can I add as a Correlation Engine contact?
Any registered user can be added as a Correlation Engine contact. People who are not yet members of the Correlation Engine community can also be easily invited to join as a contact.
Who can I search for under "People"?
You can search for all registered users of Correlation Engine that have allowed their profiles to be searchable. Please see our privacy page to set your personal privacy settings. If you do not see a colleague listed on Correlation Engine, it is still easy to invite them to join the Correlation Engine community.
Why create groups?
Groups are an easy way to collaborate and communicate with a small group of people you work with or a large number of users with research interests similar to your own. Currently, you can share studies and participate in discussions with other members of the groups. In the future, Correlation Engine will add the ability to share publications, bookmarks, and other types of information with group members.
Can I control who can join a group?
As the creator or administrator of a group, you have full control over group privacy and membership. Groups can be public, allowing all members of the Correlation Engine community able to join, or can be private requiring invitation by the administrator. Groups can also be invisible to the public, so that they do not appear in search results.
Can I share my data with other group members?
Correlation Engine Professional or Correlation Engine Enterprise users can easily upload their own data, compare it with all public data in the Correlation Engine system, as well as share it with group members. We value your privacy and data security at Correlation Engine. You have complete control over who sees your own data.
What happens when I archive a message?
Archiving messages removes messages from your inbox without deleting them. In the inbox, click on "archived inbox" in the drop down to see all archived messages.

Data Import Questions

Data import functionality is available to Correlation Engine Professional and Correlation Engine Enterprise users.

How can I bring in results from BaseSpace SequenceHub applications?

You can easily upload data files to the Correlation Engine platform as processed raw data - results of statistical analysis consisting of genes/proteins or custom IDs and associated statistics (in text, csv or excel file formats). Correlation Engine enables users to import standard statistical columns fields (fold change/log2 fold change/0-N fold change, p-value, score, rank, correlation) and custom columns with numbers and any user-defined titles (a maximum of 5 columns).

The Gene identifier column should be in the left-most column or should have the header "Gene name" to be recognized (see the Sample Import files on the left of the import page). The minimum requirement for upload of your data is that your file contains a list of recognizable identifiers (e.g., a set of genes). For experimental data, we strongly recommend including associated statistics in order to improve the quality of the correlation with other data within Correlation Engine. You can import individual files by adding them one by one, or you can zip them into a single file for easier upload. Acceptable formats include text, .csv and Excel (including both .xls and .xlsx files).

How to use BaseSpace Sequence Hub Apps for getting RNA-seq data into Correlation Engine:

For details on uploading the filtered table file from Cufflinks Assembly & DE click here.

For details on uploading the *.deseq.res.csv file from RNAExpress click here.

For details on uploading the Reference FPKM gene values file from RNA-Seq Alignment click here.

For details on upload the *_ChIP-Seq_peaks.narrowPeak or *_ChIP-Seq_peaks.xls file from ChIP-Seq click here.

Can I import my own data privately?
As a Correlation Engine Professional and Correlation Engine Enterprise user, you can upload, save, and correlate your own data with public data.
What is the acceptable data format?
You can easily upload data files to the Correlation Engine platform as processed raw data - results of statistical analysis consisting of genes/proteins or custom IDs and associated statistics (in text, csv or excel file formats). Correlation Engine enables users to import standard statistical columns fields (fold change/log2 fold change/0-N fold change, p-value, score, rank, correlation) and custom columns with numbers and any user-defined titles (a maximum of 5 columns). The Gene identifier column should be in the left-most column or should have the header "Gene name" to be recognized (see the Sample Import files on the left of the import page). The minimum requirement for upload of your data is that your file contains a list of recognizable identifiers (e.g., a set of genes). For experimental data, we strongly recommend including associated statistics in order to improve the quality of the correlation with other data within Correlation Engine. You can import individual files by adding them one by one, or you can zip them into a single file for easier upload. Acceptable formats include text, .csv and Excel (including both .xls and .xlsx files).
What should I upload as associated files?
You can upload report, presentation and any other files associated with a given study. They don't need to be in any particular format but are limited to 1MB. These files are not required to complete data import and they can be added at any time.
How does Correlation Engine rank features in my dataset during import?

Correlation Engine uses standard fields described above to rank features in your gene/protein set. If more than one standard statistical column is present, Correlation Engine automatically picks one of the following columns (in order) for ranking:

  • Fold change/log2 fold change/0-N fold change
    • (Note: log2 and 0-N values are converted to +/-fold change on upload)
  • P-value
  • Score
  • Rank
  • Correlation
What type of gene and protein identifiers does Correlation Engine support?
Correlation Engine recognizes most public and standard commercial platform identifiers, including NCBI Gene IDs, Gene symbols, NCBI accession numbers, ENSEMBL IDs, RefSeq identifiers, IPI ids, and custom IDs from most Affymetrix, Illumina, Agilent and GE Healthcare platforms.
Can I upload more biosets into an existing study?
Yes, you can upload bioset files into a new study or an existing study, provided the biosets are from the same organism and data type as the target study.
Why should I tag my data?
Tagging is an important process which provides semantic structure to your data. While it takes just a few seconds to tag data, the benefits are significant. Search results are significantly improved once the data is tagged. Furthermore, tagging can be used to associate your study within an appropriate context and can help contribute to additional computations (Enterprise users). Tagging also helps your colleagues and collaborators quickly understand the biological background of the study.
What criteria should I use to tag my data?
You should tag each of your datasets with the following: 1) the tissue or cell line under study, 2) the phenotype, if applicable, and 3) genetic or chemical modifications (compound or a gene, if applicable). In general, tagging should only describe the main attributes of the experimental design and not of the experimental result or observation (e.g., you shouldn't tag your data with a highly expressed gene you detected in your microarray results).
How does Correlation Engine correlate my data with other data?
Correlation Engine uses proprietary rank-based statistics to compute associations between the data you import and all other experimental data. This allows you to place your experimental results within the context of the world's experiments in order to validate your study results, discover novel associations and trends, and design new experiments. Correlation Engine correlates your data with all biogroups as well, allowing you to discover common features among the genes or proteins that comprise your study. This, in turn, provides a greater understanding of the cellular events contributing to your study's results.
How do I edit studies that I have already imported?
To edit an existing study, click on the "Studies & Projects" link in the left vertical panel of the Correlation Engine homepage. Select the "My Studies" tab, and click on the "Full Study Details" button corresponding to the study of interest. The pencil icon or an "Add" link indicates sections where you can apply changes to "Study Details", "Bioset details", and "associated files". You can also delete any or all biosets within your individual studies.
How do I edit the tags for studies that I have already imported?
To edit tags for an existing study, click the "My Studies & Projects" link in the left vertical panel of the Correlation Engine homepage. Select the "My Studies" tab, and click on the "Full Study Details" button corresponding to the study of interest. Click on "+Add Tags" or "Edit" in the tag section under the "Biosets" tab.

Meta-Analysis Questions

What is Meta-Analysis?
Meta-Analysis Meta-Analysis enables users to query with a collection of individual biosets to derive a consensus gene signature and/or discover sets of commonly regulated biogroups. This allows you to identify the most consistently and highly regulated genes across multiple biosets. Biosets can be mixed and matched from both your private data library as well as the Correlation Engine public library. Meta-Analysis allows users to search up to 150 biosets at a time for correlating genes and biogroups. Alternatively, users may select up to 10 biosets to create a Meta-Analysis query to search all Correlation Engine biosets for correlating signatures.
How do I run Meta-Analysis?

Find biosets to add to Meta-Analysis by clicking on the name of a study of interest. Click the Meta-Analysis icon that appears to the left of any bioset. The icon's appearance will update to show you that the bioset has been added, and the Meta-Analysis icon at the top of the page will update to display the number of biosets that have been added so far. You can also browse to other pages to find more biosets to add; Meta-Analysis will remember all the biosets you've added.

When you're ready to run Meta-Analysis, click the large Meta-Analysis icon to go to the Meta-Analysis Setup tab. From the Setup tab, you can remove any or all of the biosets from the query. You can also drag and drop individual biosets to reorder how they appear in results.

To view Meta-Analysis results, click a results tab. Results can be viewed as correlated genes, biogroups, biosets, or SNPs (if appropriate).

How do I save the results from Meta-Analysis?
From the results page, choose to "Export Results" to an excel file. Alternatively, you can save the results page as a bookmark for later access.
What can I put into Meta-Analysis?
You can add up to 150 biosets spanning different platforms, organisms, projects and libraries.
Can I change my Meta-Analysis after running it?
Correlation Engine remembers your most recent Meta-Analysis until you sign out. You can continue to add or remove biosets and run the altered query. To start a new query, click the "remove all" link from the Meta-Analysis box.
How do you compute the most significant genes in the Meta-Analysis?
There are a number of parameters which are used for computing the most relevant genes. The most important two parameters are the activity level of a gene in each bioset and the specificity (the number of biosets in which the gene is active).
How do I see SNP Results?

The SNP Results tab is only available if your query includes at least one bioset containing SNP or mutation data. Biosets of the following data types enable the SNP Results tab:

  • SNP GWAS SNP GWAS
  • Somatic Mutation Somatic Mutation
  • Germline Mutation Germline Mutation
How do I interpret the score matrix in Meta-Analysis bioset results?

Each correlated bioset has an associated score matrix. The height of each vertical bar in the score matrix represents the score of the correlation between the queried bioset and the correlated bioset.

Absence of a colored bar means that the correlation is insignificant.

The color and direction of each colored bar depend on whether the biosets involved in the correlation are both directional (e.g., RNA expression or CNVs), both non-directional (e.g., SNPs or mutations), or one of each.

  • Both biosets are directional. The correlation's score is based on the strength of the overlap, or enrichment, between the two biosets. The bar is colored red and appears above the midline if there is an overall positive correlation in the directionality of the overlapping genes—for example, if most of the overlapping genes are down-regulated in both biosets. The bar is colored green and appears below the midline if there is an overall negative correlation in the directionality of the overlapping genes.
  • Both biosets are non-directional. The correlation's score is based on the strength of the overlap, or enrichment, between the two biosets. The bar is colored orange and always appears above the midline, since there is no directionality to the enrichment.
  • One bioset is directional and the other is non-directional. We show a bar representing the score of the strongest overlap, or enrichment, between the two biosets. If the strongest overlap is with up-regulated genes, the bar is colored red and appears above the midline. The bar is colored green and appears below the midline if the strongest overlap is with down-regulated genes.
How do I interpret the score matrix in Meta-Analysis biogroup results?

Each correlated biogroup has an associated score matrix. The height of each vertical bar in the score matrix represents the score of the correlation between the queried bioset and the biogroup.

Absence of a colored bar means that the correlation is insignificant.

The color and direction of the colored bars depend on whether the queried bioset is directional (e.g., RNA expression or CNVs) or non-directional (e.g., SNPs or mutations).

  • The queried bioset is directional. Two scores are represented—one as a red bar above the midline and the other as a green bar below the midline. The first score is based on the strength of the overlap, or enrichment, between the biogroup and the up-regulated genes in the queried bioset. The second score is based on the overlap between the biogroup and the down-regulated genes in the queried bioset.
  • The queried bioset is non-directional. The correlation's score is based on the strength of the overlap, or enrichment, between the biogroup and the queried bioset. The bar is colored orange and always appears above the midline, since there is no directionality to the enrichment.
How do I interpret the score matrix in Meta-Analysis gene results?

Each correlated gene has an associated score matrix. The height of each vertical bar in the score matrix represents the score of the correlation between the queried bioset and the gene.

Absence of a colored bar means that the correlation is insignificant.

(Only the top 5,000 gene features in a queried bioset are considered in order to decrease potential noise.)

The color and direction of the colored bars depend on whether the queried bioset is directional (e.g., RNA expression or CNVs) or non-directional (e.g., SNPs or mutations).

  • The queried bioset is directional. The correlation's score is based on the significance of the measurement made for the gene in the queried bioset. The bar is red and appears above the midline if the gene was up-regulated or amplified. The bar is green and appears below the midline if the gene was down-regulated or deleted.
  • The queried bioset is non-directional. The correlation's score is based on the significance of the measurement for the gene in the queried bioset (or for the associated gene(s) if querying with SNP or mutation biosets). The bar is colored orange and always appears above the midline, since there is no directionality to the measurement.
How do I interpret the score matrix in Meta-Analysis SNP results?

Each correlated SNP has an associated score matrix. The height of each vertical bar in the score matrix represents the score of the correlation between the queried bioset and the SNP. (For non-SNP biosets, bar height represents the score of the correlation with the gene(s) associated with the SNP.)

Absence of a colored bar means that the correlation is insignificant.

The color and direction of the colored bars depend on whether the queried bioset is directional (e.g., RNA expression or CNVs) or non-directional (e.g., SNPs or mutations).

  • The queried bioset is directional. The correlation's score is based on the significance of the measurement made for the gene. The bar is red and appears above the midline if the gene was up-regulated or amplified in the queried bioset. The bar is green and appears below the midline if the gene was down-regulated or deleted in the queried bioset.
  • The queried bioset is non-directional. The correlation's score is based on the significance of the measurement for the gene (or for the associated gene(s) if querying with SNP or mutation biosets). The bar is colored orange and always appears above the midline, since there is no directionality to the measurement.

Library-Related Questions

What is the Correlation Engine Library?
The Correlation Engine Library contains all public data studies organized into projects to provide you easy navigation through all the Correlation Engine public content. You can pick any studies or biosets of interest and set up advanced queries or just browse through available content. None of the studies or biosets in this library can be edited.
What is the "Company X" Library?
This is the library containing studies and projects proprietary to your specific organization. Only users within your company have permission to access it (unless specified otherwise by an administrator). In order to move data from your private project into this library with organization-wide access users have to have special permission. Please contact Correlation Engine in order to do that. Within the next several release cycles we'll enable your organization's Correlation Engine administrator to set these permissions without Correlation Engine assistance.

Enterprise-Related Questions

How can my organization use Correlation Engine?
Through Correlation Engine, your organization can leverage all of its internal large-scale data to benefit the entire R&D team. All imported data within an enterprise is cross-correlated to previously uploaded internal data and to the public data. Correlation Engine provides a secure SaaS solution for enterprise customers. Each enterprise has a customized domain with configurable security controls in place to be compliant with your enterprise's security policies. All access to this domain is over HTTPS. Each user is authenticated before they can access their organization's version of the enterprise product. An administrator can control which users have access to the system. Data can be associated at the domain level and shared across all users in the domain. The results seen for the same query by users across different domains will vary and are a function of the data that each domain is authorized to access.
How do enterprise users access Correlation Engine?
Each enterprise user needs to be registered for their enterprise domain. Each domain has a unique URL. You should contact your Correlation Engine representative or send an email to nbadmin@nextbio.com to get the URL associated with your domain. Correlation Engine also provides a single sign-on option that transparently logs an enterprise user in.
What is the Enterprise Single sign-on functionality?

The Correlation Engine Single Sign-on (SSO) solution uses a simple scheme of auto-registration and authentication of users from a trusted source, using specific HTTP headers or URL parameters in end-user requests. Through the Correlation Engine single sign-on process, an existing user is transparently logged into the Correlation Engine application. For a new user, Correlation Engine creates a new account in the background and logs in the user without asking for a password.

Correlation Engine provides two solutions for integrating single sign-on (SSO) for an enterprise. The first solution is a proxy-based solution. In this solution, all user requests from the enterprise are directed to Correlation Engine through a trusted proxy, which provides authentication credentials to Correlation Engine for the user. The second solution is a portal-based solution. With this solution a user logs into an internal portal within the enterprise to access Correlation Engine. The trusted portal passes authentication tokens for a user to Correlation Engine.

Will my organization's data and user activity on Correlation Engine Enterprise be secure?
Correlation Engine provides a highly secure solution for its enterprise customers. Please refer to the section on security for more details.
How can my organization upload studies in bulk?
Correlation Engine provides simple APIs to enable you to import studies in batch mode. Please refer to the integration section for more details.
Does Correlation Engine provide APIs?
We make a number of APIs available to enable you to bring data into and out of Correlation Engine. Please refer to the integration section for more details.
How can we control data sharing and collaboration among different groups?
Correlation Engine provides a feature where each user can create a private group and collaborate and share data only with users within this group.
Can we keep some data private from other users within an organization?
Users can easily control who views and has access to their data, both within their own organization and outside, through privacy settings. You can share data selectively with other individuals by creating a custom group and giving access to only those users that you choose.

Matching Sentences FAQs

Matching Sentences is only available to Correlation Engine Enterprise customers who subscribe to ScienceDirect.

How do I use Matching Sentences?

Matching Sentences allows you to retrieve ScienceDirect articles in which your query terms occur very close to each other. A list of articles with matching sentences or paragraphs appears in the left pane. Click an article to display its matching sentences or paragraphs in the right pane.


You can also select a radio button to choose one of two ranges in which search term occur in a result.

  • Sentence: The search term(s) must be present within a single sentence. Matching sentences are displayed in the right pane.
  • Paragraph: The search term(s) must be present within a single paragraph. Matching paragraphs are displayed in the right pane.
What criteria does Correlation Engine use to determine the order in which search results are listed?

Correlation Engine uses a number of heuristics to determine the rank of an article, including:

  • Frequency of occurrence of the search term;
  • Authority of the journal in which the paper was published;
  • Date of publication;
  • Occurrence of the search term in key sections of the article such as title, abstract, keywords, etc.
If I enter multiple terms, does the app search for the occurrence of the exact phrase or can the terms appear in different places within a sentence or paragraph or document?
Unless the terms are in quotes, the terms can appear anywhere within a sentence or paragraph, depending on which search range was selected.
Can I use Boolean operators for search?
Yes, Correlation Engine supports Boolean operators like AND, OR, and NOT for the search. The operators should be uppercase.
Is there any part of the journal article that is not searched for a match?
Yes. Author list, year, page, volume, and references are not searched for a match. This app is specialized for occurrences of terms within the key text of the article.
Can I specify that one of the terms has to be located in a specific part of the article such as the title or abstract section?
No, but when viewing results, matching sentences or paragraphs are broken down by the location of occurrence within the body and you can selectively copy those hits that match your preference to the clipboard.
How can I use the clipboard?

Use the clipboard to collect matching sentences or paragraphs of interest and the corresponding references. Matching sentences or paragraphs are displayed in the right-hand side of your results. To the right of each match is a icon. Click the to add the sentence to the clipboard. The selected sentence or paragraph is added to the clipboard along with the title and citation information.

To access the clipboard:

  • Click on the View Clipboard link underneath the tabs.
  • You can see all the data added to the clipboard
  • Click the Remove button next to a sentence or paragraph to remove it from the clipboard.
  • There also options to email or print the contents of the clipboard.
Is there a way to clean out all the contents of my Clipboard?
When viewing the clipboard, click the Clear ClipBoard link to delete all its contents.
How can I access the full article?
Each article in the list on the left-hand side of your results has a Go to Article button. Click the button to access the article on ScienceDirect.

Section Search FAQs

Section Search is only available to Correlation Engine Enterprise customers who subscribe to ScienceDirect.

How do I use Section Search?
Section Search allows you to search the full text of ScienceDirect articles and restrict results so that your search term(s) appear only within the sections you specify. For example, if you only want to see articles in which your search term appears in the Methods section, click the Methods checkbox. The results will dynamically update.
What criteria does Correlation Engine use to determine the order in which search results are listed?

Correlation Engine uses a number of heuristics to determine the rank of an article, including:

  • Frequency of occurrence of the search term;
  • Authority of the journal in which the paper was published;
  • Date of publication;
  • Occurrence of the search term in key sections of the article such as title, abstract, keywords, etc.
Can I use Boolean operators for search?
Yes, Correlation Engine supports Boolean operators like AND, OR, and NOT for the search. The operators should be uppercase.
Can I select years other than those listed in the filter on the left?
The choices displayed under Year, Journals, Authors, and Affiliations are based on the articles returned for the search. If a year is not listed, then it means that there were no search results from that year that matched your query.
How can I access the full article?
Click on an article's title to access the full article on ScienceDirect.