Phenolyzer: Phenotype based gene analyzer

1) What does Phenolyzer do?

Phenolyzer takes as input a discrete list of phenotype terms and generates a list of candidate genes weighted by the chance of being associated with the phenotype, even in the absence of any genotype data.

2) How does Phenolyzer perform?

Below is the comparison between Phenolyzer and other gene/phevor prioritization tools, including Phevor, PosMed, SNP3d, Genecards. Phenolyzer works the best to have the best ROC to find COSMIC cancer genes.

Also, for 14 classic monogenic disease genes, Phenolyzer ranked all of them as TOP1, the best among all these tools.

3) How does Phenolyzer work?

First, Phenolyzer interprets your term into a set of professional disease names;
Second, Phenolyzer finds all the genes having a reported association with all the diseases;
Thirdly, Phenolyzer grows the genes into gene-gene relation databases and get more genes;
Finally, Phenolyzer integrates all the infomation together and give gene scores.

4) What terms does Phenolyzer accept?

1) Disease terms, separate by semicolon or return. For example, 'Alzheimer disease'
2) Phenotype terms, if Phenotype interpretation is selected in advanced options. For example, 'headache; fatigue'

5) What should I do if I don't know a specific disease/phenotype right now?

Choose 'All Diseases' in Advanced Option

6) What other inputs do Phenolyzer accept?

1) A gene list, (Entrez Genes or IDs, lower case is accepted). Example
2) A region file, in '.bed' format. Example

7) What if Phenolyzer doesn't have records for my term?

Phenolyzer needs to match your term in its database. If a term is not found, useually it is because your term is too long. Please try to break your long term into several short terms.

8) What types of databases do Phenolyzer use?

1) Disease Databases: CTD's Medic disease vocabulay, Disease Ontology, OMIM's disease synonyms
2) Gene-Disease Databass: OMIM (Online Mendelian In Man), NCBI's ClinVar, GeneReviews, Orphanet, GWAS Catalog
3) Gene-Gene Relation Databases: HPRD (Human Protein Relation Database), NCBI's Biosystem, HGNC Gene Family, HTRI(Human Transcription Interaction)

9) What are in the output?

Please see the turorial page.

10) How could I understand the network?

The network includes the top50 prioritized genes, and their associated diseases and relations with seed genes.

Round Nodes: Genes, blue as reported, and yellow as predicted. The bigger, the higher the score. The darker, the higher the score.

Round-rectangles: Diseases, the longer and darker, the larger the contribution the disease.

Big Words: The input terms.

Blue edges: The Protein-protein interaction between two genes, based on the HPRD record.

Yellow edges: The In-the-same-Biosystem relation between two genes, based on NCBI's Biosystem record.

Green edges: The In-the-same-Gene-Family relation between two genes, based on HGNC's Gene Family database.

Black edges with Arrows: The transcription interaction between two genes, the direction of the arrow is from the transcription factor to the target.

11) How could I play with the network?

The network is fully interactive, which means it is dynamic.

A simple trial is to double click an element, like clicking a gene. Then only the elements interacting with this gene will be shown.
Additionally, there are a set of swicthes and controllers below. The gene and disease elements could be turned on and off, also the names of genes and diseases could be turned on and off. By default, the disease names are not shown as they are sometimes super long :).
Also, the different types of interactions could be slelectively shown by choosing one in the 'Edges' selection. The Layout is basically how the network is shaped, by default it is force-driven, but by changing Layout one can make it more organized, like 'circle' layout:)
By the way, why not just save a photo of your network and show it to your friends? Just click the Save Photo button:)

12) Why do I have different results for 'Mental Retardation' with 'Mental Disorder'?

As the workflow explained, Phenolyzer is not the god who knows everything you mean. It is just an advanced word matcher, which needs to exactly match your input terms with disease names in the databases. Thus it is encouraged to use multiple terms if you cannot describe something very precisely, in this case, you can just enter 'mental', or you can enter both, like 'mental retardation;mental disorder'.

13) How could I add my own seed genes?

Please compile your seed genes in the format(tab_delimited), and save the file in PHENOLYZER_DIR/lib/compiled_database:
Afterwards please use -addon_gg [your_filename] -addon_gg_weight [weight] in your command line.

Entrez Gene A	Entrez Gene B	Evidence description	Normalized score (0 to 1)	Source (Pubmed ID)
BARD1	DCAF7	Mentha protein interaction	0.126	22990118

14) How could I add my own gene-gene interactions?

Please compile your gene-gene interaction in the format below (tab-delimited), and save the file in PHENOLYZER_DIR/lib/compiled_database:
Afterwards please use -addon_gg [your_filename] -addon_weight [weight] in your command line.

Entrez Gene	Disease name	Reference	Normalized score (0 to 1)	Source
CCR5	acquired immunodeficiency syndrome	PUBMED:21502085	0.00595238095238095	GAD

15) Why are all the scores positive?

1) we want to correctify a typo in the Nature Methods paper, in Page 5 Paragraph 5 Line 3, the 'wTXn-X0' should be 'wTxn - X0w0'. 2) The final score is the linear output instead of the logistic probability. All the weights except the intercept are promised to be positive. Thus to make sure our final score is positive, the negative intercept is deducted.

16) What does it mean if I separate my term 'acute lymphocytic leukemia' as 'acute;lymphocytic;leukemia'?

A delimiter like ';' or Enter means you treat them as different terms. Then any disease name containing 'acute''lymphocytic' or 'leukemia' will be considered. But we have a Weighted Sum system to combine your terms, thus the most specific term will be dominant and its corresponding genes will be prioritized on top. Multiple short terms should be considered when you want a better recall.

17) How could I get the structured output for Cytoscape?

The structured output file is called 'network.json' and it is in the result URL directory. One example is shown: http://phenolyzer.wglab.org/done/1/dIv3Tz6iaSVHA1_J/network.json
You need to modify the URL yourself as there is no direct link for download!

Frequently Asked Questions