当前位置:Gxlcms > 数据库问题 > DisGeNET 数据库 数据的下载以及数据的应用

DisGeNET 数据库 数据的下载以及数据的应用

时间:2021-07-01 10:21:17 帮助过:54人阅读

PubMed唯一标识码,PubMed Unique Identifier),用于为PubMed搜索引擎中收录的生命科学和医学等领域的文献编号
all_gene_disease_pmid_associations.tsv.gz		=> All Gene-Disease-PMID associations in DisGeNET

The columns in the files are:
geneId 		-> NCBI Entrez Gene Identifier
geneSymbol	-> Official Gene Symbol
DSI		-> The Disease Specificity Index for the gene
DPI		-> The Disease Pleiotropy Index for the gene
diseaseId 	-> UMLS concept unique identifier
diseaseName 	-> Name of the disease	
diseaseType  	-> The DisGeNET disease type: disease, phenotype and group
diseaseClass	-> The MeSH disease class(es)
diseaseSemanticType	-> The UMLS Semantic Type(s) of the disease
score		-> DisGENET score for the Gene-Disease association
EI		-> The Evidence Index for the Gene-Disease association
YearInitial	-> First time that the Gene-Disease association was reported
YearFinal	-> Last time that the Gene-Disease association was reported
pmid		-> Publication reporting the Gene-Disease association
source		-> Original source reporting the Gene-Disease association

4.variant_disease_pmid
all_variant_disease_pmid_associations.tsv.gz 		=> All Variant-Disease-PMID associations in DisGeNET

The columns in the files are:
snpId 		-> dbSNP variant Identifier
chromosome	-> Chromosome of the variant
position	-> Position in chromosome
DSI	-> The Disease Specificity Index for the variant
DPI	-> The Disease Pleiotropy Index for the variant
diseaseId 	-> UMLS concept unique identifier
diseaseName 	-> Name of the disease	
diseaseType 	-> disease, phenotype, or group
diseaseType  	-> The DisGeNET disease type: disease, phenotype and group
diseaseClass	-> The MeSH disease class(es)
score		-> DisGENET score for the Variant-Disease association
YearInitial	-> First time that the Variant-Disease association was reported
YearFinal	-> Last time that the Variant-Disease association was reported
pmid		-> Publication reporting the Variant-Disease association
source		-> Original source reporting the Variant-Disease association


5.disease_mappings.tsv.gz				=> Mappings from UMLS concept unique identifier to disease vocabularies: DO, EFO, HPO, ICD9CM, MSH, NCI, OMIM, and ORDO

6.variant_to_gene_mappings.tsv.gz 		=> Variant mapped to their corresponding genes, according to dbSNP. 

The columns in the files are:
snpId 		-> dbSNP variant Identifier
geneId		-> NCBI Entrez Gene Identifier
geneSymbol		-> Official Gene Symbol

7.UMLS CUI to several disease vocabularies
The file contains the mappings of DisGeNET genes (Entrez Gene Identifiers) to UniProt entriesUMLS CUI to several disease vocabularies

8.Other FilesFile with BeFree gene-disease-pmid associations for PubAnnotation BeFree gene-disease-pmid associations for Pubannotation
二.各个数据库统计
技术图片

三. DisGeNET 打分系统

根据报告关联的来源数量分配DisGeNET分数

技术图片

四. 疾病特异性指数Disease Specificity Index(DSI)
技术图片

where:

- Nd is the number of diseases associated to the gene

- NT is the total number of diseases in DisGeNET (13,674)

The DSI ranges from 0 to 1.

 DSI = 0 implies that the gene is associated only to phenotypes.

 

 Example: TNF, associated to more than 1,500 diseases, has a DSI of 0.247, while IDH3A is associated to one disease, with a DSI of 1.

五.疾病多效应指数Disease Pleiotropy Index(DPI)

 技术图片

where:

 - Ndc is the number of the different MeSH disease classes of the diseases associated to the gene

- NTC is the total number of MeSH diseases classes in DisGeNET (27)

The DPI ranges from 0 to 1.

DPI = 0 implies that the gene is associated only to phenotypes, or that the associated diseases do not map to any MeSH classes.

 

Example: gene KCNE2 is associated to 38 diseases and 10 phenotypes. 36 out of the 38 diseases have a MeSH disease class. The 36 diseases are associated to 10 different MeSH classes. The DPI index for KCNE2 = 10/27*100 ~ 0.37. Nevertheless, gene APOE, associated to more than 700 diseases, of different disease classes, has a DPI of 1.

六. 词汇mapping(Vocabulary Mapping)

Diseases:

The vocabulary used for diseases in the current release of DisGeNET is the Unified Medical Language System®(UMLS®) vocabulary. The repositories of gene-disease associations use different disease vocabularies, OMIM® terms for diseases from UniProt, CTDTM, and MGD; MeSH terms used by CTDTM, LHGDN, and RGD, UMLS® Concept Unique Identifiers (CUIs) from CLINVAR; Orphanet identifiers are mapped using Orphanet cross-references. Disease names from GAD and the GWAS Catalog are normalized using the UMLS Metathesaurus. We also used UMLS® Metathesaurus® concept structure to map MIM and MeSH terms to UMLS® CUIs.

 

Genes:

For human genes, HGNC symbols (used for some entries in GAD), and Uniprot accession numbers (used by Uniprot) are converted to NCBI Entrez gene identifiers using an in house dictionary that crossreferences HGNC, Uniprot and NCBI-Gene information. For mapping of mouse and rat genes, we used files ftp://ftp.informatics.jax.org/pub/reports/HOM_MouseHumanSequence.rpt, and ftp://rgd.mcw.edu/pub/data_release/RGD_ORTHOLOGS.txt both with information of orthology from MGD and RGD, respectively to map rat and mouse Entrez gene identifiers to human Entrez identifiers. We discarded the relationships when a human ortholog of the mouse or rat gene could not be found.

 

七. The DisGeNET Association Type Ontology

 技术图片

技术图片

八. 数据属性

疾病

·the disease name, provided by theUMLS®Metathesaurus®

·theUMLS®semantic types

·theMeSHclass: We classify the diseases according the MeSH hierarchy using 23 upper level concepts of the MeSH tree branch C (Diseases) plus three concepts of the F branch (Psychiatry and Psychology: "Behavior and Behavior Mechanisms", "Psychological Phenomena and Processes", and "Mental Disorders").

·The top level concepts from theHuman Disease Ontology.

·The DisGeNET disease type:disease,phenotype and group.

  UMLS semantic types:

- Disease or Syndrome

- Neoplastic Process

- Acquired Abnormality

- Anatomical Abnormality

- Congenital Abnormality

- Mental or Behavioral Dysfunction

 

 UMLS® semantic types:

- Pathologic Function

- Sign or Symptom

- Finding

- Laboratory or Test Result

- Individual Behavior

- Clinical Attribute

- Organism Attribute

- Organism Function

- Organ or Tissue Function

- Cell or Molecular Dysfunction

 

These classifications were manually checked. In addition, disease entries referring to disease groups such as "Cardiovascular Diseases", "Autoimmune Diseases", "Neurodegenerative Diseases, and "Lung Neoplasms" were classified as disease group.

 

Removed terms considered as diseases by other sources, but are not strictly diseases, such as terms belonging to the following UMLS® semantic types:

 - Gene or Genome

- Genetic Function

- Immunologic Factor

- Injury or Poisoning

 These attributes are shown in the different views of the browser, and they are all shown in the Disease Tab.

 

基因

 ·the official gene symbol, from theNCBI

·the NCBI Official Full Name

·theUniprotaccession

·the top level Panther protein class.

·the top level Reactome pathways.

·the Specificity Index (SI)

·the Pleiotropy Index (PI)

 

突变

 ·The position in the chromosome

·The reference and alternative alleles

·The class of the variant: SNP, deletion, insertion, indel, somatic SNV, substitution, sequence alteration, and tandem repeat

·The allelic frequency according to the 1000 Genomes Project

·The allelic frequency according to the Exome Aggregation Consortium

·The most severe consequence type according to the VEP

·Links to dbSNP

·Links to ClinVar

·Links to Ensembl

 

基因-疾病相关性

 ·theDisGeNET score

·the DisGeNET Gene-Disease Association Type

·the publication(s) that reports the gene-disease association, with the Pubmed Identifier

·a representative sentence from the publication describing the association between the gene and the disease (If a representative sentence is not found, we provide the title of the paper)

·the original source reporting the Gene-Disease Association

·For some sources, we provide the variant(s) associated to the gene-disease association

   

九. 提供Cytoscape插件

DisGeNET Cytoscape App

 

十.支持本地下载

可以下载tab格式的文档(Curated、BeFree gene-disease associations和publications)

提供RDF Linked Dataset

同时query大量的数据,还支持python、perl、R脚本。

还提供mapping功能:

1. UniProt Downloads    DisGeNET genes -> UniProt entries

 2. UMLS CUI  ->  MeSH Identifier

 

 

 

 

  

DisGeNET 数据库 数据的下载以及数据的应用

标签:lin   分数   init   获得   soc   symbols   auto   normal   form   

人气教程排行