GTEx简介 | eQTLGen | Blood eQTL

目录

什么是eQTL?是通过哪些数据计算得来的,数据格式是什么?

eQTL一般都富集在基因组的什么区域?

几个常见的eQTL数据库

什么是GTEx?目前第几版了?GTEx里面有哪些数据?

GTEx有哪几篇里程碑文章?

大部分课题组是如何利用GTEx数据的?

GTEx/eQTLGen数据下载download GTEx files

 

小知识

一个SNP与一个gene,一般就选TSS上下游的gene,blood是金标准。

因为染色体是线性的,LD的存在让所有的genetic的分析都变复杂了,找到的SNP可能不是causal的,它的邻居才是。这对eQTL来说也是一样的。【如果一个region里LD=1,那它们就可以看做是一个点,即使它们功能不同】

GWAS用的是common的SNP,causal SNP是未知的,肯定是有function的【肯定能知道起点是如何到达终点的】。

risk allele富集在了minor allele,Our statistical results revealed that risk alleles were enriched in minor alleles, especially for variants with low minor allele frequencies (MAFs < 0.1).

脑洞大开

如果是单倍体会如何遗传和发育?没有有性生殖,就没有重组重排,无性生殖,多样性无法保证,只能靠体细胞突变。genotype就是allele,GWAS和eQTL的计算单位都是allele了。

 

什么是eQTL?是通过哪些数据计算得来的,数据格式是什么?

google eQTL直接看图片

标准图形,三个genotype,然后就是某个基因的表达水平,近距离的就是cis,远距离的就是trans。

核心三要素:SNP、gene、tissue。

 

eQTL一般都富集在基因组的什么区域?

类似ATAC-seq的信号分布,主要富集在TSS上下游50kbp的范围内,在TSS附近有峰值。

在不同组织中,同一个位点的genotype和基因表达可能有相反的关系,突出了eQTL的组织特异性。

eQTL的另一个亮点,非编码区。most of the susceptible loci were found in non-coding regions of the genome

Here we describe “opposite eQTL effects”, i.e., gene expression effects of eQTLs that are in the opposite direction between different tissues, as the biologically meaningful annotations of genes and genetic variants for understanding the GWAS loci.

参考:Biological characterization of expression quantitative trait loci (eQTLs) showing tissue-specific opposite directional effects

 

几个常见的eQTL数据库 

GTEx

Blood eQTL

eQTLGen

 

什么是GTEx?目前第几版了?GTEx里面有哪些数据?

The Genotype-Tissue Expression (GTEx) project is an ongoing effort to build a comprehensive public resource to study tissue-specific gene expression and regulation. Samples were collected from 54 non-diseased tissue sites across nearly 1000 individuals, primarily for molecular assays including WGS, WES, and RNA-Seq. Remaining samples are available from the GTEx Biobank. The GTEx Portal provides open access to data including gene expression, QTLs, and histology images.

翻译一下:tissue-specific gene expression and regulation,组织特异性基因表达和调控。54 non-diseased tissue sites across nearly 1000 individuals,千人、54种组织,测了WGS, WES, and RNA-Seq。gene expression, QTLs,主要数据就是基因表达和eQTL。

截至2020年09月23日,已经是v8了。

post-mortem tissues 尸体解剖的组织,全部是人的数据。

 

complex trait heritability/complex trait genetics 

Majority of trait-associated variation is non-coding. 【coding基因只占genome 1-5%】

Using expression and epigenetic data to inform missing heritability【大部分trait的heritability很低,如何找那些missing的部分】

 

一般你有大量同一个个体的genotype和gene expression数据,你自然就会想到要做eQTL分析,即鉴定某个SNP的genotype是否与附近的基因表达是否有关联,如果找到感兴趣的基因,我们就可以深入挖掘。【想想很常见的genotype差异表达的boxplot】

如果样本量不够大,那么只能做简单的allelic expression,看某个SNP的某个allele是否在病人中特异或高度表达,从而继续深度挖掘。【很常见的GWAS下游分析,看risk allele是否在某个tissue里特异表达】

 

GTEx有哪几篇里程碑文章?

https://gtexportal.org/home/publicationsPage

The GTEx Consortium atlas of genetic regulatory effects across human tissues - Science  11 Sep 2020:

Cell type–specific genetic regulation of gene expression across human tissues - Science  11 Sep 2020:

新鲜出炉的文章,测了各种cell type的数据,根据统计学的deconvolution方法,鉴定出来了更多的eQTL。

 

大部分课题组是如何利用GTEx数据的? 

参考:Mulin Jun Li

 

eQTLGen数据下载

新手建议先用这个数据库练练手,数据格式比较简单。

cis-eQTLs
This page contains the cis-eQTL results. The statistically significant cis-eQTLs and SMR-prioritised genes for several traits are browsable, the other files can be downloaded.

下载Significant cis-eQTLs文件

Pvalue	SNP	SNPChr	SNPPos	AssessedAllele	OtherAllele	Zscore	Gene	GeneSymbol	GeneChr	GenePos	NrCohorts	NrSamples	FDR	BonferroniP
3.2717E-310	rs12230244	12	10117369	T	A	200.7534	ENSG00000172322	CLEC12A	12	10126104	34	30596	0.0	4.1662E-302
3.2717E-310	rs12229020	12	10117683	G	C	200.6568	ENSG00000172322	CLEC12A	12	10126104	34	30596	0.0	4.1662E-302
3.2717E-310	rs61913527	12	10116198	T	C	200.2654	ENSG00000172322	CLEC12A	12	10126104	34	30598	0.0	4.1662E-302

  

Files
-----
File with full cis-eQTL results: 2019-12-11-cis-eQTLsFDR-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt.gz
File with significant (FDR<0.05) cis-eQTL results: 2019-12-11-cis-eQTLsFDR0.05-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt.gz

Column Names
------------
Pvalue - P-value
SNP - SNP rs ID
SNPChr - SNP chromosome
SNPPos - SNP position
AssessedAllele - Assessed allele, the Z-score refers to this allele
OtherAllele - Not assessed allele
Zscore - Z-score
Gene - ENSG name (Ensembl v71) of the eQTL gene
GeneSymbol - HGNC name of the gene
GeneChr - Gene chromosome
GenePos - Centre of gene position
NrCohorts - Total number of cohorts where this SNP-gene combination was tested
NrSamples - Total number of samples where this SNP-gene combination was tested
FDR - False discovery rate estimated based on permutations
BonferroniP - P-value after Bonferroni correction

Additional information
----------------------
These files contain all cis-eQTL results from eQTLGen, accompanying the article.
19,250 genes that showed expression in blood were tested.
Every SNP-gene combination with a distance <1Mb from the center of the gene and  tested in at least 2 cohorts was included.
Associations where SNP/proxy positioned in Illumina probe were not removed from combined analysis.

  

 

GTEx数据下载download GTEx files

GTEx Analysis V8

Data available include:

  • BAM files for RNA-Seq, Whole Exome Seq, and Whole Genome Seq
  • Genotype Calls (.vcf) for OMNI SNP Arrays, WES, and WGS
  • OMNI SNP Array Intensity files (.idat and .gtc)
  • Affymetrix Expression Array Intensity files (.cel)
  • Allele Specific Expression (ASE) tables
  • All expression matrices from the Portal, including samples that did not pass the Analysis Freeze QC
  • Sample Attributes
  • Subject Phenotypes

数据格式

下载GTEx_Analysis_v8_eQTL_EUR.tar,某个population的数据

解压后有三个文件夹:

eqtls
expression_matrices
expression_covariates

  

eqtls:按组织分文件存储,每个组织两个文件

eqtls/Vagina.v8.EUR.egenes.txt.gz:

eqtls/Vagina.v8.EUR.signif_pairs.txt.gz:

Adipose_Subcutaneous.v8.EUR.egenes.txt.gz                         Esophagus_Gastroesophageal_Junction.v8.EUR.signif_pairs.txt.gz
Adipose_Subcutaneous.v8.EUR.signif_pairs.txt.gz                   Esophagus_Mucosa.v8.EUR.egenes.txt.gz
Adipose_Visceral_Omentum.v8.EUR.egenes.txt.gz                     Esophagus_Mucosa.v8.EUR.signif_pairs.txt.gz
Adipose_Visceral_Omentum.v8.EUR.signif_pairs.txt.gz               Esophagus_Muscularis.v8.EUR.egenes.txt.gz
Adrenal_Gland.v8.EUR.egenes.txt.gz                                Esophagus_Muscularis.v8.EUR.signif_pairs.txt.gz
Adrenal_Gland.v8.EUR.signif_pairs.txt.gz                          Heart_Atrial_Appendage.v8.EUR.egenes.txt.gz
Artery_Aorta.v8.EUR.egenes.txt.gz                                 Heart_Atrial_Appendage.v8.EUR.signif_pairs.txt.gz
Artery_Aorta.v8.EUR.signif_pairs.txt.gz                           Heart_Left_Ventricle.v8.EUR.egenes.txt.gz
Artery_Coronary.v8.EUR.egenes.txt.gz                              Heart_Left_Ventricle.v8.EUR.signif_pairs.txt.gz
Artery_Coronary.v8.EUR.signif_pairs.txt.gz                        Kidney_Cortex.v8.EUR.egenes.txt.gz
Artery_Tibial.v8.EUR.egenes.txt.gz                                Kidney_Cortex.v8.EUR.signif_pairs.txt.gz
Artery_Tibial.v8.EUR.signif_pairs.txt.gz                          Liver.v8.EUR.egenes.txt.gz
Brain_Amygdala.v8.EUR.egenes.txt.gz                               Liver.v8.EUR.signif_pairs.txt.gz
Brain_Amygdala.v8.EUR.signif_pairs.txt.gz                         Lung.v8.EUR.egenes.txt.gz
Brain_Anterior_cingulate_cortex_BA24.v8.EUR.egenes.txt.gz         Lung.v8.EUR.signif_pairs.txt.gz
Brain_Anterior_cingulate_cortex_BA24.v8.EUR.signif_pairs.txt.gz   Minor_Salivary_Gland.v8.EUR.egenes.txt.gz
Brain_Caudate_basal_ganglia.v8.EUR.egenes.txt.gz                  Minor_Salivary_Gland.v8.EUR.signif_pairs.txt.gz
Brain_Caudate_basal_ganglia.v8.EUR.signif_pairs.txt.gz            Muscle_Skeletal.v8.EUR.egenes.txt.gz
Brain_Cerebellar_Hemisphere.v8.EUR.egenes.txt.gz                  Muscle_Skeletal.v8.EUR.signif_pairs.txt.gz
Brain_Cerebellar_Hemisphere.v8.EUR.signif_pairs.txt.gz            Nerve_Tibial.v8.EUR.egenes.txt.gz
Brain_Cerebellum.v8.EUR.egenes.txt.gz                             Nerve_Tibial.v8.EUR.signif_pairs.txt.gz
Brain_Cerebellum.v8.EUR.signif_pairs.txt.gz                       Ovary.v8.EUR.egenes.txt.gz
Brain_Cortex.v8.EUR.egenes.txt.gz                                 Ovary.v8.EUR.signif_pairs.txt.gz
Brain_Cortex.v8.EUR.signif_pairs.txt.gz                           Pancreas.v8.EUR.egenes.txt.gz
Brain_Frontal_Cortex_BA9.v8.EUR.egenes.txt.gz                     Pancreas.v8.EUR.signif_pairs.txt.gz
Brain_Frontal_Cortex_BA9.v8.EUR.signif_pairs.txt.gz               Pituitary.v8.EUR.egenes.txt.gz
Brain_Hippocampus.v8.EUR.egenes.txt.gz                            Pituitary.v8.EUR.signif_pairs.txt.gz
Brain_Hippocampus.v8.EUR.signif_pairs.txt.gz                      Prostate.v8.EUR.egenes.txt.gz
Brain_Hypothalamus.v8.EUR.egenes.txt.gz                           Prostate.v8.EUR.signif_pairs.txt.gz
Brain_Hypothalamus.v8.EUR.signif_pairs.txt.gz                     Skin_Not_Sun_Exposed_Suprapubic.v8.EUR.egenes.txt.gz
Brain_Nucleus_accumbens_basal_ganglia.v8.EUR.egenes.txt.gz        Skin_Not_Sun_Exposed_Suprapubic.v8.EUR.signif_pairs.txt.gz
Brain_Nucleus_accumbens_basal_ganglia.v8.EUR.signif_pairs.txt.gz  Skin_Sun_Exposed_Lower_leg.v8.EUR.egenes.txt.gz
Brain_Putamen_basal_ganglia.v8.EUR.egenes.txt.gz                  Skin_Sun_Exposed_Lower_leg.v8.EUR.signif_pairs.txt.gz
Brain_Putamen_basal_ganglia.v8.EUR.signif_pairs.txt.gz            Small_Intestine_Terminal_Ileum.v8.EUR.egenes.txt.gz
Brain_Spinal_cord_cervical_c-1.v8.EUR.egenes.txt.gz               Small_Intestine_Terminal_Ileum.v8.EUR.signif_pairs.txt.gz
Brain_Spinal_cord_cervical_c-1.v8.EUR.signif_pairs.txt.gz         Spleen.v8.EUR.egenes.txt.gz
Brain_Substantia_nigra.v8.EUR.egenes.txt.gz                       Spleen.v8.EUR.signif_pairs.txt.gz
Brain_Substantia_nigra.v8.EUR.signif_pairs.txt.gz                 Stomach.v8.EUR.egenes.txt.gz
Breast_Mammary_Tissue.v8.EUR.egenes.txt.gz                        Stomach.v8.EUR.signif_pairs.txt.gz
Breast_Mammary_Tissue.v8.EUR.signif_pairs.txt.gz                  Testis.v8.EUR.egenes.txt.gz
Cells_Cultured_fibroblasts.v8.EUR.egenes.txt.gz                   Testis.v8.EUR.signif_pairs.txt.gz
Cells_Cultured_fibroblasts.v8.EUR.signif_pairs.txt.gz             Thyroid.v8.EUR.egenes.txt.gz
Cells_EBV-transformed_lymphocytes.v8.EUR.egenes.txt.gz            Thyroid.v8.EUR.signif_pairs.txt.gz
Cells_EBV-transformed_lymphocytes.v8.EUR.signif_pairs.txt.gz      Uterus.v8.EUR.egenes.txt.gz
Colon_Sigmoid.v8.EUR.egenes.txt.gz                                Uterus.v8.EUR.signif_pairs.txt.gz
Colon_Sigmoid.v8.EUR.signif_pairs.txt.gz                          Vagina.v8.EUR.egenes.txt.gz
Colon_Transverse.v8.EUR.egenes.txt.gz                             Vagina.v8.EUR.signif_pairs.txt.gz
Colon_Transverse.v8.EUR.signif_pairs.txt.gz                       Whole_Blood.v8.EUR.egenes.txt.gz
Esophagus_Gastroesophageal_Junction.v8.EUR.egenes.txt.gz          Whole_Blood.v8.EUR.signif_pairs.txt.gz

  

expression_matrices:bed格式的表达数据,后面每一列就是一个人,数据依旧是按组织分文件存储。

#chr    start   end     gene_id GTEX-111CU      GTEX-111FC      GTEX-111VG      GTEX-111YS      GTEX-1122O      GTEX-1128S      GTEX-11DXX      GTEX-11DZ1      GTEX-11EI6      GTEX-11EM3      GTEX
chr1    29552   29553   ENSG00000227232.5       -0.8416212335729142     -0.1573106846101707     -0.6744897501960817     -0.1414683013821586     -0.5244005127080409     0.37970195786468147

  

expression_covariates:协变量,去掉confounder用的。  

 

 

 

参考: 

GTEx introduction.pdf - 入门简介必看

The Genotype-Tissue Expression Project

 

posted @ 2020-09-23 17:07  Life·Intelligence  阅读(10996)  评论(0编辑  收藏  举报
TOP