1、RegulomeDB and HaploReg Exercises Exercise #1 rs2816316 has ben associated with Celiac Disease in the European population by two studies (Hunt, , van Hel (208) Nature Genetics and Dubois, , van Hel (2010) Nature Genetics). rs2816316 lies thousands of base pairs upstream of protein coding gene RGS1 i
2、n an intergenic region of the genome. You decide to further investigate this SNP using RegulomeDB and HaploReg. 1. What score does RegulomeDB assign to rs2816316? Is this SNP likely to afect transcription factor binding? 2. Using HaploReg, determine if there are there any SNPs in high LD with rs2816
3、316. Are any of these SNPs more likely to be causal? 3. Using RegulomeDB, determine the scores for each of the SNPs in LD with rs2816316 that you think may be casual. Is there a SNP that is likely to afect transcription factor binding? Which SNP(s) would you further investigate? Exercise #2 You are
4、interested in studying genetics variants associated with Amyotrophic lateral sclerosis (ALS), which causes muscle atrophy due to the degeneration of motor neurons. Eleven studies have reported 6 SNPs associated with ALS. Since litle is known about the disease, you decided to investigate these geneti
5、c variants. 1. Using HaploReg, determine if there are enrichments for enhancers in any ENCODE cel types for these ALS SNPs. Are there enrichments in DNase regions? 2. Perform the same analysis using Roadmap epigenomes. Are disease relevant tissue and cel types enriched? Exercise #3 rs67494 has ben a
6、ssociated with Nasopharyngeal carcinoma in the Chinese population (Bei, , Zeng (2010) Nature Genetics) 1. What score does RegulomeDB assign to rs67494? Is this SNP likely to afect transcription factor binding? 2. Using HaploReg, determine if there are there any SNPs in high LD with rs2816316. Are an
7、y of these SNPs more likely to be causal? 3. How ould your results change if you used the default setings for HaploReg (i.e. European LD?) SOLUTIONS Exercise #1 rs2816316 has ben associated with Celiac Disease in the European population by two studies (Hunt, , van Hel (208) Nature Genetics and Duboi
8、s, , van Hel (2010) Nature Genetics). rs2816316 lies thousands of base pairs up stream of protein coding gene RGS1 in an intergenic region of the genome. You decide to further investigate this SNP using RegulomeDB and HaploReg. 1. What score does RegulomeDB assign to rs2816316? Is this SNP likely to
9、 afect transcription factor binding? RegulomeDB assigns rs2816316 a score of 5, which means that there is minimal binding evidence. This SNP is not likely to afect TF binding 2. Using HaploReg, determine if there are there any SNPs in high LD with rs2816316. Are any of these SNPs more likely to be c
10、ausal? There are 25 SNPs in LD (r20.8) with rs2816316. There are three SNPs that overlap TF binding sites: rs2816305, rs2984920 and rs7535818. These SNPs also overlap DHSs, promoter marks, and enhancer marks for several cels lines. rs2984920 is a strong candidate as it overlaps regulatory marks in t
11、he most cel lines. It also disrupts a PU.1 motif (Log ods drop from 14.5 to 2.9) and overlaps a PU.1 binding site. It is also in the promoter of RGS1. rs7535818 primarily overlaps POL2 binding sites which sugests it would not afect regulation but is instead in an actively transcribed region. It is a
12、lso in the promoter/first intron of RGS1. rs2816305 also overlaps regulatory regions and TFs. It overlaps some motifs but not those corresponding to TFs with overlaping binding sites. However, it is important to remember that not al TFs are surveyed by ENCODE. 3. Using RegulomeDB, determine the scor
13、es for each of the SNPs in LD with rs2816316 that you think may be casual. Is there a SNP that is likely to afect transcription factor binding? Which SNP(s) would you further investigate? rs2816305 = 1d rs2984920 = 2a rs7535818 = 3a It would be worthwhile to further investigate both rs2984920 and rs
14、2816305. rs2816305 has the lowest RegulomeDB score since it was reported to be a eQTL for RGS1. It does not overlap a motif corresponding to a bound TF but is in a regulatory region. rs2984920 lies in the promoter of RGS1 and overlaps motifs for several bound TFs including PU.1 and NFKB (discovered
15、by ReguloeDB). rs2984920 and rs2816305 are also in LD, so the eQTL signal from rs2816305 could be due to rs2984920. Both SNPs would be worth investigating further to determine the casual variant. Exercise #2 You are interested in studying genetics variants associated with Amyotrophic lateral scleros
16、is (ALS), which causes muscle atrophy due to degeneration of motor neurons. Eleven studies have reported 6 SNPs associated with ALS. Since litle is known about the disease, you decided to investigate these genetic variants. 1. Using HaploReg, determine if there are enrichments for enhancers in any E
17、NCODE cel types for these ALS SNPs. Are there enrichments in DNase regions? HepG2 Strong Enhancers, HMEC Strong Enhancers, GM12878 All Enhancers & Strong Enhancers DNase: HF-Myc, HA-sp, Th2, and GM18507 2. Perform the same analysis using Roadmap epigenomes. Are these disease relevant tissue and cel
18、types enriched? Colon: All Enhancers, Penis Foreskin: Strong Enhancers, Brain Substantia Nigra: All Enhancers, Brain Inferior Temporal lobe: All Enhancers, Brain Cingulate Gyrus: All Enhancers, Skeletal Muscle: Strong Enhancers Exercise #3 rs67494 has ben associated with Nasopharyngeal carcinoma in
19、the Chinese population (Bei, , Zeng (2010) Nature Genetics) 1. What score does RegulomeDB assign to rs67494? Is this SNP likely to afect transcription factor binding? There is no data for rs67494 in RegulomeDB so no score is assigned. This SNP is unlikely to afect TF binding. 2. Using HaploReg, dete
20、rmine if there are there any SNPs in high LD with rs2816316. Are any of these SNPs more likely to be causal? There are ten SNPs in LD with rs2816316. rs9869781 and rs132424 overlap enhancer marks, promoter marks, DNase peaks and TF binding sites and would be worth investigating further. 3. How would your results change if you used the default setings for HaploReg (i.e. European LD?) If you use European LD, there are only two SNPs in LD with rs67494 and neither are as likely to afect TF binding as rs9869781 and rs13242.