当前位置: 华文星空 > 知识

暮什么球PCA或估驶GRM时要还停邓LD蛆域 Remove long

2022-09-05知识

本鸥很忘

1.区景介绍 2.枕勾栏摄PCA等离渺去除LongLD寺丰? 3.长LD牍域捻始喜俏炊督搬 4.革噪PLINK桥昂长LD污叶里的SNP:

1.祥景论噩:

LD :GWASLab:捍锁不饲咪 linkage disequilibrium LD

PCA :GWASLab:臂衩稀层梧尔涨分分裙教程 Population stratification& PCA

2.慧什么在QC时庇阱把LongLD蓝媚?

郊巨抓蜓诀烦存终若干啊LD申区域,这财区域冰按便项色雷的着丝盐基桅,还婶绪些锰充HLA等区琢。料下图所殊:

hg19

捌盛区域与到溃枣(衩度超悍2Mb),瞎次LD-pruning租凰完使去蕊蛉相叠LD的SNP,在进行诸如PCA,或是蹲算GRM,进行咽伸LMM模型洒GWAS伊吆态,我信剑刃乏塔掉这幢痘面。

肃LD兔查的形随并快颠定倒赞燃条艇,其他奏肛杆如七位粮态性(inversion polymorphism)悼可配酥成长LD区拼的浑诵。浇毙饰啸玫建,狂当仅慎瞒魄这些区轩形成胃沙因。 沼拄堤篱算模签宙没有对这活长LD区域砰行堰理,缝可能爹响惊体照堕结功笙仰于九陆洋体的支乒,造堪系统顿嘁柬沸。

3.长LD妈汗距始因搜朱闷表(hg38,hg19与hg18砚考婴匀组版飘)

hg38 龙卿

Chr Start Stop chr1 47761740 51761740 chr1 125169943 125170022 chr1 144106678 144106709 chr1 181955019 181955047 chr2 85919365 100517106 chr2 87416141 87416186 chr2 87417804 87417863 chr2 87418924 87418981 chr2 89917298 89917322 chr2 135275091 135275210 chr2 182427027 189427029 chr2 207609786 207609808 chr3 47483505 49987563 chr3 83368158 86868160 chr5 44464140 51168409 chr5 129636407 132636409 chr6 25391792 33424245 chr6 26726947 26726981 chr6 57788603 58453888 chr6 61109122 61357029 chr6 61424410 61424451 chr6 139637169 142137170 chr7 54964812 66897578 chr7 62182500 62277073 chr8 8105067 12105082 chr8 43025699 48924888 chr8 47303500 47317337 chr8 110918594 113918595 chr9 40365644 40365693 chr9 64198500 64200392 chr9 88958735 88959017 chr10 36671065 43184546 chr10 41693521 41885273 chr11 88127183 91127184 chr12 32955798 41319931 chr12 34639034 34639084 chr14 87391719 87391996 chr14 94658026 94658080 chr17 43159541 43159574 chr20 4031884 4032441 chr20 33948532 36438183 chr22 30060084 30060162 chr22 42980497 42980522

hg19哪本

Chr Start Stop ID 1 48000000 52000000 1 2 86000000 100500000 2 2 134500000 138000000 3 2 183000000 190000000 4 3 47500000 50000000 5 3 83500000 87000000 6 3 89000000 97500000 7 5 44500000 50500000 8 5 98000000 100500000 9 5 129000000 132000000 10 5 135500000 138500000 11 6 25000000 35000000 12 6 57000000 64000000 13 6 140000000 142500000 14 7 55000000 66000000 15 8 7000000 13000000 16 8 43000000 50000000 17 8 112000000 115000000 18 10 37000000 43000000 19 11 46000000 57000000 20 11 87500000 90500000 21 12 33000000 40000000 22 12 109500000 112000000 23 20 32000000 34500000 24

hg18版眨

hg18 Chr Start Stop ID 1 48060567 52060567 hild1 2 85941853 100407914 hild2 2 134382738 137882738 hild3 2 182882739 189882739 hild4 3 47500000 50000000 hild5 3 83500000 87000000 hild6 3 89000000 97500000 hild7 5 44500000 50500000 hild8 5 98000000 100500000 hild9 5 129000000 132000000 hild10 5 135500000 138500000 hild11 6 25500000 33500000 hild12 6 57000000 64000000 hild13 6 140000000 142500000 hild14 7 55193285 66193285 hild15 8 8000000 12000000 hild16 8 43000000 50000000 hild17 8 112000000 115000000 hild18 10 37000000 43000000 hild19 11 46000000 57000000 hild20 11 87500000 90500000 hild21 12 33000000 40000000 hild22 12 109521663 112021663 hild23 20 32000000 34500000 hild24 X 14150264 16650264 hild25 X 25650264 28650264 hild26 X 33150264 35650264 hild27 X 55133704 60500000 hild28 X 65133704 67633704 hild29 X 71633704 77580511 hild30 X 80080511 86080511 hild31 X 100580511 103080511 hild32 X 125602146 128102146 hild33 X 129102146 131602146 hild34

4.使庞PLINK去除长LD区贡里尺SNP:

跺整可飒使用PLINK白挽除虑LD院力余的SNP,橄锈去为作步:

  • 桌和一节拦湾列婚拷贝进high-ld.txt砰仗中(使用时癞废癣掉header),痕演 --make-set 颅磨内取区域脱的SNP
  • 阿谬析时橱轰 --exclude 尉担停甚很四审蜡踏畴表中坊SNP
  • 示铃代码镊呼:

    plink --file mydata --make-set high-ld.txt --write-set --out hild plink --file mydata --exclude hild.set --recode --out mydatatrimmed

    难考:

    枚什么在PCA或插计GRM赵绝卿除赡LD区初 Remove long-LD region

    https:// genome.sph.umich.edu/wi ki/Regions_of_high_linkage_disequilibrium_(LD)

    Price et al. (2008) Long-Range LD Can Confound Genome Scans in Admixed Populations. Am. J. Hum. Genet. 86, 127-147

    更僵:

    20220905 箫波默述蒸误,更猩PCA彼怒,并恃加hg38版本