R 脚本实现 xx.hmp.txt格式数据转换为plink格式
001、
dir() dat <- fread("mdp_genotype_test.hmp.txt", header = F, data.table = F) ## 读入文件 genotype <- t(dat[,12:ncol(dat)])[,-1] genotype <- apply(genotype,2,function(x){gsub(pattern = "(^.)", replacement = "\\1 ", x)}) genotype <- apply(genotype,2,function(x){gsub(pattern = "N", replacement = "0", x)}) id <- t(dat[1,][,12:ncol(dat)]) ped <- data.frame(fid = id, id = id, pid = 0, mid = 0, gender = 0, phy = -9, genotype) map <- data.frame(dat$V3, dat$V1, 0, dat$V4)[-1,] write.table(ped, "result.ped", row.names = F, col.names = F, sep = "\t", quote = F) write.table(map, "result.map", row.names = F, col.names = F, sep = "\t", quote = F) dir()

002、plink验证
root@PC1:/home/test# ls result.map result.ped root@PC1:/home/test# plink --file result --recode PLINK v1.90b6.24 64-bit (6 Jun 2021) www.cog-genomics.org/plink/1.9/ (C) 2005-2021 Shaun Purcell, Christopher Chang GNU General Public License v3 Logging to plink.log. Options in effect: --file result --recode 15969 MB RAM detected; reserving 7984 MB for main workspace. .ped scan complete (for binary autoconversion). Warning: Variant 424 quadallelic; setting rarest alleles missing. Performing single-pass .bed write (3093 variants, 281 people). --file: plink-temporary.bed + plink-temporary.bim + plink-temporary.fam written. 3093 variants loaded from .bim file. 281 people (0 males, 0 females, 281 ambiguous) loaded from .fam. Ambiguous sex IDs written to plink.nosex . Using 1 thread (no multithreaded calculations invoked). Before main variant filters, 281 founders and 0 nonfounders present. Calculating allele frequencies... done. Total genotyping rate is 0.963828. 3093 variants and 281 people pass filters and QC. Note: No phenotypes present. --recode ped to plink.ped + plink.map ... done.

浙公网安备 33010602011771号