PGP: A PLATFORM FOR COMPREHENSIVE ANALYSIS OF GENOMIC DIVERSITY
DOI:
https://doi.org/10.32782/1998-6475.2023.55.28-33Keywords:
computational pipeline, bioinformatics, genomesAbstract
Population genomic projects are essential in the current drive to map the genome diversity of human populations across the globe. Various barriers persist hindering these efforts, and the lack of bioinformatic expertise and reproducible standardized population-scale analysis is one of the major challenges limiting their discovery potential. Scalable, automated, user-friendly pipelines can help researchers with minimum programming skills to tackle these issues without extensive training. PopGenPlayground (PGP), is a streamlined, single-command computation pipeline designed for human population genomics analysis based on Snakemake workflow management system. Developed to automate secondary analysis of a previously published national genome project, it leverages the publicly available genomic databases for comparative analysis and annotation of variant calls. PGP presents a multi-platform robust population analysis pipeline, that reduces the time and the expertise levels to perform the main core of population analysis for a national genome project. PGP provides a comprehensive secondary analysis tool and can be used to perform analysis on a personal computer or using a remote high-performance computing platform.
References
ALEXANDER, D.H., NOVEMBRE, J., LANGE, K. (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Research, 19(9), 1655–1664. DOI: 10.1101/GR.094052.109
ANACONDA SOFTWARE DISTRIBUTION (2020) In: Anaconda Documentation. Anaconda Inc. Available at: https://docs.anaconda.com/
BARTLETT, A., PENDERS, B., LEWIS, J. (2017) Bioinformatics: Indispensable, yet hidden in plain sight? BMC Bioinformatics, 18(1), 1–4. DOI: 10.1186/S12859-017-1730-9/METRICS
CHANG, C.C., CHOW, C.C., TELLIER, L.C.A.M., VATTIKUTI, S., PURCELL, S.M., LEE, J.J. (2015) Second-generation PLINK: Rising to the challenge of larger and richer datasets. GigaScience, 4(1), 7. DOI: 10.1186/s13742-015-0047-8
DANECEK, P., BONFIELD, J.K., LIDDLE, J., MARSHALL, J., OHAN, V., POLLARD, M.O., WHITWHAM, A., KEANE, T., MCCARTHY, S.A., DAVIES, R.M. (2021) Twelve years of SAMtools and BCFtools. GigaScience, 10(2). DOI: 10.1093/GIGASCIENCE/GIAB008
DELANEAU, O., COULONGES, C., ZAGURY, J.F. (2008) Shape-IT: New rapid and accurate algorithm for haplotype inference. BMC Bioinformatics, 9, 1–14. DOI: 10.1186/1471-2105-9-540
FAIRLEY, S., LOWY-GALLEGO, E., PERRY, E., FLICEK, P. (2020) The International Genome Sample Resource (IGSR) collection of open human genomic variation resources. Nucleic Acids Research, 48(D1), D941–D947. DOI: 10.1093/NAR/GKZ836
KÖSTER, J., MÖLDER, F., JABLONSKI, K.P., LETCHER, B., HALL, M.B., TOMKINSTINCH, C.H., SOCHAT, V., FORSTER, J., LEE, S., TWARDZIOK, S.O., KANITZ, A., WILM, A., HOLTGREWE, M., RAHMANN, S., NAHNSEN, S. (2021) Sustainable data analysis with Snakemake. F1000Research, 10, 33. DOI: 10.12688/f1000research.29032.2
LANDRUM, M.J., LEE, J.M., BENSON, M., BROWN, G., CHAO, C., CHITIPIRALLA, S., GU, B., HART, J., HOFFMAN, D., HOOVER, J., JANG, W., KATZ, K., OVETSKY, M., RILEY, G., SETHI, A., TULLY, R., VILLAMARIN-SALOMON, R., RUBINSTEIN, W., MAGLOTT, D. R. (2016) ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Research, 44(D1), D862–D868. DOI: 10.1093/nar/gkv1222
MARRAS, G., GASPA, G., SORBOLINI, S., DIMAURO, C., AJMONE-MARSAN, P., VALENTINI, A., WILLIAMS, J.L., MACCIOTTA, N.P.P. (2015) Analysis of runs of homozygosity and their relationship with inbreeding in five cattle breeds farmed in Italy. Animal Genetics, 46(2), 110–121. DOI: 10.1111/AGE.12259
MCGUIRE, A.L., GABRIEL, S., TISHKOFF, S.A., WONKAM, A., CHAKRAVARTI, A., FURLONG, E.E.M., TREUTLEIN, B., MEISSNER, A., CHANG, H., LÓPEZ-BIGAS, N., SEGAL, E., KIM, J.-S. (2020) The road ahead in genetics and genomics. Nature Reviews Genetics, 21(10), 581–596. DOI: 10.1038/s41576-020-0272-6
MCLAREN, W., GIL, L., HUNT, S.E., RIAT, H.S., RITCHIE, G.R.S., THORMANN, A., FLICEK, P., CUNNINGHAM, F. (2016) The Ensemble Variant Effect Predictor. Genome Biology, 17(122). DOI: 10.1186/S13059-016-0974-4
OLEKSYK, T.K., WOLFSBERGER, W.W., WEBER, A.M., SHCHUBELKA, K., OLEKSYK, O.T., LEVCHUK, O., PATRUS, A., LAZAR, N., CASTRO-MARQUEZ, S.O., HASYNETS, Y., BOLDYZHAR, P., NEYMET, M., URBANOVYCH, A., STAKHOVSKA, V., MALYAR, K., CHERVYAKOVA, S., PODOROHA, O., KOVALCHUK, N., RODRIGUEZFLORES, J.L., ZHOU, W., MEDLEY, S., BATTISTUZZI, F., LIU, R., HOU, Y., CHEN, S., YANG, H., YEAGER, M., DEAN, M., MILLS, R.E., SMOLANKA, V. (2021) Genome diversity in Ukraine. GigaScience, 10(1), 1–14. DOI: 10.1093/GIGASCIENCE/GIAA159
VAN ASSCHE, R., BROECKX, V., BOONEN, K., MAES, E., DE HAES, W., SCHOOFS, L., TEMMERMAN, L. (2015) Integrating – Omics: Systems Biology as Explored Through C. elegans Research. Journal of Molecular Biology, 427(21), 3441–3451. DOI: 10.1016/J.JMB.2015.03.015
WOLFSBERGER, W.W. (2023) PopGenPlayground (0.1). Available at: https://github.com/wwolfsberger/OU_popgen_playground