PGP: A PLATFORM FOR COMPREHENSIVE ANALYSIS OF GENOMIC DIVERSITY

Authors

  • W. Wolfsberger Oakland University
  • K. Shchubelka Uzhhorod National University
  • O. T. Oleksyk A. Novak Transcarpathian Regional Clinical Hospital
  • Ya. Hasynets Uzhhorod National University
  • S. Patskun Uzhhorod National University
  • M. Vakerych Uzhhorod National University
  • R. Kish Uzhhorod National University
  • V. Mirutenko Uzhhorod National University
  • Vl. Mirutenko Uzhhorod National University
  • C. A. Cotoraci ”Vasile Goldiș” Western University of Arad
  • C. Pop ”Vasile Goldiș” Western University of Arad
  • O. Neagu ”Vasile Goldiș” Western University of Arad
  • C. Baltă ”Vasile Goldiș” Western University of Arad
  • H. Herman ”Vasile Goldiș” Western University of Arad
  • P. Mare ”Vasile Goldiș” Western University of Arad
  • S. Dumitra ”Vasile Goldiș” Western University of Arad
  • H. Papiu ”Vasile Goldiș” Western University of Arad
  • A. Hermenean ”Vasile Goldiș” Western University of Arad
  • T. Oleksyk Oakland University

DOI:

https://doi.org/10.32782/1998-6475.2023.55.28-33

Keywords:

computational pipeline, bioinformatics, genomes

Abstract

Population genomic projects are essential in the current drive to map the genome diversity of human populations across the globe. Various barriers persist hindering these efforts, and the lack of bioinformatic expertise and reproducible standardized population-scale analysis is one of the major challenges limiting their discovery potential. Scalable, automated, user-friendly pipelines can help researchers with minimum programming skills to tackle these issues without extensive training. PopGenPlayground (PGP), is a streamlined, single-command computation pipeline designed for human population genomics analysis based on Snakemake workflow management system. Developed to automate secondary analysis of a previously published national genome project, it leverages the publicly available genomic databases for comparative analysis and annotation of variant calls. PGP presents a multi-platform robust population analysis pipeline, that reduces the time and the expertise levels to perform the main core of population analysis for a national genome project. PGP provides a comprehensive secondary analysis tool and can be used to perform analysis on a personal computer or using a remote high-performance computing platform.

References

ALEXANDER, D.H., NOVEMBRE, J., LANGE, K. (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Research, 19(9), 1655–1664. DOI: 10.1101/GR.094052.109

ANACONDA SOFTWARE DISTRIBUTION (2020) In: Anaconda Documentation. Anaconda Inc. Available at: https://docs.anaconda.com/

BARTLETT, A., PENDERS, B., LEWIS, J. (2017) Bioinformatics: Indispensable, yet hidden in plain sight? BMC Bioinformatics, 18(1), 1–4. DOI: 10.1186/S12859-017-1730-9/METRICS

CHANG, C.C., CHOW, C.C., TELLIER, L.C.A.M., VATTIKUTI, S., PURCELL, S.M., LEE, J.J. (2015) Second-generation PLINK: Rising to the challenge of larger and richer datasets. GigaScience, 4(1), 7. DOI: 10.1186/s13742-015-0047-8

DANECEK, P., BONFIELD, J.K., LIDDLE, J., MARSHALL, J., OHAN, V., POLLARD, M.O., WHITWHAM, A., KEANE, T., MCCARTHY, S.A., DAVIES, R.M. (2021) Twelve years of SAMtools and BCFtools. GigaScience, 10(2). DOI: 10.1093/GIGASCIENCE/GIAB008

DELANEAU, O., COULONGES, C., ZAGURY, J.F. (2008) Shape-IT: New rapid and accurate algorithm for haplotype inference. BMC Bioinformatics, 9, 1–14. DOI: 10.1186/1471-2105-9-540

FAIRLEY, S., LOWY-GALLEGO, E., PERRY, E., FLICEK, P. (2020) The International Genome Sample Resource (IGSR) collection of open human genomic variation resources. Nucleic Acids Research, 48(D1), D941–D947. DOI: 10.1093/NAR/GKZ836

KÖSTER, J., MÖLDER, F., JABLONSKI, K.P., LETCHER, B., HALL, M.B., TOMKINSTINCH, C.H., SOCHAT, V., FORSTER, J., LEE, S., TWARDZIOK, S.O., KANITZ, A., WILM, A., HOLTGREWE, M., RAHMANN, S., NAHNSEN, S. (2021) Sustainable data analysis with Snakemake. F1000Research, 10, 33. DOI: 10.12688/f1000research.29032.2

LANDRUM, M.J., LEE, J.M., BENSON, M., BROWN, G., CHAO, C., CHITIPIRALLA, S., GU, B., HART, J., HOFFMAN, D., HOOVER, J., JANG, W., KATZ, K., OVETSKY, M., RILEY, G., SETHI, A., TULLY, R., VILLAMARIN-SALOMON, R., RUBINSTEIN, W., MAGLOTT, D. R. (2016) ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Research, 44(D1), D862–D868. DOI: 10.1093/nar/gkv1222

MARRAS, G., GASPA, G., SORBOLINI, S., DIMAURO, C., AJMONE-MARSAN, P., VALENTINI, A., WILLIAMS, J.L., MACCIOTTA, N.P.P. (2015) Analysis of runs of homozygosity and their relationship with inbreeding in five cattle breeds farmed in Italy. Animal Genetics, 46(2), 110–121. DOI: 10.1111/AGE.12259

MCGUIRE, A.L., GABRIEL, S., TISHKOFF, S.A., WONKAM, A., CHAKRAVARTI, A., FURLONG, E.E.M., TREUTLEIN, B., MEISSNER, A., CHANG, H., LÓPEZ-BIGAS, N., SEGAL, E., KIM, J.-S. (2020) The road ahead in genetics and genomics. Nature Reviews Genetics, 21(10), 581–596. DOI: 10.1038/s41576-020-0272-6

MCLAREN, W., GIL, L., HUNT, S.E., RIAT, H.S., RITCHIE, G.R.S., THORMANN, A., FLICEK, P., CUNNINGHAM, F. (2016) The Ensemble Variant Effect Predictor. Genome Biology, 17(122). DOI: 10.1186/S13059-016-0974-4

OLEKSYK, T.K., WOLFSBERGER, W.W., WEBER, A.M., SHCHUBELKA, K., OLEKSYK, O.T., LEVCHUK, O., PATRUS, A., LAZAR, N., CASTRO-MARQUEZ, S.O., HASYNETS, Y., BOLDYZHAR, P., NEYMET, M., URBANOVYCH, A., STAKHOVSKA, V., MALYAR, K., CHERVYAKOVA, S., PODOROHA, O., KOVALCHUK, N., RODRIGUEZFLORES, J.L., ZHOU, W., MEDLEY, S., BATTISTUZZI, F., LIU, R., HOU, Y., CHEN, S., YANG, H., YEAGER, M., DEAN, M., MILLS, R.E., SMOLANKA, V. (2021) Genome diversity in Ukraine. GigaScience, 10(1), 1–14. DOI: 10.1093/GIGASCIENCE/GIAA159

VAN ASSCHE, R., BROECKX, V., BOONEN, K., MAES, E., DE HAES, W., SCHOOFS, L., TEMMERMAN, L. (2015) Integrating – Omics: Systems Biology as Explored Through C. elegans Research. Journal of Molecular Biology, 427(21), 3441–3451. DOI: 10.1016/J.JMB.2015.03.015

WOLFSBERGER, W.W. (2023) PopGenPlayground (0.1). Available at: https://github.com/wwolfsberger/OU_popgen_playground

Published

2024-09-30