ROUA DATABASE
DOI:
https://doi.org/10.32782/1998-6475.2023.55.89-94Keywords:
Whole Genome Sequencing, Carpathians, Ukraine, Romania, bioinformatics, genomesAbstract
We present a multi-layered data source, providing the results of Whole Genome Sequencing of two human populations in the Carpathian Mountains region, specifically Ukraine’s Transcarpathia and Romania’s Satu Mare and Baia Mare provinces, areas previously underexplored in population genomics. The database contains the raw and annotated files of the whole genome sequences from 300 individuals from these regions, including annotations of common and unique genetic variants following a sampling protocol designed to capture the genetic diversity of Ukrainians and Romanians, including minority groups like Wallachians and Roma. The data is hosted on a dedicated web resource. We provide information on how to access to results of primary and secondary analysis of the data, including comparative analysis with previously published populations from Ukraine, and populations from International Genome Sample Resource and Human Genome Diversity Project. The free research access to this database is contributing to growing understanding of human genetic diversity in Central Europe. This effort emphasizes the potential for reuse of the generated data, advocating for open access to support future research in genomics, bioinformatics, and personalized medicine.
References
ALEXANDER, D.H., NOVEMBRE, J., LANGE, K. (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Research, 19(9), 1655–1664. DOI: 10.1101/gr.094052.109
BERGSTRÖM, A., MCCARTHY, S.A., HUI, R., ALMARRI, M.A., AYUB, Q., DANECEK, P., CHEN, Y., FELKEL, S., HALLAST, P., KAMM, J., BLANCHÉ, H., DELEUZE, J.F., CANN, H., MALLICK, S., REICH, D., SANDHU, M.S., SKOGLUND, P., SCALLY, A., XUE, Y., DURBIN R., TYLER-SMITH, C. (2020) Insights into human genetic variation and population history from 929 diverse genomes. Science, 367(6484):eaay5012. DOI: 10.1126/science.aay5012
CINGOLANI, P., PLATTS, A., WANG, L.L., COON, M., NGUYEN, T., WANG, L., LAND, S.J., LU, X., RUDEN, D. M. (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly, 6(2), 80. DOI: 10.4161/FLY.19695
DEPRISTO, M.A., BANKS, E., POPLIN, R., GARIMELLA, K.V., MAGUIRE, J.R., HARTL, C., PHILIPPAKIS, A.A., DEL ANGEL, G., RIVAS, M.A., HANNA, M., MCKENNA, A., FENNELL, T.J., KERNYTSKY, A.M., SIVACHENKO, A.Y., CIBULSKIS, K., GABRIEL, S.B., ALTSHULER, D., DALY, M.J. (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics, 43(5), 491–498. DOI: 10.1038/ng.806
FAIRLEY, S., LOWY-GALLEGO, E., PERRY, E., FLICEK, P. (2020) The International Genome Sample Resource (IGSR) collection of open human genomic variation resources. Nucleic Acids Research, 48(D1), D941–D947. DOI: 10.1093/NAR/GKZ836
KÖSTER, J., MÖLDER, F., JABLONSKI, K.P., LETCHER, B., HALL, M.B., TOMKINSTINCH, C.H., SOCHAT, V., FORSTER, J., LEE, S., TWARDZIOK, S.O., KANITZ, A., WILM, A., HOLTGREWE, M., RAHMANN, S., NAHNSEN, S. (2021) Sustainable data analysis with Snakemake. F1000Research, 10, 33. DOI: 10.12688/f1000research.29032.2
LANDRUM, M.J., LEE, J.M., BENSON, M., BROWN, G., CHAO, C., CHITIPIRALLA, S., GU, B., HART, J., HOFFMAN, D., HOOVER, J., JANG, W., KATZ, K., OVETSKY, M., RILEY, G., SETHI, A., TULLY, R., VILLAMARIN-SALOMON, R., RUBINSTEIN, W., MAGLOTT, D.R. (2016) ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Research, 44(D1), D862–D868. DOI: 10.1093/nar/gkv1222
LANDRUM, M.J., LEE, J.M., BENSON, M., BROWN, G.R., CHAO, C., CHITIPIRALLA, S., GU, B., HART, J., HOFFMAN, D., JANG, W., KARAPETYAN, K., KATZ, K., LIU, C., MADDIPATLA, Z., MALHEIRO, A., MCDANIEL, K., OVETSKY, M., RILEY, G., ZHOU, G., HOLMES, J.B., KATTMAN, B.L., MAGLOTT, D.R. (2018) ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Research, 46(D1), D1062–D1067. DOI: 10.1093/nar/gkx1153
OLEKSYK, T.K., WOLFSBERGER, W.W., SCHUBELKA, K., MANGUL, S., O’BRIEN, S.J. (2022) The Pioneer Advantage: Filling the blank spots on the map of genome diversity in Europe. GigaScience, 11, 1–7. DOI: 10.1093/GIGASCIENCE/GIAC081
OLEKSYK, T.K., WOLFSBERGER, W.W., WEBER, A.M., SHCHUBELKA, K., OLEKSYK, O.T., LEVCHUK, O., PATRUS, A., LAZAR, N., CASTRO-MARQUEZ, S.O., HASYNETS, Y., BOLDYZHAR, P., NEYMET, M., URBANOVYCH, A., STAKHOVSKA, V., MALYAR, K., CHERVYAKOVA, S., PODOROHA, O., KOVALCHUK, N., RODRIGUEZ-FLORES, J.L., ZHOU, W., MEDLEY, S., BATTISTUZZI, F., LIU, R., HOU, Y., CHEN, S., YANG, H., YEAGER, M., DEAN, M., MILLS, R.E., SMOLANKA, V. (2021) Genome diversity in Ukraine. GigaScience, 10(1), 1–14. DOI: 10.1093/GIGASCIENCE/GIAA159
PRICE, A.L., PATTERSON, N.J., PLENGE, R.M., WEINBLATT, M.E., SHADICK, N.A., REICH, D. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics, 38(8), 904–909. DOI: 10.1038/ng1847
SOLLIS, E., MOSAKU, A., ABID, A., BUNIELLO, A., CEREZO, M., GIL, L., GROZA, T., GÜNEŞ, O., HALL, P., HAYHURST, J., IBRAHIM, A., JI., Y., JOHN, S., LEWIS, E., MACARTHUR, J.A. L., MCMAHON, A., OSUMI-SUTHERLAND, D., PANOUTSOPOULOU, K., PENDLINGTON, Z., RAMACHANDRAN, S., STEFANCSIK, R., STEWART, J., WHETZEL, P., WILSON, R., HINDORFF, L., CUNNINGHAM, F., LAMBERT, S.A., INOUYE, M., PARKINSON, H., HARRIS, L.W. (2023) The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Research, 51(D1), D977–D985. DOI: 10.1093/NAR/GKAC1010
WOLFSBERGER, W.W. (2023) PopGen Playground (0.1). Available from: https://github.com/wwolfsberger/OU_popgen_playground (accessed 10.11.2023).