Transcriptomics and the “curse of dimensionality”: Monte Carlo simulations of ml-models as a tool for analyzing multidimensional data in tasks of searching markers of biological processes
- Authors: Osmak G.J.1,2, Pisklova M.V.1,2
-
Affiliations:
- Сhazov National Medical Research Center for Cardiology
- Pirogov Russian National Research Medical University
- Issue: Vol 59, No 1 (2025)
- Pages: 154-161
- Section: БИОИНФОРМАТИКА
- URL: https://innoscience.ru/0026-8984/article/view/682236
- DOI: https://doi.org/10.31857/S0026898425010117
- EDN: https://elibrary.ru/HCCMTU
- ID: 682236
Cite item
Abstract
High-throughput transcriptomic research methods provide the assessment of a vast number of factors, valuable for researchers. At the same time the “curse of dimensionality” issues arise, which lead to increasing requirements on data processing and analysis methods. In this study, we propose a new algorithm that combines Monte Carlo methods and machine learning. This algorithm will enable feature space reduction by highlighting genes most likely associated with the investigated diseases. Our approach allows not only to generate a set of “interesting” genes but also to assign weight to each gene, indicating its “importance”. This measure can be used in subsequent statistical analysis, visualization, and interpretation of results. Algorithm performance was demonstrated on open transcriptomic data of patients with HCM (GSE36961 and GSE1145). The analysis revealed genes MYH6, FCN3, RASD1, and SERPINA3, which is in good agreement with the available literature.
Full Text

About the authors
G. J. Osmak
Сhazov National Medical Research Center for Cardiology; Pirogov Russian National Research Medical University
Author for correspondence.
Email: german.osmak@gmail.com
Russian Federation, Moscow; Moscow
M. V. Pisklova
Сhazov National Medical Research Center for Cardiology; Pirogov Russian National Research Medical University
Email: german.osmak@gmail.com
Russian Federation, Moscow; Moscow
References
- Akond Z., Alam M., Mollah Md.N.H. (2018) Biomarker identification from RNA-seq data using a robust statistical approach. Bioinformation. 14(4), 153–163.
- Tang M., Sun J., Shimizu K., Kadota K. (2015) Evaluation of methods for differential expression analysis on multi-group RNA-seq count data. BMC Bioinformatics. 16(1), 360.
- Barbiero P., Squillero G., Tonda A. (2020) Modeling generalization in machine learning: a methodological and computational study. arXiv. 2006.15680.
- Robinson M.D., McCarthy D.J., Smyth G.K. (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 26(1), 139–140.
- Smyth G.K. (2005) Limma: linear models for microarray data. In: Bioinformatics and computational biology solutions using R and Bioconductor. New York: Springer.
- Benjamini Y., Hochberg Y. (1997) Multiple hypotheses testing with weights. Scandinavian J. Statistics. 24(3), 407–418.
- Holm S. (1979) A simple sequentially rejective multiple test procedure. Scandinavian J. Statistics. 6(2), 65–70.
- Gui J., Tosteson T.D., Borsuk M. (2012) Weighted multiple testing procedures for genomic studies. BioData Mining. 5(1), 4.
- Basu P., Cai T. T., Das K., Sun W (2018) Weighted false discovery rate control in large-scale multiple testing. J. Am. Stat. Assoc. 113(523), 1172–1183.
- Mann H.B., Whitney D.R. (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann. Mathemat. Statistics. 18(1), 50–60.
- Benjamini Y., Hochberg Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Royal Statist. Soc.: Series B (Methodological). 57(1), 289–300.
- Genovese C.R., Roeder K., Wasserman L. (2006) False discovery control with p-value weighting. Biometrika. 93(3), 509–524.
- Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M., Duchesnay E. (2011) Scikit-learn: machine learning in python. J. Machine Learning Res. 12(Oct), 2825–2830.
- Anfinson M., Fitts R.H., Lough J.W., James J.M., Simpson P.M., Handler S.S., Mitchell M.E., Tomita-Mitchell A. (2022) Significance of α-myosin heavy chain (MYH6) variants in hypoplastic left heart syndrome and related cardiovascular diseases. J. Cardiovascular Dev. Dis. 9(5), 144.
- Ntelios D., Meditskou S., Efthimiadis G., Pitsis A., Zegkos T., Parcharidou D., Theotokis P., Alexouda S., Karvounis H., Tzimagiorgis G. (2022) α-Myosin heavy chain (MYH6) in hypertrophic cardiomyopathy: рrominent expression in areas with vacuolar degeneration of myocardial cells. Pathol. Int. 72(5), 308–310.
- Suzuki T., Saito K., Yoshikawa T., Hirono K., Hata Y., Nishida N., Yasuda K., Nagashima M. (2022) A double heterozygous variant in MYH6 and MYH7 associated with hypertrophic cardiomyopathy in a Japanese family. J. Cardiol. Cases. 25(4), 213–217.
- Michalski M., Świerzko A.S., Pągowska-Klimek I., Niemir Z.I., Mazerant K., Domżalska-Popadiuk I., Moll M., Cedzyński M. (2015) Primary ficolin-3 deficiency — is it associated with increased susceptibility to infections? Immunobiology. 220(6), 711–713.
- Prohászka Z., Munthe-Fog L., Ueland T., Gombos T., Yndestad A., Förhécz Z., Skjoedt MO, Pozsonyi Z., Gustavsen A., Jánoskuti L., Karádi I., Gullestad L., Dahl C.P., Askevold E.T., Füst G., Aukrust P., Mollnes T.E., Garred P. (2013) Association of ficolin-3 with severity and outcome of chronic heart failure. PLoS One. 8(4), e60976.
- Li D., Lin H., Li L. (2020) Multiple feature selection strategies identified novel cardiac gene expression signature for heart failure. Front. Physiol. 11, 604241.
- Song H., Chen S., Zhang T., Huang X., Zhang Q., Li C., Chen C., Chen S., Liu D., Wang J., Tu Y., Wu Y., Liu Y. (2022) Integrated strategies of diverse feature selection methods identify aging-based reliable gene signatures for ischemic cardiomyopathy. Front. Mol. Biosci. 9, 805235.
- Wie J., Kim B.J., Myeong J., Ha K., Jeong S.J., Yang D., Kim E., Jeon J.H., So I. (2015) The roles of Rasd1 small G proteins and leptin in the activation of TRPC4 transient receptor potential channels. Channels. 9(4), 186–195.
- Kemppainen R.J., Behrend E.N. (1998) Dexamethasone rapidly induces a novel Ras superfamily member-related gene in AtT-20 cells. J. Biol. Chem. 273(6), 3129–3131.
- McGrath M.F., Ogawa T., De Bold A.J. (2012) Ras dexamethasone-induced protein 1 is a modulator of hormone secretion in the volume overloaded heart. Am. J. Physiol. Heart Circ. Physiol. 302(9), H1826–H1837.
- Baker C., Belbin O., Kalsheker N., Morgan K. (2007) SERPINA3 (aka alpha-1-antichymotrypsin). Front. Biosci. 12(8–12), 2821–2835.
- de Mezer M., Rogaliński J., Przewoźny S., Chojnicki M., Niepolski L., Sobieska M., Przystańska A. (2023) SERPINA3: stimulator or inhibitor of pathological changes. Biomedicines. 11(1), 156.
- You H., Dong M. (2023) Prediction of diagnostic gene biomarkers for hypertrophic cardiomyopathy by integrated machine learning. J. Int. Med. Res. 51(11), 03000605231213781.
Supplementary files
