iProm-Yeast: Prediction Tool for Yeast Promoters Based on ML Stacking
- Авторы: Shujaat M.1, Yoo S.2, Tayara H.3, Chong K.T.1
- 
							Учреждения: 
							- Department of Electronics and Information Engineering, Jeonbuk National University
- Department of Electricity Engineering, Vision College of Jeonju
- School of International Engineering and Science, Jeonbuk National University
 
- Выпуск: Том 19, № 2 (2024)
- Страницы: 162-173
- Раздел: Life Sciences
- URL: https://innoscience.ru/1574-8936/article/view/643799
- DOI: https://doi.org/10.2174/0115748936256869231019113616
- ID: 643799
Цитировать
Полный текст
Аннотация
Background and Objective:Gene promoters play a crucial role in regulating gene transcription by serving as DNA regulatory elements near transcription start sites. Despite numerous approaches, including alignment signal and content-based methods for promoter prediction, accurately identifying promoters remains challenging due to the lack of explicit features in their sequences. Consequently, many machine learning and deep learning models for promoter identification have been presented, but the performance of these tools is not precise. Most recent investigations have concentrated on identifying sigma or plant promoters. While the accurate identification of Saccharomyces cerevisiae promoters remains an underexplored area. In this study, we introduced "iPromyeast", a method for identifying yeast promoters. Using genome sequences from the eukaryotic yeast Saccharomyces cerevisiae, we investigate vector encoding and promoter classification. Additionally, we developed a more difficult negative set by employing promoter sequences rather than nonpromoter regions of the genome. The newly developed negative reconstruction approach improves classification and minimizes the amount of false positive predictions.
Methods:To overcome the problems associated with promoter prediction, we investigate alternate vector encoding and feature extraction methodologies. Following that, these strategies are coupled with several machine learning algorithms and a 1-D convolutional neural network model. Our results show that the pseudo-dinucleotide composition is preferable for feature encoding and that the machine- learning stacking approach is excellent for accurate promoter categorization. Furthermore, we provide a negative reconstruction method that uses promoter sequences rather than non-promoter regions, resulting in higher classification performance and fewer false positive predictions.
Results:Based on the results of 5-fold cross-validation, the proposed predictor, iProm-Yeast, has a good potential for detecting Saccharomyces cerevisiae promoters. The accuracy (Acc) was 86.27%, the sensitivity (Sn) was 82.29%, the specificity (Sp) was 89.47%, the Matthews correlation coefficient (MCC) was 0.72, and the area under the receiver operating characteristic curve (AUROC) was 0.98. We also performed a cross-species analysis to determine the generalizability of iProm-Yeast across other species.
Conclusion:iProm-Yeast is a robust method for accurately identifying Saccharomyces cerevisiae promoters. With advanced vector encoding techniques and a negative reconstruction approach, it achieves improved classification accuracy and reduces false positive predictions. In addition, it offers researchers a reliable and precise webserver to study gene regulation in diverse organisms.
Ключевые слова
Об авторах
Muhammad Shujaat
Department of Electronics and Information Engineering, Jeonbuk National University
														Email: info@benthamscience.net
				                					                																			                												                														
Sunggoo Yoo
Department of Electricity Engineering, Vision College of Jeonju
														Email: info@benthamscience.net
				                					                																			                												                														
Hilal Tayara
School of International Engineering and Science, Jeonbuk National University
							Автор, ответственный за переписку.
							Email: info@benthamscience.net
				                					                																			                												                														
Kil Chong
Department of Electronics and Information Engineering, Jeonbuk National University
							Автор, ответственный за переписку.
							Email: info@benthamscience.net
				                					                																			                												                														
Список литературы
- Tang H, Wu Y, Deng J, et al. Promoter architecture and promoter engineering in Saccharomyces cerevisiae. Metabolites 2020; 10(8): 320. doi: 10.3390/metabo10080320 PMID: 32781665
- Hoskins RA, Landolin JM, Brown JB, Sandler JE, Takahashi H. Genome wide analysis of promoter architecture in drosophila melano-gaster. Genome Res 2011; 21(2): 182-92.
- Scalcinati G, Knuf C, Partow S, et al. Dynamic control of gene expression in Saccharomyces cerevisiae engineered for the production of plant sesquitepene α-santalene in a fed-batch mode. Metab Eng 2012; 14(2): 91-103. doi: 10.1016/j.ymben.2012.01.007 PMID: 22330799
- Oubounyt M, Louadi Z, Tayara H, Chong KT. Deepromoter: Robust promoter predictor using deep learning. Front Genet 2019; 10: 286. doi: 10.3389/fgene.2019.00286 PMID: 31024615
- Matsumine H, Yamamura Y, Hattori N, et al. A microdeletion of d6s305 in a family of autosomal recessive juvenile parkinsonism (park2). Genomics 1998; 49(1): 143-6.
- Chollet F. Keras: The python deep learning library, Astrophysics source code library (2018) ascl-1806. 2018. Available from: https://ui.adsabs.harvard.edu/abs/2018ascl.soft06022C/abstract
- Behjati S, Tarpey PS. What is next generation sequencing? Arch Dis Child Educ Pract Ed 2013; 98(6): 236-8. doi: 10.1136/archdischild-2013-304340 PMID: 23986538
- Zhang J, Chiodini R, Badr A, Zhang G. The impact of next-generation sequencing on genomics. J Genet Genomics 2011; 38(3): 95-109. doi: 10.1016/j.jgg.2011.02.003 PMID: 21477781
- Prestridge DS. Predicting pol ii promoter sequences using transcription factor binding sites. J Mol Biol 1995; 249(5): 923-32.
- Reese MG. Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome. Comput Chem 2001; 26(1): 51-6. doi: 10.1016/S0097-8485(01)00099-7 PMID: 11765852
- Down TA, Hubbard TJP. Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Res 2002; 12(3): 458-61. doi: 10.1101/gr.216102 PMID: 11875034
- Hutchinson GB. The prediction of vertebrate promoter regions using differential hexamer frequency analysis. Bioinformatics 1996; 12(5): 391-8. doi: 10.1093/bioinformatics/12.5.391 PMID: 8996787
- Scherf M. Highly specific localization of promoter regions in large genomic sequences by promoter inspector: A novel context analysis approach. J Mol Biol 2000; 297(3): 599-606. doi: 10.1006/jmbi.2000.3589 PMID: 10731414
- Ioshikhes IP, Zhang MQ. Large-scale human promoter mapping using CpG islands. Nat Genet 2000; 26(1): 61-3. doi: 10.1038/79189 PMID: 10973249
- Yang Y, Zhang R, Singh S, Ma J. Exploiting sequence-based features for predicting enhancerpromoter interactions. Bioinformatics 2017; 33(14): i252-60. doi: 10.1093/bioinformatics/btx257 PMID: 28881991
- Anzas A, Pe A, Robles V, Larrannaga P. Machine learning in bioinformatics downloaded from. Brief Bioinform 1991; 7: 112.
- Nguyen NG, Tran VA, Ngo DL, et al. Dna sequence classification by convolutional neural network. J Biomed Sci Eng 2016; 9(5): 280-6. doi: 10.4236/jbise.2016.95021
- Rahman MS, Aktar U, Jani MR, Shatabda S. iPromoter-FSEn: Identification of bacterial σ70 promoter sequences using feature subspace based ensemble classifier. Genomics 2019; 111(5): 1160-6. doi: 10.1016/j.ygeno.2018.07.011 PMID: 30059731
- Libbrecht MW, Noble WS. Machine learning applications in genetics and genomics. Nat Rev Genet 2015; 16(6): 321-32. doi: 10.1038/nrg3920 PMID: 25948244
- Kalkatawi M, Magana-Mora A, Jankovic B, Bajic VB. Deepgsr: An optimized deep-learning structure for the recognition of genomic signals and regions. Bioinformatics 2019; 35(7): 1125-32.
- Tabl AA, Alkhateeb A, ElMaraghy W, Rueda L, Ngom A. A machine learning approach for identifying gene biomarkers guiding the treatment of breast cancer. Front Genet 2019; 10: 256. doi: 10.3389/fgene.2019.00256 PMID: 30972106
- Cheng F, Lu W, Liu C, et al. A genome-wide positioning systems network algorithm for in silico drug repurposing. Nat Commun 2019; 10(1): 3476. doi: 10.1038/s41467-019-10744-6 PMID: 31375661
- Umarov RK, Solovyev VV. Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks. PLoS One 2017; 12(2): e0171410. doi: 10.1371/journal.pone.0171410 PMID: 28158264
- Shujaat M, Chong KT. Hm-prom: Cnn based prediction of tata promoters from human and mouse sequences. 21st International Conference on Control, Automation and Systems (ICCAS). Jeju, Korea. 2021; pp. 12-5. Oct; 1848-52.
- Shujaat M, Lee SB, Tayara H, Chong KT. Cr-prom: A convolutional neural network-based model for the prediction of rice promoters. IEEE Access 2021; 9: 81485-91. doi: 10.1109/ACCESS.2021.3086102
- Salamov V S A, Solovyevand A. Automatic annotation of microbial genomes and metagenomic sequences, Metagenomics and its applica-tions in agriculture. Biomed Environ Stud 2011; pp. 61-78.
- de Avila e Silva S, Echeverrigaray S, Gerhardt GJL. BacPP: Bacterial promoter prediction-A tool for accurate sigma-factor specific assignment in enterobacteria. J Theor Biol 2011; 287: 92-9. doi: 10.1016/j.jtbi.2011.07.017 PMID: 21827769
- Rahman M, Aktar U, Jani MR, Shatabda S, et al. ipro70-fmwin: Identifying sigma70 promoters using multiple windowing and minimal features. Mol Genet Genom 2019; 294(1): 69-84.
- Aktar U. Identification of bacterial sigma 70 promoter sequences using feature subspace based ensemble classifier. PhD thesis, Department of Computer Science and Engineering (CSE) 2018.
- Liu B, Li K. ipromoter-2l2.0: Identifying promoters and their types by combining smoothing cutting window algorithm and sequence-based features. Mol Therapy-Nucleic Acids 2019; 18: 80-7.
- Dona MSI, Prendergast LA, Mathivanan S, Keerthikumar S, Salim A. Powerful differential expression analysis incorporating network topology for next-generation sequencing data. Bioinformatics 2017; 33(10): 1505-13. doi: 10.1093/bioinformatics/btw833 PMID: 28172447
- Galili T. Dendextend: An R package for visualizing, adjusting and comparing trees of hierarchical clustering. Bioinformatics 2015; 31(22): 3718-20. doi: 10.1093/bioinformatics/btv428 PMID: 26209431
- Shujaat M, Wahab A, Tayara H, Chong KT. pcPromoter-CNN: A cnn based prediction and classification of promoters. Genes 2020; 11(12): 1529. doi: 10.3390/genes11121529 PMID: 33371507
- Shahmuradov IA, Umarov RK, Solovyev VV. TSSPlant: A new tool for prediction of plant Pol II promoters. Nucleic Acids Res 2017; 45(8): gkw1353. doi: 10.1093/nar/gkw1353 PMID: 28082394
- Kim J, Shujaat M, Tayara H. iProm-Zea: A two-layer model to identify plant promoters and their types using convolutional neural net-work. Genomics 2022; 114(3): 110384. doi: 10.1016/j.ygeno.2022.110384 PMID: 35533969
- Zhu Y, Li F, Xiang D, Akutsu T, Song J, Jia C. Computational identification of eukaryotic promoters based on cascaded deep capsule neural networks. Brief Bioinform 2021; 22(4): bbaa299. doi: 10.1093/bib/bbaa299 PMID: 33227813
- Coppens L, Lavigne R. SAPPHIRE: A neural network based classifier for σ70 promoter prediction in Pseudomonas. BMC Bioinformatics 2020; 21(1): 415. doi: 10.1186/s12859-020-03730-z PMID: 31898485
- Lv H, Dao FY, Zhang D, et al. idna-ms: An integrated computational tool for detecting dna modification sites in multiple genomes. iScience 2020; 23(4): 100991. doi: 10.1016/j.isci.2020.100991 PMID: 32240948
- Dreos R, Ambrosini G, Périer RC, Bucher P. The eukaryotic promoter database: Expansion of EPDnew and new promoter analysis tools. Nucleic Acids Res 2015; 43(D1): D92-6. doi: 10.1093/nar/gku1111 PMID: 25378343
- Liya DH, Elanchezhian M, Pahari M, et al. Qpromoters: Sequence based prediction of promoter strength in Saccharomyces cerevisiae. bioRxiv 2021; 2021; 441621. doi: 10.1101/2021.04.27.441621
- Sun A, Xiao X, Xu Z. iptt (2 l)-cnn: A two-layer predictor for identifying promoters and their types in plant genomes by convolutional neural network. Comput Math Methods Med 2021; 2021: 1-9. doi: 10.1155/2021/6636350 PMID: 33488763
- Alam W, Tayara H, Chong KT. Xg-ac4c: Identification of n4-acetylcytidine (ac4c) in mrna using extreme gradient boosting with electron-ion interaction pseudopotentials. Sci Rep 2020; 10(1): 20942.
- Jeong BS, Golam Bari ATM, Rokeya Reaz M, Jeon S, Lim CG, Choi HJ. Codon-based encoding for DNA sequence analysis. Methods 2014; 67(3): 373-9. doi: 10.1016/j.ymeth.2014.01.016 PMID: 24530970
- Lim DY, Rehman MU, Chong KT. irg-4mc: Neural network based tool for identification of dna 4mc sites in rosaceae genome. Symmetry 2021; 13(5): 899. doi: 10.3390/sym13050899
- Abbas Z, Tayara H, Chong K. ZayyuNet - A unified deep learning model for the identification of epigenetic modifications using raw genomic sequences. IEEE/ACM Trans Comput Biol Bioinform 2022; 19(4): 2533-44.
Дополнительные файлы
 
				
			 
						 
					 
						 
						 
						 
									 
  
  
  Отправить статью по E-mail
			Отправить статью по E-mail 