Identification of potential biomarkers for esophageal squamous cell carcinoma using unsupervised machine learning

Loading...
Thumbnail Image

Date

item.page.authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Esophageal Squamous Cell Carcinoma (ESCC) is known for its high prevalence and aggressivness. It is often diagnosed at advanced stages due to the lack of specific symptoms, highlighting the urgent need to explore new diagnostic and therapeutic approaches. The identification of reliable biomarkers is pivotal for accurate diagnosis, prognosis, and the development of personalized treatment approaches tailored to individual patient profiles. This comprehensive study harnesses diverse datasets, including microarray, RNA sequencing (RNA-seq), and single cell RNA sequencing (scRNA-seq), to deeply explore the molecular landscape of ESCC. As the large-scale biological datasets missing data always becomes a challenging issue for the researchers , hence, this study introduces a novel ensemble algorithm for missing data imputation. The algorithm integrates four robust techniques: k- nearest neighbor, local least squares, K- means clustering, and missForest algorithm to effectively mitigate gaps in the datasets. Comparative analyses across eight distinct datasets demonstrate the superior performance and robustness of the proposed imputation method, showcasing its ability to enhance data completeness and reliability. Afterward, the research focuses on biomarker discovery using various biclustering algorithms to identify groups of genes with coherent expression patterns. Additionally, EnsemBic, an ensemble biclustering algorithm, is introduced to bolster the reliability and comprehensiveness of biomarker identification. Topological and biological analyses focusing on elite genes within identified biclusters aid in pinpointing potential biomarkers intricately linked to ESCC, providing insights into the underlying molecular mechanisms of the disease. Subsequently, community detection algorithms are applied to unveil latent structures within the datasets, revealing hidden biological communities. The development and evaluation of two novel community detection algorithms highlight their efficacy in identifying potential biomarkers.

Description

Keywords

Citation

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced