Optimized Ensemble Stacking Framework for NLinked Glycosylation Site Prediction in Human Proteins Using Cross Validation Weighted Learning and Feature Augmentation

Optimized Ensemble Stacking Framework for NLinked Glycosylation Site Prediction in Human Proteins Using Cross Validation Weighted Learning and Feature Augmentation

Files

01_title.pdf (198.44 KB)

02_prelim_pages.pdf (411.55 KB)

03_contents.pdf (383.43 KB)

04_abstract.pdf (366.32 KB)

05_chapter 1.pdf (1.37 MB)

Abstract

The N-linked glycosylation of proteins is a common and functionally important newlinepost-translational modification in human proteins, determining folding, stability, newlinetrafficking, and cellular signaling. Identifying exactly where, on the protein chain, newlinethis modification occurs-usually at asparagine (N) residues in a limited newlinesequence motif is crucial to understanding protein function and also for the newlinedesign of drugs and the interpretation of high-throughput proteomics data. newlineExperimental identification of sites of glycosylation produces reliable results but newlineis expensive and time-consuming. Computed predictions present a rapid, costeffective newlinecomplement, yet existing predictive methods suffer from such issues newlineas class imbalance, with far fewer glycosylated sites than non-glycosylated newlineones, under-exploitation of heterogeneous features, and sensitivity to noise. newlineThis thesis presents an Optimized Ensemble Stacking Framework for the newlinePrediction of N-Linked Glycosylation Sites in Human Proteins that overcome newlinethese limitations by combining robust model design with careful validation, newlineweighted learning, and rich feature augmentation for high-performance, newlinegeneralizable predictions. To obtain optimized result using machine learning newlineclassifiers various dataset explored and three datasets namely UniprotKB, newlinedbPTM and nGlycositeAtlas was selected and combined to get final dataset of newlinethe model. newlineThe different layers of the proposed framework are stacked ensembles, where newlinemultiple diverse base learners extract complementary patterns from protein newlinesequence and derived features, and a powerful meta-learner synthesizes their newlineoutputs into a final prediction. The base learners include algorithms with newlinedifferent inductive biases for instance, support vector machines (SVM), Logistic newlineRegression(LR), Random Forest (RF) and Extreme Gradient Boosting newline(XGBoost) chosen to capture both nonlinear interactions and linear trends. The newlinemeta-learner is implemented using XGBoost because it can handle newlineii newlineheterogeneous inputs, has resistance to overfitting through

URI

http://hdl.handle.net/10603/685410

Collections

Faculty of Computer Science & Applications

Full item page

Optimized Ensemble Stacking Framework for NLinked Glycosylation Site Prediction in Human Proteins Using Cross Validation Weighted Learning and Feature Augmentation

Files

Date

item.page.authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced