Optimized Ensemble Stacking Framework for NLinked Glycosylation Site Prediction in Human Proteins Using Cross Validation Weighted Learning and Feature Augmentation
| dc.contributor.guide | Jaimin Undavia | |
| dc.coverage.spatial | ||
| dc.creator.researcher | MUBINA SHAQIL MALIK | |
| dc.date.accessioned | 2026-01-02T04:44:27Z | |
| dc.date.available | 2026-01-02T04:44:27Z | |
| dc.date.awarded | 2026 | |
| dc.date.completed | 2025 | |
| dc.date.registered | 2019 | |
| dc.description.abstract | The N-linked glycosylation of proteins is a common and functionally important newlinepost-translational modification in human proteins, determining folding, stability, newlinetrafficking, and cellular signaling. Identifying exactly where, on the protein chain, newlinethis modification occurs-usually at asparagine (N) residues in a limited newlinesequence motif is crucial to understanding protein function and also for the newlinedesign of drugs and the interpretation of high-throughput proteomics data. newlineExperimental identification of sites of glycosylation produces reliable results but newlineis expensive and time-consuming. Computed predictions present a rapid, costeffective newlinecomplement, yet existing predictive methods suffer from such issues newlineas class imbalance, with far fewer glycosylated sites than non-glycosylated newlineones, under-exploitation of heterogeneous features, and sensitivity to noise. newlineThis thesis presents an Optimized Ensemble Stacking Framework for the newlinePrediction of N-Linked Glycosylation Sites in Human Proteins that overcome newlinethese limitations by combining robust model design with careful validation, newlineweighted learning, and rich feature augmentation for high-performance, newlinegeneralizable predictions. To obtain optimized result using machine learning newlineclassifiers various dataset explored and three datasets namely UniprotKB, newlinedbPTM and nGlycositeAtlas was selected and combined to get final dataset of newlinethe model. newlineThe different layers of the proposed framework are stacked ensembles, where newlinemultiple diverse base learners extract complementary patterns from protein newlinesequence and derived features, and a powerful meta-learner synthesizes their newlineoutputs into a final prediction. The base learners include algorithms with newlinedifferent inductive biases for instance, support vector machines (SVM), Logistic newlineRegression(LR), Random Forest (RF) and Extreme Gradient Boosting newline(XGBoost) chosen to capture both nonlinear interactions and linear trends. The newlinemeta-learner is implemented using XGBoost because it can handle newlineii newlineheterogeneous inputs, has resistance to overfitting through | |
| dc.description.note | ||
| dc.format.accompanyingmaterial | DVD | |
| dc.format.dimensions | ||
| dc.format.extent | ||
| dc.identifier.researcherid | 0000-0000-0000-0000 | |
| dc.identifier.uri | http://hdl.handle.net/10603/685410 | |
| dc.language | English | |
| dc.publisher.institution | Faculty of Computer Science and Applications | |
| dc.publisher.place | Anand | |
| dc.publisher.university | Charotar University of Science and Technology | |
| dc.relation | ||
| dc.rights | university | |
| dc.source.university | University | |
| dc.subject.keyword | Computer Science | |
| dc.subject.keyword | Computer Science Information Systems | |
| dc.subject.keyword | Engineering and Technology | |
| dc.title | Optimized Ensemble Stacking Framework for NLinked Glycosylation Site Prediction in Human Proteins Using Cross Validation Weighted Learning and Feature Augmentation | |
| dc.title.alternative | ||
| dc.type.degree | Ph.D. |
Files
Original bundle
1 - 5 of 13
Loading...
- Name:
- 01_title.pdf
- Size:
- 198.44 KB
- Format:
- Adobe Portable Document Format
- Description:
- Attached File
Loading...
- Name:
- 02_prelim_pages.pdf
- Size:
- 411.55 KB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 of 1