Intelligent Scheduling Strategies and Resource Optimization for Big Data Processing

Abstract

The exponential growth of data in modern computing systems has driven the need for efficient newlineframeworks for Big data processing. However, challenges remain in optimising for the newlineBig data processing performance, energy consumption, and operational costs, particularly in newlineheterogeneous and cloud environments with dynamic workloads. Big data processing frameworks, newlinesuch as Hadoop and Apache Spark, despite their scalability, often face inefficiencies, newlinesuch as suboptimal resource utilisation, increased latency, and higher energy consumption. This newlinethesis explored the main challenges in big data processing, focusing on enhancing performance, newlineoptimising energy consumption, and reducing operational costs in cloud and heterogeneous computing environments. This thesis introduced adaptive data placement and intelligent scheduling newlinestrategies to address these issues. A key contribution of this thesis is the Adaptive Node-Aware newlineData Placement method, which optimises Hadoop data distribution by dynamically adjusting newlineallocations based on real-time node throughput, thereby minimising delays from straggler nodes newlineand improving data locality. Experiments confirmed the effectiveness of the proposed placement newlinemethod over the Hadoop default placement strategy. newlineBuilding on the data placement optimisations in Hadoop, the subsequent phase of this thesis newlinefocused on enhancing job scheduling efficiency in Apache Spark. We have evaluated various newlineApache Spark scheduling strategies, with the Multilevel Feedback Queue scheduling showing newlinestrong performance for latency-sensitive tasks due to its dynamic priority adjustments.Next, newlinefor cloud environments with dynamic workloads, we initially developed a Deep Reinforcement newlineLearning based scheduler using Proximal Policy Optimisation, which adapts to changing resource newlinedemands while maintaining a high job throughput. Building on this, we introduced a newlinemore advanced approach, based on Distributional Deep Reinforcement Learning, to optimise newlineresource allocation and account for workload prediction uncertainties,

Description

Keywords

Citation

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced