Intelligent Scheduling Strategies and Resource Optimization for Big Data Processing
Loading...
Date
item.page.authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The exponential growth of data in modern computing systems has driven the need for efficient
newlineframeworks for Big data processing. However, challenges remain in optimising for the
newlineBig data processing performance, energy consumption, and operational costs, particularly in
newlineheterogeneous and cloud environments with dynamic workloads. Big data processing frameworks,
newlinesuch as Hadoop and Apache Spark, despite their scalability, often face inefficiencies,
newlinesuch as suboptimal resource utilisation, increased latency, and higher energy consumption. This
newlinethesis explored the main challenges in big data processing, focusing on enhancing performance,
newlineoptimising energy consumption, and reducing operational costs in cloud and heterogeneous computing environments. This thesis introduced adaptive data placement and intelligent scheduling
newlinestrategies to address these issues. A key contribution of this thesis is the Adaptive Node-Aware
newlineData Placement method, which optimises Hadoop data distribution by dynamically adjusting
newlineallocations based on real-time node throughput, thereby minimising delays from straggler nodes
newlineand improving data locality. Experiments confirmed the effectiveness of the proposed placement
newlinemethod over the Hadoop default placement strategy.
newlineBuilding on the data placement optimisations in Hadoop, the subsequent phase of this thesis
newlinefocused on enhancing job scheduling efficiency in Apache Spark. We have evaluated various
newlineApache Spark scheduling strategies, with the Multilevel Feedback Queue scheduling showing
newlinestrong performance for latency-sensitive tasks due to its dynamic priority adjustments.Next,
newlinefor cloud environments with dynamic workloads, we initially developed a Deep Reinforcement
newlineLearning based scheduler using Proximal Policy Optimisation, which adapts to changing resource
newlinedemands while maintaining a high job throughput. Building on this, we introduced a
newlinemore advanced approach, based on Distributional Deep Reinforcement Learning, to optimise
newlineresource allocation and account for workload prediction uncertainties,