Design and Evaluation of Parallel Coded Systems
Loading...
Date
item.page.authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In this computer era, we all live in a place where the demand for data and computing is increasing day by day. Since the need for faster data retrieval and faster computation brings us a reliability as the solution, we need more more data storage points or computation units. Motivated by scalability, availability, and reliability, there has been a paradigm shift from centralized storage (computation) at a large supercomputer to distributed storage (computing) on a large cluster of regular servers to handle complex tasks. In distributed storage setting, a single file is divided into smaller number of subfiles, which are then stored across multiple nodes, and the file requests are handled by the storage cluster. Similarly, in dis- tributed compute setting, a single task is fragmented into a smaller number of subtasks, and processed by the compute cluster. File request time (task completion time) is limited by the slowest execution time of the parallel subtasks. The lagging subfile requests (subtasks) are referred to as stragglers, and they delay the entire file retrieval (task execution). Straggling servers is one of the challenges in distributed storage and compute systems. Redundancy has emerged as a popular technique to mitigate the impact of stragglers. Redundant subfile requests (compute subtasks) can be sent to a larger set of storage (compute) nodes, such that a smaller subset suffices for the file (task) completion. This approach can be used for straggler mitigation in the face of uncertainty in file retrieval (task execution) times at the storage (compute) nodes. Coding theoretic techniques can be employed to systematically control the redundancy in storage and compute systems...