|
Nome do aluno
|
Wesley Oliveira Souza
|
|---|---|
|
Título do trabalho
|
Navigating the Performance-Resilience Trilemma: HYDRA, a Budget-Aware Antifragile Approach for the Mission-Critical Application Placement in Computing Continuum
|
|
Resumo do trabalho
|
Application placement in the Computing Continuum must satisfy the increasingly complex demands of modern services. While this paradigm effectively integrates edge and cloud resources, it also introduces a critical trilemma: the simultaneous optimization of low latency, energy efficiency, and high resilience. These objectives are often mutually exclusive; for instance, distributing replicas to enhance resilience typically increases network latency and energy consumption, whereas consolidating them for energy efficiency compromises fault tolerance. Conventional approaches frequently prioritize performance and efficiency, treating resilience as a static, secondary constraint, an approach that is insufficient for mission-critical and latency-sensitive applications. To address this challenge, this research proposes Hydra, a novel placement approach designed to resolve this trilemma. Hydra leverages a hybrid architecture, combining Deep Reinforcement Learning (DRL) with heuristics, to dynamically manage application replicas. Moreover, it introduces an adaptive resilience mechanism that intelligently responds to failures, thereby enhancing service robustness while concurrently optimizing the conflicting objectives. A cornerstone of the Hydra framework is its ability to treat an application's Service Level Agreement (SLA) error budget as a dynamic and governable resource. This paradigm shift empowers the DRL agent to make strategic trade-offs: it can intentionally consume a controlled fraction of the error budget to proactively provision additional service replicas, thus increasing fault tolerance and scalability to preemptively mitigate cascading failures. To validate the feasibility of our approach, we have conducted preliminary simulation-based experiments with a partial implementation of Hydra. The results indicate that our method significantly outperforms a state-of-the-art baseline, particularly in maintaining high reliability and performance under heavy load conditions. These initial findings provide evidence that by strategically managing the SLA error budget, Hydra can achieve superior resilience without compromising predictable, SLA-compliant performance. This research will build upon this foundation to develop a complete and robust solution for next-generation distributed systems.
|
|
Orientador
|
Maycon Leone Maciel Peixoto - Universidade Federal da Bahia (UFBA)
|
|
Membro externo 1 (com afiliação)
|
Helder May Nunes da Silva Oliveira - Instituto de Matemática e Estatística da Universidade de São Paulo (IME-USP)
|
|
Link para o curriculum lattes
|
http://lattes.cnpq.br/
|
|
Membro interno 1 (com afiliação)
|
Cássio Vinicius Serafim Prazeres - Universidade Federal da Bahia (UFBA)
|
|
Link para o curriculum lattes
|
http://lattes.cnpq.br/
|
|
Suplente do membro externo (com afiliação)
|
Geraldo Pereira Rocha Filho - Universidade Estadual do Sudoeste da Bahia (UESB)
|
|
Link para o curriculum lattes
|
http://lattes.cnpq.br/
|
|
Suplente do membro interno (com afiliação)
|
Gustavo Bittencourt Figueiredo - Universidade Federal da Bahia (UFBA)
|
|
Link para o curriculum lattes
|
http://lattes.cnpq.br/
|
|
Data do exame
|
27 Nov, 2025
|
|
Horário do exame
|
1:00 PM
|