Výsledky vyhledávání - "Nuria Losada"

Fault tolerance of MPI applications in exascale systems: The ULFM solution

Autor: Keita Teranishi, Patricia González, George Bosilca, Aurelien Bouteiller, Nuria Losada, María Martín

Publikováno v: RUC: Repositorio da Universidade da Coruña
Universidade da Coruña (UDC)
RUC. Repositorio da Universidade da Coruña
instname

[Abstract] The growth in the number of computational resources used by high-performance computing (HPC) systems leads to an increase in failure rates. Fault-tolerant techniques will become essential for long-running applications executing in future e

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::9c2054074400e6515d06b52fa6854a1d
https://doi.org/10.1016/j.future.2020.01.026

Zobrazit plný text záznamu

Asynchronous Receiver-Driven Replay for Local Rollback of MPI Applications

Autor: Nuria Losada, Aurelien Bouteiller, George Bosilca

Publikováno v: FTXS@SC

With the increase in scale and architectural complexity of supercomputers, the management of failures has become integral to successfully executing a long-running high-performance computing application. In many instances, failures have a localized sc

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::fc5cebcc10d1377c72fe04a5df2659c8
https://doi.org/10.1109/ftxs49593.2019.00006

Zobrazit plný text záznamu

A portable and adaptable fault tolerance solution for heterogeneous applications

Autor: Mara J. Martn, Nuria Losada, Patricia Gonzlez, Basilio B. Fraguela

Publikováno v: RUC. Repositorio da Universidade da Coruña
Universitat Oberta de Catalunya (UOC)
instname

[Abstract] Heterogeneous systems have increased their popularity in recent years due to the high performance and reduced energy consumption capabilities provided by using devices such as GPUs or Xeon Phi accelerators. This paper proposes a checkpoint

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::c411114b14430688d10044316ae35310
https://doi.org/10.1016/j.jpdc.2017.01.020

Zobrazit plný text záznamu

Local Rollback for Resilient Mpi Applications With Application-Level Checkpointing and Message Logging

Autor: María Martín, Aurelien Bouteiller, Patricia González, Nuria Losada, George Bosilca

Publikováno v: RUC: Repositorio da Universidade da Coruña
Universidade da Coruña (UDC)
RUC. Repositorio da Universidade da Coruña
instname

[Abstract] The resilience approach generally used in high-performance computing (HPC) relies on coordinated checkpoint/restart, a global rollback of all the processes that are running the application. However, in many instances, the failure has a mor

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::011b4fcf909698e01b3ce01e9af17ca5
http://hdl.handle.net/2183/27584

Zobrazit plný text záznamu

Portable Application-level Checkpointing for Hybrid MPI-OpenMP Applications

Autor: Gabriel Rodrguez, Mara J. Martn, Patricia Gonzlez, Nuria Losada

Publikováno v: ICCS

As parallel machines increase their number of processors, so does the failure rate of the global system, thus, long-running applications will need to make use of fault tolerance techniques to ensure the successful execution completion. Most of curren

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::36a8ecb7185abef72c234cd681d9b30e
https://doi.org/10.1016/j.procs.2016.05.294

Zobrazit plný text záznamu

Insights into Application-level Solutions towards Resilient MPI Applications

Autor: María Martín, Patricia González, Nuria Losada

Publikováno v: HPCS

Current petascale systems, formed by hundreds of thousands of cores, are highly dynamic, which causes that hardware failure rates are relatively high. Failure data collected from two large high-performance computing sites have been analysed in [1], s

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::651a255bb6fa3d5b7a35042942cae33b
https://doi.org/10.1109/hpcs.2018.00101

Zobrazit plný text záznamu

Towards Ad Hoc Recovery for Soft Errors

Autor: Kai Keller, Nuria Losada, Osman Unsal, Leonardo Bautista-Gomez

Publikováno v: UPCommons. Portal del coneixement obert de la UPC
Universitat Politècnica de Catalunya (UPC)
FTXS@SC
Recercat. Dipósit de la Recerca de Catalunya
instname

The coming exascale era is a great opportunity for high performance computing (HPC) applications. However, high failure rates on these systems will hazard the successful completion of their execution. Bit-flip errors in dynamic random access memory (

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::576ce18667b1ef900caf9daaef5cbc86

Zobrazit plný text záznamu

Resilient MPI applications using an application-level checkpointing framework and ULFM

Autor: Patricia González, María Martín, Nuria Losada, Iván Cores

Publikováno v: RUC. Repositorio da Universidade da Coruña
instname

This is a post-peer-review, pre-copyedit version of an article published in Journal of Supercomputing. The final authenticated version is available online at: https://doi.org/10.1007/s11227-016-1629-7 [Abstract] Future exascale systems, formed by mil

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::69b8e09994e9f5efc41650d7cc2ca7d4
http://hdl.handle.net/2183/20890

Zobrazit plný text záznamu

I/O Optimization in the Checkpointing of OpenMP Parallel Applications

Autor: Nuria Losada, María Martín, Gabriel Rodríguez, Patricia González

Publikováno v: PDP

Despite the increasing popularity of shared-memory systems, there is a lack of tools for providing fault tolerance support to shared-memory applications. Check pointing is one of the most popular fault tolerance techniques. However, check pointing co

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::804dbca1cac3de2dbbb69039f0a8d2c1
https://doi.org/10.1109/pdp.2015.39

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání