Zobrazeno 1 - 10
of 21
pro vyhledávání: '"Le Fèvre, Valentin"'
Autor:
Le Fèvre, Valentin, Herault, Thomas, Robert, Yves, Bouteiller, Aurelien, Hori, Atsushi, Bosilca, George, Dongarra, Jack
Publikováno v:
In Parallel Computing July 2019 85:1-12
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Autor:
Bosilca, George, Bouteiller, Aurélien, Hérault, Thomas, Le Fèvre, Valentin, Robert, Yves, Dongarra, Jack
Publikováno v:
International Journal of Networking and Computing
International Journal of Networking and Computing, 2022, 12 (1)
International Journal of Networking and Computing, 2022, 12 (1)
International audience; This paper revisits distributed termination detection algorithms in the context of High-Performance Computing (HPC) applications. We introduce an efficient variant of the Credit Distribution Algorithm (CDA) and compare it to t
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=dedup_wf_001::5bac12d1175beb84e26600308b368e4f
https://inria.hal.science/hal-03920388/file/ijnc22.pdf
https://inria.hal.science/hal-03920388/file/ijnc22.pdf
Autor:
Benoit, Anne, Le Fèvre, Valentin, Perotin, Lucas, Raghavan, Padma, Robert, Yves, Sun, Hongyang
Publikováno v:
[Research Report] RR-9340, Inria-Research Centre Grenoble – Rhône-Alpes. 2021
We study the resilient scheduling of moldable parallel jobs on high-performance computing (HPC) platforms. Moldable jobs allow for choosing a processor allocation before execution, and their execution time obeys various speedup models. The objective
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=od______3393::ab93c7472648ec65a4d360364a333ee1
https://inria.hal.science/hal-02614215
https://inria.hal.science/hal-02614215
Autor:
Le Fèvre, Valentin
Publikováno v:
Distributed, Parallel, and Cluster Computing [cs.DC]. Université de Lyon, 2020. English. ⟨NNT : 2020LYSEN019⟩
This thesis focuses on a major problem for the HPC community: resilience. Computing platforms are bigger and bigger in order to reach what we call exascale, i.e. a computing capacity of 10^18 FLOP/s but they suffer numerous failures. Reducing the exe
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=od_______212::9ed0c9db7e3c758dc65da4d573159a90
https://tel.archives-ouvertes.fr/tel-02947051/file/LE_FEVRE_Valentin_2020LYSEN019_These.pdf
https://tel.archives-ouvertes.fr/tel-02947051/file/LE_FEVRE_Valentin_2020LYSEN019_These.pdf
Autor:
Benoit, Anne, Le Fèvre, Valentin, Perotin, Lucas, Raghavan, Padma, Robert, Yves, Sun, Hongyang
Publikováno v:
[Research Report] RR-9340, Inria-Research Centre Grenoble – Rhône-Alpes. 2020
[Research Report] RR-9340, Inria-Research Centre Grenoble – Rhône-Alpes. 2021
[Research Report] RR-9340, Inria-Research Centre Grenoble – Rhône-Alpes. 2021
This paper focuses on the resilient scheduling of moldable parallel jobson high-performance computing (HPC) platforms. Moldable jobs allow for choosing aprocessor allocation before execution, and their execution time obeys various speedup models. The
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=dedup_wf_001::250aea526190e8a3dff8aa9082926af1
https://hal.inria.fr/hal-02614215
https://hal.inria.fr/hal-02614215
Publikováno v:
[Research Report] RR-9235, ROMA (INRIA Rhône-Alpes / LIP Laboratoire de l’Informatique du Parallélisme); LIP-Laboratoire de l’Informatique du Parallélisme. 2018, pp.1-32
Large-scale platforms currently experience errors from two different sources,namely fail-stop errors (which interrupt the execution) and silent errors (which strikeunnoticed and corrupt data). This work combines checkpointing and replication for ther
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=od_______212::25c1e9c9818d7b705d5ebb4bc32fdbfa
https://hal.inria.fr/hal-01955859
https://hal.inria.fr/hal-01955859
Autor:
Bosilca, George, Bouteiller, Aurelien, Hérault, Thomas, Le Fèvre, Valentin, Robert, Yves, Dongarra, Jack
Publikováno v:
[Research Report] RR-9181, Inria-Research Centre Grenoble – Rhône-Alpes. 2018, pp.1-28
[Research Report] RR-9181, Inria-Research Centre Grenoble – Rhône-Alpes. 2018, pp.28
[Research Report] RR-9181, Inria-Research Centre Grenoble – Rhône-Alpes. 2018, pp.28
This paper revisits distributed termination detection algorithms in the context of high-performance computing applications in task systems. We first outline the need to efficiently detect termination in workflows for which the total number of tasks i
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=dedup_wf_001::f5c3e0f955b99ce870e943200b003fbc
https://inria.hal.science/hal-01811823
https://inria.hal.science/hal-01811823
Publikováno v:
[Research Report] RR-9152, Inria-Research Centre Grenoble – Rhône-Alpes. 2018, pp.1-36
This report combines checkpointing and replication for the reliable executionof linear workows. While both methods have been studied separately, their combinationhas not yet been investigated despite its promising potential to minimize the execution
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=od_______212::4053fc0e3d1644dddbbcfca8328eea59
https://hal.inria.fr/hal-01714978
https://hal.inria.fr/hal-01714978
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.