Zobrazeno 1 - 10
of 18
pro vyhledávání: '"Valentin Le Fèvre"'
Autor:
George Bosilca, Aurélien Bouteiller, Thomas Herault, Valentin Le Fèvre, Yves Robert, Jack Dongarra
Publikováno v:
International Journal of Networking and Computing. 12:26-46
Publikováno v:
International Journal of Networking and Computing
International Journal of Networking and Computing, Higashi Hiroshima : Dept. of Computer Engineering, Hiroshima University, In press
International Journal of Networking and Computing, Higashi Hiroshima : Dept. of Computer Engineering, Hiroshima University, 2021, 11 (1)
International Journal of Networking and Computing, 2021, 11 (1), pp.1-25. ⟨10.15803/ijnc.11.1_2⟩
International Journal of Networking and Computing, Higashi Hiroshima : Dept. of Computer Engineering, Hiroshima University, 2021, 11 (1), pp.1-25
International Journal of Networking and Computing, Higashi Hiroshima : Dept. of Computer Engineering, Hiroshima University, In press
International Journal of Networking and Computing, Higashi Hiroshima : Dept. of Computer Engineering, Hiroshima University, 2021, 11 (1)
International Journal of Networking and Computing, 2021, 11 (1), pp.1-25. ⟨10.15803/ijnc.11.1_2⟩
International Journal of Networking and Computing, Higashi Hiroshima : Dept. of Computer Engineering, Hiroshima University, 2021, 11 (1), pp.1-25
International audience; This paper focuses on the resilient scheduling of parallel jobs on highperformance computing (HPC) platforms to minimize the overall completion time, or the makespan. We revisit the classical problem while assuming that jobs a
Publikováno v:
IEEE Transactions on Computers
IEEE Transactions on Computers, 2021, pp.1-14. ⟨10.1109/TC.2021.3104747⟩
IEEE Transactions on Computers, Institute of Electrical and Electronics Engineers, 2021, pp.14. ⟨10.1109/TC.2021.3104747⟩
IEEE Transactions on Computers, Institute of Electrical and Electronics Engineers, 2021, pp.1-14. ⟨10.1109/TC.2021.3104747⟩
IEEE Transactions on Computers, 2021, pp.1-14. ⟨10.1109/TC.2021.3104747⟩
IEEE Transactions on Computers, Institute of Electrical and Electronics Engineers, 2021, pp.14. ⟨10.1109/TC.2021.3104747⟩
IEEE Transactions on Computers, Institute of Electrical and Electronics Engineers, 2021, pp.1-14. ⟨10.1109/TC.2021.3104747⟩
International audience; We study the resilient scheduling of moldable parallel jobs on high-performance computing (HPC) platforms. Moldable jobs allow for choosing a processor allocation before execution, and their execution time obeys various speedu
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::19d974d0779b97dbf31732619465aba4
https://inria.hal.science/hal-03509760/file/moldable_ieeetc.pdf
https://inria.hal.science/hal-03509760/file/moldable_ieeetc.pdf
Publikováno v:
CLUSTER 2020-IEEE International Conference on Cluster Computing
CLUSTER 2020-IEEE International Conference on Cluster Computing, Sep 2020, Kobe, Japan. pp.1-29
CLUSTER
CLUSTER 2020-IEEE International Conference on Cluster Computing, Sep 2020, Kobe, Japan. pp.1-29
CLUSTER
International audience; This paper focuses on the resilient scheduling of moldable parallel jobs on high-performance computing (HPC) platforms. Moldable jobs allow for choosing a processor allocation before execution, and their execution time obeys v
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b26543ca079367b4e16a168d9b816bfc
https://inria.hal.science/hal-03028773/file/moldable_cluster_hal.pdf
https://inria.hal.science/hal-03028773/file/moldable_cluster_hal.pdf
Publikováno v:
[Research Report] RR-9351, Inria-Research Centre Grenoble – Rhône-Alpes. 2020
Resilience 2020-12th Workshop on Resiliency in High Performance Computing in Clusters, Clouds, and Grids (colocated with Euro-Par)
Resilience 2020-12th Workshop on Resiliency in High Performance Computing in Clusters, Clouds, and Grids (colocated with Euro-Par), Aug 2020, Warsaw, Poland. pp.1-14
Euro-Par 2020: Parallel Processing Workshops
Euro-Par 2020: Parallel Processing Workshops ISBN: 9783030715922
Euro-Par Workshops
Resilience 2020-12th Workshop on Resiliency in High Performance Computing in Clusters, Clouds, and Grids (colocated with Euro-Par)
Resilience 2020-12th Workshop on Resiliency in High Performance Computing in Clusters, Clouds, and Grids (colocated with Euro-Par), Aug 2020, Warsaw, Poland. pp.1-14
Euro-Par 2020: Parallel Processing Workshops
Euro-Par 2020: Parallel Processing Workshops ISBN: 9783030715922
Euro-Par Workshops
International audience; This paper compares several fault-tolerance methods for the detection and correction of floating-point errors in matrix-matrix multiplication. These methods include replication, triplication, Algorithm-Based Fault Tolerance (A
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b7432cdf92e949b255aa235a4a718ee5
https://hal.inria.fr/hal-02867859
https://hal.inria.fr/hal-02867859
Publikováno v:
IPDPS Workshops
APDCM 2020-Workshop on Advances in Parallel and Distributed Computational Models (colocated with IPDPS)
APDCM 2020-Workshop on Advances in Parallel and Distributed Computational Models (colocated with IPDPS), May 2020, New Orleans, LA, United States. pp.1-27
[Research Report] RR-9296, Inria-Research Centre Grenoble – Rhône-Alpes. 2019, pp.31
APDCM 2020-Workshop on Advances in Parallel and Distributed Computational Models (colocated with IPDPS)
APDCM 2020-Workshop on Advances in Parallel and Distributed Computational Models (colocated with IPDPS), May 2020, New Orleans, LA, United States. pp.1-27
[Research Report] RR-9296, Inria-Research Centre Grenoble – Rhône-Alpes. 2019, pp.31
This paper focuses on the resilient scheduling of parallel jobs on highperformance computing (HPC) platforms to minimize the overall completion time, or makespan. We revisit the problem by assuming that jobs are subject to transient or silent errors,
Publikováno v:
SC 2019-International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'19)
SC 2019-International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'19), Nov 2019, Denver, United States
[Research Report] RR-9278, Inria-Research Centre Grenoble – Rhône-Alpes. 2019
SC
SC 2019-International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'19), Nov 2019, Denver, United States
[Research Report] RR-9278, Inria-Research Centre Grenoble – Rhône-Alpes. 2019
SC
This paper revisits replication coupled with checkpointing for fail-stop errors. Replication enables the application to survive many fail-stop errors, thereby allowing for longer checkpointing periods. Previously published works use replication with
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b87c53e9a90a53f50f7f770d062688c8
https://inria.hal.science/hal-02273142/file/sc-hal.pdf
https://inria.hal.science/hal-02273142/file/sc-hal.pdf
Autor:
Jack Dongarra, George Bosilca, Valentin Le Fèvre, Aurelien Bouteiller, Atsushi Hori, Thomas Herault, Yves Robert
Publikováno v:
Parallel Computing
Parallel Computing, Elsevier, 2019, 85, pp.1-12. ⟨10.1016/j.parco.2019.02.002⟩
Parallel Computing, 2019, 85, pp.1-12. ⟨10.1016/j.parco.2019.02.002⟩
Parallel Computing, Elsevier, 2019, 85, pp.1-12. ⟨10.1016/j.parco.2019.02.002⟩
Parallel Computing, 2019, 85, pp.1-12. ⟨10.1016/j.parco.2019.02.002⟩
This paper compares the performance of different approaches to tolerate failures for applications executing on large-scale failure-prone platforms. We study (i) R i g i d applications, which use a constant number of processors throughout execution; (
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::39ae5d81a4f0756a9a4aa4ef01906706
https://hal.inria.fr/hal-03360189
https://hal.inria.fr/hal-03360189
Publikováno v:
UPCommons. Portal del coneixement obert de la UPC
Universitat Politècnica de Catalunya (UPC)
PMBS@SC
Recercat. Dipósit de la Recerca de Catalunya
instname
Universitat Politècnica de Catalunya (UPC)
PMBS@SC
Recercat. Dipósit de la Recerca de Catalunya
instname
Multi-grid methods are numerical algorithms used in parallel and distributed processing. The main idea of multigrid solvers is to speedup the convergence of an iterative method by reducing the problem to a coarser grid a number of times. Multi-grid m
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::4ae8d0cd61d2ecdc104002c0d0bc961b
https://hdl.handle.net/2117/133927
https://hdl.handle.net/2117/133927
Publikováno v:
ACM Transactions on Parallel Computing
ACM Transactions on Parallel Computing, In press, ⟨10.1145/3338510⟩
ACM Transactions on Parallel Computing, Association for Computing Machinery, In press, ⟨10.1145/3338510⟩
ACM Transactions on Parallel Computing, In press, ⟨10.1145/3338510⟩
ACM Transactions on Parallel Computing, Association for Computing Machinery, In press, ⟨10.1145/3338510⟩
With the ever-growing need of data in HPC applications, the congestion at the I/O level becomes critical in supercomputers. Architectural enhancement such as burst buffers and pre-fetching are added to machines but are not sufficient to prevent conge
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::8f562e4a214648b85bd293de6cc68f56
https://inria.hal.science/hal-02141576/file/topc.pdf
https://inria.hal.science/hal-02141576/file/topc.pdf