Zobrazeno 1 - 10
of 23
pro vyhledávání: '"Ghiasvand, Siavash"'
Monitoring the status of large computing systems is essential to identify unexpected behavior and improve their performance and uptime. However, due to the large-scale and distributed design of such computing systems as well as a large number of moni
Externí odkaz:
http://arxiv.org/abs/2402.05114
This work prioritizes building a modular pipeline that utilizes existing models to systematically restore images, rather than creating new restoration models from scratch. Restoration is carried out at an object-specific level, with each object regen
Externí odkaz:
http://arxiv.org/abs/2401.05049
Publikováno v:
International Workshop on Data-driven Resilience Research 2022, https://2022.dataweek.de/d2r2-22/
System logs are a common source of monitoring data for analyzing computing systems' behavior. Due to the complexity of modern computing systems and the large size of collected monitoring data, automated analysis mechanisms are required. Numerous mach
Externí odkaz:
http://arxiv.org/abs/2212.01101
Autor:
Ghiasvand, Siavash, Ciorba, Florina M.
In response to the demand for higher computational power, the number of computing nodes in high performance computers (HPC) increases rapidly. Exascale HPC systems are expected to arrive by 2020. With drastic increase in the number of HPC system comp
Externí odkaz:
http://arxiv.org/abs/1906.04550
Publikováno v:
29th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2016)
The mean time between failures (MTBF) of HPC systems is rapidly reducing, and that current failure recovery mechanisms e.g., checkpoint-restart, will no longer be able to recover the systems from failures. Early failure detection is a new class of fa
Externí odkaz:
http://arxiv.org/abs/1901.06918
Autor:
Ghiasvand, Siavash, Ciorba, Florina M.
System logs are a valuable source of information for the analysis and understanding of systems behavior for the purpose of improving their performance. Such logs contain various types of information, including sensitive information. Information deeme
Externí odkaz:
http://arxiv.org/abs/1805.01790
Autor:
Ghiasvand, Siavash, Ciorba, Florina M.
Failure rates in high performance computers rapidly increase due to the growth in system size and complexity. Hence, failures became the norm rather than the exception. Different approaches on high performance computing (HPC) systems have been introd
Externí odkaz:
http://arxiv.org/abs/1706.04345
Autor:
Ghiasvand, Siavash, Ciorba, Florina M.
System logs constitute valuable information for analysis and diagnosis of system behavior. The size of parallel computing systems and the number of their components steadily increase. The volume of generated logs by the system is in proportion to thi
Externí odkaz:
http://arxiv.org/abs/1706.04337
Autor:
Vagis, Tom Richard, Ghiasvand, Siavash
Assessing Anonymized System Logs Usefulness for Behavioral Analysis in RNN Models Tom Richard Vargis1,∗, Siavash Ghiasvand1,2 1Technische Universität Dresden, Germany 2Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) Dresd
Externí odkaz:
https://tud.qucosa.de/id/qucosa%3A92793
https://tud.qucosa.de/api/qucosa%3A92793/attachment/ATT-0/
https://tud.qucosa.de/api/qucosa%3A92793/attachment/ATT-0/
Toward Resilience in High Performance Computing:: A Prototype to Analyze and Predict System Behavior
Autor:
Ghiasvand, Siavash
Following the growth of high performance computing systems (HPC) in size and complexity, and the advent of faster and more complex Exascale systems, failures became the norm rather than the exception. Hence, the protection mechanisms need to be impro
Externí odkaz:
https://tud.qucosa.de/id/qucosa%3A72457
https://tud.qucosa.de/api/qucosa%3A72457/attachment/ATT-0/
https://tud.qucosa.de/api/qucosa%3A72457/attachment/ATT-0/