The benefits and challenges of sharing glidein factory operations across nine time zones between OSG and CMS
Autor: | Peter Kreuzer, Burt Holzman, Igor Sfiligoi, Frank Wuerthwein, S W Teige, M Zvada, Jose Flix, Jose M Hernandez, I Butenas, Rob Quick, J M Dost |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2012 |
Předmět: |
History
Engineering Large Hadron Collider Operations research business.industry computer.software_genre Computer Science Applications Education Operating system Multiple time Factory (object-oriented programming) Single point of failure Detectors and Experimental Techniques business computer Host (network) |
Popis: | OSG has been operating for a few years at UCSD a glideinWMS factory for several scientific communities, including CMS analysis, HCC and GLOW. This setup worked fine, but it had become a single point of failure. OSG thus recently added another instance at Indiana University, serving the same user communities. Similarly, CMS has been operating a glidein factory dedicated to reprocessing activities at Fermilab, with similar results. Recently, CMS decided to host another glidein factory at CERN, to increase the availability of the system, both for analysis, MC and reprocessing jobs. Given the large overlap between this new factory and the three factories in the US, and given that CMS represents a significant fraction of glideins going through the OSG factories, CMS and OSG formed a common operations team that operates all of the above factories. The reasoning behind this arrangement is that most operational issues stem from Grid-related problems, and are very similar for all the factory instances. Solving a problem in one instance thus very often solves the problem for all of them. This paper presents the operational experience of how we address both the social and technical issues of running multiple instances of a glideinWMS factory with operations staff spanning multiple time zones on two continents. |
Databáze: | OpenAIRE |
Externí odkaz: |