The benefits and challenges of sharing glidein factory operations across nine time zones between OSG and CMS

Autor: Peter Kreuzer, Burt Holzman, Igor Sfiligoi, Frank Wuerthwein, S W Teige, M Zvada, Jose Flix, Jose M Hernandez, I Butenas, Rob Quick, J M Dost
Jazyk: angličtina
Rok vydání: 2012
Předmět:
Popis: OSG has been operating for a few years at UCSD a glideinWMS factory for several scientific communities, including CMS analysis, HCC and GLOW. This setup worked fine, but it had become a single point of failure. OSG thus recently added another instance at Indiana University, serving the same user communities. Similarly, CMS has been operating a glidein factory dedicated to reprocessing activities at Fermilab, with similar results. Recently, CMS decided to host another glidein factory at CERN, to increase the availability of the system, both for analysis, MC and reprocessing jobs. Given the large overlap between this new factory and the three factories in the US, and given that CMS represents a significant fraction of glideins going through the OSG factories, CMS and OSG formed a common operations team that operates all of the above factories. The reasoning behind this arrangement is that most operational issues stem from Grid-related problems, and are very similar for all the factory instances. Solving a problem in one instance thus very often solves the problem for all of them. This paper presents the operational experience of how we address both the social and technical issues of running multiple instances of a glideinWMS factory with operations staff spanning multiple time zones on two continents.
Databáze: OpenAIRE