Multi-tenancy Cloud Access and Preservation

Autor: Yinlin Chen, Soumik Ghosh, Andrea Waldren, James Tuttle, Lee Hunter, Tingting Jiang, William A. Ingram
Rok vydání: 2020
Předmět:
Zdroj: JCDL
DOI: 10.1145/3383583.3398624
Popis: Virginia Tech Libraries has developed a cloud-native, microservervices-based digital libraries platform to consolidate diverse access and preservation infrastructure into a set of flexible, independent microservices in Amazon Web Services. We have been an implementer and contributor to various community digital library and repository projects including DSpace, Fedora, and Samvera3. However, the complexity and cost of maintaining disparate application stacks have reduced our capacity to build new infrastructure. Virginia Tech has a long history of participation in and contribution to community-driven Open Source projects and has, in that time, developed more than a dozen independent applications architected on these stacks. The cost of independently addressing vulnerabilities, which often requires work to mitigate incompatibilities; reworking each application to comply with developing branding guidelines; and feature development and improvement has burgeoned, threatening to overwhelm our capacity. Like many of our peers5, our maintenance obligations have made continued growth unsustainable and have pushed older applications to near abandonware. We have designed and developed the Digital Libraries Platform to address these concerns thus reducing our maintenance obligations and costs associated with feature development across digital libraries. This approach represents a departure from the monolithic architectures of our legacy systems and, as such, shares more infrastructure among individual digital library implementations. The shared infrastructure facilitates rapid inclusion of new and improved features into each digital library instance. New features can be developed independent of any digital library instance and integrated into that instance by inclusion of that feature in the React/Amplify template. Changes to the template super class, such as those necessitated by evolving branding guidelines, may be immediately inherited by the template instances that subscribe to it. The platform implements Terraform6 deployment templates, Lambda serverless functions, and other cloud assets to form a microservices architecture on which multiple template-based sites are built. Individual sites are configured in AWS DynamoDB, Amazon's NoSQL database service, and via modification of shared template. Additional services provide digital preservation support including auditing, file fixity validation, replication to external cloud storage providers, file format characterization, and deposit to third-party preservation services. This presentation also discusses the cost of operating these services in AWS and strategies for mitigating those costs. These strategies include containerization to allow deployment of high-cost, asynchronous services to local infrastructure to take full advantage of existing infrastructure and advantageous utility pricing while allowing for local redeployment. In the past, developers worked in local, independent environments. New features and fixes were submitted to a central development environment testing and validation, which significantly slowed development. Migrating development, review, integration, and deployment processes to AWS decreased the time and resource bottlenecks for those processes. Our AWS cost accounting demonstrates an 87% savings over our traditional, on-premises Fedora/Samvera approach For a team of four software developers, the total cost using a traditional server-based (a t2-medium EC2 instance) development approach is about $133 per month versus our serverless-based development approach using AWS Amplify at an average of $17 per month. As the Digital Libraries Platform project expands, we anticipate publishing a set of API documents allowing us and others to reimplement specific microservices independent of the architecture.
Databáze: OpenAIRE