Q&A With the CTO: Storing Growing Data While Reducing Environmental Impact
Illumina Chief Technology Officer Alex Aravanis discusses how the company is working toward its sustainability goals
Originally published on Illumina News Center
As the UN Climate Change Conference (COP27) convenes this month in Sharm el-Sheikh, Egypt, Sharon Vidal, global head of Corporate Social Responsibility at Illumina, talks to the company’s chief technology officer, Alex Aravanis, about achieving a sustainable future with intelligent data storage solutions that reduce Illumina’s environmental footprint.
Vidal: The volume of data associated with the growing field of genomics is increasing data storage needs, as well as the costs and carbon footprint that go with it. What is Illumina doing to address these issues?
Aravanis: The amount of data generated from next-generation sequencing is rapidly expanding. To reach the full potential of genomics in health care, we need not only accessible sequencing and data creation but also advanced systems to easily analyze that data, systems to optimize the storage, and innovative solutions to reduce the environmental footprint of the workflows.
In 2021, we made significant improvements to our platforms and product pipeline to provide highly accurate, comprehensive, and efficient analysis of this data. We launched Illumina Connected Analytics, a secure genomics data platform to operationalize informatics and drive scientific insights. It empowers customers to manage, analyze, and explore large volumes of multiomic data in a scalable and flexible environment with security and privacy at its core. Illumina Connected Analytics provides users access to DRAGEN secondary analysis pipelines, enabling custom analyses and resulting in 16X faster mapping/variant calling for whole genome sequencing compared to traditional open-source methods. Additionally, DRAGEN Original Read Archive (ORA) technology provides data compression in the cloud, reducing our customers’ carbon footprint up to 5 times.
The amount of genetic data that we securely store in the cloud has grown tremendously—from 1 petabyte to 100 petabytes in just eight years. We anticipate that our stored data will continue to grow, so we turned our attention to finding solutions that securely optimize our data storage and reduce its associated carbon footprint.
Vidal: Illumina has been using Amazon Simple Storage Service (Amazon S3) for over 10 years and we’ve recently begun using the Amazon S3 Intelligent-Tiering storage class. Tell us about some of the initial impacts you’ve observed.
Aravanis: We learned that Amazon S3 Intelligent-Tiering would automate storage cost savings by moving data when access patterns change, while still maintaining data security. It also improves sustainability by storing less volatile data on technologies designed for efficient long-term storage.
Usually, our customers keep a copy of the genomic data they generate and rarely delete data that might be useful in future analyses. Some of that data is not being accessed, but our total data footprint keeps growing as a result. Amazon S3 Intelligent-Tiering moves data to the most cost-effective access tier with no impact to performance. In large-scale tests on our own data, we’ve seen a 60% reduction in data storage costs. Illumina is currently working to apply these techniques to customer data both in BaseSpace Sequence Hub and Connected Analytics. In addition to our DRAGEN compression technology, this will allow us to provide our customers with near-instant access to hundreds of thousands of whole genome sequences at a low, competitive cost.
Vidal: Can you tell us about some of the sustainability wins associated with this project and how they support Illumina’s climate action plans?
Aravanis: In addition to improved performance, we’ve seen a substantial decrease in our environmental footprint. Since implementing this project in 2021, there has been a 90% reduction in our carbon emissions associated with genomic data storage on the cloud. We continue to work with AWS to optimize our data storage so that we can further reduce carbon emissions while enabling accessible data analysis for our customers.
Last year, we announced our net zero by 2050 target as part of our climate action plan to positively impact not only our direct operations but also our value chain. Considering our mission to improve human health and our commitment to operate responsibly and sustainably, this initiative was a win for us and a win for our customers. We know that to achieve a more sustainable future, we will need partners like AWS to find innovative solutions. If we work together on our collective goals, we can bring genomics for good to all.
Read more about how Illumina and AWS are working together to optimize genomic data storage and analysis on the cloud.