Best Practices for Cost Optimization with Amazon S3

Video Link

In this video Alex Crabbe and Jessie Felix,  Product Managers with Amazon S3, explain how you can best optimize your costs with Amazon S3.

Introduction

So in terms of the agenda, first, we are going to provide a brief overview of S3, and then we are going to talk about a few steps you can take to optimize your storage costs. You want to start with defining your workload requirements and that’s followed by organizing your data.

You then want to analyze, take action, and measure your storage and storage costs. We will also look at how you can automatically optimize your storage costs using things like S3 Intelligent-Tiering for automatic storage cost savings.

Throughout the discussion, we’ll also provide key takeaways so you can take fast action to optimize your storage costs.

Workload Requirements

From 2m: 46s

The first step to optimizing your storage costs is to define your workload requirements. When we look at our customers who use S3 today, we see that the customers who are able to best articulate and define their workload requirements have the easiest time when it comes to choosing a storage class and therefore optimizing their storage costs.

Fig# 4

From 3m: 11s

So the first step, as we said, to optimizing your storage costs is to really understand your workload requirements. So let’s say you have a media storage workload where you need to stream content to an end user immediately. You’re going to want to prioritize high availability and performance.

Well, let’s say on the other hand, you have a different type of use case, which is, let’s say it’s a machine learning workload. It has a historical dataset that doesn’t require immediate access. In that case, you wouldn’t necessarily want to prioritize performance and availability, and could therefore have the opportunity to optimize your storage costs further.

Or let’s say you have a third type of use case where you have a secondary backup dataset that only requires the resiliency of a single availability zone. We have a storage class that’s right for that that can help you to optimize your storage costs. The bottom line is there are a ton of different use cases that could apply to you, but you’re the one who can best define your specific and unique use case.

You are going to come away with a much clear understanding of your specific use case and workload, and it’s going to make it a lot easier when it comes time to choosing a specific storage class for your workload, and that is specifically going to help you optimize your storage costs.

Fig# 5

From 5m: 00s

So once you’ve defined your workload requirements, you’re also going to want to organize your data. When we look at our customers who use S3 today, we see a lot of different organizational structures across departments, teams, and users. We always recommend that customers organize their data from day one. It’s going to make it much easier when it comes later time to organize and optimize your storage costs.

Fig# 6

From 5m: 29s

So the first step to organizing your data is to create buckets. Buckets are S3 resources, and with buckets, you can upload your data across all of your business units within your account. And to further organize your data, you can add prefixes to your key names after the bucket name to help infer a folder structure that’s more organizationally simple. So for example, let’s say you have two prefix names. One is called Unit 1 and one is called internal. You can list out all objects that commonly share both of those prefixes, or you can write rules to act on those prefixes.

Object Tagging

You’re also going to want to consider using object tagging. Object tags are key value pairs that help you to associate up to 10 tags with an individual object so that you can characterize and best describe those objects. Some examples could include retention information, project name, and data types. So once you’ve created buckets, added prefixes, and implemented object tags, you’re now going to have a much easier time when it comes to applying fine-grained access controls and permissions.

For example, you could give controls to a user so that they can only access objects under specific tags, or you could even take it a step further and give that user access to create and apply their own tags.

Fig# 7

From 6m: 54s

The other big thing you definitely want to consider when it comes to organizing your data is object size. That’s because as object size increases for a given storage workload, you’re going to see a corresponding decrease in request costs per GB. So we always recommend to customers that they batch together smaller objects into larger objects, because when you have larger objects, you’re going to have fewer requests. And when you have fewer requests, you’re going to have a lower request cost per GB. At the same time, having larger objects also helps you to take advantage of numerous S3 storage classes.

That’s because some storage classes have minimum capacity charges per object, like S3 Standard-Infrequent Access, which has a minimum capacity charge per object of 128 KB. If you’re uploading data to S3 for the first time, you can take advantage of things like Amazon Kinesis Firehose, which can help you to set thresholds for your object batching. Or if you have existing objects in S3, you can take advantage of Amazon EMR to help you combine smaller objects into larger objects.

Fig# 8

From 8m: 05s

So once you’ve defined your workload requirements and organized your data, next, you’re going to want to analyze your data. And we always recommend to customers that they use certain tools, like S3 Storage Lens.

Fig# 9

Storage Class Analysis, and S3 Inventory Reports

From 8m: 17s

The first one we’ll look at is S3 Storage Lens. So to recap, Storage Lens was launched about a year ago, right before re:Invent 2020 actually, but the full backstory for Storage Lens begins a few years prior to that, as we had begun collecting feedback from customers that they wanted deeper understanding and deeper insights into their storage usage and activity.

And at the time, we collected feedback from customers saying that they were actually collecting data from S3 using things like access logs and inventory reports, and then storing that data in places like Redshift, Athena, and other big data systems, and using that data to create their own personalized summary insights.

Fig# 10

From 9m: 26s

With Amazon S3 Storage Lens, you get a central, organization-wide view to your storage across all of your accounts and buckets. It provides granular visibility to your storage usage and activity with 29 unique metrics that are all updated daily.

Those metrics are pre-aggregated at the organization as well as the account level, with the ability to drill down to more granular levels like storage class, bucket, and prefix. Storage Lens can help you to answer key questions about your storage, like which of your prefixes are growing fastest, or which of your buckets have recently gone cold and are no longer receiving puts or gets.

Alongside those metrics, Storage Lens provides you with contextual recommendations so that you can come away with best practices to help improve on things like cost efficiency and data protection. And all of your Storage Lens metrics are presented via an interactive dashboard that’s integrated right within the S3 console.

Fig# 11

From 10m: 34s

So with an interactive dashboard that’s integrated right in the S3 console, customers can now more easily have visibility into their storage usage and activity, and they can more easily analyze their storage usage and activity. Customers also have faster access to insights because S3 Storage Lens is right in the S3 console.

So whether you’re creating a bucket for the first time or if you’re reconfiguring your existing storage, you can always surface fast, easy, and relevant insights related to your storage task at hand, whether that’s insights related to the current state of versioning across your buckets or insights related to encryption on a particular storage footprint.

From 11m: 22s

Now, although customers really like that S3 Storage Lens is located right in the S3 console, it’s fast and easy and can surface relevant insights quickly, at the same time, we wanted to provide flexibility to customers who have other ways of observing and monitoring their storage usage and activity. So last week, we launched the Amazon Cloud Watch publishing feature, which helps unlock Cloud Watch for Storage Lens metrics. And we launched this for two types of customers.

First are the types of customer that use Cloud Watch today and simply want to unlock Storage Lens metrics via Cloud Watch. The other type of customers, customers who want to view their Storage Lens metrics via API so that they can create their own custom applications, as well as help enable AWS observability partners.

Fig# 13

From 12m: 16s

So with Cloud Watch unlocked for Storage Lens, you have access to all 29 unique metrics that are updated daily. That’s the same as it was prior with Storage Lens before the Cloud Watch publishing feature launch. But with Cloud Watch, you get a unified view across all AWS services right within a single dashboard.

You also have the ability to create custom dashboards and can use things like metrics math so you can create your own expressions within those dashboards. That’s something that wasn’t previously possible with S3 Storage Lens prior to the Cloud Watch publishing feature launch.

You can also now set up alarms to trigger SNS notifications so that you’re always aware of anomalies and outliers across your storage usage and activity metrics. And just to reiterate again, you can access all of those metrics via API so that you can create your own custom applications, as well as help enable AWS partners.

Fig# 14

Storage Class Analysis

From 13m: 15s

So now that we talked a little bit about S3 Storage Lens, let’s look at another tool you can use to help analyze your storage. Storage Class Analysis is best for workloads with predictable access patterns.

So with Storage Class Analysis, you can use Storage Class Analysis to figure out where you have infrequently accessed data so that you can transition that data to a more appropriate or more cost-optimized storage class. So what you want to do is you want to run Storage Class Analysis for 30 to 45 days or so, at which point you’ll be able to determine your access patterns. And once you know your access patterns, you can then identify your infrequently accessed storage.

At which point, you want to create a lifecycle policy that can transition your storage to a more appropriate storage class. So for example, if you transition your storage from the S3 Standard-Infrequent Access storage class to the S3 Glacier Flexible Retrieval storage class, you could save up to 68% on your storage costs. Worth calling out that you can also expire objects and their older versions based on their age. So we’d mentioned that Storage Class Analysis is best for data with predictable access patterns.

Fig# 16

From 14m: 44s

If you have these, you might want to use Storage Class Analysis. So these include medical records, media streaming, backups, and learning resources. So for example, with educational or learning resources, these have predictable access patterns because, for example, let’s say you have a university that is providing content to students throughout a school year or semester term. That content will be frequently accessed for a duration of time, but once that term is up, that data’s going to go a bit colder or it will not be accessed at all. So that’s a perfect example of when you might want to use Storage Class Analysis, for that type of workload.

Fig# 17

S3 Inventory reports

From 15m: 26s

So now that we’ve looked at S3 Storage Lens and we talked a little bit about Storage Class Analysis, next, let’s look at S3 Inventory reports, which is great for analyzing and auditing your data.

Fig# 18

Athena

From 15m: 39s

So you can use Amazon Athena to analyze your S3 Inventory reports at the object level. Athena uses standard SQL expressions to analyze your data, delivers results in seconds, and is great for ad hoc data discovery. So for example, let’s say you analyze your data, you analyze your inventory reports using Athena and you see that all of your storage is in the S3 Standard storage class.

So S3 Standard is best for performance object storage that’s frequently accessed and short-lived, but it’s worth noting that the S3 Standard storage class doesn’t have a minimum capacity charge per object, whereas, as we mentioned before, the S3 Standard-Infrequent Access storage class does have a minimum capacity charge per object of 128 KB. So when you analyze your storage and inventory reports using Athena, take note if you have larger object sizes.

Because if you have objects over 128 KB, and that data hasn’t been accessed for 30 days, you’re going to want to take advantage of the S3 Standard-Infrequent Access storage class, which still has the same high durability, high throughput, and low latency of S3 Standard, but with a low per-GB storage price and per-GB retrieval fee. And by moving to the S3 Standard-Infrequent Access storage class, you save up to 40% on your storage costs.

Now, taking that a step further, if you see that you have object sizes of at least 40 KB but haven’t been accessed for 90 days, you can take advantage of the S3 Glacier Flexible Retrieval storage class. And as I mentioned before, you can save up to 68% on your storage costs by doing so.

Fig# 19

From 17m: 26s

So to recap, there are a few ways that you can analyze your S3 storage. One is with S3 Storage Lens, which provides you with deep insights into your storage usage and activities. And now with the latest Amazon Cloud Watch publishing feature, you can access those metrics via API so you can create your own custom applications as well as help enable AWS observability partners.

With Storage Class Analysis, you can determine your access patterns to help you identify infrequently accessed storage so that you can transition that storage to a more optimal storage class to help reduce your storage costs. You can also use S3 Inventory reports, which is great for analyzing your storage and auditing your workload.

Fig# 20

From 18m: 16s

So based on what we’ve discussed so far, let’s look at a few key takeaways. First key takeaway is that you want to define your performance or workload requirements. And depending on the storage class you need, you’re going to be able to optimize your storage costs the right way. In addition to defining your performance requirements, you’re also going to want to organize your data based on your unique business logic.

As I mentioned before, we see a lot of different customers with very different business logic, so some of the things you can do to organize your data is creating buckets, adding prefixes to your bucket names, taking advantage of object tagging, as well as applying fine-grained access controls and permissions. You also want to make sure that you’re, when it comes to organizing your data, that you’re considerate of object size.

From 20m: 12s

As I mentioned earlier, as you increase your object size, you’re going to have fewer requests, and when you have fewer requests, you’re going to be able to decrease your request cost per GB for your given storage workload. At the same time, when you increase your object size, you also have the ability to take advantage of more S3 storage classes.

For example, we mentioned the S3 Standard-Infrequent Access storage class has a minimum capacity charge per object of 128 KB, whereas the S3 Glacier Flexible Retrieval storage class has a minimum capacity charge per object of 40 KB. And the last thing is you definitely want to check out S3 Storage Lens and the new Cloud Watch publishing feature.

With the new Cloud Watch publishing feature, you can set up alarms to trigger SNS notifications so you’re always aware of anomalies and outliers across your storage usage and activity metrics. And at the same time, you can create your own custom applications and help enable AWS observability partners.

Fig# 21

Using S3 Storage Classes

From 21m: 39s

We’ve launched a wide range of storage classes that have helped you match your workload requirements with the ideal cost and performance that you need. For example, in 2009, we launched Glacier Deep Archive, the lowest cloud storage in the cloud, and we also launched Glacier Instant Retrieval, which allows you to optimize rarely accessed data that needs immediate retrieval. But it’s not all about optimizing storage cost.

We also launched S3 Intelligent-Tiering that allows you to automatically optimize your storage cost based on your changing access patterns. So let’s talk more about the different flavors of storage and when you should use each of them.

Fig# 22

From 22m: 30s

Okay, now, one of the patterns that we observe in S3 is that the vast majority of data has changing or unknown access patterns.

Fig# 23

From 22m: 42s

And for these workloads, customers are using S3 Intelligent-Tiering to automatically optimize storage cost based on their changing data access patterns. Intelligent-Tiering is the only object storage that automatically moves data between different access tiers when your access patterns change. In fact, I’d go as far as to say that many of our customers today are using Intelligent-Tiering as their default storage class for data lakes and data analytics use cases. We’ll talk more about the Intelligent-Tiering storage class in a few minutes.

Fig# 24

From 23m: 12s

Now, for data that has predictable or known access patterns, customers have different storage classes to choose from to help optimize their cost and performance. So the S3 Standard storage class, as an example, is designed for data that is accessed multiple times over in a single month. To provide you with a real world example, imagine that new song that is released. You listen to it over and over again until you can’t get enough of it. In that initial release, that media asset is accessed multiple times and is likely stored in the S3 Standard storage class.

Now, looking to the right, from S3 Standard-Infrequent Access to S3 Glacier Deep Archive, the cost of data decreases but the cost of accessing that data increases. So the S3 Standard-Infrequent Access storage class is designed for data that is accessed once or twice per month. So yes, even your favorite song, at some point, will become infrequently accessed.

And now, with the new S3 Glacier Instant Retrieval storage class, it’s optimized for rarely accessed data that you’re accessing once per quarter. And for data that can be accessed in minutes to hours, you can use the S3 Glacier Flexible Retrieval storage class, which now offers free bulk retrievals and is ideal if you need to retrieve large amounts of datasets at no cost. And finally, the Glacier Deep Archive storage class offers you the lowest storage cost in the cloud.

Fig# 25

From 24m: 51s

Additionally, S3 offers storage classes for specialized workloads. For example, One Zone-Infrequent Access is designed for infrequently accessed data that is optimized for rewritable copies that can be stored in a single availability zone, and it is priced at a 20% lower storage cost than S3 Standard-Infrequent Access. And AWS Outpost allows you to use S3 API capabilities on your on-premises environment.

Fig# 26

From 25m: 26s

So now that we’ve talked about the different flavors of storage classes, I do want to spend some time talking about our latest storage class, the S3 Glacier Instant Retrieval, which we announced earlier this week.

Fig# 27

From 25m: 42s

So the S3 Glacier Instant Retrieval storage class is ideal for workloads where you need to make that data immediately accessible to end users. One of the patterns that we observe is that customers are accumulating hundreds of petabytes of storage virtually across every industry, and a lot of this data is stored for indefinite periods of time, and a lot of these datasets become rarely accessed. But many applications cannot access data asynchronously, so it has to be immediately accessible by their end users.

To give you an example that we’re all familiar with, we all share photos and videos with our friends and family. And after usually you share, you know, a picture or a video of your food to your friends and family, that data is accessed probably multiple times over, but over time it becomes rarely accessed. Or in my case, it becomes rarely accessed after a few minutes. With that said, once those images and those videos are accessed, the application has to ensure that it is immediately accessible when it is called to deliver the customer experience that these providers promise.

Fig# 28

From 27m: 00s

So to recap, the S3 Glacier Instant Retrieval storage class offers the lowest storage cost for long-lived data that needs to be immediately accessible. It is designed for the same millisecond access and high-throughput performance as the S3 Standard storage class and as the S3 Standard-Infrequent Access storage class. It is designed for three nines of availability and 11 nines of durability.

Fig# 29

From 27m: 29s

One of the things that Alex talked about earlier was that to optimize your storage cost, you want to use S3 lifecycle policies in combination with our storage classes, and this is really ideal for workloads that have predictable access patterns. And what that means is that you can identify a specific period in time when you should move that data from S3 Standard storage class to a less frequently accessed storage class when it becomes infrequently accessed or rarely accessed.

Fig# 30

From 28m: 01s

S3 lifecycle rules are configurations that you can apply to a bucket, to a prefix, or to a set of objects, and they are based on the creation of the object. So the idea there is that as the objects get older, the access of the data will decrease over time. So in this example, we’re configuring a lifecycle policy to move data from the S3 Standard storage class to S3 Glacier Instant Retrieval after 90 days. And what that means is that after 90 days, we expect that object to become rarely accessed. And finally, over time, we can transition that data all the way down to the S3 Glacier Deep Archive storage class for additional cost savings.

To provide you with another example of a customer application, a top use case will be in medical images and broadcasting unused content. So imagine an X-ray that I get at a doctor. After I get that X-ray, the technicians will make immediate use of that, but it’s very likely that it’ll become rarely accessed soon after it is used. It has to be immediately accessible for some time. But over time, it can be lifecycled safely to the S3 Glacier Deep Archive storage class.

Fig# 31

From 29m: 31s

And when we’re thinking about using S3 Lifecycle policies, we want to think about using our S3 Lifecycle policies in the most cost-efficient way. Another thing that we announced earlier this week was new actions to our S3 Lifecycle configurations. Specifically, you can define rules to only move the largest objects from S3 Standard or S3 Standard-Infrequent Access or S3 Glacier Instant Retrieval down to S3 Glacier and S3 Glacier Deep Archive.

So we have a lot of customers that want to move the largest media assets into the archive storage classes and reduce their lifecycle request cost. In addition, you can save additional costs by controlling the number of noncurrent versions that you have for a versioned bucket.

Fig# 32

From 30m: 23s

So far, we’ve defined our workload requirements, we’ve organized our data in a way that makes it easy for us to apply different lifecycle rules and different controls to optimize storage cost, we’ve analyzed our data using S3 Storage Lens and S3 Storage Class Analysis to identify the right time when data becomes infrequently or even rarely accessed, and we’ve taken this information and we’ve applied it by choosing the right storage class that fits that access pattern.

Fig# 33

From 30m: 54s

A few takeaways are you want to use S3 Lifecycle policies and our storage class building blocks
for data with known and predictable access patterns, and you want to fine tune those lifecycle policies to optimize your lifecycle transition request costs. In addition, the new S3 Glacier Instant Retrieval storage class is ideal for data that you’re accessing once per quarter.

Fig# 34

From 31m: 20s

Next, we’re going to talk more about S3 Intelligent-Tiering and how you can benefit from automatic storage cost savings. In fact, since we launched the S3 Intelligent-Tiering storage class, we have passed on customer storage cost savings, now exceeding a total of $250 million.

Fig# 35

From 31m: 42s

And when we look at that number, we actually get really excited about it because we know that delivering that value to you means that you can build new user experiences and new products that help you grow your business. So let’s talk more about the Intelligent-Tiering storage class, what it is, and when exactly you should use it. So Intelligent-Tiering is the only cloud storage that automatically optimizes storage costs at a granular object level when data access patterns change. And we’re excited to announce that with the new Archive Instant Access tier, Intelligent-Tiering now automatically optimizes storage cost between three access tiers.

And with the new Archive Instant Access tier, you save 68% on your storage cost relative to the Infrequent Access tier. And with these changes, there’s absolutely no change in performance. So the Frequent Access, Infrequent Access, and the Archive Instant Access have the same performance. And there’s nothing that you have to do, so there is absolutely no operational overhead. When your data access patterns change in the Intelligent-Tiering storage class, there are no retrieval fees.

Fig# 36

From 32m: 54s

So when should you use Intelligent-Tiering? Well, you should use Intelligent-Tiering if you have data with changing or unknown access patterns. But what does that mean? Oftentimes, I talk to a lot of customers that tell me, “I know that a dataset becomes infrequently accessed or can appear to be rarely accessed at specific points in time. With that said, we don’t know how access for that data will change in the future.”

To give you an example, we have a lot of customers that are using Intelligent-Tiering for different data platforms that are hosted over the internet. So imagine economic data or meteorological data that is made available over the internet to scientists and researchers. We just don’t know how data access is going to change in the future for these datasets. It could be driven by a worldwide event or something novel that is happening culturally.

Fig# 37

From 33m: 58s

So let’s talk a little bit more about the different use cases that have unknown or changing access patterns. We’ve talked about data lakes, we’ve talked about data analytics, which are the top use cases that we see customers using the Intelligent-Tiering storage class for.

And the reason these use cases are very popular with the Intelligent-Tiering storage class is because we have different users within an organization and different applications accessing a wide range of datasets at different rates. I talk to customers with large data lakes, where you could have different business units creating different ad hoc reports, creating different BI systems, all the way down to data science and machine learning, and we just don’t know what sets of data are going to be accessed and when they’re going to be accessed.

Fig# 38

From 34m: 55s

So before the Archive Instant Access tier, the way that Intelligent-Tiering worked was that you pay a small per-object monitoring and automation charge, and in exchange for that per-object monitoring and automation charge, we keep an eye on access of every object that you store in the Intelligent-Tiering storage class. Any objects that are not accessed for 30 consecutive days are automatically moved to the Infrequent Access tier. And whenever access patterns change in the future, we simply move those objects back to the Frequent Access tier.

So Intelligent-Tiering will automatically move between Frequent and Infrequent Access based on your changing data access patterns.

Fig# 39

From 35m: 37s

And now, with the new Archive Instant Access tier, Intelligent-Tiering behaves in a very similar fashion. All objects transitioned or directly uploaded to Intelligent-Tiering are automatically stored in the Frequent Access tier.

After 30 days of consecutive no access, they’re automatically moved down to the Infrequent Access tier where you get 40% cost savings relative to the Frequent Access tier.

And now, after 90 days of consecutive no access, you get 68% storage cost savings compared to the Infrequent Access tier. And with this new change, I want to emphasize that there’s no change in performance. The Frequent, Infrequent, and Archive Instant Access tier are all designed for millisecond access and high-throughput performance.

If you’ve taken a look at your Cost Explorer reports in the past couple of days, you should start seeing meaningful storage cost savings from data that has not been accessed for 90 consecutive days. Any objects that haven’t been accessed for 90 consecutive days will automatically start going down to the Archive Instant Access tier.

Fig# 40

From 37m: 06s

So why did we build the Archive Instant Access tier? A lot of our customers told us that they’re storing petabytes of data, accumulating petabytes of data that need to be immediately retrievable when they are needed. What this means is that they cannot use asynchronous access tiers to save on storage cost. And now with the new Archive Instant Access tier, you get the lowest storage cost for rarely accessed data with no impact on your performance.

Fig# 41

From 37m: 40s

I want to recap the changes with the new Archive Instant Access tier. First, it’s automatic. The new Archive Instant Access tier will automatically optimize rarely accessed data and you will get 68% lower storage cost for any data that is not accessed for 90 consecutive days. There’s no impact on performance, and there’s absolutely no changes to the availability or the durability of Intelligent-Tiering. Intelligent-Tiering is designed for three nines of availability and 11 nines of durability.

Fig# 42

From 38m: 15s

I want to shift gears and talk about other innovations that we have launched within the Intelligent-Tiering storage class that have passed on storage cost savings to our customers. In addition to the Archive Instant Access tier, if you have any objects that are smaller than 128 KB that are not eligible for auto-tiering, those objects are no longer monitored or charged.

In addition, if you delete data within 30 days, you will not incur early delete charges within the Intelligent-Tiering storage class. And the reason why we did this is because we have an increasing number of customers that are using Intelligent-Tiering as a default storage class, and they told us that they didn’t want the operational overhead of having to analyze their object size distribution or understand the life span of different objects at different times. And with these changes, customers can now use Intelligent-Tiering for virtually any workload.

Fig# 43

From 39m: 22s

And with the increasing number of customers using Intelligent-Tiering as their default storage class, I often get asked, “What is the most cost-effective way of storing data in Intelligent-Tiering?” I want to call attention to the fact that you can directly upload your data to the Intelligent-Tiering storage class. And the way that you do this is by specifying INTELLIGENT_TIERING in the PUT API request header, and this will guarantee that any newly uploaded objects will be directly stored in the Intelligent-Tiering storage class.

Fig# 44

From 39m: 57s

So now let’s talk more about how to use Intelligent-Tiering in combination with your archiving strategy. What if you don’t need immediate access for your data and it’s okay if that data is accessed within minutes to hours? For these use cases, you want to make use of the Archive Access Tier and the Deep Archive Access tier. We built the asynchronous archive access tiers because many of our customers told us that there’s subsets of datasets within their data lake that are just not being accessed by anybody for very long periods of time.

In this example above, I provide an example where we automatically move data to the Archive Access tier after six months, where you save 72% on your storage costs relative to the Infrequent Access tier. And after a year of no access, you automatically move that data down to the Deep Archive Access tier. A good example for this is we have a lot of, within organizations, you have business analysts and you have data scientists.

A lot of times, they’ll get a request and they’ll work on a downstream set of data, and after they complete that request, oftentimes that data is just not accessed anymore. A lot of these datasets may never be used in the future, and what you can do is use the asynchronous archive access tiers to automatically move those datasets that are not in use down to the Deep Archive Access tier.

Fig# 45

From 41m: 36s

If your application needs the data to be immediately accessible for longer periods of time, you don’t have to enable the Archive Access tier. In this example, the data that is rarely accessed goes to the Archive Instant Access tier and then after one year of no access, it goes down to the Deep Archive Access tier.

Fig# 46

From 41m: 58s

Highlighting this point, I want to call attention to the fact that the Archive Access and the Deep Archive Access tier are optional. You can choose one or both. The Archive Access tier is ideal for datasets that have not been accessed for a minimum of 90 days, but you can also extend that, as well as for the Deep Archive Access tier, which has a minimum of 180 days of no access. And you can extend that up to two years. I’m just a little curious. we should be considering using the Archive Access and the Deep Archive access tiers. Fig# 47

From 42m: 57s

To support these asynchronous workflows, we also launched event notifications to let you know when data is automatically archived within the Intelligent-Tiering storage class. And this is a top request that we got from media and entertainment customers who manage databases that have to be aware which media assets are immediately accessible and which media assets have to be restored. Fig# 48

From 43m: 26s

So to sum up, the simplicity of Intelligent-Tiering really, really resonates with our customers. And in our short history, since 2018, we have launched different innovations to help you further optimize your storage cost. Just to name a few, we talked about the asynchronous access tiers that let you save money on datasets that are not accessed for very long periods of time.

We removed the storage duration period with Intelligent-Tiering because many of our customers, the reality of the world is that they have subsets of workflows that have short-lived objects. And now we’re launching the Archive Instant Access tier because many of our customers need that data to be immediately accessible at all times.

Fig# 49

From 44m: 15s

I want to leave you with a few customer stories, and it’s really stories like these that we hear about on a daily and on a weekly basis that we take back to the team and our engineering teams just get really excited about. In this particular example, we have Stripe. Stripe is a technology company that is used, a payments technology company that is used by startup customers all the way to enterprise-scale customers.

For them, cost is important, and they’ve been able to save 30% on a month-over-month basis with the use of the S3 Intelligent-Tiering storage class without any impact on performance. Fig# 50

From 45m: 00s

And here, we have another example from Mobileye. Mobileye develops software for autonomous vehicle driving, and they use the Intelligent-Tiering storage class because many of their data analytics use cases have unpredictable and changing access patterns. Fig# 51

From 45m: 22s

And another key use case that I want to highlight here is from Capital One. And in this use case is something that I talk to customers a lot about. I hear a lot from customers that tell me, “I have a wide range of buckets, they’re all accessed at different rates by different users, and there’s no clear way for me to understand, like, what are the ideal rules that I can define per each bucket?”

So in the case of Capital One, instead of analyzing the access patterns of each individual bucket, what they did is that they used the Intelligent-Tiering storage class. And why this is important is because reducing that operational overhead means that you can focus on your core business as opposed to optimizing your storage cost. Fig# 52

From 46m: 14s

So a few key takeaways, use Intelligent-Tiering for data that has unknown or changing access patterns. With the introduction of the new Archive Instant Access tier, I want to reiterate that there is no impact on performance. And for those of you that are using Intelligent-Tiering today, you should start seeing substantial cost savings in your Cost Explorer reports. And for data that isn’t accessed for a very long time, we recommend using the asynchronous archive access tiers.

This will ensure that even as your data grows, your costs don’t necessarily grow at the same proportion. Finally, for customers that have to be aware which objects are immediately accessible and which are not, make use of the new event notifications for archive events. Fig# 53

Was this article helpful?
Dislike 0
Views: 17
Back to top button