S3 | The AWS Blog

Replicating S3 bucket configuration needs workflow discipline

Emiliano Montesdeoca — Tue, 30 Jun 2026 00:00:00 +0000

Replicating S3 data is only part of a multi-Region storage strategy. The bucket configuration around that data is often where drift hides.

The AWS Storage Blog post on replicating Amazon S3 bucket configurations across AWS Regions with AWS Step Functions shows an automation pattern for replaying bucket configuration into a target Region with an auditable workflow.

That is useful, but it also raises an important architecture question: should configuration replication be a workflow, or should it be infrastructure as code?

What changed

The source article describes a Step Functions and Lambda solution that creates a bucket in a target Region and applies configuration from a source bucket. It logs runs to DynamoDB and CloudWatch, which gives operators an audit trail.

This kind of workflow can help when teams need to replicate settings such as encryption, lifecycle, versioning, event notifications, tags, or other operational configuration across Regions.

Why builders should care

Disaster recovery plans often assume that a bucket in another Region is ready because replication is configured. But during a real failover, missing configuration can break applications or weaken controls.

Examples:

lifecycle rules are missing and costs grow,
event notifications do not trigger downstream processing,
encryption or bucket policy differs from the primary Region,
observability tags are absent,
access points or integration settings are inconsistent.

A repeatable replication workflow can turn those assumptions into something testable.

The trade-off with IaC

For stable environments, infrastructure as code should usually be the source of truth. If the bucket configuration is defined in CDK, CloudFormation, Terraform, or Pulumi, the cleanest replication path is often to deploy the same intent to another Region.

A workflow-based replication tool is valuable when:

buckets already exist and need operational synchronization,
configuration is discovered from a source environment,
teams need an emergency or transitional DR path,
there are many legacy buckets not yet under IaC,
operators need a controlled copy action with audit logs.

The risk is creating a second source of truth. If IaC says one thing and the replication workflow copies another, drift becomes harder to reason about.

What to do next

Before using this pattern, classify buckets into two groups:

IaC-owned buckets where configuration should be generated from code.
Operationally managed buckets where a replication workflow can reduce drift until IaC ownership exists.

Then run regular DR validation. Do not only check that the target bucket exists. Check whether the target bucket has the policies, notifications, lifecycle rules, encryption settings, tags, and observability hooks needed for the application to run.

The useful takeaway is that S3 resilience is not just object replication. It is configuration repeatability. Step Functions can provide a controlled workflow for that, as long as builders are clear about the source of truth.

Faster S3 access log queries make storage security more usable

Emiliano Montesdeoca — Mon, 29 Jun 2026 00:00:00 +0000

S3 access logs are valuable, but value delayed often becomes value ignored. If logs are difficult to query, teams use them only after something has already gone wrong.

The AWS Storage Blog post on querying Amazon S3 access logs instantly with CloudWatch and S3 Tables is interesting because it focuses on making access data easier to use in everyday operations.

What changed

The article walks through delivering S3 access log data to CloudWatch Logs and S3 Tables, then using those destinations for dashboards, alarms, and queries. The practical shift is from “logs exist somewhere” to “logs are queryable enough to support investigation and monitoring.”

For builders, that means S3 access patterns can become part of normal observability instead of a separate forensic workflow.

Why it matters

S3 is often where the most important data lives. Application logs, analytics exports, customer files, model artifacts, backups, and operational reports all end up in buckets. Yet many teams still monitor bucket configuration more carefully than bucket access behavior.

Queryable access logs help answer questions that matter:

Which principals are reading sensitive prefixes?
Did access spike after a deployment?
Are clients getting unexpected 403 or 404 responses?
Which workloads are driving request cost?
Did a suspicious IP enumerate objects?
Are lifecycle or replication assumptions visible in traffic?

When these questions are cheap to answer, teams ask them earlier.

The trade-offs

More logging is not free. S3 access log delivery, CloudWatch ingestion, query volume, table storage, retention, and dashboards all have cost implications. The right design depends on the bucket’s risk and traffic profile.

I would not send every low-value development bucket into a high-retention analytics pipeline. I would prioritize buckets that contain customer data, security logs, production artifacts, financial exports, backups, or AI training and retrieval data.

Also, access logs are only one layer. They should complement CloudTrail data events, IAM Access Analyzer, Macie, GuardDuty, S3 Storage Lens, and application-level audit logs. Different signals answer different questions.

What to do next

Pick a production bucket that matters and define three operational queries before building anything. For example:

top readers by prefix over the last hour,
denied requests by principal and source IP,
unusual data transfer volume compared with baseline.

Then build the logging path and dashboard around those questions. Add alarms only where the signal is actionable; noisy storage alarms quickly become invisible.

The main takeaway is that storage observability should be designed for regular use, not just incident response. Faster S3 access log queries make that much more realistic.

S3 Storage Lens groups make storage cost conversations less generic

Emiliano Montesdeoca — Fri, 26 Jun 2026 00:00:00 +0000

Storage optimization advice often starts too broadly: reduce old data, review access patterns, apply lifecycle policies. That is true, but it is not specific enough for a team to act.

The AWS Storage Blog post on S3 Storage Lens groups is useful because it shows how to group storage by workload-specific criteria instead of looking only at account or bucket totals.

What changed

S3 Storage Lens groups let builders define custom groupings and analyze metrics for targeted slices of S3 data. The source article uses examples such as older application logs and aging image files across multiple buckets and accounts.

That matters because the useful unit of storage ownership is rarely just a bucket. It may be an application, dataset, tenant, media type, compliance class, or product feature.

Why builders should care

S3 cost and hygiene problems are usually ownership problems.

A central cloud team can see that storage is growing. They may not know which logs are safe to expire, which images are user-generated content, which datasets are legally retained, or which prefixes belong to an old migration. Application teams know the context, but they often lack cross-account visibility.

Storage Lens groups can bridge that gap. They help create a view that says, “this workload has 40 TB of logs older than 180 days,” not just “this account has 400 TB in S3.”

That turns a generic optimization request into a practical backlog item.

The trade-offs

Custom groups are only useful if the grouping logic reflects real ownership and lifecycle rules. If tags, prefixes, naming conventions, or account boundaries are inconsistent, the dashboard can give false confidence.

Teams should avoid building too many groups at once. Start with questions that lead to action:

Which data can move to a colder tier?
Which prefixes have no recent access?
Which workloads are growing faster than expected?
Which buckets contain small-object patterns that inflate request costs?
Which teams own the largest retained datasets?

Also remember that visibility is not enforcement. Storage Lens can show the opportunity, but lifecycle rules, retention policies, and application changes implement the fix.

What to do next

Pick one high-cost storage area and define a group around the way the business thinks about it. For example: production application logs older than 90 days, media originals by product, or export datasets by tenant.

Then review the metrics with the owning team and decide on one action: lifecycle transition, deletion after retention, object compaction, prefix redesign, or access pattern review.

The practical value of Storage Lens groups is not better charts. It is better conversations between platform, finance, security, and application teams about the specific data that is driving cost and risk.

S3 Files makes Lambda file workflows simpler, but not automatically better

Emiliano Montesdeoca — Wed, 24 Jun 2026 00:00:00 +0000

A lot of Lambda code that works with S3 is not complicated because the business logic is complicated. It is complicated because the function has to download an object, manage /tmp, process the file, upload the result, and clean up after itself.

The AWS Compute Blog post on modernizing Lambda and S3 workloads with Amazon S3 Files shows a different model: mount an S3-backed file system and let the function use normal file paths.

That sounds small. For many workloads, it changes the shape of the code.

What changed

S3 Files lets a Lambda function mount an S3 bucket as a file system at a path such as /mnt/data. Code can open, read, write, list, and organize files using filesystem operations while S3 remains the durable backing store.

The source article uses examples such as image processing, ETL pipelines, and multi-agent shared workspaces. In each case, the function moves away from explicit S3 transfer code and toward direct file I/O.

That removes a surprising amount of glue:

no manual download before processing,
no upload step after writing output,
less /tmp capacity management,
fewer cleanup paths for failed invocations,
simpler handoffs between functions that share a workspace.

Why builders should care

The strongest use case is modernization of existing file-oriented libraries. Many image, document, ML, and data-processing libraries expect file paths. Without S3 Files, Lambda code often has to adapt object storage into temporary local files before real work can start.

S3 Files lets builders keep file-based code and still use S3 as the storage layer. That can make Lambda more attractive for workloads that previously moved to containers only because file handling became awkward.

The shared workspace pattern is also interesting for AI agents. If multiple Lambda functions collaborate on a session, a directory tree can be easier to reason about than a collection of object keys and serialized state blobs.

The trade-offs

I would not replace every S3 API call with file I/O.

Object APIs are still excellent when you want explicit object boundaries, event-driven processing, presigned URLs, lifecycle policies, replication, and direct control over metadata. File systems are easier for some code, but they can also hide transfer and consistency behavior behind familiar syntax.

Builders should validate:

throughput for large files,
behavior under high concurrency,
consistency expectations between producers and consumers,
VPC and mount target requirements,
IAM permissions for mount and write operations,
failure behavior when a function exits while writing.

Also remember that simpler code is not the same as simpler operations. If many functions share a writable workspace, you need naming conventions, cleanup policies, and safeguards against accidental overwrite.

What to do next

Look for Lambda functions where more code is spent on S3 transfers than on the actual business logic. Image resizing, report generation, data conversion, document processing, and agent workspaces are good candidates.

For each candidate, compare two versions:

current S3 API flow with /tmp,
S3 Files flow with mounted paths.

Measure duration, memory, error handling complexity, and cost. The winning design may be different per workload.

The useful takeaway is not that file systems are better than object APIs. It is that Lambda now has a cleaner option when the workload is naturally file-shaped.