Ai | The AWS Blog

Bedrock managed entitlements make model access a platform control

Emiliano Montesdeoca — Tue, 30 Jun 2026 00:00:00 +0000

AI model access becomes messy as soon as an organization moves beyond one account and one team. Some models are available directly. Others require AWS Marketplace subscriptions. Workload accounts need access, but broad Marketplace permissions are rarely what security teams want.

The AWS Machine Learning Blog post on managed entitlements for Amazon Bedrock models is important because it turns model access into a platform-governance problem instead of an account-by-account chore.

What changed

Managed entitlements let a central account subscribe to supported third-party Bedrock models distributed through AWS Marketplace and share access with member accounts using AWS License Manager. Workload accounts can use the model access without needing direct AWS Marketplace subscription permissions.

This is especially useful for models such as Anthropic, Cohere, AI21 Labs, or Stability AI when they are distributed through Marketplace and used across many accounts.

Why builders should care

A healthy multi-account AI platform needs two things at the same time:

teams can access approved models quickly,
the organization can govern subscriptions, pricing, visibility, and permissions centrally.

Without a central entitlement pattern, every workload account becomes a small procurement and governance island. That slows adoption and creates inconsistency. With managed entitlements, a platform team can subscribe once, distribute access intentionally, and keep workload accounts away from broad Marketplace permissions.

This also helps with private offers. If pricing and terms are negotiated centrally, model access should follow that central agreement rather than being recreated account by account.

The trade-offs

Managed entitlements are not needed for every Bedrock model. Amazon models and some partner models may already be available without Marketplace subscription overhead. Single-account teams may not need this complexity.

For larger organizations, the main design work is governance:

Who approves model subscriptions?
Which accounts receive grants?
How are Regions handled?
How are private offers tracked?
How is model use monitored against budget and policy?
What is the offboarding process when a model is no longer approved?

Access distribution is only one layer. Teams still need IAM permissions, guardrails, logging, evaluation, and cost controls around actual model invocation.

What to do next

Inventory current Bedrock model usage by account. Identify which models require Marketplace subscriptions and which accounts have Marketplace permissions only because they needed model access.

Then pilot managed entitlements with one approved third-party model and a small set of workload accounts. Validate the subscription flow, grant distribution, regional behavior, billing visibility, and access removal.

The practical takeaway is that AI platforms need the same governance maturity as any other shared platform capability. Managed entitlements give AWS organizations a cleaner control point for model access.

Lambda durable functions fit the messy middle of agent workflows

Emiliano Montesdeoca — Mon, 29 Jun 2026 00:00:00 +0000

Agent workflows are rarely a neat chain of fast API calls. They wait for humans, retry model calls, poll external systems, compensate for failed steps, and burn money when the same expensive operation runs twice.

The AWS Compute Blog post on building fault-tolerant multi-agent AI workflows with AWS Lambda durable functions is useful because it focuses on that messy middle. The healthcare prior authorization example is domain-specific, but the orchestration problem is common.

What changed

Lambda durable functions extend the Lambda programming model with checkpoint and replay operations. The source article highlights patterns such as:

context.step() for checkpointed work,
callback waits for human review or external completion signals,
condition waits for polling long-running systems,
replay behavior that skips completed durable operations,
execution metrics and status events for operational visibility.

That means a long-running workflow can pause without paying compute charges during the wait, then resume from the right point when a callback arrives or a condition changes.

Why it matters for AI systems

Multi-agent workflows are expensive and non-deterministic. If an extraction agent, reasoning agent, or synthesis agent has already completed, you usually do not want a transient failure later in the flow to rerun everything from the start.

Durable checkpointing directly addresses that. It also makes human-in-the-loop patterns less awkward. A review step can suspend for hours or days without holding a running process open or building separate queue-and-database plumbing for every workflow.

This is not only an AI feature. It is orchestration for any process where work is valuable, waits are long, and retries must be controlled.

The architectural trade-off

The biggest benefit is also the thing to design carefully: workflow logic lives in code.

That can be cleaner than stitching together many small infrastructure pieces, but it requires discipline. Builders need deterministic step boundaries, idempotent external calls, clear timeout behavior, and replay-aware logging. If a step charges a credit card, submits a claim, sends an email, or updates a ticket, retries must reuse idempotency keys.

You also need to decide when Step Functions is still a better fit. If the team benefits from visual state machines, service integrations, explicit workflow definitions, or non-developer operators inspecting the flow, Step Functions remains strong. Durable functions are attractive when the orchestration is code-heavy, developer-owned, and benefits from staying close to the Lambda handler.

What builders should do next

Do not start by converting every workflow. Start with one painful process that has three properties:

expensive completed steps that should not repeat,
long waits for humans or external systems,
retry behavior that is currently implemented with custom glue.

Then design the durable function around failure cases first. What happens if an agent times out? If the human rejects the result? If the external API accepts a submission but the response is lost? If the workflow exceeds the business deadline?

The source example uses prior authorization, but the pattern applies to code review agents, document processing, procurement approvals, incident remediation, and migration assessment pipelines.

The practical takeaway: durable functions are not just about making Lambda run longer. They are about making long-running workflows resumable, observable, and cheaper to wait on.

Running pgvector on Aurora is a production operations decision

Emiliano Montesdeoca — Thu, 25 Jun 2026 00:00:00 +0000

It is easy to prototype vector search. It is harder to operate it after users, documents, embeddings, and retrieval patterns start changing every day.

The AWS Database Blog post on running pgvector in production on Amazon Aurora PostgreSQL is useful because it moves the conversation away from “can I store embeddings?” and toward “can I keep this retrieval system healthy?”

What changed

The source article covers operational practices for pgvector workloads on Aurora PostgreSQL: choosing index types and distance functions, managing HNSW behavior, using quantization and partitioning, sizing memory, and monitoring the signals that show when the vector store is drifting out of shape.

That is the right level of discussion for production RAG systems. The database is not just a place to put vectors. It is part of the user-facing latency, relevance, and cost profile.

Why builders should care

Aurora PostgreSQL with pgvector is attractive because many teams already understand PostgreSQL. They can keep relational data, metadata filters, access patterns, and embeddings close together. That reduces architecture sprawl for early and mid-sized AI applications.

But familiarity can hide risk. Vector indexes have different maintenance behavior than normal B-tree indexes. Embedding dimensions affect memory. Update and delete patterns can degrade index quality. Query filters can change recall and latency. The database may need to serve both transactional and retrieval traffic.

If you treat pgvector like a small column type, production will teach you otherwise.

The trade-offs

The main decision is managed abstraction versus self-managed control.

Aurora PostgreSQL with pgvector gives control over schema, SQL, transactions, and tuning. That is valuable when retrieval is tightly coupled to application data. Amazon Bedrock Knowledge Bases or other managed retrieval systems reduce operational burden, which can be better when the team does not need direct database-level control.

There is no universal winner. Choose pgvector on Aurora when PostgreSQL integration is a real product advantage. Choose a more managed path when the team mostly wants ingestion, embedding, retrieval, and scaling handled for them.

What to do next

Before putting pgvector-backed retrieval into production, define operational checks:

index type and distance metric per use case,
expected vector count and growth rate,
memory needed to keep hot indexes healthy,
update and deletion behavior,
query latency percentiles under realistic filters,
recall evaluation for representative prompts,
vacuum and maintenance expectations,
fallback behavior when retrieval fails or gets slow.

Also separate prototype metrics from production metrics. A demo with 10,000 documents says little about a system with millions of vectors, concurrent users, and evolving embeddings.

The practical takeaway is simple: pgvector on Aurora can be a strong architecture choice, but only if the team is ready to operate vector search as a database workload, not as a model configuration checkbox.

Lambda MicroVMs make isolated sandboxes a serverless design choice

Emiliano Montesdeoca — Mon, 22 Jun 2026 00:00:00 +0000

AWS Lambda MicroVMs are interesting because they do not try to replace normal Lambda functions. They fill a different gap: workloads where the unit of isolation is not an event, but a user session, coding environment, agent run, scanner job, or other stateful sandbox.

The AWS announcement frames this around isolated sandboxes with full lifecycle control. That is the right framing. The practical value is not only that Firecracker provides VM-level isolation. It is that AWS is exposing a managed way to create, pause, resume, and retire those environments without asking every product team to become a virtualization platform team.

What changed

Lambda MicroVMs add a serverless compute primitive inside the Lambda family for running code in isolated, stateful execution environments. The source article describes several important properties:

each session can run in its own Firecracker-backed MicroVM,
environments can launch and resume from pre-initialized snapshots,
memory, disk, and running process state can survive during the session,
idle environments can be suspended by policy,
applications get a dedicated endpoint and short-lived request authentication.

That combination matters for applications that cannot fit cleanly into stateless request-response functions. A code interpreter, browser automation sandbox, vulnerability scanner, AI coding assistant, data notebook, or game scripting environment often needs process state and filesystem state between interactions.

Why builders should care

The old decision tree was uncomfortable. Virtual machines gave strong isolation but slow startup and more operations. Containers started quickly but shared a kernel, which raises the bar for safely running untrusted code. Lambda functions were operationally simple but not designed for long-running interactive state.

Lambda MicroVMs create a new middle path. For builders, the design conversation becomes more precise:

Use Lambda functions for event handlers and short stateless tasks.
Use containers when you need packaging flexibility and can manage isolation risk.
Use Lambda MicroVMs when each tenant, user, or agent run needs a dedicated stateful sandbox.

This is especially relevant for AI systems. As more applications let agents write code, execute tools, inspect repositories, or process customer files, isolation becomes part of the product boundary. A prompt injection bug should not become a cross-tenant file access bug.

The trade-offs are still real

MicroVMs reduce a lot of infrastructure work, but they do not remove architecture responsibility.

First, lifecycle policy becomes a cost control. If idle sessions stay warm too long, the bill can drift. If they suspend too aggressively, users feel resume latency. Teams should treat idle duration as a product setting, not a default copied from a sample.

Second, snapshot-based startup changes how applications initialize. Code that generates unique state, opens long-lived external connections, or assumes initialization happens once per user action needs careful review.

Third, stateful sandboxes need cleanup rules. Temporary files, credentials, downloaded packages, generated artifacts, and logs can accumulate. Builders should define what survives during a session, what is exported, and what is destroyed.

Finally, security does not stop at VM boundaries. The execution role, outbound network policy, source artifact pipeline, token handling, and tenant mapping are still part of the isolation story.

What to do next

I would start with workloads where the current workaround is obviously expensive: per-user EC2 sandboxes, over-hardened container runners, or Lambda workflows full of awkward /tmp and rehydration logic.

For a proof of concept, validate four things before celebrating:

Cold launch and resume behavior with your real image size and dependencies.
Idle cost profile for normal user behavior, not a synthetic benchmark.
Tenant boundary tests for filesystem, process, network, and IAM access.
Failure cleanup when a session crashes, times out, or is abandoned.

Lambda MicroVMs are not just another Lambda feature. They are AWS acknowledging that the next wave of serverless workloads includes interactive, stateful, sometimes untrusted execution. That is a useful primitive, as long as teams treat it as an isolation architecture rather than a shortcut around security design.