microtica agents
For whoever's asking — and whoever gets asked

Don't file a ticket.
Ask the agent.

Devs ask. DevOps gets asked. Either way, someone waits. The agent investigates your AWS and Kubernetes the way a senior engineer would — and hands back the root cause, the evidence, and the fix in minutes.

Connects to AWS Kubernetes CloudWatch
Case file #INC-4821
prod API latency
opened 14:02 · ECS · eu-west-1
Status root cause found
Findings 5 · 2 loose
Confidence
Root cause: deploy #482 exhausted the DB connection pool.
01The difference

Ten tabs, or one case.

Without the agent 02:14 · incident open
CloudWatch ECS EC2 RDS CloudTrail ALB Grafana +3
console.aws.amazon.com/cloudwatch/home#metricsV2
zsh — kubectl · aws
$kubectl get pods -n prod
api-server-7f9d CrashLoopBackOff 14m
$aws ecs describe-services --cluster prod
… 1,204 lines of JSON …
$kubectl logs api-server-7f9d --tail=50
# incident-prod
you looking into it…
pm any update?? customers paging
10 tabs · kubectl · aws cli · CloudWatch · Slack — 47 min, still guessing
VS
With the agent one ask · 2 min later
investigation · api-latency Full depth
Why is the prod API slow since 2pm?
Investigating
Pull ALB p99 latency
Scan api-server logs
Diff recent deploys
Check RDS connections
Case · API latency root cause
13:58Deploy #482 → api-server
14:01p99 latency 180ms → 1.4s
14:02DB pool exhausted in logs
Confidence — high

Fix: raise the DB connection pool or roll back deploy #482.

1 prompt · 1 case — root cause + fix
02The relief

Production doesn't have to mean panic.

Whether you're the one asking or the one being asked, the bottleneck is the same: the answer lives in the cloud, and someone has to go dig. Now the agent digs.

Stop waiting in the queue.

Your app, your question. Ask directly and get a senior-level investigation back — no ticket, no waiting for whoever knows the infrastructure.

Stop being the queue.

It does the root-cause legwork a senior SRE would, so every “is prod ok?” doesn't have to land on you.

It can't make things worse.

Read-only by default. It gathers evidence and proposes a fix; nothing changes until you approve it.

Close the ten tabs. The case has what you need.

03Your role

You're still the lead investigator.

The agent does the legwork and assembles the case. You read it, question it, and approve the fix — the call is always yours.

01
Agent does the legwork
It gathers evidence and assembles the findings — you skip the tab-hopping, not the thinking.
02
You decide & approve
Nothing changes until you say so. The diagnosis is a starting point, not an order.
03
Pull in a specialist
Bring an app-code or security agent into the same case when the problem crosses a line.
THE CASE
5 findings
2 loose · moderate risk
Agent
does the legwork
App / code
joins if needed
Security
joins if needed
You
decide & approve
04How it works

Ask, investigate, get a case.

STEP 1

Ask

Describe the problem in plain language. No query syntax, no dashboard-hopping.

STEP 2

Investigate

The agent plans the steps, runs read-only checks, and correlates metrics with logs, events, and deploys.

STEP 3

Get a case

Root-cause findings, an evidence timeline, a confidence read, and the fix — with your approval.

05Capabilities

Built to investigate, not to chat.

Autonomous investigation

Ask once; the agent plans the checks, runs them, and adapts as evidence comes in.

The Case Board

A transparent case: evidence timeline, coverage, confidence balance, recommended fix.

Reads your real cloud

Given read access, it investigates any AWS service in your account — plus your Kubernetes clusters.

Triage or full depth

Pick a quick triage or a full investigation per run — you control how deep it digs, and what it costs.

Memory & skills

Persistent memory and custom skills, so it gets sharper about your infra over time.

Safe & collaborative

Read-only by default, approve-before-act, and shareable with the whole rotation.

06Use cases

Whatever paged you, start by asking.

Cost spikes

$ Our AWS spend is up 30% this week — why?
The resource or deploy that changed, with the cost trail.

Crash-looping pods

$ The api-server pod in prod is crash-looping.
OOMKilled vs. exit code, the offending deploy, and the fix.

Fargate task failures

$ My ECS service won't stay healthy.
Task metrics, health-check config, IAM, and the cause.

Latency regressions

$ The API got slow around 2pm — what changed?
The metric/log/event correlation pinned to a time range.

Deploy-induced incidents

$ Did the last deploy break checkout?
The change correlated to the error spike.

Permission & connectivity

$ The service can't pull from ECR.
The exact IAM policy or security-group rule to fix.

Find the root cause before the next page.