AWS Cost Alerting
Plan Metadata
- Plan type:
plan - Parent plan:
N/A - Depends on:
N/A - Status:
documentation
// TODO @blaine validate this please
System Intent
- What is being built: Contract for infrastructure-level budget alerts, EC2 state notifications, and scheduled cost reporting to Discord.
- Primary consumer(s): Platform operators and finance/ops visibility workflows.
- Boundary (black-box scope only): AWS Budgets/SNS/EventBridge/Lambda deployment surfaces, IAM/runtime wiring, SSM-backed Discord configuration, and Discord webhook/interaction delivery behavior.
Stage Gate Tracker
- [x] Stage 1 Mermaid approved
- [x] Stage 2 I/O contracts approved
- [x] Stage 3 acceptance criteria and plan-first tests approved
1. Mermaid Diagram
Reference: .agent/skills/create-mermaid-diagram/SKILL.md
flowchart TD
H[Interaction Build Script — main/devops/lambdas/discord_interaction/build.sh] -->|produces discord_interaction_pkg artifact for terraform archive| A[Cost Alerting Infrastructure Wiring — main/devops/cost_alerts.tf]
A -->|provisions alerts lambda wiring and ec2 budget trigger routes| C[Discord Alerts Handler — main/devops/lambdas/discord_alerts/handler.py]
A -->|provisions report lambda wiring and daily schedule routes| D[Daily Cost Report Handler — main/devops/lambdas/discord_alerts/cost_report.py]
A -->|provisions interaction lambda and function URL output| E[Discord Interaction Handler — main/devops/lambdas/discord_interaction/handler.py]
A -->|defines required SSM parameter names| F[Discord Config Wiring — main/devops/cost_alerts.tf]
F -->|DISCORD_WEBHOOK_URL env value| C
F -->|DISCORD_WEBHOOK_URL env value| D
F -->|DISCORD_APP_ID and DISCORD_PUBLIC_KEY env values| E
D -->|aws cost and instance query request| G[Shared AWS Report Builder — main/devops/lambdas/discord_alerts/aws_report.py]
E -->|slash command report query request| G
classDef unchanged fill:#d3d3d3,stroke:#666,stroke-width:1px;
classDef updated fill:#ffe58a,stroke:#666,stroke-width:1px;
classDef deleted fill:#f4a6a6,stroke:#666,stroke-width:1px;
classDef created fill:#a8e6a3,stroke:#666,stroke-width:1px;
class A,C,D,E,F,G,H unchanged; File status note: this is a documentation-only update, so all runtime nodes above are marked unchanged. External providers are intentionally omitted as black boxes. Terraform .tf implementation details are documented in contract sections below, not in this Mermaid diagram.
2. Black-Box Inputs and Outputs
Keep this short. Define types in JSON-style blocks and capture each flow with path-level rows.
AWS Resource Inventory
The infrastructure deployment in main/devops/cost_alerts.tf is expected to manage these AWS resources:
| terraform address | aws resource kind | purpose |
|---|---|---|
data.aws_ssm_parameter.discord_webhook | SSM Parameter | read webhook URL used by alert and report lambdas |
data.aws_ssm_parameter.discord_public_key | SSM Parameter | read Discord interaction signature verification key |
data.aws_ssm_parameter.discord_app_id | SSM Parameter | read Discord application id used by interaction followups |
resource.aws_sns_topic.cost_alerts | SNS Topic | budget alert event bus |
resource.aws_sns_topic_policy.cost_alerts | SNS Topic Policy | allows AWS Budgets service to publish notifications |
resource.aws_sns_topic_subscription.discord_alerts | SNS Subscription | route SNS budget events to alerts lambda |
resource.aws_cloudwatch_event_rule.ec2_state | EventBridge Rule | capture EC2 state changes |
resource.aws_cloudwatch_event_target.ec2_to_discord | EventBridge Target | send EC2 events to alerts lambda |
resource.aws_cloudwatch_event_rule.daily_cost_report | EventBridge Rule | daily schedule for cost report |
resource.aws_cloudwatch_event_target.daily_cost_to_lambda | EventBridge Target | trigger daily cost report lambda |
resource.aws_budgets_budget.monthly | AWS Budget | monthly cost thresholds and notifications |
resource.aws_lambda_function.discord_alerts | Lambda Function | process budget and EC2 state alerts |
resource.aws_lambda_function.cost_report | Lambda Function | generate and post daily cost summary |
resource.aws_lambda_function.discord_interaction | Lambda Function | process Discord slash commands |
resource.aws_lambda_function_url.discord_interaction | Lambda Function URL | public endpoint for Discord interactions |
data.archive_file.discord_alerts and data.archive_file.discord_interaction | Archive Build Artifact | package lambda deployment zips used by Terraform resources |
resource.aws_iam_role.* and resource.aws_iam_role_policy* | IAM Role/Policy | execution permissions for each lambda |
resource.aws_lambda_permission.* | Lambda Permission | allow SNS/EventBridge/public URL invoke paths |
output.discord_interaction_url | Terraform Output | value used to configure Discord interactions endpoint |
Global Types
Define shared types used across multiple flows.
DiscordWebhookMessage {
content: string
embeds?: list
}
BudgetAlertEvent {
event_source: "aws:sns"
sns_subject?: string
sns_message: string
}
StandardError {
status?: number
code?: string
message: string
}
DiscordSupportInputs {
discord_webhook_url: string
discord_app_id: string
discord_public_key: string
discord_interaction_endpoint_url_configured: boolean
discord_command_scope?: "global" | "guild"
discord_test_guild_id?: string
}
CostAlertingInfrastructureDeploymentInput {
terraform_apply: boolean
aws_region: string
interaction_build_artifact_ready: boolean
required_ssm_parameters: [
"/encache/discord/webhook_url",
"/encache/discord/app_id",
"/encache/discord/public_key"
]
}
CostAlertingInfrastructureDeploymentOutput {
resources_provisioned: {
sns_topic_cost_alerts: string
lambda_discord_alerts: string
lambda_cost_report: string
lambda_discord_interaction: string
event_rule_ec2_state: string
event_rule_daily_cost_report: string
budget_encache_monthly: string
}
deployment_outputs: {
discord_interaction_url: string
}
}
DeploymentReadinessState {
terraform_apply_succeeded: boolean
slash_commands_routable: boolean
}
Flow: cost-alerting-infrastructure-deployment, N/A
Type Definitions
CostAlertingInfrastructureDeploymentContractInput = CostAlertingInfrastructureDeploymentInput
CostAlertingInfrastructureDeploymentContractOutput = CostAlertingInfrastructureDeploymentOutput
Paths
| path-name | input | output/expected state change | path-type | notes | updated |
|---|---|---|---|---|---|
cost-alerting-infrastructure-deployment.success | CostAlertingInfrastructureDeploymentContractInput with all required SSM parameters present | CostAlertingInfrastructureDeploymentContractOutput | happy path | terraform apply provisions budget alerting, reporting, and discord interaction surfaces | |
cost-alerting-infrastructure-deployment.missing-interaction-build-artifact | CostAlertingInfrastructureDeploymentContractInput with interaction_build_artifact_ready=false | StandardError | error | data.archive_file.discord_interaction cannot package lambda when .build/discord_interaction_pkg is missing | |
cost-alerting-infrastructure-deployment.missing-discord-ssm-parameters | CostAlertingInfrastructureDeploymentContractInput missing one or more required SSM parameters | StandardError | error | apply must fail before creating lambdas with incomplete env configuration | |
cost-alerting-infrastructure-deployment.interaction-endpoint-not-configured | CostAlertingInfrastructureDeploymentContractOutput with discord_interaction_url produced but not set in Discord portal | DeploymentReadinessState terraform_apply_succeeded=true slash_commands_routable=false | operational | terraform deploy succeeds, but Discord slash commands stay unroutable until endpoint setup |
Flow: budget-alert-dispatch, N/A
Type Definitions
BudgetAlertDispatchInput {
sns_records: list<BudgetAlertEvent>
}
BudgetAlertDispatchOutput {
status_code: 200
delivery_attempted: boolean
}
Paths
| path-name | input | output/expected state change | path-type | notes | updated |
|---|---|---|---|---|---|
budget-alert-dispatch.success | BudgetAlertDispatchInput with SNS records | BudgetAlertDispatchOutput status_code=200 delivery_attempted=true | happy path | each SNS record message is formatted and sent to Discord webhook | |
budget-alert-dispatch.delivery-failure-logged | BudgetAlertDispatchInput with webhook network failure | BudgetAlertDispatchOutput status_code=200 delivery_attempted=true | subpath | delivery failure is logged in lambda, but handler still returns 200 |
Flow: ec2-state-alert-dispatch, N/A
Type Definitions
Ec2StateAlertInput {
detail-type: "EC2 Instance State-change Notification"
detail.state: string
detail.instance-id: string
}
Ec2StateAlertOutput {
status_code: 200
delivery_attempted: boolean
}
Paths
| path-name | input | output/expected state change | path-type | notes | updated |
|---|---|---|---|---|---|
ec2-state-alert-dispatch.success | Ec2StateAlertInput | Ec2StateAlertOutput status_code=200 delivery_attempted=true | happy path | EC2 state change event is formatted and sent to Discord webhook | |
ec2-state-alert-dispatch.unhandled-source | event without source=aws.ec2 | Ec2StateAlertOutput status_code=200 delivery_attempted=false | subpath | lambda logs unhandled source and exits without Discord post |
Flow: daily-cost-report, N/A
Type Definitions
DailyCostReportInput {
scheduled_event: cron(0 14 * * ? *)
}
DailyCostReportOutput {
status_code: 200
report_body: string
delivery_attempted: boolean
warning_sections?: list<"cost" | "instance_status">
}
Paths
| path-name | input | output/expected state change | path-type | notes | updated |
|---|---|---|---|---|---|
daily-cost-report.success | DailyCostReportInput | DailyCostReportOutput status_code=200 delivery_attempted=true | happy path | Lambda reads Cost Explorer + EC2 and posts summary body to Discord | |
daily-cost-report-query-degraded | DailyCostReportInput with Cost Explorer or EC2 API failure | DailyCostReportOutput status_code=200 warning_sections includes failed section | subpath | report body includes warning text and continues for remaining sections | |
daily-cost-report.delivery-failure-logged | DailyCostReportInput with webhook network failure | DailyCostReportOutput status_code=200 delivery_attempted=true | subpath | webhook delivery failure is logged but lambda still returns 200 |
Flow: discord-slash-command-interaction, N/A
Type Definitions
DiscordSlashCommandInput {
interaction_type: number
command_name?: "help" | "cost" | "status"
interaction_token?: string
signature_headers: map<string, string>
}
DiscordSlashCommandOutput {
status_code: 200 | 400 | 401
interaction_response_type?: 1 | 4 | 5
followup_delivery?: boolean
}
Paths
| path-name | input | output/expected state change | path-type | notes | updated |
|---|---|---|---|---|---|
discord-slash-command.ping | DiscordSlashCommandInput interaction_type=1 | DiscordSlashCommandOutput status_code=200 interaction_response_type=1 | happy path | Discord interaction health check | |
discord-slash-command.help | DiscordSlashCommandInput command_name=help | DiscordSlashCommandOutput status_code=200 interaction_response_type=4 | happy path | inline command help message | |
discord-slash-command.cost-or-status | DiscordSlashCommandInput command_name=cost|status with interaction_token | DiscordSlashCommandOutput status_code=200 interaction_response_type=5 followup_delivery=true | happy path | deferred response then async followup PATCH to Discord | |
discord-slash-command.missing-token | DiscordSlashCommandInput command_name=cost|status without interaction_token | DiscordSlashCommandOutput status_code=200 interaction_response_type=4 | error | returns inline error message, no async followup | |
discord-slash-command.self-invoke-failure | DiscordSlashCommandInput command_name=cost|status and lambda invoke fails | DiscordSlashCommandOutput status_code=200 interaction_response_type=4 | error | returns inline failure message, no deferred response | |
discord-slash-command.unknown-command | DiscordSlashCommandInput command_name unsupported | DiscordSlashCommandOutput status_code=200 interaction_response_type=4 | subpath | returns Unknown command inline response | |
discord-slash-command.invalid-signature | DiscordSlashCommandInput with invalid signature | DiscordSlashCommandOutput status_code=401 | error | request rejected before command routing | |
discord-slash-command.unhandled-interaction-type | DiscordSlashCommandInput interaction_type not 1 or 2 | DiscordSlashCommandOutput status_code=400 | error | handler returns Unhandled interaction type body | |
discord-slash-command.followup-delivery-failure-logged | Async followup where Discord PATCH fails | DiscordSlashCommandOutput status_code=200 followup_delivery=false | subpath | failure is logged in async path, lambda still returns 200 |
Flow: discord-support-intake, N/A
Type Definitions
DiscordSupportIntakeInput {
user_provided_values: DiscordSupportInputs
}
DiscordSupportIntakeOutput {
completeness: "complete" | "incomplete"
missing_fields?: list<string>
}
Paths
| path-name | input | output/expected state change | path-type | notes | updated |
|---|---|---|---|---|---|
discord-support-intake.complete-runtime-fields | DiscordSupportIntakeInput with discord_webhook_url discord_app_id discord_public_key discord_interaction_endpoint_url_configured | DiscordSupportIntakeOutput completeness=complete | happy path | ready for SSM bootstrap and deployment validation | |
discord-support-intake.missing-runtime-fields | DiscordSupportIntakeInput missing one or more runtime-required fields | DiscordSupportIntakeOutput completeness=incomplete | error | block deployment until runtime-required fields are present | |
discord-support-intake.rollout-fields-optional | DiscordSupportIntakeInput without command_scope or test_guild_id | DiscordSupportIntakeOutput completeness=complete | subpath | rollout strategy fields are optional and not consumed by runtime code |
3. Acceptance Criteria and Plan-First Tests
| test-name | input | pass criteria | fail criteria | updated |
|---|---|---|---|---|
discord-inputs-complete-before-apply | Completed in-plan Discord intake template in aws-cost-alerting.md | All required values are present and mapped to SSM parameter names | Any required field is missing or marked unknown | |
discord-interaction-package-built-before-apply | main/devops/lambdas/discord_interaction/build.sh executed before terraform apply | main/devops/.build/discord_interaction_pkg exists and terraform archive step succeeds | terraform archive step fails because interaction package folder is missing | |
terraform-deploy-cost-alerting-infra | Terraform apply for main/devops/cost_alerts.tf with required SSM inputs present | Apply completes and creates SNS, EventBridge rules, Lambdas, budget notifications, and interaction URL output | Apply fails or required resources/outputs are missing | |
terraform-deploy-missing-ssm-fails-fast | Terraform apply with missing /encache/discord/webhook_url or /encache/discord/app_id or /encache/discord/public_key | Deployment is blocked and missing-input error is surfaced | Deployment succeeds with unresolved/missing Discord config | |
discord-interactions-endpoint-wired | discord_interaction_url output configured in Discord Developer Portal | /help, /cost, and /status are routable to interaction lambda | Commands remain unavailable or route to wrong endpoint | |
budget-alert-dispatch-to-discord | Budget threshold notification published to encache-cost-alerts SNS topic | Alert appears in the configured Discord channel webhook | No Discord message appears or lambda logs delivery failure | |
daily-cost-report-to-discord | Scheduled or manual invocation of encache-cost-report | Daily report message appears in Discord; warnings may appear inline when AWS queries partially fail | Missing report message | |
discord-command-cost-success | Valid /cost interaction payload with valid signature | Deferred response is returned and followup report is posted | Signature fails, deferred response missing, or followup not delivered | |
discord-command-missing-token | Valid signed /cost or /status payload with missing token | Inline type 4 error response is returned with status 200 | Handler crashes or returns deferred response without token | |
discord-command-unknown-command | Valid signed payload with unknown command name | Inline type 4 Unknown command response is returned with status 200 | Handler returns 400/500 instead of inline unknown command response | |
discord-command-invalid-signature-rejected | Interaction payload with invalid signature headers | Request returns 401 and no followup is sent | Request is accepted despite invalid signature |
Operational prerequisites: 1. Fill the in-plan Discord intake template with values. 2. Run main/devops/lambdas/discord_interaction/build.sh before terraform apply. 3. Confirm target Discord channel for webhook alerts and reports. 4. Optionally define command rollout strategy (global vs guild) in the intake template.
4. Pseudocode / Technical Details for Critical Flows (Optional)
Deployment Prerequisites (Technical)
Discord Inputs Required From User
Before implementation or terraform apply, complete this in-plan Discord intake section.
| required-item | used-by | source in Discord | value | updated |
|---|---|---|---|---|
discord_webhook_url | main/devops/lambdas/discord_alerts/handler.py; main/devops/lambdas/discord_alerts/cost_report.py | Channel settings -> Integrations -> Webhooks | TBD by user | |
discord_app_id | main/devops/lambdas/discord_interaction/handler.py | Discord Developer Portal -> General Information -> Application ID | TBD by user | |
discord_public_key | main/devops/lambdas/discord_interaction/handler.py | Discord Developer Portal -> General Information -> Public Key | TBD by user | |
discord_interaction_endpoint_url_configured | Discord slash command routing to encache-discord-interaction function URL | Discord Developer Portal -> Interactions Endpoint URL | TBD by user |
Operational rollout inputs not consumed directly by current runtime: | optional-item | purpose | value | updated | | --- | --- | --- | --- | | discord_command_scope | command rollout strategy only (global or guild) | TBD by user | | | discord_test_guild_id | faster testing iteration for guild-scoped command registration | optional; recommended | |
Discord Intake Template
Do not commit real secrets. Store secret values in AWS SSM Parameter Store and keep this plan table as status-only.
| field | required | description | example | value |
|---|---|---|---|---|
discord_webhook_url | yes | Incoming webhook URL for the alert/report channel | https://discord.com/api/webhooks/... | TBD |
discord_app_id | yes | Discord Application ID used by interaction followup API | 123456789012345678 | TBD |
discord_public_key | yes | Discord public key used to verify interaction signatures | hex string | TBD |
discord_interaction_endpoint_url_configured | yes | Whether discord_interaction_url Terraform output has been set in Discord Developer Portal | true | TBD |
discord_command_scope | optional | Slash command rollout scope | global or guild | TBD |
discord_test_guild_id | optional | Guild ID for fast command validation | 123456789012345678 | TBD |
discord_alert_channel_name | yes | Human-readable channel for alerts/reports | aws-alerts | TBD |
Discord Setup Checklist
- [ ] Confirm the Discord channel where budget alerts and daily reports should be posted.
- [ ] Create or select incoming webhook for that channel.
- [ ] Create or verify the Discord application for slash commands.
- [ ] Capture Application ID and Public Key from Discord Developer Portal.
- [ ] Set Discord Interactions Endpoint URL to Terraform output
discord_interaction_url. - [ ] Optionally decide and record command scope (
globalorguild). - [ ] If using
guildscope, optionally providediscord_test_guild_id.
AWS SSM Parameter Mapping
aws ssm put-parameter \
--name /encache/discord/webhook_url \
--value "<discord_webhook_url>" \
--type SecureString \
--overwrite \
--region us-east-1
aws ssm put-parameter \
--name /encache/discord/app_id \
--value "<discord_app_id>" \
--type String \
--overwrite \
--region us-east-1
aws ssm put-parameter \
--name /encache/discord/public_key \
--value "<discord_public_key>" \
--type String \
--overwrite \
--region us-east-1
Discord Intake Sign-off
| owner | date | notes |
|---|---|---|
TBD | TBD | TBD |
Implementation Prerequisites
- Required SSM parameters must exist in AWS before deployment:
/encache/discord/webhook_url/encache/discord/app_id/encache/discord/public_key- Required build artifact must exist before terraform apply:
- run
main/devops/lambdas/discord_interaction/build.shto populatemain/devops/.build/discord_interaction_pkg -
Required owner sign-off for Discord intake must be present before deployment validation.
-
Flow name::
alert-pipeline - Flow name::
slash-command-pipeline - Flow name::
infrastructure-deployment-pipelinevalidate required Discord inputs from in-plan intake template write Discord inputs to required SSM parameter names run main/devops/lambdas/discord_interaction/build.sh run terraform apply for main/devops/cost_alerts.tf read discord_interaction_url output set Discord interactions endpoint URL in Discord Developer Portal run command and webhook smoke tests - Implementation notes: Runtime readiness requires completed Discord intake values and the interaction lambda build artifact.
5. Handoff to Related Plan Reconciliation
After all stages are approved, apply .agent/skills/reconcile-plans/SKILL.md to propagate contract updates across linked plans.