Skip to content

AWS Cost Alerting

Plan Metadata

  • Plan type: plan
  • Parent plan: N/A
  • Depends on: N/A
  • Status: documentation

// TODO @blaine validate this please

System Intent

  • What is being built: Contract for infrastructure-level budget alerts, EC2 state notifications, and scheduled cost reporting to Discord.
  • Primary consumer(s): Platform operators and finance/ops visibility workflows.
  • Boundary (black-box scope only): AWS Budgets/SNS/EventBridge/Lambda deployment surfaces, IAM/runtime wiring, SSM-backed Discord configuration, and Discord webhook/interaction delivery behavior.

Stage Gate Tracker

  • [x] Stage 1 Mermaid approved
  • [x] Stage 2 I/O contracts approved
  • [x] Stage 3 acceptance criteria and plan-first tests approved

1. Mermaid Diagram

Reference: .agent/skills/create-mermaid-diagram/SKILL.md

flowchart TD
  H[Interaction Build Script — main/devops/lambdas/discord_interaction/build.sh] -->|produces discord_interaction_pkg artifact for terraform archive| A[Cost Alerting Infrastructure Wiring — main/devops/cost_alerts.tf]
  A -->|provisions alerts lambda wiring and ec2 budget trigger routes| C[Discord Alerts Handler — main/devops/lambdas/discord_alerts/handler.py]
  A -->|provisions report lambda wiring and daily schedule routes| D[Daily Cost Report Handler — main/devops/lambdas/discord_alerts/cost_report.py]
  A -->|provisions interaction lambda and function URL output| E[Discord Interaction Handler — main/devops/lambdas/discord_interaction/handler.py]
  A -->|defines required SSM parameter names| F[Discord Config Wiring — main/devops/cost_alerts.tf]
  F -->|DISCORD_WEBHOOK_URL env value| C
  F -->|DISCORD_WEBHOOK_URL env value| D
  F -->|DISCORD_APP_ID and DISCORD_PUBLIC_KEY env values| E
  D -->|aws cost and instance query request| G[Shared AWS Report Builder — main/devops/lambdas/discord_alerts/aws_report.py]
  E -->|slash command report query request| G

  classDef unchanged fill:#d3d3d3,stroke:#666,stroke-width:1px;
  classDef updated fill:#ffe58a,stroke:#666,stroke-width:1px;
  classDef deleted fill:#f4a6a6,stroke:#666,stroke-width:1px;
  classDef created fill:#a8e6a3,stroke:#666,stroke-width:1px;

  class A,C,D,E,F,G,H unchanged;

File status note: this is a documentation-only update, so all runtime nodes above are marked unchanged. External providers are intentionally omitted as black boxes. Terraform .tf implementation details are documented in contract sections below, not in this Mermaid diagram.

2. Black-Box Inputs and Outputs

Keep this short. Define types in JSON-style blocks and capture each flow with path-level rows.

AWS Resource Inventory

The infrastructure deployment in main/devops/cost_alerts.tf is expected to manage these AWS resources:

terraform address aws resource kind purpose
data.aws_ssm_parameter.discord_webhook SSM Parameter read webhook URL used by alert and report lambdas
data.aws_ssm_parameter.discord_public_key SSM Parameter read Discord interaction signature verification key
data.aws_ssm_parameter.discord_app_id SSM Parameter read Discord application id used by interaction followups
resource.aws_sns_topic.cost_alerts SNS Topic budget alert event bus
resource.aws_sns_topic_policy.cost_alerts SNS Topic Policy allows AWS Budgets service to publish notifications
resource.aws_sns_topic_subscription.discord_alerts SNS Subscription route SNS budget events to alerts lambda
resource.aws_cloudwatch_event_rule.ec2_state EventBridge Rule capture EC2 state changes
resource.aws_cloudwatch_event_target.ec2_to_discord EventBridge Target send EC2 events to alerts lambda
resource.aws_cloudwatch_event_rule.daily_cost_report EventBridge Rule daily schedule for cost report
resource.aws_cloudwatch_event_target.daily_cost_to_lambda EventBridge Target trigger daily cost report lambda
resource.aws_budgets_budget.monthly AWS Budget monthly cost thresholds and notifications
resource.aws_lambda_function.discord_alerts Lambda Function process budget and EC2 state alerts
resource.aws_lambda_function.cost_report Lambda Function generate and post daily cost summary
resource.aws_lambda_function.discord_interaction Lambda Function process Discord slash commands
resource.aws_lambda_function_url.discord_interaction Lambda Function URL public endpoint for Discord interactions
data.archive_file.discord_alerts and data.archive_file.discord_interaction Archive Build Artifact package lambda deployment zips used by Terraform resources
resource.aws_iam_role.* and resource.aws_iam_role_policy* IAM Role/Policy execution permissions for each lambda
resource.aws_lambda_permission.* Lambda Permission allow SNS/EventBridge/public URL invoke paths
output.discord_interaction_url Terraform Output value used to configure Discord interactions endpoint

Global Types

Define shared types used across multiple flows.

DiscordWebhookMessage {
  content: string
  embeds?: list
}

BudgetAlertEvent {
  event_source: "aws:sns"
  sns_subject?: string
  sns_message: string
}

StandardError {
  status?: number
  code?: string
  message: string
}

DiscordSupportInputs {
  discord_webhook_url: string
  discord_app_id: string
  discord_public_key: string
  discord_interaction_endpoint_url_configured: boolean
  discord_command_scope?: "global" | "guild"
  discord_test_guild_id?: string
}

CostAlertingInfrastructureDeploymentInput {
  terraform_apply: boolean
  aws_region: string
  interaction_build_artifact_ready: boolean
  required_ssm_parameters: [
    "/encache/discord/webhook_url",
    "/encache/discord/app_id",
    "/encache/discord/public_key"
  ]
}

CostAlertingInfrastructureDeploymentOutput {
  resources_provisioned: {
    sns_topic_cost_alerts: string
    lambda_discord_alerts: string
    lambda_cost_report: string
    lambda_discord_interaction: string
    event_rule_ec2_state: string
    event_rule_daily_cost_report: string
    budget_encache_monthly: string
  }
  deployment_outputs: {
    discord_interaction_url: string
  }
}

DeploymentReadinessState {
  terraform_apply_succeeded: boolean
  slash_commands_routable: boolean
}

Flow: cost-alerting-infrastructure-deployment, N/A

Type Definitions

CostAlertingInfrastructureDeploymentContractInput = CostAlertingInfrastructureDeploymentInput

CostAlertingInfrastructureDeploymentContractOutput = CostAlertingInfrastructureDeploymentOutput

Paths

path-name input output/expected state change path-type notes updated
cost-alerting-infrastructure-deployment.success CostAlertingInfrastructureDeploymentContractInput with all required SSM parameters present CostAlertingInfrastructureDeploymentContractOutput happy path terraform apply provisions budget alerting, reporting, and discord interaction surfaces
cost-alerting-infrastructure-deployment.missing-interaction-build-artifact CostAlertingInfrastructureDeploymentContractInput with interaction_build_artifact_ready=false StandardError error data.archive_file.discord_interaction cannot package lambda when .build/discord_interaction_pkg is missing
cost-alerting-infrastructure-deployment.missing-discord-ssm-parameters CostAlertingInfrastructureDeploymentContractInput missing one or more required SSM parameters StandardError error apply must fail before creating lambdas with incomplete env configuration
cost-alerting-infrastructure-deployment.interaction-endpoint-not-configured CostAlertingInfrastructureDeploymentContractOutput with discord_interaction_url produced but not set in Discord portal DeploymentReadinessState terraform_apply_succeeded=true slash_commands_routable=false operational terraform deploy succeeds, but Discord slash commands stay unroutable until endpoint setup

Flow: budget-alert-dispatch, N/A

Type Definitions

BudgetAlertDispatchInput {
  sns_records: list<BudgetAlertEvent>
}

BudgetAlertDispatchOutput {
  status_code: 200
  delivery_attempted: boolean
}

Paths

path-name input output/expected state change path-type notes updated
budget-alert-dispatch.success BudgetAlertDispatchInput with SNS records BudgetAlertDispatchOutput status_code=200 delivery_attempted=true happy path each SNS record message is formatted and sent to Discord webhook
budget-alert-dispatch.delivery-failure-logged BudgetAlertDispatchInput with webhook network failure BudgetAlertDispatchOutput status_code=200 delivery_attempted=true subpath delivery failure is logged in lambda, but handler still returns 200

Flow: ec2-state-alert-dispatch, N/A

Type Definitions

Ec2StateAlertInput {
  detail-type: "EC2 Instance State-change Notification"
  detail.state: string
  detail.instance-id: string
}

Ec2StateAlertOutput {
  status_code: 200
  delivery_attempted: boolean
}

Paths

path-name input output/expected state change path-type notes updated
ec2-state-alert-dispatch.success Ec2StateAlertInput Ec2StateAlertOutput status_code=200 delivery_attempted=true happy path EC2 state change event is formatted and sent to Discord webhook
ec2-state-alert-dispatch.unhandled-source event without source=aws.ec2 Ec2StateAlertOutput status_code=200 delivery_attempted=false subpath lambda logs unhandled source and exits without Discord post

Flow: daily-cost-report, N/A

Type Definitions

DailyCostReportInput {
  scheduled_event: cron(0 14 * * ? *)
}

DailyCostReportOutput {
  status_code: 200
  report_body: string
  delivery_attempted: boolean
  warning_sections?: list<"cost" | "instance_status">
}

Paths

path-name input output/expected state change path-type notes updated
daily-cost-report.success DailyCostReportInput DailyCostReportOutput status_code=200 delivery_attempted=true happy path Lambda reads Cost Explorer + EC2 and posts summary body to Discord
daily-cost-report-query-degraded DailyCostReportInput with Cost Explorer or EC2 API failure DailyCostReportOutput status_code=200 warning_sections includes failed section subpath report body includes warning text and continues for remaining sections
daily-cost-report.delivery-failure-logged DailyCostReportInput with webhook network failure DailyCostReportOutput status_code=200 delivery_attempted=true subpath webhook delivery failure is logged but lambda still returns 200

Flow: discord-slash-command-interaction, N/A

Type Definitions

DiscordSlashCommandInput {
  interaction_type: number
  command_name?: "help" | "cost" | "status"
  interaction_token?: string
  signature_headers: map<string, string>
}

DiscordSlashCommandOutput {
  status_code: 200 | 400 | 401
  interaction_response_type?: 1 | 4 | 5
  followup_delivery?: boolean
}

Paths

path-name input output/expected state change path-type notes updated
discord-slash-command.ping DiscordSlashCommandInput interaction_type=1 DiscordSlashCommandOutput status_code=200 interaction_response_type=1 happy path Discord interaction health check
discord-slash-command.help DiscordSlashCommandInput command_name=help DiscordSlashCommandOutput status_code=200 interaction_response_type=4 happy path inline command help message
discord-slash-command.cost-or-status DiscordSlashCommandInput command_name=cost|status with interaction_token DiscordSlashCommandOutput status_code=200 interaction_response_type=5 followup_delivery=true happy path deferred response then async followup PATCH to Discord
discord-slash-command.missing-token DiscordSlashCommandInput command_name=cost|status without interaction_token DiscordSlashCommandOutput status_code=200 interaction_response_type=4 error returns inline error message, no async followup
discord-slash-command.self-invoke-failure DiscordSlashCommandInput command_name=cost|status and lambda invoke fails DiscordSlashCommandOutput status_code=200 interaction_response_type=4 error returns inline failure message, no deferred response
discord-slash-command.unknown-command DiscordSlashCommandInput command_name unsupported DiscordSlashCommandOutput status_code=200 interaction_response_type=4 subpath returns Unknown command inline response
discord-slash-command.invalid-signature DiscordSlashCommandInput with invalid signature DiscordSlashCommandOutput status_code=401 error request rejected before command routing
discord-slash-command.unhandled-interaction-type DiscordSlashCommandInput interaction_type not 1 or 2 DiscordSlashCommandOutput status_code=400 error handler returns Unhandled interaction type body
discord-slash-command.followup-delivery-failure-logged Async followup where Discord PATCH fails DiscordSlashCommandOutput status_code=200 followup_delivery=false subpath failure is logged in async path, lambda still returns 200

Flow: discord-support-intake, N/A

Type Definitions

DiscordSupportIntakeInput {
  user_provided_values: DiscordSupportInputs
}

DiscordSupportIntakeOutput {
  completeness: "complete" | "incomplete"
  missing_fields?: list<string>
}

Paths

path-name input output/expected state change path-type notes updated
discord-support-intake.complete-runtime-fields DiscordSupportIntakeInput with discord_webhook_url discord_app_id discord_public_key discord_interaction_endpoint_url_configured DiscordSupportIntakeOutput completeness=complete happy path ready for SSM bootstrap and deployment validation
discord-support-intake.missing-runtime-fields DiscordSupportIntakeInput missing one or more runtime-required fields DiscordSupportIntakeOutput completeness=incomplete error block deployment until runtime-required fields are present
discord-support-intake.rollout-fields-optional DiscordSupportIntakeInput without command_scope or test_guild_id DiscordSupportIntakeOutput completeness=complete subpath rollout strategy fields are optional and not consumed by runtime code

3. Acceptance Criteria and Plan-First Tests

test-name input pass criteria fail criteria updated
discord-inputs-complete-before-apply Completed in-plan Discord intake template in aws-cost-alerting.md All required values are present and mapped to SSM parameter names Any required field is missing or marked unknown
discord-interaction-package-built-before-apply main/devops/lambdas/discord_interaction/build.sh executed before terraform apply main/devops/.build/discord_interaction_pkg exists and terraform archive step succeeds terraform archive step fails because interaction package folder is missing
terraform-deploy-cost-alerting-infra Terraform apply for main/devops/cost_alerts.tf with required SSM inputs present Apply completes and creates SNS, EventBridge rules, Lambdas, budget notifications, and interaction URL output Apply fails or required resources/outputs are missing
terraform-deploy-missing-ssm-fails-fast Terraform apply with missing /encache/discord/webhook_url or /encache/discord/app_id or /encache/discord/public_key Deployment is blocked and missing-input error is surfaced Deployment succeeds with unresolved/missing Discord config
discord-interactions-endpoint-wired discord_interaction_url output configured in Discord Developer Portal /help, /cost, and /status are routable to interaction lambda Commands remain unavailable or route to wrong endpoint
budget-alert-dispatch-to-discord Budget threshold notification published to encache-cost-alerts SNS topic Alert appears in the configured Discord channel webhook No Discord message appears or lambda logs delivery failure
daily-cost-report-to-discord Scheduled or manual invocation of encache-cost-report Daily report message appears in Discord; warnings may appear inline when AWS queries partially fail Missing report message
discord-command-cost-success Valid /cost interaction payload with valid signature Deferred response is returned and followup report is posted Signature fails, deferred response missing, or followup not delivered
discord-command-missing-token Valid signed /cost or /status payload with missing token Inline type 4 error response is returned with status 200 Handler crashes or returns deferred response without token
discord-command-unknown-command Valid signed payload with unknown command name Inline type 4 Unknown command response is returned with status 200 Handler returns 400/500 instead of inline unknown command response
discord-command-invalid-signature-rejected Interaction payload with invalid signature headers Request returns 401 and no followup is sent Request is accepted despite invalid signature

Operational prerequisites: 1. Fill the in-plan Discord intake template with values. 2. Run main/devops/lambdas/discord_interaction/build.sh before terraform apply. 3. Confirm target Discord channel for webhook alerts and reports. 4. Optionally define command rollout strategy (global vs guild) in the intake template.

4. Pseudocode / Technical Details for Critical Flows (Optional)

Deployment Prerequisites (Technical)

Discord Inputs Required From User

Before implementation or terraform apply, complete this in-plan Discord intake section.

required-item used-by source in Discord value updated
discord_webhook_url main/devops/lambdas/discord_alerts/handler.py; main/devops/lambdas/discord_alerts/cost_report.py Channel settings -> Integrations -> Webhooks TBD by user
discord_app_id main/devops/lambdas/discord_interaction/handler.py Discord Developer Portal -> General Information -> Application ID TBD by user
discord_public_key main/devops/lambdas/discord_interaction/handler.py Discord Developer Portal -> General Information -> Public Key TBD by user
discord_interaction_endpoint_url_configured Discord slash command routing to encache-discord-interaction function URL Discord Developer Portal -> Interactions Endpoint URL TBD by user

Operational rollout inputs not consumed directly by current runtime: | optional-item | purpose | value | updated | | --- | --- | --- | --- | | discord_command_scope | command rollout strategy only (global or guild) | TBD by user | | | discord_test_guild_id | faster testing iteration for guild-scoped command registration | optional; recommended | |

Discord Intake Template

Do not commit real secrets. Store secret values in AWS SSM Parameter Store and keep this plan table as status-only.

field required description example value
discord_webhook_url yes Incoming webhook URL for the alert/report channel https://discord.com/api/webhooks/... TBD
discord_app_id yes Discord Application ID used by interaction followup API 123456789012345678 TBD
discord_public_key yes Discord public key used to verify interaction signatures hex string TBD
discord_interaction_endpoint_url_configured yes Whether discord_interaction_url Terraform output has been set in Discord Developer Portal true TBD
discord_command_scope optional Slash command rollout scope global or guild TBD
discord_test_guild_id optional Guild ID for fast command validation 123456789012345678 TBD
discord_alert_channel_name yes Human-readable channel for alerts/reports aws-alerts TBD

Discord Setup Checklist

  • [ ] Confirm the Discord channel where budget alerts and daily reports should be posted.
  • [ ] Create or select incoming webhook for that channel.
  • [ ] Create or verify the Discord application for slash commands.
  • [ ] Capture Application ID and Public Key from Discord Developer Portal.
  • [ ] Set Discord Interactions Endpoint URL to Terraform output discord_interaction_url.
  • [ ] Optionally decide and record command scope (global or guild).
  • [ ] If using guild scope, optionally provide discord_test_guild_id.

AWS SSM Parameter Mapping

aws ssm put-parameter \
  --name /encache/discord/webhook_url \
  --value "<discord_webhook_url>" \
  --type SecureString \
  --overwrite \
  --region us-east-1

aws ssm put-parameter \
  --name /encache/discord/app_id \
  --value "<discord_app_id>" \
  --type String \
  --overwrite \
  --region us-east-1

aws ssm put-parameter \
  --name /encache/discord/public_key \
  --value "<discord_public_key>" \
  --type String \
  --overwrite \
  --region us-east-1

Discord Intake Sign-off

owner date notes
TBD TBD TBD

Implementation Prerequisites

  • Required SSM parameters must exist in AWS before deployment:
  • /encache/discord/webhook_url
  • /encache/discord/app_id
  • /encache/discord/public_key
  • Required build artifact must exist before terraform apply:
  • run main/devops/lambdas/discord_interaction/build.sh to populate main/devops/.build/discord_interaction_pkg
  • Required owner sign-off for Discord intake must be present before deployment validation.

  • Flow name:: alert-pipeline

    receive budget/scheduled/state event
    transform event into Discord payload
    read webhook from secure config
    POST payload to Discord webhook
    

  • Flow name:: slash-command-pipeline
    receive Discord interaction HTTP payload
    verify ed25519 signature with DISCORD_PUBLIC_KEY
    if command is help then return immediate response
    if command is cost or status then return deferred response
    self invoke lambda async, build AWS report, patch deferred Discord message
    
  • Flow name:: infrastructure-deployment-pipeline
    validate required Discord inputs from in-plan intake template
    write Discord inputs to required SSM parameter names
    run main/devops/lambdas/discord_interaction/build.sh
    run terraform apply for main/devops/cost_alerts.tf
    read discord_interaction_url output
    set Discord interactions endpoint URL in Discord Developer Portal
    run command and webhook smoke tests
    
  • Implementation notes: Runtime readiness requires completed Discord intake values and the interaction lambda build artifact.

After all stages are approved, apply .agent/skills/reconcile-plans/SKILL.md to propagate contract updates across linked plans.