AWS CloudWatch + AWS Lambda

Automate Cloud Operations by Integrating AWS CloudWatch with AWS Lambda

Turn real-time monitoring alerts into automated actions without writing custom infrastructure code.

Why integrate AWS CloudWatch and AWS Lambda?

AWS CloudWatch and AWS Lambda are two of the most useful services in the AWS ecosystem, and together they're the engine behind most event-driven cloud operations. CloudWatch continuously monitors metrics, logs, and alarms across your AWS infrastructure, while Lambda executes serverless functions in response to virtually any trigger. By integrating these two services through tray.ai, teams can build automated workflows that detect anomalies, respond to threshold breaches, and coordinate cross-service remediation — all without manual intervention.

Automate & integrate AWS CloudWatch & AWS Lambda

Use case

Automated Infrastructure Remediation on CloudWatch Alarms

When a CloudWatch alarm detects a critical threshold breach — CPU utilization exceeding 90%, memory running low — tray.ai can automatically trigger a Lambda function to fix the issue. That might mean restarting unhealthy EC2 instances, scaling out an Auto Scaling group, or flushing a clogged queue. The entire remediation loop runs in seconds without anyone getting paged.

Use case

Real-Time Log Anomaly Detection and Alerting

CloudWatch Logs Insights can be configured to detect anomalous patterns — repeated 5xx error codes, authentication failures, unexpected null responses — and tray.ai can route those findings to a Lambda function that enriches the event with additional context before sending notifications. The right team gets a fully contextualized alert rather than a raw log dump. Engineering, security, and operations teams all receive actionable, relevant information immediately.

Use case

Scheduled Lambda Invocations Triggered by CloudWatch Events

CloudWatch Events (now Amazon EventBridge) supports cron-style scheduling that can invoke Lambda functions at precise intervals for recurring operational tasks. Using tray.ai, teams can manage and extend these scheduled workflows — running nightly database cleanup functions, generating periodic cost reports, or invoking data transformation pipelines. tray.ai adds orchestration logic on top of native scheduling, so you can include conditional branching and downstream notifications.

Use case

Cross-Service Incident Escalation Workflows

When CloudWatch detects a service degradation event, tray.ai can trigger a Lambda function that simultaneously creates a PagerDuty incident, posts a message to a designated Slack channel, and opens a ticket in Jira — all from a single alarm. No incident goes unnoticed and the right stakeholders are notified through their preferred channels. The workflow can also include conditional logic to escalate differently based on alarm severity.

Use case

Serverless Cost Monitoring and Budget Enforcement

CloudWatch metrics expose detailed Lambda invocation counts, execution durations, throttle rates, and error rates that, when analyzed together, reveal cost anomalies and runaway functions. tray.ai can build workflows that monitor these metrics against budget thresholds and automatically trigger a Lambda function to disable or throttle a specific function if it exceeds its allocated cost envelope. Finance and engineering teams get automated reports when budget guardrails kick in.

Use case

Automated Log Archiving and Compliance Reporting

CloudWatch Log Groups accumulate large volumes of operational and application logs that must be retained for compliance purposes. tray.ai can schedule a Lambda function to export specific log groups to S3 on a defined cadence, apply lifecycle policies, and trigger downstream notifications to compliance tools. This replaces fragile custom scripts with a managed, observable automation that works with your existing data retention policies.

Use case

Dynamic Auto-Scaling Triggered by Custom CloudWatch Metrics

Beyond native AWS auto-scaling policies, teams often need custom scaling logic based on application-specific metrics published to CloudWatch — queue depth, active user sessions, business transaction volume. tray.ai can monitor these custom metrics and invoke Lambda functions that execute complex scaling decisions, modify capacity reservations, or interact with third-party infrastructure tools. You get far more sophisticated scaling strategies than native policies allow.

Get started with AWS CloudWatch & AWS Lambda integration today

AWS CloudWatch & AWS Lambda Challenges

What challenges are there when working with AWS CloudWatch & AWS Lambda and how will using Tray.ai help?

Challenge

Managing Event Payload Complexity Between CloudWatch and Lambda

CloudWatch alarm events, log subscription filter events, and scheduled events all have distinct JSON payload structures that need careful mapping before they can be used as Lambda function inputs. Teams often spend significant time writing and maintaining transformation logic to normalize these payloads across different event sources.

How Tray.ai Can Help:

tray.ai's visual data mapping interface lets teams inspect, transform, and normalize CloudWatch event payloads into the exact structure a Lambda function expects — no custom transformation code required. When schemas change, updates can be made visually and take effect immediately across all affected workflows.

Challenge

Handling Lambda Execution Timeouts and Retry Logic

Lambda functions invoked by CloudWatch events may time out, throw errors, or need retry logic with exponential backoff — especially during infrastructure incidents when dependent services are themselves degraded. Without proper retry handling, critical remediation functions can silently fail at exactly the wrong moment.

How Tray.ai Can Help:

tray.ai provides built-in error handling, retry configuration, and dead-letter queue routing at the workflow level, so Lambda invocation failures are captured, retried on a configurable schedule, and escalated to the appropriate team if they exceed the maximum retry count.

Challenge

Cross-Account and Cross-Region CloudWatch Event Routing

Large enterprises often operate multiple AWS accounts and regions, making it difficult to build centralized workflows that respond to CloudWatch alarms from different organizational units without complex cross-account IAM configurations and event bus routing rules.

How Tray.ai Can Help:

tray.ai can connect to multiple AWS accounts and regions simultaneously using distinct credential sets, so you can run centralized workflow orchestration that spans CloudWatch sources and Lambda execution targets across your entire AWS organization without custom cross-account plumbing.

Challenge

Avoiding Runaway Recursive Lambda Invocations

A poorly configured workflow can create recursive loops where a Lambda function writes to a CloudWatch log, which triggers a metric filter alarm, which invokes the same Lambda function again — generating exponential AWS charges and potentially causing service disruptions within minutes.

How Tray.ai Can Help:

tray.ai's workflow engine includes loop detection, execution rate limiting, and configurable cooldown periods between alarm-triggered invocations, giving teams guardrails that catch recursive execution patterns before they turn into costly incidents.

Challenge

Maintaining Observability Over Automated Remediation Actions

When Lambda functions are automatically invoked by CloudWatch alarms to perform remediation, it can be hard to maintain a clear audit trail of exactly what actions were taken, when, and against which resources — which makes post-incident reviews and compliance audits painful.

How Tray.ai Can Help:

tray.ai logs every workflow execution with full input/output data, timestamps, and execution status. That gives you a complete, queryable audit trail of every CloudWatch-triggered Lambda invocation, the remediation actions taken, and the outcomes returned — independent of AWS CloudTrail.

Start using our pre-built AWS CloudWatch & AWS Lambda templates today

Start from scratch or use one of our pre-built AWS CloudWatch & AWS Lambda templates to quickly solve your most common use cases.

AWS CloudWatch & AWS Lambda Templates

Find pre-built AWS CloudWatch & AWS Lambda solutions for common use cases

Browse all templates

Template

CloudWatch Alarm → Lambda Remediation → Slack Notification

This template monitors a CloudWatch alarm for a defined threshold breach, invokes a Lambda function to run automated remediation steps, and sends a detailed Slack notification confirming what happened — including the alarm name, breach value, and remediation outcome.

Steps:

  • Trigger workflow when a CloudWatch alarm transitions to ALARM state
  • Invoke the designated Lambda remediation function with alarm context as payload
  • Parse the Lambda execution response and post a structured summary to a Slack channel

Connectors Used: AWS CloudWatch, AWS Lambda

Template

Scheduled CloudWatch Metrics Report via Lambda to Email

This template uses a CloudWatch Events schedule to invoke a Lambda function at defined intervals, retrieve infrastructure metrics, and generate a formatted summary report delivered via email to engineering and operations stakeholders.

Steps:

  • Trigger workflow on a CloudWatch Events cron schedule (e.g., daily at 8 AM UTC)
  • Invoke Lambda function to query CloudWatch metrics and aggregate data into a report payload
  • Format the report and send it via email or push it to a Confluence page for team visibility

Connectors Used: AWS CloudWatch, AWS Lambda

Template

CloudWatch Log Anomaly → Lambda Enrichment → PagerDuty Incident

This template watches for CloudWatch Logs metric filter matches indicating an anomaly, triggers a Lambda function to enrich the event with additional AWS resource context, and automatically creates a PagerDuty incident with full details pre-populated.

Steps:

  • Detect CloudWatch Logs metric filter alarm triggered by a specific log pattern
  • Invoke Lambda function to retrieve additional context from AWS APIs (e.g., resource tags, recent deployments)
  • Create a PagerDuty incident with enriched details including affected service, severity, and remediation runbook link

Connectors Used: AWS CloudWatch, AWS Lambda

Template

Lambda Error Rate Spike → CloudWatch Alarm → Jira Incident Ticket

This template monitors the CloudWatch Lambda error rate metric and, when a spike crosses a configurable threshold, automatically creates a Jira incident ticket with the function name, error count, time window, and a link to the relevant CloudWatch log group.

Steps:

  • CloudWatch alarm triggers when Lambda error rate exceeds the defined threshold percentage
  • tray.ai retrieves detailed Lambda execution metrics and recent error log samples from CloudWatch Logs
  • Create a Jira issue in the incident tracking project with pre-populated fields and log context attached

Connectors Used: AWS CloudWatch, AWS Lambda

Template

CloudWatch Budget Alarm → Lambda Throttle Function → Finance Notification

This template watches for CloudWatch billing alarms that signal Lambda cost overruns, invokes a Lambda function to apply concurrency throttling to the offending function, and notifies finance and engineering teams with a cost summary report.

Steps:

  • CloudWatch billing alarm triggers when Lambda invocation costs exceed the budget threshold
  • Invoke Lambda management function to reduce reserved concurrency on the identified over-spending function
  • Send cost summary and throttle action confirmation to finance team via email and engineering team via Slack

Connectors Used: AWS CloudWatch, AWS Lambda

Template

Nightly CloudWatch Log Export → Lambda Archiver → S3 Compliance Store

This template runs on a nightly CloudWatch Events schedule to invoke a Lambda function that exports designated CloudWatch Log Groups to an S3 bucket, applies retention tagging, and sends a completion notification to the compliance team.

Steps:

  • CloudWatch Events cron trigger fires nightly to initiate the log archiving workflow
  • Invoke Lambda archiver function to export specified CloudWatch Log Groups to the designated S3 compliance bucket
  • Send archive completion summary including log groups processed, file sizes, and S3 paths to the compliance team via email

Connectors Used: AWS CloudWatch, AWS Lambda