Detailed postmortem of the outage on November 18, 2025.
At Resend, we're accountable for delivering a reliable and resilient service, regardless of where an incident occurs.
On November 18, our customers experienced a service outage impacting their sending from 11:30 UTC to 14:31 UTC due to a wider Cloudflare outage.
All API traffic to Resend is proxied through Cloudflare. Requests enter Cloudflare's network and are then forwarded to our origin infrastructure in AWS. During this incident, this routing path failed, preventing requests from reaching our infrastructure.
We take full accountability for ensuring our services remain available and are resilient to failures, and this incident did not meet that expectation.
The Cloudflare outage directly affected Resend's API, Email API, SMTP and Dashboard. Because Cloudflare is the entry point for all traffic to api.resend.com, the incident prevented requests from reaching our infrastructure.
Cloudflare returned HTTP 500 errors at the edge before traffic could be routed to our AWS origins. As a result, no requests reached our backend services and all API calls failed. Our SMTP service was also impacted due to its dependency on api.resend.com.
Our team was first alerted at 11:31 UTC for intermittent requests failing with a 500 response error, originating from Tokyo (ap-northeast-1) region. At first while monitoring and investigating the issue, we noticed monitors for other regions (us-east-1, sa-east-1 and eu-west-1) started to trigger intermittently with the same behavior. At 12:14 UTC, we started working on a workaround to bypass Cloudflare in an attempt to mitigate the incident and restore services.
api.resend.com started returning Internal Server Error responsesThe ALB traffic shows that Cloudflare connectivity was intermittent at first, but then failed for 1.5 hours.
Datadog monitors for our APIs also showed a similar pattern.
When a request is made, it reaches Cloudflare, is processed at the edge and then routed using Cloudflare Load Balancing, which forwards the traffic to the appropriate origin based on HTTP method and path. These endpoints are AWS Application Load Balancers (ALB) routing traffic to APIs hosted on ECS.
The traffic is split over two internal APIs known as:
/emails routes.Cloudflare Load Balancing is used to send all POST /emails requests to the Email API and the rest to the Resend API.
This load-balancing requirement made changing the entry point of our APIs more difficult than a single DNS entry to a different provider.
After identifying that it was an issue with Cloudflare, we assessed all the services impacted, created a public status page and started working towards a resolution.
We determined that we need to work on a fix for our API entry points into Cloudflare and restore the critical email sending path first. We chose to move the API load balancing that was happening on Cloudflare to AWS CloudFront, since we already had the architecture and knowledge required to do so.
We took longer than we probably could have to deploy and test the solution. Various small issues delayed the fix, like figuring out CloudFront Function dynamic forwarding to AWS ALBs and header whitelisting. Once it was validated that it could work in the development environment, the IaC was reviewed and deployed into production.
We began testing and validated critical endpoints. Once we validated the few key endpoints, it was noted that traffic started to return to normal. We decided not to make the switch to reduce the unknown variables at play.
The CloudFront solution was not deployed, but the runbook was created. If the incident were to recur, we could switch to the fallback within 60 seconds. We continued to monitor and then closed the status page.
To prevent an outage like this from happening again, we're making long term improvements to our infrastructure.
We're making changes so that providers outages doesn't translate into Resend outages.
We identified gaps in alerting and escalation that slowed our response.
Hundreds of thousands of developers trust the email infrastructure every day. We see this kind of outage as unacceptable and apologize for the impact it has had on our users. Thank you for your trust and for your patience.