Updating Rate Limiting

As your partners in engineering, our primary commitment is to provide a stable, reliable, and performant platform for you to build on and integrate with. We know that your business relies on our APIs, and a disruption on our end is a disruption for you.

The primary cause of incidents is not malicious intent; rather, it is often due to an accidental runaway script or a bug in an integration that requests large datasets at a rapid rate. Left unchecked it can overwhelm our infrastructure, adversely affecting service for all users within the ecosystem.

This measure aims to mitigate significant unwanted spikes in order to safeguard our platform and, consequently, your applications. And we are implementing a new, more intelligent stability layer to achieve this.

Introducing Token Buckets

You may be familiar with rate limiting, which often just counts the number of requests per second. However, it does not suit a complex system where some operations are much more involved than others. On our platform some API calls are 100x “cheaper” in terms of resources for our systems to process than others. A simple limit (e.g. 50 requests/second) would unnecessarily throttle partners making many cheap requests while allowing excessive system load from partners making many expensive ones.

Our new system is more intelligent. Instead of limiting requests, we limit cost. Here’s how it works:

  • The Bucket:
    Your application will receive a bucket for each agreement that it serves, containing a specific number of tokens (e.g., 2,000 tokens). This is your “burst capacity” per agreement.
  • The Refill:
    Each bucket is refilled with new tokens at a fixed, steady rate (e.g., 30 tokens per second). This is your “sustained rate.”
  • The Cost:
    This is the crucial part. Every endpoint now has a “cost” based on its load on our systems.

    • A simple GET /accounts/{{accountNumber}} might cost 1 token, while a complex GET /self might cost 5 tokens and a data heavy request like GET /invoices/booked/{bookedinvoicenumber} might cost 13 tokens.
  • The Transaction:
    When you make an API call, we check two things:

    • If the bucket has enough tokens to pay the request’s “cost,” the request is processed, and those tokens are deducted from your bucket.
    • If your bucket is empty or has insufficient tokens, the request is rejected with a 429 Too Many Requests error. You must wait for your bucket to refill.

This model is the best of both worlds. It allows you to save up tokens to handle short, high-traffic bursts (up to your bucket’s capacity) while protecting the platform from the kind of sustained, heavy-load traffic that causes outages.

What This Means for You

As we begin this rollout, you’ll get actionable information in your API response headers with every call:

  • X-CallCost:
    What it is: The “price” (in tokens) for the single request you just made. Example: If this says 8, your call cost 8 tokens.
  • X-RateLimiting:
    What it is: Your bucket’s current status. It shows how many tokens you have left. Example: A value like limit-2000-per-60-seconds: 1450/2000 means:

    • You have 1,450 tokens left…
    • …out of a 2,000 token maximum.

    (And this bucket refills over 60 seconds.)

Use these headers to monitor your usage and optimize your integration.

Consider the following changes

The good news is that for the vast majority of our partners operating with standard traffic patterns, we do not expect you to need to change anything. Your integrations will continue to work as normal, however, this change makes it more important than ever to build resilient applications. Therefore we encourage you to review your code and ensure you are following two best practices:

  1. Implement Robust Error Handling
    Never assume an API call will succeed. Your code should gracefully handle different types of failures, especially:

    • Client Errors (4xx): Check for 429 Too Many Requests. This is now the specific signal that you’ve exhausted your token budget.
    • Server Errors (5xx): Our servers may still have transient issues.
    • Network Timeouts: The request may not reach us at all.

    Use try…catch blocks to manage exceptions and check the HTTP status code for every response to prevent your application from crashing.

  2. Respect Rate Limits with Exponential Backoff:
    If you receive a 429 Too Many Requests error, do not immediately retry in a tight loop. This will only drain your refilling tokens and contribute to load.

    The best practice is to implement exponential backoff. This means you wait progressively longer intervals before retrying:

    • Receive a 429 error.
    • Wait 1 second, then retry.
    • Still 429? Wait 2 seconds, then retry.
    • Still 429? Wait 4 seconds, then retry.
    • …and so on, up to a maximum wait time.

    This strategy is more effective, uses your tokens efficiently, and ensures your integration can gracefully recover from high-load situations.

Timeline

We will begin rolling out the new cost-based token bucket system in early December. The rollout will be gradual, and we will be monitoring performance closely. We’re making this change to build a more stable and reliable platform for you to build on. By preventing accidental overload, we ensure the API is fast and available for everyone. Thank you for being a valued partner.

Support

If you have any questions, please reach out to the API support for immediate assistance. If you are concerned about being rate-limited, we assure you that we will be monitoring the system to ensure the bucket size is large enough to handle the normal load. As always, if you are in doubt, please contact our API support.