Pricing AI Features: Billing Terms When Your Costs Are Per-Token

Traditional SaaS has predictable unit economics. You pay for cloud infrastructure on a relatively stable monthly basis. Your customer pays a subscription. Your margin is the difference, and it does not change dramatically from month to month. You can forecast revenue, forecast costs, and know what your margins will look like next quarter.

AI features break this model. Your cost per request is no longer a fraction of a cent for a database query and a page render. It is a token-based charge from your LLM provider that varies by model, by input length, by output length, and by feature. A customer who sends ten queries a day costs you a different amount than a customer who sends ten thousand. A customer who uses your AI summarization feature (short inputs, short outputs) costs less than a customer who uses your AI document analysis feature (long inputs, detailed outputs). And your LLM provider can change its token pricing at any time.

If your billing terms were drafted for a flat subscription model and you are now offering AI features with per-token costs underneath, you have a margin problem that will grow with every customer who adopts the AI features heavily. This post covers how to structure billing terms that account for AI-driven variable costs without creating friction that slows adoption.

Why the Margin Math Is Different

In traditional SaaS, your infrastructure cost per customer is roughly flat. Whether the customer logs in once a day or a hundred times, your incremental cost is negligible. This is why flat subscription pricing works: the cost to serve each customer is predictable and the margin is stable.

AI features introduce a direct, variable cost per interaction. Every time a customer uses an AI feature, you incur a charge from your LLM provider. The charge depends on the model used, the volume of tokens processed (input and output), and any additional processing (embeddings, retrieval, function calls). For a product with heavy AI usage, the LLM API costs can become the largest single line item in your cost of goods sold.

The problem is not that AI is expensive. The problem is that AI costs are variable and unpredictable on a per-customer basis, and most B2B SaaS billing terms are not designed for variable costs. If you offer unlimited AI usage on a flat subscription, your heaviest users will erode your margin while your lightest users subsidize them. If the balance tips toward heavy usage (which it will, because customers who adopt AI features tend to use them more over time, not less), your aggregate margin deteriorates.

This is not a theoretical concern. It is the reason that multiple AI-native SaaS companies have shifted from flat pricing to usage-based or hybrid models within their first year of operation. The billing terms need to match the cost structure.

Five Pricing Models for AI Features

There are five approaches to pricing AI features, each with different billing term requirements.

Per-query or per-action pricing charges the customer for each discrete use of the AI feature. One summarization request, one charge. One document analysis, one charge. The billing terms need to define what constitutes a billable action (is a retry the same action or a new one? does a failed request count?), how actions are metered and reported, and how the customer can monitor their usage in real time. This model is transparent and aligns costs directly, but it creates unpredictability for the customer, which can slow adoption.

Token-based or pass-through pricing passes the LLM provider’s token costs through to the customer, typically with a markup. The billing terms need to define how token consumption is calculated, what markup applies, and how the customer can access their consumption data. This model is the most cost-aligned but also the most complex for the customer to understand. Most B2B customers do not think in tokens. Translating token consumption into business-level metrics (queries, documents processed, hours of analysis) makes this model more accessible.

Tiered usage includes a defined allocation of AI usage in the subscription (for example, 1,000 AI queries per month) with overage charges above the threshold. The billing terms need to define the included allocation, how it resets (monthly, annually), what the overage rate is, and how the customer is notified as they approach the threshold. This is the most common model for B2B SaaS with AI features because it gives the customer predictability up to a point while protecting your margin above the threshold.

Hybrid pricing combines a base subscription for the core product with a separate usage-based charge for AI features. The billing terms need to clearly separate which components are covered by the base fee and which generate variable charges. This model works well when the AI features are a distinct value-add on top of an existing product. The key drafting requirement: make sure the customer can easily understand which charges are fixed and which are variable. Confusion about what generates variable charges is where billing disputes start.

Flat rate with fair use cap maintains the simplicity of flat subscription pricing but includes a fair use provision that limits AI usage to reasonable levels. The billing terms need to define what “fair use” means with enough specificity to be enforceable: a numerical cap (queries per month, tokens per month, or a percentage above the median usage for the customer’s tier), what happens when the cap is exceeded (throttling, overage charges, or a conversation about upgrading), and how the cap is communicated. The risk with this model is that a vague fair use provision is difficult to enforce. If you tell a customer they exceeded “fair use” but you never defined the threshold, the customer will push back. Define the number.

Rate Limiting as a Contractual Tool

Rate limiting is a technical mechanism (capping the number of API calls or AI requests a customer can make in a given time window) that also serves a contractual purpose. It is the enforcement mechanism for your usage limits, your fair use provisions, and your tier structure.

Your billing terms should reference rate limiting as a feature of the service, not as a punitive measure. Specify that AI features are subject to usage limits as defined in the Order Form or pricing tier. Specify that requests exceeding the applicable rate limit may be queued, throttled, or rejected. Specify that rate limiting does not constitute a service outage or SLA breach.

That last point matters. If your SLA does not carve out rate limiting from your uptime calculation, a customer who hits a rate limit could argue that the service was “unavailable” during the period their requests were throttled. Your SLA exclusions (discussed in the main series post on SLA exclusions) should explicitly address this.

Upstream Cost Pass-Through

Your LLM provider can change its token pricing. If your provider raises rates and your billing terms lock you into a fixed price with your customer, you absorb the difference. Over a multi-year enterprise contract, this can erode your margin significantly.

Your billing terms should include a provision that allows you to adjust AI-related pricing to reflect increases in third-party AI service costs. The provision should specify how much notice the customer receives before the adjustment takes effect (30 to 60 days is standard), that the adjustment is limited to the actual cost increase (not a general price increase mechanism), and that the customer has the right to terminate if the adjustment exceeds a defined threshold (giving the customer an exit if costs become unreasonable).

This is analogous to the general subprocessor cost pass-through provision discussed in the main series billing terms post, but specific to AI costs. The AI-specific provision matters because AI costs are more volatile than traditional infrastructure costs and the adjustments may be more frequent.

For enterprise customers on negotiated contracts, the cost pass-through provision will be negotiated. Some customers will accept it. Others will want a fixed price for the contract term and will expect you to absorb upstream cost changes as a cost of doing business. Know your margin before you agree to a fixed AI price, and model the downside scenario where your provider raises rates mid-term.

Billing Terms and the Order Form

As covered in the main series post on billing terms, the principle is that deal-specific pricing goes in the Order Form and the billing framework goes in the Terms of Service.

For AI features, this means the Order Form specifies the customer’s pricing tier, any included AI usage allocation, overage rates, and any negotiated cost pass-through terms. The Terms of Service define the billing mechanics: how usage is metered, how invoices are generated, how disputes are handled, and the general framework for rate limiting and fair use.

This separation is particularly important for AI pricing because it allows you to offer different AI pricing structures to different customers without maintaining multiple versions of your Terms of Service. An enterprise customer might negotiate a custom allocation with a negotiated overage rate. A self-serve customer might be on a standard tier from your pricing page. Both operate under the same billing framework in the Terms. The variables are in the Order Form.

This is the seventh post in the AI-Enabled SaaS series. Previous: AI-Specific Acceptable Use: Drawing the Line on What Users Can Do With Your AI Features. Next: AI and Insurance: What Changes in Your Cyber and Tech E&O Coverage.

No Boiler provides self-service legal document generation and educational content. This material and our service is not a substitute for legal advice. Please have a qualified attorney review any documents before relying on them.