InferLane is free at the point of use for most workloads. This page tells you exactly how that works, so you can verify that our incentives line up with yours.
We have four independent revenue legs. None of them requires charging you per request.
When you route traffic through our hosted endpoint using your own provider API key, we add a small percentage markup on the provider's cost (typically 5–10%). The markup funds the router, moderation gate, and fuel gauge you're using.
We route enough volume to qualify for privately-negotiated rates with many model providers. We quote you near rack rate and are invoiced at the partnership rate. The delta is our revenue. See the rebate table below for the providers we have disclosed arrangements with.
Enterprise customers can pre-purchase a block of inference capacity for a specified period at a fixed per-token rate. This is a commercial volume commitment — not a financial instrument, not tradeable, not transferable to other parties. The difference between their committed rate and the spot rate at fulfilment is our margin (positive or negative).
Advanced tooling (team budgets, Slack alerts, SSO, audit logs, dedicated capacity) is sold as a subscription. Routing itself is never behind a paywall.
When consumers pay for inference served by peer operators on our network, operators are credited 90% of the service value in kT credits and we retain 10% as the platform share. kT credits are redeemable for inference on the network — they do not convert to cash. The Service operates in a credits-only mode. See the "What InferLane is not" box below.
kT credits are service units redeemable for inference on the network. They are not a financial product. They have no investment character and no claim on InferLane revenue or assets. Credits do not convert to cash. The Service operates in a credits-only mode. If a cash pathway is introduced in the future, operators will need to separately opt in under new terms; existing credit balances will not be converted.
We explicitly cap the influence of rebate arrangements on routing decisions at 5% of the composite score. A provider with a bigger rebate can never beat a provider with better quality, lower cost, or lower latency. The rebate is only a tiebreaker when two candidates are within 0.5% of each other on the composite score.
This is enforced in code at src/lib/proxy/router-commercial.ts. You can audit it.
Prepaid balances are held by our licensed payment processor (Stripe) under their regulated payment-services arrangements. InferLane does not hold customer funds directly. We do not operate as a bank, money transmitter, money services business, securities broker, exchange, or qualified custodian. We record a service-credit balance that corresponds to the processor-held prepayment; we do not take custody of your money.
The double-entry ledger that tracks your balance is auditable and reconciled nightly. Any discrepancy freezes the money layer until it's resolved.
We publish the providers we have disclosed rebate arrangements with. Specific percentages are negotiated privately and fall into ranges disclosed here.
No disclosed rebate arrangements are active yet — we're pre-launch. This list will populate as we sign partnership rate agreements and as customers reach volumes that qualify for their own disclosed discounts.
Different workloads have different privacy requirements. We route accordingly and are upfront about what each tier actually provides.
Workloads route to providers with hardware-backed Trusted Execution Environments (Azure Confidential Computing, AWS Nitro Enclaves). Attestation is cryptographically verified. Use this for PII, financial data, and compliance-sensitive workloads (HIPAA, SOC 2).
Workloads route to major cloud providers (Anthropic, OpenAI, Google). Privacy is backed by their terms of service and data processing agreements, not hardware attestation. Suitable for business data that isn't regulated.
Workloads may route to community or decentralized nodes. Privacy relies on OS-level protections (SIP, hardened runtime) — not hardware enclaves. There is no way to cryptographically verify that a consumer Mac is running untampered code today. This tier is appropriate for public data, non-sensitive classification, and image generation — not for PII or confidential business data.
Our routing engine selects the appropriate privacy tier automatically based on your configured policy. You can override per-request via the privacyTier parameter in the dispatch API, or set a default policy in your dashboard settings.
The full list of third-party services InferLane uses to process customer data is at inferlane.dev/legal/subprocessors. We give 14-day notice of changes to customers on enterprise contracts.
Every request through InferLane adds routing overhead (auth, model selection, provider lookup, cost logging). Here are real measurements from April 2026 — a minimal Haiku request, from Sydney to us-east-1 (Vercel + Anthropic):
~750ms
Direct to Anthropic
~1.5s
Through InferLane proxy
Overhead is ~500–800ms, mostly Vercel serverless cold starts and the routing DB lookup. For a typical 5–30 second inference call, this is 2–10% added latency. The MCP tools (pick_model, session_cost) run locally with zero network overhead. We plan to move the routing decision to Vercel Edge Functions to bring overhead under 50ms.