Operating steps
- Estimate eager catalog tokens per session.
- Multiply by monthly sessions, retries, and model price assumptions.
- Compare latency between full schema loading and progressive loading.
- Set savings targets that CI can enforce.
Common risks
- Cost dashboards can undercount retries caused by ambiguous routing.
- Latency savings depend on client transport and cache behavior.
- Context savings should not hide security or permission metadata.