I’m a bit confused about how OpenAI’s API rate limits work - specifically the TPM (tokens per minute) limit.

If I have, for example, 2 million TPM, is that limit calculated based on:

  • only the input tokens I send in my request,

  • or both input + output tokens combined?

I’ve seen different explanations online, so I’d love to hear from people who have tested this or know for sure. Thanks!

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.