Skip to content

Context Close to Limit

Most bugs in long sessions with an AI agent start precisely when the context window is filled to 90%+. The model begins to “lose” system prompt instructions and tools. The signal is usage.input_tokens relative to the model limit:

LIMITS = {
"claude-sonnet-4-5": 200_000,
"claude-opus-4-5": 200_000,
"gpt-4.1": 1_000_000,
"gpt-4o": 128_000,
}
def observe(model, used_tokens):
limit = LIMITS.get(model, 200_000)
ratio = used_tokens / limit
flag = f"/tmp/ctx-{model}.flag"
if ratio >= 0.85 and not os.path.exists(flag):
push(f"🧠 Контекст {int(ratio*100)}% у {model}",
f"Использовано {used_tokens:,} из {limit:,} токенов.\n"
"Сделайте summary или начните новый thread.",
priority=7)
open(flag, "w").close()
elif ratio < 0.5 and os.path.exists(flag):
os.unlink(flag) # reset flag after summary

It’s convenient to call observe() in a common wrapper over the SDK — this control then works automatically for any long dialogue.