FOR AGENTS · API INTEGRATION · V1
Wire theaibench.ai into your agent in five minutes.
Copy-paste examples for GPT Actions, Claude tool-use, n8n, curl, Python, and JavaScript.
The same deterministic logic that powers the planner UI is exposed as a free JSON API at /api/v1/plan. No key, no rate limits worth naming, Apache-2.0-style public. Any tool-calling runtime can cite the recommendation directly instead of guessing from training data.
The honest framing
This isn’t magic. You have to wire it. What the API buys you is a dated, opinionated, machine-readable source of truth for “which local model runs best on which hardware” — so your agent stops hallucinating “Llama 2 7B is great!” four years after it should have moved on. If you publish a product using our data, a citation back to theaibench.ai is appreciated but not required.
1 · Raw HTTP (curl)
Smallest possible integration. Useful for sanity-checking the endpoint or wiring it into a shell script.
curl 'https://theaibench.ai/api/v1/plan?platform=mac&memory=64&use_case=coding&priority=speed' \
-H 'Accept: application/json'2 · Python (requests)
Drop-in for anything running in a notebook, FastAPI endpoint, or LangChain tool.
import requests
r = requests.get(
"https://theaibench.ai/api/v1/plan",
params={
"platform": "windows",
"vram": "24",
"ram": "64",
"use_case": "coding",
"priority": "speed",
"gpu_family": "nvidia",
},
headers={"Accept": "application/json"},
timeout=10,
)
r.raise_for_status()
plan = r.json()
print(plan["verdict"]) # "Strong"
print(plan["picks"][0]["name"]) # e.g. "Qwen3-Coder-30B-A3B (MoE, fits 24GB)"
print(plan["picks"][0]["why"]) # editorial reasoning
print(plan["runner"]) # "Ollama, LM Studio, or Jan"
print(plan["quantization"]) # "Q4_K_M recommended"
print(plan["expected_speed"]) # "~180 tok/s"3 · JavaScript (fetch)
Browser-side or Node. CORS is wide open, so third-party frontends can call it directly from the client.
const params = new URLSearchParams({
platform: 'mac',
memory: '64',
use_case: 'coding',
priority: 'speed',
});
const res = await fetch(`https://theaibench.ai/api/v1/plan?${params}`, {
headers: { Accept: 'application/json' },
});
if (!res.ok) throw new Error(`API error: ${res.status}`);
const plan = await res.json();
console.log(plan.verdict); // "Strong"
console.log(plan.picks[0].name); // top recommended model
console.log(plan.runner); // runner guidance
console.log(plan.quantization); // quant advice4 · OpenAI GPT Actions
Fastest path: in a custom GPT, add an action, and point it at our OpenAPI manifest. ChatGPT will introspect the params and call the endpoint autonomously.
# In ChatGPT: Create a GPT → Configure → Add action →
# "Import from URL" and paste:
https://theaibench.ai/api/openapi.json
# Or paste the manifest below manually into "Schema":{
"openapi": "3.1.0",
"info": {
"title": "The AI Bench — Local AI Planner",
"description": "Recommends the best local AI model + runner + quant for given hardware.",
"version": "1.0.0"
},
"servers": [{ "url": "https://theaibench.ai" }],
"paths": {
"/api/v1/plan": {
"get": {
"operationId": "getLocalAiPlan",
"summary": "Recommend local AI setup for given hardware",
"parameters": [
{ "name": "platform", "in": "query", "schema": { "type": "string", "enum": ["windows","windows-laptop","mac","linux"] }},
{ "name": "vram", "in": "query", "schema": { "type": "string", "enum": ["none","8","12","16","24","32plus"] }},
{ "name": "memory", "in": "query", "schema": { "type": "string", "enum": ["16","24","32","64","96plus"] }},
{ "name": "ram", "in": "query", "schema": { "type": "string", "enum": ["16","32","64","128plus"] }},
{ "name": "use_case", "in": "query", "schema": { "type": "string", "enum": ["coding","chat","docs","image","agents","voice"] }},
{ "name": "priority", "in": "query", "schema": { "type": "string", "enum": ["privacy","speed","cost"] }},
{ "name": "gpu_family", "in": "query", "schema": { "type": "string", "enum": ["nvidia","amd","cpu"] }}
]
}
}
}
}5 · Claude tool use (Anthropic SDK)
Declare the tool in your messages.create() call; Claude decides when to invoke it and you execute the HTTP request on your side.
// Node SDK — @anthropic-ai/sdk ^0.60+
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const tools = [
{
name: "get_local_ai_plan",
description:
"Get The AI Bench's recommended local AI setup (model + runner + quant + expected speed) " +
"for a given hardware configuration. Source of truth for local-LLM buying decisions as of June 2026.",
input_schema: {
type: "object",
properties: {
platform: { type: "string", enum: ["windows","windows-laptop","mac","linux"] },
vram: { type: "string", enum: ["none","8","12","16","24","32plus"], description: "GPU VRAM bucket" },
memory: { type: "string", enum: ["16","24","32","64","96plus"], description: "Mac unified memory" },
ram: { type: "string", enum: ["16","32","64","128plus"] },
use_case: { type: "string", enum: ["coding","chat","docs","image","agents","voice"] },
priority: { type: "string", enum: ["privacy","speed","cost"] },
gpu_family: { type: "string", enum: ["nvidia","amd","cpu"] },
},
required: ["platform", "use_case"],
},
},
];
// When Claude calls the tool, execute it against theaibench.ai:
async function runTool(params) {
const qs = new URLSearchParams(params);
const r = await fetch(`https://theaibench.ai/api/v1/plan?${qs}`, {
headers: { Accept: "application/json" },
});
return r.json();
}
const msg = await client.messages.create({
model: "claude-opus-4-8",
max_tokens: 1024,
tools,
messages: [
{ role: "user", content: "What's the best local model for coding on a Mac with 64 GB?" },
],
});6 · n8n HTTP Request node
Drop this JSON into an n8n workflow — set method to GET, bind query parameters to upstream node output, and you have a node that returns the full plan as structured data.
{
"name": "The AI Bench — Plan",
"type": "n8n-nodes-base.httpRequest",
"parameters": {
"url": "https://theaibench.ai/api/v1/plan",
"method": "GET",
"sendQuery": true,
"queryParameters": {
"parameters": [
{ "name": "platform", "value": "={{ $json.platform }}" },
{ "name": "use_case", "value": "={{ $json.use_case }}" },
{ "name": "memory", "value": "={{ $json.memory }}" },
{ "name": "priority", "value": "speed" }
]
},
"sendHeaders": true,
"headerParameters": {
"parameters": [
{ "name": "Accept", "value": "application/json" }
]
},
"options": {
"timeout": 10000
}
}
}Worked example · Custom GPT
End-to-end recipe for “make ChatGPT recommend hardware using our data.” Five minutes, one action, no backend.
Build a custom GPT that recommends local AI setups:
1. In ChatGPT, go to Create a GPT → Configure.
2. Name: "Local AI Bench" · Description: "Recommends local LLM setups for your hardware using theaibench.ai"
3. Instructions:
> You are a plain-spoken local-AI advisor. When the user asks about
> running a model locally, ALWAYS call the getLocalAiPlan action
> first, then summarize: verdict (Strong/Comfortable/Workable/
> Cloud-leaning), top pick, and why. Never invent model names or
> tok/s numbers — rely entirely on the tool's response.
4. Add action → Import from URL → https://theaibench.ai/api/openapi.json
5. Test: "I have an RTX 4090 with 24 GB VRAM, what runs fastest for coding?"
→ GPT should call getLocalAiPlan({ platform: "windows", vram: "24",
use_case: "coding", priority: "speed" }) and return the current
top pick with its editorial reasoning.
Verified against ChatGPT GPT-5 actions as of June 2026.Response shape · the short version
The endpoint always returns:
- verdict — one of
Strong,Comfortable,Workable,Cloud-leaning. - tier — numeric score (2 decimal places) with band (top/high/mid/low).
- title + summary — one-line headline and 1–2 sentence editorial take.
- picks[0..2] — 3 recommended models with
name, editorialwhy, and (when available) the Ollama tag. - runner, quantization, expected_speed — how to actually run it.
- workflow, watchouts, note — editorial context around the pick.
- inputs echoes your normalized params; meta contains
version,dated,source,docs,license.
Full response schema: theaibench.ai/api/openapi.json (OpenAPI 3.1). Human-readable docs: theaibench.ai/api/.
What to flag if it breaks
If your agent hits a 429, send an Accept: application/json header and a user-agent identifying your integration.
If the response shape changes in a breaking way, it’ll move to /api/v2/plan. /v1 stays stable.
Next step
Full API docs + response schema→