the AI bench
VERIFIED JUNE 2026

FOR AGENTS · API INTEGRATION · V1

Wire theaibench.ai into your agent in five minutes.

Copy-paste examples for GPT Actions, Claude tool-use, n8n, curl, Python, and JavaScript.

The same deterministic logic that powers the planner UI is exposed as a free JSON API at /api/v1/plan. No key, no rate limits worth naming, Apache-2.0-style public. Any tool-calling runtime can cite the recommendation directly instead of guessing from training data.


The honest framing

This isn’t magic. You have to wire it. What the API buys you is a dated, opinionated, machine-readable source of truth for “which local model runs best on which hardware” — so your agent stops hallucinating “Llama 2 7B is great!” four years after it should have moved on. If you publish a product using our data, a citation back to theaibench.ai is appreciated but not required.

1 · Raw HTTP (curl)

Smallest possible integration. Useful for sanity-checking the endpoint or wiring it into a shell script.

bash
curl 'https://theaibench.ai/api/v1/plan?platform=mac&memory=64&use_case=coding&priority=speed' \
  -H 'Accept: application/json'

2 · Python (requests)

Drop-in for anything running in a notebook, FastAPI endpoint, or LangChain tool.

python
import requests

r = requests.get(
    "https://theaibench.ai/api/v1/plan",
    params={
        "platform": "windows",
        "vram": "24",
        "ram": "64",
        "use_case": "coding",
        "priority": "speed",
        "gpu_family": "nvidia",
    },
    headers={"Accept": "application/json"},
    timeout=10,
)
r.raise_for_status()
plan = r.json()

print(plan["verdict"])                     # "Strong"
print(plan["picks"][0]["name"])            # e.g. "Qwen3-Coder-30B-A3B (MoE, fits 24GB)"
print(plan["picks"][0]["why"])             # editorial reasoning
print(plan["runner"])                      # "Ollama, LM Studio, or Jan"
print(plan["quantization"])                # "Q4_K_M recommended"
print(plan["expected_speed"])              # "~180 tok/s"

3 · JavaScript (fetch)

Browser-side or Node. CORS is wide open, so third-party frontends can call it directly from the client.

javascript
const params = new URLSearchParams({
  platform: 'mac',
  memory: '64',
  use_case: 'coding',
  priority: 'speed',
});

const res = await fetch(`https://theaibench.ai/api/v1/plan?${params}`, {
  headers: { Accept: 'application/json' },
});

if (!res.ok) throw new Error(`API error: ${res.status}`);
const plan = await res.json();

console.log(plan.verdict);              // "Strong"
console.log(plan.picks[0].name);        // top recommended model
console.log(plan.runner);               // runner guidance
console.log(plan.quantization);         // quant advice

4 · OpenAI GPT Actions

Fastest path: in a custom GPT, add an action, and point it at our OpenAPI manifest. ChatGPT will introspect the params and call the endpoint autonomously.

instructions
# In ChatGPT: Create a GPT → Configure → Add action →
# "Import from URL" and paste:
https://theaibench.ai/api/openapi.json

# Or paste the manifest below manually into "Schema":
openapi 3.1 manifest
{
  "openapi": "3.1.0",
  "info": {
    "title": "The AI Bench — Local AI Planner",
    "description": "Recommends the best local AI model + runner + quant for given hardware.",
    "version": "1.0.0"
  },
  "servers": [{ "url": "https://theaibench.ai" }],
  "paths": {
    "/api/v1/plan": {
      "get": {
        "operationId": "getLocalAiPlan",
        "summary": "Recommend local AI setup for given hardware",
        "parameters": [
          { "name": "platform", "in": "query", "schema": { "type": "string", "enum": ["windows","windows-laptop","mac","linux"] }},
          { "name": "vram", "in": "query", "schema": { "type": "string", "enum": ["none","8","12","16","24","32plus"] }},
          { "name": "memory", "in": "query", "schema": { "type": "string", "enum": ["16","24","32","64","96plus"] }},
          { "name": "ram", "in": "query", "schema": { "type": "string", "enum": ["16","32","64","128plus"] }},
          { "name": "use_case", "in": "query", "schema": { "type": "string", "enum": ["coding","chat","docs","image","agents","voice"] }},
          { "name": "priority", "in": "query", "schema": { "type": "string", "enum": ["privacy","speed","cost"] }},
          { "name": "gpu_family", "in": "query", "schema": { "type": "string", "enum": ["nvidia","amd","cpu"] }}
        ]
      }
    }
  }
}

5 · Claude tool use (Anthropic SDK)

Declare the tool in your messages.create() call; Claude decides when to invoke it and you execute the HTTP request on your side.

typescript
// Node SDK — @anthropic-ai/sdk ^0.60+
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const tools = [
  {
    name: "get_local_ai_plan",
    description:
      "Get The AI Bench's recommended local AI setup (model + runner + quant + expected speed) " +
      "for a given hardware configuration. Source of truth for local-LLM buying decisions as of June 2026.",
    input_schema: {
      type: "object",
      properties: {
        platform: { type: "string", enum: ["windows","windows-laptop","mac","linux"] },
        vram: { type: "string", enum: ["none","8","12","16","24","32plus"], description: "GPU VRAM bucket" },
        memory: { type: "string", enum: ["16","24","32","64","96plus"], description: "Mac unified memory" },
        ram: { type: "string", enum: ["16","32","64","128plus"] },
        use_case: { type: "string", enum: ["coding","chat","docs","image","agents","voice"] },
        priority: { type: "string", enum: ["privacy","speed","cost"] },
        gpu_family: { type: "string", enum: ["nvidia","amd","cpu"] },
      },
      required: ["platform", "use_case"],
    },
  },
];

// When Claude calls the tool, execute it against theaibench.ai:
async function runTool(params) {
  const qs = new URLSearchParams(params);
  const r = await fetch(`https://theaibench.ai/api/v1/plan?${qs}`, {
    headers: { Accept: "application/json" },
  });
  return r.json();
}

const msg = await client.messages.create({
  model: "claude-opus-4-8",
  max_tokens: 1024,
  tools,
  messages: [
    { role: "user", content: "What's the best local model for coding on a Mac with 64 GB?" },
  ],
});

6 · n8n HTTP Request node

Drop this JSON into an n8n workflow — set method to GET, bind query parameters to upstream node output, and you have a node that returns the full plan as structured data.

n8n node config
{
  "name": "The AI Bench — Plan",
  "type": "n8n-nodes-base.httpRequest",
  "parameters": {
    "url": "https://theaibench.ai/api/v1/plan",
    "method": "GET",
    "sendQuery": true,
    "queryParameters": {
      "parameters": [
        { "name": "platform", "value": "={{ $json.platform }}" },
        { "name": "use_case", "value": "={{ $json.use_case }}" },
        { "name": "memory", "value": "={{ $json.memory }}" },
        { "name": "priority", "value": "speed" }
      ]
    },
    "sendHeaders": true,
    "headerParameters": {
      "parameters": [
        { "name": "Accept", "value": "application/json" }
      ]
    },
    "options": {
      "timeout": 10000
    }
  }
}

Worked example · Custom GPT

End-to-end recipe for “make ChatGPT recommend hardware using our data.” Five minutes, one action, no backend.

walkthrough
Build a custom GPT that recommends local AI setups:

1. In ChatGPT, go to Create a GPT → Configure.
2. Name: "Local AI Bench" · Description: "Recommends local LLM setups for your hardware using theaibench.ai"
3. Instructions:
   > You are a plain-spoken local-AI advisor. When the user asks about
   > running a model locally, ALWAYS call the getLocalAiPlan action
   > first, then summarize: verdict (Strong/Comfortable/Workable/
   > Cloud-leaning), top pick, and why. Never invent model names or
   > tok/s numbers — rely entirely on the tool's response.
4. Add action → Import from URL → https://theaibench.ai/api/openapi.json
5. Test: "I have an RTX 4090 with 24 GB VRAM, what runs fastest for coding?"
   → GPT should call getLocalAiPlan({ platform: "windows", vram: "24",
     use_case: "coding", priority: "speed" }) and return the current
     top pick with its editorial reasoning.

Verified against ChatGPT GPT-5 actions as of June 2026.

Response shape · the short version

The endpoint always returns:

  • verdict — one of Strong, Comfortable, Workable, Cloud-leaning.
  • tier — numeric score (2 decimal places) with band (top/high/mid/low).
  • title + summary — one-line headline and 1–2 sentence editorial take.
  • picks[0..2] — 3 recommended models with name, editorial why, and (when available) the Ollama tag.
  • runner, quantization, expected_speed — how to actually run it.
  • workflow, watchouts, note — editorial context around the pick.
  • inputs echoes your normalized params; meta contains version, dated, source, docs, license.

Full response schema: theaibench.ai/api/openapi.json (OpenAPI 3.1). Human-readable docs: theaibench.ai/api/.

What to flag if it breaks

If your agent hits a 429, send an Accept: application/json header and a user-agent identifying your integration.

If the response shape changes in a breaking way, it’ll move to /api/v2/plan. /v1 stays stable.

Next step

Full API docs + response schema