๐Ÿ”ง Tutorials Beginner

How to Use the Grok 4.3 API: 1M Token Tutorial

Step-by-step guide to xAI's Grok 4.3 API with 1M token context, tool calling, and long-context patterns for developers building agentic apps.

The AI Dude ยท May 7, 2026 ยท 8 min read

Grok 4.3 Just Became the API to Beat on Agentic Tasks

On May 5, 2026, xAI released Grok 4.3 to its API โ€” and it arrived with two headline numbers: a 1-million-token context window and the #1 ranking on Artificial Analysis's agentic coding leaderboard (per the xAI announcement post, which pulled over 50 million views on X). It also topped ValsAI's domain-specific benchmarks for law and finance.

If you're building apps that need long-context reasoning, multi-step tool calling, or autonomous code generation, Grok 4.3 is now a serious contender. This guide walks through getting API access, making your first call, using tool calling, and designing for that 1M context window.

Get Your API Key

xAI's API uses a straightforward key-based auth system. Here's the setup:

  • Go to console.x.ai and sign up or log in
  • Navigate to API Keys in the dashboard
  • Click Create API Key, give it a name, and copy it immediately โ€” xAI won't show it again
  • Store the key in an environment variable: export XAI_API_KEY="xai-your-key-here"

xAI's API is OpenAI-compatible. The base URL is https://api.x.ai/v1, and it accepts the same request format as OpenAI's Chat Completions endpoint. If you're already using the OpenAI SDK, you can point it at xAI's base URL and swap in your key โ€” no code rewrite needed.

Your First Grok 4.3 API Call

Here's a minimal Python example using the OpenAI SDK (which works with any OpenAI-compatible API):

from openai import OpenAI

client = OpenAI(
    api_key="your-xai-api-key",
    base_url="https://api.x.ai/v1",
)

response = client.chat.completions.create(
    model="grok-4.3",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Explain how a B-tree index works in PostgreSQL."}
    ]
)

print(response.choices[0].message.content)

And the equivalent using curl:

curl https://api.x.ai/v1/chat/completions \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-4.3",
    "messages": [
      {"role": "system", "content": "You are a helpful coding assistant."},
      {"role": "user", "content": "Explain how a B-tree index works in PostgreSQL."}
    ]
  }'

The response format mirrors OpenAI's: you get back a choices array with message.content containing the model's output. If you're using TypeScript, Python, Go, or any language with an OpenAI client library, it works out of the box by changing the base URL and API key.

Tool Calling: Where Grok 4.3 Shines

Grok 4.3's #1 placement on Artificial Analysis's agentic leaderboard (per the May 5, 2026 announcement) is specifically about tool-calling performance โ€” the model's ability to decide which tools to invoke, chain them correctly, and recover from errors across multi-step workflows.

Here's how tool calling works with the xAI API. You define functions in the tools parameter, and the model decides when and how to call them:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_stock_price",
            "description": "Get the current stock price for a ticker symbol",
            "parameters": {
                "type": "object",
                "properties": {
                    "ticker": {
                        "type": "string",
                        "description": "Stock ticker symbol, e.g. AAPL"
                    }
                },
                "required": ["ticker"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="grok-4.3",
    messages=[
        {"role": "user", "content": "What's the current price of AAPL?"}
    ],
    tools=tools,
    tool_choice="auto"
)

When the model decides to call a tool, the response comes back with finish_reason: "tool_calls" and a tool_calls array. You execute the function locally, then send the result back:

message = response.choices[0].message

if message.tool_calls:
    for tool_call in message.tool_calls:
        if tool_call.function.name == "get_stock_price":
            args = json.loads(tool_call.function.arguments)
            result = get_stock_price(args["ticker"])

            messages.append(message)
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(result)
            })

    final = client.chat.completions.create(
        model="grok-4.3",
        messages=messages,
        tools=tools
    )
    print(final.choices[0].message.content)

The pattern that matters for agentic apps: loop this. Keep sending responses back until finish_reason is "stop" rather than "tool_calls". Grok 4.3 can chain multiple tool calls across turns without you writing explicit orchestration logic.

Server-Side Tools: Web Search and X Search

xAI also offers server-side tools that the API executes for you โ€” no local function implementation required:

response = client.chat.completions.create(
    model="grok-4.3",
    messages=[
        {"role": "user", "content": "What AI models launched this week?"}
    ],
    tools=[
        {"type": "web_search"},
        {"type": "x_search"}
    ]
)

The x_search tool is Grok's unique edge โ€” it searches X/Twitter posts in real time, surfacing breaking news and product launches hours before traditional web indexes pick them up. Both tools are available through the realtime API endpoint.

One caveat: as of early May 2026, xAI's Batch API does not auto-execute server-side tools like web_search and x_search. The model emits the tool calls but the batch infrastructure doesn't run them. If you need server-side tools, use the realtime endpoint, not batch.

Working With 1 Million Tokens of Context

A 1M token context window is roughly 750,000 words โ€” enough to fit an entire mid-size codebase, a full legal contract library, or months of conversation history in a single API call. But "can fit" and "should fit" are different questions.

Codebase Analysis

The most immediate use case: feed an entire repository into a single prompt. With 1M tokens, you can include hundreds of source files without chunking or retrieval augmentation.

import os

def collect_source_files(repo_path, extensions=('.py', '.js', '.ts')):
    files = []
    for root, dirs, filenames in os.walk(repo_path):
        dirs[:] = [d for d in dirs if d not in ('node_modules', '.git', '__pycache__')]
        for f in filenames:
            if any(f.endswith(ext) for ext in extensions):
                filepath = os.path.join(root, f)
                with open(filepath, 'r', errors='ignore') as fh:
                    content = fh.read()
                files.append(f"--- {os.path.relpath(filepath, repo_path)} ---\n{content}")
    return "\n\n".join(files)

codebase = collect_source_files("./my-project")

response = client.chat.completions.create(
    model="grok-4.3",
    messages=[
        {"role": "system", "content": "You are a senior software architect analyzing a full codebase."},
        {"role": "user", "content": f"{codebase}\n\nFind all SQL injection vulnerabilities and suggest fixes."}
    ]
)

Practical Considerations for Long Context

  • Cost scales with input tokens. Sending 500K tokens per request adds up fast. Check xAI's current pricing at docs.x.ai โ€” rates for Grok 4.3 may differ from earlier Grok models.
  • Latency increases with context length. A 1M token prompt takes meaningfully longer than a 10K token prompt. For interactive use cases, consider whether a focused subset would suffice.
  • Attention quality at scale is unverified. No third-party needle-in-a-haystack benchmarks have been published for Grok 4.3 yet. Validate with your own data before committing to a full-context architecture.

Context Window Comparison (May 2026)

ModelContext WindowProvider
Grok 4.31M tokensxAI
Gemini 2.5 Pro1M tokensGoogle DeepMind
Claude Opus 4 / Sonnet 4200K tokensAnthropic
GPT-5.5128K tokensOpenAI

Grok 4.3 joins Gemini 2.5 Pro in the 1M-token tier โ€” a significant gap over Claude and GPT-5.5 for use cases that genuinely need massive context. Whether that extra capacity translates to better answers depends on the model's attention quality at scale, which is harder to benchmark than raw window size.

Building an Agentic Loop

The real payoff of Grok 4.3's tool-calling strength is building autonomous agents. Here's a minimal agent pattern that loops until the model finishes:

def run_agent(user_query, tools, available_functions, max_turns=10):
    messages = [
        {"role": "system", "content": "You are a helpful agent. Use tools to answer accurately."},
        {"role": "user", "content": user_query}
    ]

    for turn in range(max_turns):
        response = client.chat.completions.create(
            model="grok-4.3",
            messages=messages,
            tools=tools,
            tool_choice="auto"
        )

        msg = response.choices[0].message
        messages.append(msg)

        if response.choices[0].finish_reason == "stop":
            return msg.content

        if msg.tool_calls:
            for tc in msg.tool_calls:
                fn = available_functions[tc.function.name]
                result = fn(**json.loads(tc.function.arguments))
                messages.append({
                    "role": "tool",
                    "tool_call_id": tc.id,
                    "content": json.dumps(result)
                })

    return messages[-1].content

Production agents need error handling around tool execution, token budget tracking (runaway context growth gets expensive with 1M tokens available), and guardrails against infinite loops. But the core pattern is straightforward: call the model, execute requested tools, feed results back, repeat.

Pricing and Access

Grok 4.3 is available through both consumer products and the developer API:

  • SuperGrok ($30/month) โ€” consumer chat access to Grok 4.3
  • SuperGrok Heavy ($300/month) โ€” higher rate limits and multi-agent parallel processing
  • API access โ€” pay-per-token through console.x.ai

Per-token API pricing for Grok 4.3 hasn't been confirmed on xAI's public pricing page as of this writing โ€” they typically update within days of a model launch. Previous Grok models were priced below OpenAI equivalents, but a 1M-context model may carry different economics. The model is also available through OpenRouter for developers who prefer unified billing across providers.

What's Still Unknown

  • Independent long-context benchmarks. No third-party needle-in-a-haystack results for Grok 4.3 exist yet.
  • SWE-bench verified score. The Artificial Analysis placement is real, but SWE-bench verified โ€” the gold standard for coding evaluation โ€” hasn't been published.
  • Rate limits for 1M-token requests. Whether different limits apply to maximum-context calls isn't documented.
  • Batch API compatibility. Server-side tools don't work in batch mode. Whether standard (non-tool) Grok 4.3 calls work through the batch endpoint hasn't been widely confirmed.

My Read: When to Reach for Grok 4.3

I think Grok 4.3 fits three scenarios well:

Agentic coding pipelines. If you're building autonomous agents that write, test, and modify code across multi-step workflows, Grok 4.3 currently has the strongest public benchmark results in this category (per Artificial Analysis, May 2026).

Full-codebase or full-document analysis. The 1M window means you can skip RAG for many document analysis tasks. If your corpus fits in ~750K words, a single API call replaces a retrieval pipeline.

Real-time information needs. The server-side x_search and web_search tools give Grok native access to current information including live X/Twitter posts โ€” a structural advantage for apps reasoning about breaking news or social sentiment.

For general chat, creative writing, or latency-sensitive customer-facing apps, compare Grok 4.3's pricing and response times against Gemini 2.5 Pro (also 1M tokens), Claude, and GPT-5.5 before committing. The best model is the one that fits your latency and budget requirements, not just the one with the highest benchmark number.

Grok 4.3 APIxAI API tutorial1M context windowtool callingagentic coding

Keep reading