Skip to main content
The Google-native protocol. Speaks the Gemini generateContent REST contract verbatim, so the google-generativeai Python SDK and the @google/generative-ai JS SDK work unchanged — flip the base URL and your apiKey. Cross-vendor models (Claude, GPT, DeepSeek, Doubao) are translated to Gemini shape under the hood.

When to use

Pick this endpoint when you want:
  • Drop-in for google-generativeai / Gemini SDK clients
  • Gemini-only features like responseSchema with the Gemini-flavoured JSON schema, or safetySettings
  • Code paths already structured around contents[].parts[] (different from OpenAI / Anthropic messages)
For the same model via the Anthropic shape, use /v1/messages; for OpenAI shape, /chat/completions.

Request

curl https://llm.bytespike.ai/v1beta/models/gemini-3-1-pro:generateContent \
  -H "x-goog-api-key: $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "contents": [
      {"role": "user", "parts": [{"text": "Summarize the first chapter of Moby Dick."}]}
    ],
    "generationConfig": {"maxOutputTokens": 1024}
  }'
The {model} in the path is the model slug (e.g. gemini-3-1-pro). For streaming, use the sibling :streamGenerateContent path — see Streaming below.

Headers

HeaderRequiredNotes
x-goog-api-keyyesYour ByteSpike key. (Authorization: Bearer … also accepted.)
content-typeyesapplication/json.

Body

FieldTypeRequiredNotes
contentsarrayyesConversation turns. Each item has role (user / model) + parts[].
systemInstructionobjectno{"parts":[{"text": "…"}]} — Gemini’s system prompt slot.
generationConfig.maxOutputTokensintegernoHard cap on response length.
generationConfig.temperaturenumberno0.0–2.0.
generationConfig.topPnumbernoNucleus sampling.
generationConfig.topKintegernoTop-K sampling.
generationConfig.stopSequencesstring[]noCustom stop tokens.
generationConfig.responseMimeTypestringnoapplication/json to force JSON.
generationConfig.responseSchemaobjectnoGemini-shape JSON schema constraint.
toolsarraynoFunction declarations — see Tool calling.
toolConfigobjectno{"functionCallingConfig": {"mode": "ANY"|"AUTO"|"NONE"}}.
safetySettingsarraynoPer-category thresholds (HARM_CATEGORY_HARASSMENT, etc.). Forwarded to the Gemini model.

Response

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {"text": "Ishmael, the narrator, signs onto a whaling ship..."}
        ]
      },
      "finishReason": "STOP",
      "index": 0,
      "safetyRatings": [/* … */]
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 23,
    "candidatesTokenCount": 87,
    "totalTokenCount": 110
  }
}

Response fields

FieldTypeNotes
candidates[].content.parts[]arrayText and/or functionCall parts.
candidates[].finishReasonstringSTOP, MAX_TOKENS, SAFETY, RECITATION.
candidates[].safetyRatingsarrayGemini’s content-safety scores.
usageMetadata.promptTokenCountintegerTokens billed for input.
usageMetadata.candidatesTokenCountintegerTokens billed for output.

Accounting headers

Same envelope as every other endpoint:
X-RateLimit-Limit: 50.00
X-RateLimit-Remaining: 42.18
X-RateLimit-Reset: 1716705600
X-Quota-Remaining-Credits: 192.40
Full breakdown in the API Reference overview.

Streaming

Use the sibling path :streamGenerateContent:
curl 'https://llm.bytespike.ai/v1beta/models/gemini-3-1-pro:streamGenerateContent?alt=sse' \
  -H "x-goog-api-key: $BYTESPIKE_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "contents": [{"role": "user", "parts": [{"text": "Tell me a joke."}]}]
  }'
Response is SSE (alt=sse) with each frame carrying a partial candidates payload:
data: {"candidates":[{"content":{"role":"model","parts":[{"text":"Why"}]}}]}

data: {"candidates":[{"content":{"role":"model","parts":[{"text":" did"}]}}]}



data: {"candidates":[{"finishReason":"STOP"}],"usageMetadata":{"promptTokenCount":7,"candidatesTokenCount":21}}
(Without alt=sse, the response is a JSON array of frames — the SDK default.)

Tool calling

Gemini’s tools[].functionDeclarations[] shape. Round-trip via two requests:

Round 1 — tool offered

{
  "contents": [
    {"role": "user", "parts": [{"text": "What's the weather in Tokyo?"}]}
  ],
  "tools": [
    {
      "functionDeclarations": [
        {
          "name": "get_weather",
          "description": "Get current weather for a city.",
          "parameters": {
            "type": "object",
            "properties": {
              "city": {"type": "string"}
            },
            "required": ["city"]
          }
        }
      ]
    }
  ]
}
Response includes a functionCall part:
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "functionCall": {
              "name": "get_weather",
              "args": {"city": "Tokyo"}
            }
          }
        ]
      },
      "finishReason": "STOP"
    }
  ]
}
Note args is a structured object here, not a JSON string (unlike OpenAI’s tool_calls.arguments).

Round 2 — tool result returned

{
  "contents": [
    {"role": "user", "parts": [{"text": "What's the weather in Tokyo?"}]},
    {
      "role": "model",
      "parts": [{"functionCall": {"name": "get_weather", "args": {"city": "Tokyo"}}}]
    },
    {
      "role": "user",
      "parts": [
        {
          "functionResponse": {
            "name": "get_weather",
            "response": {"temperature": "18°C", "conditions": "partly cloudy"}
          }
        }
      ]
    }
  ],
  "tools": [ /* same schema */ ]
}

Multimodal (image / audio parts)

Inline a base64-encoded image as a part on the user message:
{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "inlineData": {
            "mimeType": "image/jpeg",
            "data": "/9j/4AAQSkZJRg..."
          }
        },
        {"text": "What's in this image?"}
      ]
    }
  ]
}
For files stored externally, use fileData.fileUri (the file must be publicly reachable over HTTPS). Audio (audio/wav, audio/mp3) and video (video/mp4) parts work the same way on multimodal-capable models.

Cross-model routing

The {model} path segment accepts any ByteSpike catalog model — the gateway translates the Gemini shape to each model’s native protocol:
POST /v1beta/models/gemini-3-1-pro:generateContent
POST /v1beta/models/claude-sonnet-4-6:generateContent
POST /v1beta/models/gpt-5-4:generateContent
POST /v1beta/models/deepseek-v4-pro:generateContent
Caveats:
  • responseSchema constraint requires the model to support structured outputs; otherwise returns 400 unsupported_feature.
  • safetySettings are only honored on Gemini models — silently ignored on Claude / GPT / DeepSeek (those have their own safety stacks).
  • usageMetadata field names are Gemini-shape regardless of the model.
Full catalog: GET /v1/models. Pricing per model: bytespike.ai/pricing.

Errors

All non-2xx responses are free. Body shape matches Gemini’s error envelope:
{
  "error": {
    "code": 429,
    "message": "You exceeded your current requests-per-minute budget.",
    "status": "RESOURCE_EXHAUSTED"
  }
}
Statuserror.statusTrigger
400INVALID_ARGUMENTBody validation failed. Message identifies the field.
400INVALID_ARGUMENT (msg unsupported_model)Model slug not in scope.
400INVALID_ARGUMENT (msg unsupported_feature)E.g. responseSchema on a non-structured-output model.
401UNAUTHENTICATEDMissing / revoked key.
402FAILED_PRECONDITION (msg insufficient_credits)Wallet exhausted.
403PERMISSION_DENIEDScope denied, IP allowlist, model gated.
404NOT_FOUNDPath typo or unknown model id.
429RESOURCE_EXHAUSTEDTier rate-limit. Backoff per x-ratelimit-reset-*.
5xxINTERNAL / UNAVAILABLEUpstream provider issue. Free + automatic retry envelope.

SDK example

google-generativeai Python SDK, base URL pointed at ByteSpike:
import google.generativeai as genai

genai.configure(
    api_key=os.environ["BYTESPIKE_API_KEY"],
    client_options={"api_endpoint": "api.bytespike.ai"},
    transport="rest",
)

model = genai.GenerativeModel("gemini-3-1-pro")
r = model.generate_content("Hello!")
print(r.text)