OpenAI-Compatible · MIT License · Go 1.21+

Local LLM. Zero Friction.

An ultra-lightweight HTTP server that brings PicoLM inference to your stack with full OpenAI API compatibility — no cloud required.

⚡ Get Started ⭐ View on GitHub
~8MB
Binary Size
~45MB
RAM Usage
<1s
Startup Time
100%
OpenAI Compat
MIT
License
Features

Everything you need,
nothing you don't.

PicoLM Server gives you a production-ready inference API without the bloat. Drop-in replace OpenAI with one config line.

🔌
OpenAI-Compatible API
Works with any existing OpenAI SDK. Switch from cloud to local inference by changing one URL. Zero code changes required in your apps.
Real-Time Streaming
Token-by-token streaming via Server-Sent Events (SSE). Get instant feedback with the same streaming UX users expect from premium services.
🛠️
Tool Calling
Full function calling support for building AI agents. Define tools in your requests and let the model decide when and how to use them.
🪶
Ultra-Lightweight
An ~8MB binary with ~45MB RAM footprint. Designed to run anywhere — from Raspberry Pi to production servers — without breaking a sweat.
🌍
Cross-Platform
Runs natively on Linux, macOS, Windows, and ARM devices. Single binary deployment with Docker support for containerized workloads.
🔒
Local & Private
Your data never leaves your machine. Run fully offline with any GGUF model. Optional API key auth when you need access control.

One line
to switch.

Change your base URL. That's it. PicoLM Server speaks fluent OpenAI so your existing code just works.

🐍 Python
🟨 Node.js
🐹 Go
🐚 cURL
python
node.js
go
curl
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="your-key"
)

response = client.chat.completions.create(
    model="picolm-local",
    messages=[{
        "role": "user",
        "content": "Hello!"
    }]
)

print(response.choices[0].message.content)
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:8080/v1',
  apiKey: 'your-key'
});

const response = await client.chat.completions.create({
  model: 'picolm-local',
  messages: [{
    role: 'user',
    content: 'Hello!'
  }]
});

console.log(response.choices[0].message.content);
package main

import (
    "context"
    "fmt"
    openai "github.com/sashabaranov/go-openai"
)

func main() {
    cfg := openai.DefaultConfig("your-key")
    cfg.BaseURL = "http://localhost:8080/v1"
    client := openai.NewClientWithConfig(cfg)

    resp, _ := client.CreateChatCompletion(
        context.Background(),
        openai.ChatCompletionRequest{
            Model: "picolm-local",
            Messages: []openai.ChatCompletionMessage{
                {Role: openai.ChatMessageRoleUser,
                 Content: "Hello!"},
            },
        },
    )
    fmt.Println(resp.Choices[0].Message.Content)
}
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-key" \
  -d '{
    "model": "picolm-local",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "stream": true,
    "temperature": 0.7
  }'

Runs any ChatML model.

Load any GGUF-formatted model that supports ChatML format. From tiny to large — if PicoLM can run it, PicoLM Server can serve it.

🦙
TinyLlama
φ
Phi-2
Qwen
📦
GGUF Models
💬
ChatML Format
More Soon

Up in under
a minute.

Three steps. One config file. You're running local inference.

01

Clone & Build

Grab the source and compile the server binary with Go.

git clone https://github.com/wmik/picolm-server.git && go build ./cmd/server/
02

Configure

Copy the example config and point it at your PicoLM binary and GGUF model file.

cp config.example.yaml config.yaml # then edit paths
03

Run

Launch the server. Your OpenAI-compatible endpoint is live at localhost:8080.

./picolm-server -config config.yaml
🐳

Or with Docker

Prefer containers? One command and you're done.

docker-compose up -d