PicoLM Server — Ultra-Lightweight LLM Inference

Features

Everything you need,
nothing you don't.

PicoLM Server gives you a production-ready inference API without the bloat. Drop-in replace OpenAI with one config line.

🔌

OpenAI-Compatible API

Works with any existing OpenAI SDK. Switch from cloud to local inference by changing one URL. Zero code changes required in your apps.

⚡

Real-Time Streaming

Token-by-token streaming via Server-Sent Events (SSE). Get instant feedback with the same streaming UX users expect from premium services.

🛠️

Tool Calling

Full function calling support for building AI agents. Define tools in your requests and let the model decide when and how to use them.

🪶

Ultra-Lightweight

An ~8MB binary with ~45MB RAM footprint. Designed to run anywhere — from Raspberry Pi to production servers — without breaking a sweat.

🌍

Cross-Platform

Runs natively on Linux, macOS, Windows, and ARM devices. Single binary deployment with Docker support for containerized workloads.

🔒

Local & Private

Your data never leaves your machine. Run fully offline with any GGUF model. Optional API key auth when you need access control.

Integration

One line
to switch.

Change your base URL. That's it. PicoLM Server speaks fluent OpenAI so your existing code just works.

🐍 Python

🟨 Node.js

🐹 Go

🐚 cURL

python

node.js

curl

          
          from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="your-key"
)

response = client.chat.completions.create(
    model="picolm-local",
    messages=[{
        "role": "user",
        "content": "Hello!"
    }]
)

print(response.choices[0].message.content)
        

          
          import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:8080/v1',
  apiKey: 'your-key'
});

const response = await client.chat.completions.create({
  model: 'picolm-local',
  messages: [{
    role: 'user',
    content: 'Hello!'
  }]
});

console.log(response.choices[0].message.content);
        

          
          package main

import (
    "context"
    "fmt"
    openai "github.com/sashabaranov/go-openai"
)

func main() {
    cfg := openai.DefaultConfig("your-key")
    cfg.BaseURL = "http://localhost:8080/v1"
    client := openai.NewClientWithConfig(cfg)

    resp, _ := client.CreateChatCompletion(
        context.Background(),
        openai.ChatCompletionRequest{
            Model: "picolm-local",
            Messages: []openai.ChatCompletionMessage{
                {Role: openai.ChatMessageRoleUser,
                 Content: "Hello!"},
            },
        },
    )
    fmt.Println(resp.Choices[0].Message.Content)
}
        

          
          curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-key" \
  -d '{
    "model": "picolm-local",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "stream": true,
    "temperature": 0.7
  }'
        

Quick Start

Up in under
a minute.

Three steps. One config file. You're running local inference.

Clone & Build

Grab the source and compile the server binary with Go.

git clone https://github.com/wmik/picolm-server.git && go build ./cmd/server/

Configure

Copy the example config and point it at your PicoLM binary and GGUF model file.

cp config.example.yaml config.yaml # then edit paths

Run

Launch the server. Your OpenAI-compatible endpoint is live at localhost:8080.

./picolm-server -config config.yaml

🐳

Or with Docker

Prefer containers? One command and you're done.

docker-compose up -d

Local LLM. Zero Friction.

Everything you need,
nothing you don't.

One line
to switch.

Runs any ChatML model.

Up in under
a minute.

Clone & Build

Configure

Run

Or with Docker

Local LLM. Zero Friction.

Everything you need,nothing you don't.

One lineto switch.

Runs any ChatML model.

Up in undera minute.

Clone & Build

Configure

Run

Or with Docker

Everything you need,
nothing you don't.

One line
to switch.

Up in under
a minute.