OpenCode

How to OpenCode to a Remote AI LLM Ollama Server

How to OpenCode to a Remote AI LLM Ollama Server

I, Stan Switaj, wrote another article for you, here it is.

Today I'd like to show you how to connect your development computer to a remote server that has all your AI LLMs running Ollama on it.

Though perhaps not as blazing fast if the local AL LLMs are actually on the local development computer because we are accessing the LLMs remotely, though if we use a fast AI llm to begin with (even though we are connecting to it over the Internet), we should be ok.

Why connect to a remote computer with the AI LLMs on it, easier to manage the AI LLMs, the server has a lot of memory to run them, adequate cpu/gpu for running, etc.

Let's say your development computer is [IP 1].
And the remote Server of Ollama AI LLMs is [IP 2].

First let me say that you simply don't want an Ollama server on the Internet arbitrarily. What I do is either ufw it to only allow [IP1] and the same can be done in iptables, safety first.

On the Remote Server of AI LLMs

It's safe To only allow your development computer

ufw
sudo ufw allow from [IP1] to any port 11434 proto tcp

or

iptables
sudo iptables -I INPUT -s [IP1] -p tcp --dport 11434 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 11434 -j DROP

And while we're at it, it's also safe for another authentication factor to get ssh to also only allow that development ip of [IP1]

ssh configure

put this at the bottom of your sudo nano /etc/ssh/sshd_config

AllowUsers [username_of_your_connecting_computer]@[IP1]

Next, on the local development computer.
I like this cli statement a bit that I am going to show you right now, as it says that it will use the port of the Ollama servers from the remote server and provide them locally on the development computer, thus it is taking the remote server port and interpreting as it is locally running, very nice.

Get the Remote Server port of the Ollama AI LLMs Server To Operate on the Local Development Computer

Presuming that my local development computer is running on 192.168.1.10 (note the -L option)
ssh -p 22 -L 192.168.1.10:11434:localhost:11434 -i /my/[public_key] username@[IP 2]

And as a nice thing, the -i uses a public key that ssh on the remote server is ready for also, as another authentication factor.

This will establish the connection. Keep this window open for as long as you use OpenCode.

OpenCode (and to operate the gemma4:e4b using it)

  1. install it.

curl -fsSL https://opencode.ai/install | bash

Now open a new terminal window on your local linux development computer.

configure the OpenCode json file (to inform it of the AI LLMs and their url location, note that with the above ssh statement, the AI LLMs are as if they are on the local development computer now

gedit ~/.config/opencode/opencode.json

Put the following into the .json file. (notice it is running on port 11435, not 11434 because it is using the proxy server, read about that below)

Plain Text
{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "ollama": {
      "options": {
        "baseURL": "http://localhost:11435/v1"
      },
      "models": {
        "gemma4_e2b": {
          "name": "Gemma 4 E2B",
          "id": "gemma4_e2b"
        },
        "gemma4_e4b": {
          "name": "Gemma 4 E4B",
          "id": "gemma4_e4b"
        },
        "qwen2_5_coder_7b": {
          "name": "Qwen 2.5 Coder",
          "id": "qwen2.5-coder_7b"
        },
        "deepseek_coder_latest": {
          "name": "DeepSeek Coder",
          "id": "deepseek-coder_latest"
        }
      },
      "defaultModel": "gemma4_e4b"
    }
  },
  "model": "gemma4_e4b"
}

Now if I do a
ollama list
on my server, it says the following.

Plain Text
gemma4:e4b                                                                           
       c6eb396dbd59    9.6 GB    3 weeks ago

Here's the thing. The problem is that OpenCode wants it's AI LLMs to be in the format with a forward slash of a two name convention of

NameOfAILLM/name_of_llm

So you could cp all your LLM models that only use one name without the forward slash (two name convention). (I'm not doing that).

I don't want to rename any AI LLMs from their default names that I download from the Ollama Models.

However, OpenCode needs its two name convention with the forward slash.

Solution, use a Simple Proxy Server

The proxy server will run the AI LLMs on the local computer with the port 11435.

The proxy server maps what ollama says the name of the ai llm is to what opencode likes.

So we simply have to make the proxy server mimic that OpenCode communication protocol, easy right?

I created this the other day, here you are, you are welcome.

proxy_server.py in Python3, of course.

Python 3
#!/usr/bin/python3
import json
import requests
from http.server import BaseHTTPRequestHandler, HTTPServer
from datetime import datetime

import socket

OLLAMA_URL = "http://localhost:11434/v1/chat/completions"
PORT = 11435


def log(msg):
    print(f"[{datetime.now().isoformat()}] {msg}")


def clean_model(model: str) -> str:
    if model.startswith("ollama/"):
        model = model.replace("ollama/", "")
    return model.replace("_", ":")


class Proxy(BaseHTTPRequestHandler):

    def do_GET(self):
        if self.path == "/v1/models":
            self.send_response(200)
            self.send_header("Content-Type", "application/json")
            self.end_headers()
            self.wfile.write(json.dumps({
                "object": "list",
                "data": [{"id": "gemma4_e2b", "object": "model"}]
            }).encode())
        else:
            self.send_response(404)
            self.end_headers()

    def do_HEAD(self):
        self.send_response(200)
        self.end_headers()

    # ----------------------------
    # STREAMING RESPONSE (IMPORTANT)
    # ----------------------------
    def do_POST(self):
    
    
        try:
            
            self.connection.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
        
        except Exception:
        
            pass
            
            
        try:
            length = int(self.headers.get("Content-Length", 0))
            body = self.rfile.read(length)
            req = json.loads(body.decode())

            model = clean_model(req.get("model", ""))
            messages = req.get("messages", [])

            log(f"MODEL: {model}")

            payload = {
                "model": model,
                "messages": messages,
                "stream": True   # 🔥 CRITICAL FOR OPENCODE
            }

            ollama_resp = requests.post(
                OLLAMA_URL,
                json=payload,
                stream=True
            )

            self.send_response(200)
            self.send_header("Content-Type", "text/event-stream")
            self.send_header("Cache-Control", "no-cache")
            self.end_headers()

            full_text = ""

            for line in ollama_resp.iter_lines():
                if not line:
                    continue

                decoded = line.decode("utf-8")

                # Ollama SSE format: "data: {...}"
                if decoded.startswith("data:"):
                    data = decoded.replace("data: ", "")

                    if data.strip() == "[DONE]":
                        break

                    try:
                        obj = json.loads(data)
                        delta = obj["choices"][0].get("delta", {})
                        content = delta.get("content", "")

                        if content:
                            full_text += content

                            # 🔥 SEND STREAM CHUNK TO OPENCODE
                            chunk = {
                                "id": "chatcmpl-stream",
                                "object": "chat.completion.chunk",
                                "choices": [
                                    {
                                        "index": 0,
                                        "delta": {
                                            "content": content
                                        },
                                        "finish_reason": None
                                    }
                                ]
                            }

                            self.wfile.write(
                                f"data: {json.dumps(chunk)}\n\n".encode()
                            )
                            self.wfile.flush()

                    except Exception:
                        continue

            # FINAL STOP SIGNAL
            self.wfile.write(b"data: [DONE]\n\n")
            self.wfile.flush()

        except Exception as e:
            log(f"ERROR: {e}")
            self.send_response(500)
            self.end_headers()


if __name__ == "__main__":
    server = HTTPServer(("0.0.0.0", PORT), Proxy)
    log(f"Streaming proxy running on http://0.0.0.0:{PORT}")
    server.serve_forever()
    

To use this proxy server.

run this first
run the proxy server after that

export OLLAMA_HOST=http://localhost:11434
python3 ollama_proxy.py

Description:
The proxy server will provide the AI LLMs from the remote server to the local development computer via port 11435.
And do some mapping of the LLM files, etc. read the source if you want to.

Run OpenCode

After compensating for what OpenCode wants (is what proxy_server.py does), its two name format.

In a folder of a source code project of your liking, run the following.

opencode --model=ollama/gemma4_e4b

Now, OpenCode should run, and where it says Build it should say, Gemma 4 E4B ollama

Enjoy.

Comment if you like articles like this. I'd like to know of those out there, fellow coders. 🙂

might as well put this here, for my archiving purposes, and anyone who wants to try it, or improve it a bit.
an early version of the proxy_server.py to make it use bash tools (that are run from the proxy_server.py)

Python 3
#!/usr/bin/python3

# Ollama Proxy Server

import os
import json
import subprocess
import requests
import traceback
from http.server import BaseHTTPRequestHandler, HTTPServer
from datetime import datetime

OLLAMA_URL = "http://localhost:11434/api/chat"
PORT = 11435
LOG_FILE = "ollama_proxy.log"

LAST_COMMAND = ""
LAST_OUTPUT = ""
LAST_WORKING_DIR = ""
COMMAND_HISTORY = []

with open(LOG_FILE, "w", encoding="utf-8") as f:
    f.write("")


def log(msg):
    timestamp = datetime.now().isoformat()

    with open(LOG_FILE, "a", encoding="utf-8") as f:
        f.write(f"[{timestamp}] {msg}\n")


def get_opencode_dir():

    try:
        pid_bytes = subprocess.check_output([
            "pgrep",
            "-f",
            "opencode"
        ])

        pid = pid_bytes.decode().strip().split('\n')[0]

        if pid:
            return os.readlink(f"/proc/{pid}/cwd")

    except Exception as e:
        log(f"[BOOTSTRAP ERROR] {e}")

    return os.getcwd()


def normalize_tool_name(name: str) -> str:

    if not name:
        return ""

    normalized = name.strip().lower()

    aliases = {
        "execute": "bash",
        "run": "bash",
        "shell": "bash",
        "terminal": "bash",
        "command": "bash",
        "cmd": "bash",
        "sh": "bash",
        "exec": "bash"
    }

    return aliases.get(normalized, normalized)


def summarize_output(output, max_chars=2500):

    if len(output) <= max_chars:
        return output

    return output[:max_chars] + "\n\n[OUTPUT TRUNCATED]"


def extract_json_block(text):

    start_markers = [
        "```json\n",
        "```bash\n",
        "```json",
        "```bash"
    ]

    end_marker = "```"

    start_idx = -1
    marker_used = ""

    for marker in start_markers:

        start_idx = text.find(marker)

        if start_idx != -1:
            marker_used = marker
            break

    if start_idx == -1:
        return text

    json_start = start_idx + len(marker_used)

    end_idx = text.find(end_marker, json_start)

    if end_idx == -1:
        return text

    return text[json_start:end_idx].strip()


TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "bash",
            "description": "Execute bash commands",
            "parameters": {
                "type": "object",
                "properties": {
                    "command": {
                        "type": "string"
                    },
                    "working_dir": {
                        "type": "string"
                    }
                },
                "required": ["command"]
            }
        }
    }
]


SYSTEM_PROMPT = """
You are a precise NinjaBee automation assistant.

TOOLS AVAILABLE:
1. bash

Always use the available tools (e.g., bash) to determine answers when possible, and explicitly state the commands used for verification.

You maintain awareness of:
- previous commands
- previous outputs
- current working directory

If asked what command was executed,
answer conversationally and clearly.
"""


def check_tool_exists(command):

    try:

        executable = command.split()[0]

        result = subprocess.run(
            ["which", executable],
            capture_output=True,
            text=True
        )

        return result.returncode == 0

    except Exception:
        return False


def run_bash(command, working_dir=None):

    global LAST_COMMAND
    global LAST_OUTPUT
    global LAST_WORKING_DIR
    global COMMAND_HISTORY

    if not command.strip():
        return "Error: Empty command"

    if not check_tool_exists(command):

        executable = command.split()[0]

        return (
            f"Error: Tool '{executable}' "
            "not found."
        )

    env = os.environ.copy()
    env["LC_ALL"] = "C"

    if not working_dir:
        working_dir = get_opencode_dir()

    log(f"[BASH EXEC] {command}")

    try:

        p = subprocess.run(
            command,
            shell=True,
            capture_output=True,
            text=True,
            timeout=30,
            cwd=working_dir,
            env=env
        )

        output = p.stdout + p.stderr

        summarized = summarize_output(output)

        LAST_COMMAND = command
        LAST_OUTPUT = summarized
        LAST_WORKING_DIR = working_dir

        COMMAND_HISTORY.append({
            "command": command,
            "working_dir": working_dir,
            "output": summarized,
            "timestamp": datetime.now().isoformat()
        })

        COMMAND_HISTORY = COMMAND_HISTORY[-25:]

        return summarized

    except Exception as e:

        log(f"[BASH ERROR] {e}")

        return str(e)


def clean_model(model):
    return model.replace("ollama/", "").replace("_", ":")


class Proxy(BaseHTTPRequestHandler):

    def do_GET(self):

        if self.path == "/v1/models":

            self.send_response(200)
            self.send_header(
                "Content-Type",
                "application/json"
            )

            self.end_headers()

            self.wfile.write(json.dumps({
                "object": "list",
                "data": [
                    {
                        "id": "qwen2.5-coder:14b",
                        "object": "model"
                    }
                ]
            }).encode())

    def do_POST(self):

        try:

            length = int(
                self.headers.get(
                    "Content-Length",
                    0
                )
            )

            body = self.rfile.read(length)

            req = json.loads(body.decode())

            model = clean_model(
                req.get(
                    "model",
                    "qwen2.5-coder:14b"
                )
            )

            messages = req.get("messages", [])

            messages.insert(0, {
                "role": "system",
                "content": SYSTEM_PROMPT
            })

            final_content = ""

            while True:

                payload = {
                    "model": model,
                    "messages": messages,
                    "tools": TOOLS,
                    "stream": False
                }

                res = requests.post(
                    OLLAMA_URL,
                    json=payload
                )

                data = res.json()

                msg = data.get("message", {})

                tool_calls = msg.get(
                    "tool_calls",
                    []
                )

                content = msg.get(
                    "content",
                    ""
                ).strip()


                log('CONTENT:' + content)

                extracted = extract_json_block(content)

                if (
                    not tool_calls and
                    extracted.startswith("{")
                ):

                    try:

                        parsed = json.loads(extracted)

                        parsed["name"] = (
                            normalize_tool_name(
                                parsed.get("name", "")
                            )
                        )

                        tool_calls = [{
                            "function": parsed
                        }]

                    except Exception:
                        pass

                messages.append(msg)

                if not tool_calls:

                    final_content = content
                    break

                for call in tool_calls:

                    raw_name = call["function"]["name"]

                    name = normalize_tool_name(
                        raw_name
                    )

                    args = call["function"].get(
                        "arguments",
                        {}
                    )

                    if isinstance(args, str):

                        try:
                            args = json.loads(args)

                        except Exception:
                            args = {
                                "command": args
                            }

                    if name == "bash":

                        cmd = args.get(
                            "command",
                            ""
                        )

                        wd = args.get(
                            "working_dir",
                            get_opencode_dir()
                        )

                        result = run_bash(
                            cmd,
                            wd
                        )

                        messages.append({
                            "role": "tool",
                            "content": result,
                            "tool_call_id": call.get(
                                "id",
                                "temp_id"
                            )
                        })

                        # Key improvement:
                        # conversational memory reinjection
                        messages.append({
                            "role": "system",
                            "content": (
                                "Previous command executed:\n\n"
                                f"Command: {cmd}\n"
                                f"Directory: {wd}\n\n"
                                "Output:\n"
                                f"{result}\n\n"
                                "If the user asks what command "
                                "was used, answer naturally."
                            )
                        })

                    else:

                        messages.append({
                            "role": "tool",
                            "content": (
                                f"Unknown tool: {raw_name}"
                            ),
                            "tool_call_id": call.get(
                                "id",
                                "temp_id"
                            )
                        })

            self.send_response(200)

            self.send_header(
                "Content-Type",
                "text/event-stream"
            )

            self.end_headers()

            if not final_content:
                final_content = "Done."

            words = final_content.split(" ")

            for i, word in enumerate(words):

                chunk_text = word + (
                    " " if i < len(words) - 1 else ""
                )

                chunk = {
                    "choices": [
                        {
                            "delta": {
                                "content": chunk_text
                            },
                            "index": 0,
                            "finish_reason": None
                        }
                    ]
                }

                self.wfile.write(
                    f"data: {json.dumps(chunk)}\n\n".encode()
                )

                self.wfile.flush()

            self.wfile.write(
                b"data: [DONE]\n\n"
            )

            self.wfile.flush()

        except Exception:

            error_message = traceback.format_exc()

            log(
                f"CRITICAL ERROR:\n{error_message}"
            )

            try:
                self.send_response(500)
                self.end_headers()

            except Exception:
                pass

        finally:
            log("Transaction closed.")


if __name__ == "__main__":

    print(
        f"Server started on "
        f"http://localhost:{PORT}"
    )

    server = HTTPServer(
        ("0.0.0.0", PORT),
        Proxy
    )

    server.serve_forever()

gedit ~/.config/opencode/opencode.json

Plain Text

...        


        "qwen2.5-coder:14b": {
          "name": "qwen2.5-coder:14b",
          "id": "qwen2.5-coder:14b"
        },    

...        
        

and i run like so

in the folder of a project

opencode --agent --model=ollama/qwen2.5-coder:14b

0 Comments

No comments yet. Be the first!

Leave a Comment