How to OpenCode to a Remote AI LLM Ollama Server
I, Stan Switaj, wrote another article for you, here it is.
Today I'd like to show you how to connect your development computer to a remote server that has all your AI LLMs running Ollama on it.
Though perhaps not as blazing fast if the local AL LLMs are actually on the local development computer because we are accessing the LLMs remotely, though if we use a fast AI llm to begin with (even though we are connecting to it over the Internet), we should be ok.
Why connect to a remote computer with the AI LLMs on it, easier to manage the AI LLMs, the server has a lot of memory to run them, adequate cpu/gpu for running, etc.
Let's say your development computer is [IP 1].
And the remote Server of Ollama AI LLMs is [IP 2].
First let me say that you simply don't want an Ollama server on the Internet arbitrarily. What I do is either ufw it to only allow [IP1] and the same can be done in iptables, safety first.
On the Remote Server of AI LLMs
It's safe To only allow your development computer
ufw
sudo ufw allow from [IP1] to any port 11434 proto tcp
or
iptables
sudo iptables -I INPUT -s [IP1] -p tcp --dport 11434 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 11434 -j DROP
And while we're at it, it's also safe for another authentication factor to get ssh to also only allow that development ip of [IP1]
ssh configure
put this at the bottom of your sudo nano /etc/ssh/sshd_config
AllowUsers [username_of_your_connecting_computer]@[IP1]
Next, on the local development computer.
I like this cli statement a bit that I am going to show you right now, as it says that it will use the port of the Ollama servers from the remote server and provide them locally on the development computer, thus it is taking the remote server port and interpreting as it is locally running, very nice.
Get the Remote Server port of the Ollama AI LLMs Server To Operate on the Local Development Computer
Presuming that my local development computer is running on 192.168.1.10 (note the -L option)
ssh -p 22 -L 192.168.1.10:11434:localhost:11434 -i /my/[public_key] username@[IP 2]
And as a nice thing, the -i uses a public key that ssh on the remote server is ready for also, as another authentication factor.
This will establish the connection. Keep this window open for as long as you use OpenCode.
OpenCode (and to operate the gemma4:e4b using it)
- install it.
curl -fsSL https://opencode.ai/install | bash
Now open a new terminal window on your local linux development computer.
configure the OpenCode json file (to inform it of the AI LLMs and their url location, note that with the above ssh statement, the AI LLMs are as if they are on the local development computer now
gedit ~/.config/opencode/opencode.json
Put the following into the .json file. (notice it is running on port 11435, not 11434 because it is using the proxy server, read about that below)
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"ollama": {
"options": {
"baseURL": "http://localhost:11435/v1"
},
"models": {
"gemma4_e2b": {
"name": "Gemma 4 E2B",
"id": "gemma4_e2b"
},
"gemma4_e4b": {
"name": "Gemma 4 E4B",
"id": "gemma4_e4b"
},
"qwen2_5_coder_7b": {
"name": "Qwen 2.5 Coder",
"id": "qwen2.5-coder_7b"
},
"deepseek_coder_latest": {
"name": "DeepSeek Coder",
"id": "deepseek-coder_latest"
}
},
"defaultModel": "gemma4_e4b"
}
},
"model": "gemma4_e4b"
}Now if I do a
ollama list
on my server, it says the following.
gemma4:e4b
c6eb396dbd59 9.6 GB 3 weeks agoHere's the thing. The problem is that OpenCode wants it's AI LLMs to be in the format with a forward slash of a two name convention of
NameOfAILLM/name_of_llm
So you could cp all your LLM models that only use one name without the forward slash (two name convention). (I'm not doing that).
I don't want to rename any AI LLMs from their default names that I download from the Ollama Models.
However, OpenCode needs its two name convention with the forward slash.
Solution, use a Simple Proxy Server
The proxy server will run the AI LLMs on the local computer with the port 11435.
The proxy server maps what ollama says the name of the ai llm is to what opencode likes.
So we simply have to make the proxy server mimic that OpenCode communication protocol, easy right?
I created this the other day, here you are, you are welcome.
proxy_server.py in Python3, of course.
#!/usr/bin/python3
import json
import requests
from http.server import BaseHTTPRequestHandler, HTTPServer
from datetime import datetime
import socket
OLLAMA_URL = "http://localhost:11434/v1/chat/completions"
PORT = 11435
def log(msg):
print(f"[{datetime.now().isoformat()}] {msg}")
def clean_model(model: str) -> str:
if model.startswith("ollama/"):
model = model.replace("ollama/", "")
return model.replace("_", ":")
class Proxy(BaseHTTPRequestHandler):
def do_GET(self):
if self.path == "/v1/models":
self.send_response(200)
self.send_header("Content-Type", "application/json")
self.end_headers()
self.wfile.write(json.dumps({
"object": "list",
"data": [{"id": "gemma4_e2b", "object": "model"}]
}).encode())
else:
self.send_response(404)
self.end_headers()
def do_HEAD(self):
self.send_response(200)
self.end_headers()
# ----------------------------
# STREAMING RESPONSE (IMPORTANT)
# ----------------------------
def do_POST(self):
try:
self.connection.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
except Exception:
pass
try:
length = int(self.headers.get("Content-Length", 0))
body = self.rfile.read(length)
req = json.loads(body.decode())
model = clean_model(req.get("model", ""))
messages = req.get("messages", [])
log(f"MODEL: {model}")
payload = {
"model": model,
"messages": messages,
"stream": True # 🔥 CRITICAL FOR OPENCODE
}
ollama_resp = requests.post(
OLLAMA_URL,
json=payload,
stream=True
)
self.send_response(200)
self.send_header("Content-Type", "text/event-stream")
self.send_header("Cache-Control", "no-cache")
self.end_headers()
full_text = ""
for line in ollama_resp.iter_lines():
if not line:
continue
decoded = line.decode("utf-8")
# Ollama SSE format: "data: {...}"
if decoded.startswith("data:"):
data = decoded.replace("data: ", "")
if data.strip() == "[DONE]":
break
try:
obj = json.loads(data)
delta = obj["choices"][0].get("delta", {})
content = delta.get("content", "")
if content:
full_text += content
# 🔥 SEND STREAM CHUNK TO OPENCODE
chunk = {
"id": "chatcmpl-stream",
"object": "chat.completion.chunk",
"choices": [
{
"index": 0,
"delta": {
"content": content
},
"finish_reason": None
}
]
}
self.wfile.write(
f"data: {json.dumps(chunk)}\n\n".encode()
)
self.wfile.flush()
except Exception:
continue
# FINAL STOP SIGNAL
self.wfile.write(b"data: [DONE]\n\n")
self.wfile.flush()
except Exception as e:
log(f"ERROR: {e}")
self.send_response(500)
self.end_headers()
if __name__ == "__main__":
server = HTTPServer(("0.0.0.0", PORT), Proxy)
log(f"Streaming proxy running on http://0.0.0.0:{PORT}")
server.serve_forever()
To use this proxy server.
run this first
run the proxy server after that
export OLLAMA_HOST=http://localhost:11434
python3 ollama_proxy.py
Description:
The proxy server will provide the AI LLMs from the remote server to the local development computer via port 11435.
And do some mapping of the LLM files, etc. read the source if you want to.
Run OpenCode
After compensating for what OpenCode wants (is what proxy_server.py does), its two name format.
In a folder of a source code project of your liking, run the following.
opencode --model=ollama/gemma4_e4b
Now, OpenCode should run, and where it says Build it should say, Gemma 4 E4B ollama
Enjoy.
Comment if you like articles like this. I'd like to know of those out there, fellow coders. 🙂
might as well put this here, for my archiving purposes, and anyone who wants to try it, or improve it a bit.
an early version of the proxy_server.py to make it use bash tools (that are run from the proxy_server.py)
#!/usr/bin/python3
# Ollama Proxy Server
import os
import json
import subprocess
import requests
import traceback
from http.server import BaseHTTPRequestHandler, HTTPServer
from datetime import datetime
OLLAMA_URL = "http://localhost:11434/api/chat"
PORT = 11435
LOG_FILE = "ollama_proxy.log"
LAST_COMMAND = ""
LAST_OUTPUT = ""
LAST_WORKING_DIR = ""
COMMAND_HISTORY = []
with open(LOG_FILE, "w", encoding="utf-8") as f:
f.write("")
def log(msg):
timestamp = datetime.now().isoformat()
with open(LOG_FILE, "a", encoding="utf-8") as f:
f.write(f"[{timestamp}] {msg}\n")
def get_opencode_dir():
try:
pid_bytes = subprocess.check_output([
"pgrep",
"-f",
"opencode"
])
pid = pid_bytes.decode().strip().split('\n')[0]
if pid:
return os.readlink(f"/proc/{pid}/cwd")
except Exception as e:
log(f"[BOOTSTRAP ERROR] {e}")
return os.getcwd()
def normalize_tool_name(name: str) -> str:
if not name:
return ""
normalized = name.strip().lower()
aliases = {
"execute": "bash",
"run": "bash",
"shell": "bash",
"terminal": "bash",
"command": "bash",
"cmd": "bash",
"sh": "bash",
"exec": "bash"
}
return aliases.get(normalized, normalized)
def summarize_output(output, max_chars=2500):
if len(output) <= max_chars:
return output
return output[:max_chars] + "\n\n[OUTPUT TRUNCATED]"
def extract_json_block(text):
start_markers = [
"```json\n",
"```bash\n",
"```json",
"```bash"
]
end_marker = "```"
start_idx = -1
marker_used = ""
for marker in start_markers:
start_idx = text.find(marker)
if start_idx != -1:
marker_used = marker
break
if start_idx == -1:
return text
json_start = start_idx + len(marker_used)
end_idx = text.find(end_marker, json_start)
if end_idx == -1:
return text
return text[json_start:end_idx].strip()
TOOLS = [
{
"type": "function",
"function": {
"name": "bash",
"description": "Execute bash commands",
"parameters": {
"type": "object",
"properties": {
"command": {
"type": "string"
},
"working_dir": {
"type": "string"
}
},
"required": ["command"]
}
}
}
]
SYSTEM_PROMPT = """
You are a precise NinjaBee automation assistant.
TOOLS AVAILABLE:
1. bash
Always use the available tools (e.g., bash) to determine answers when possible, and explicitly state the commands used for verification.
You maintain awareness of:
- previous commands
- previous outputs
- current working directory
If asked what command was executed,
answer conversationally and clearly.
"""
def check_tool_exists(command):
try:
executable = command.split()[0]
result = subprocess.run(
["which", executable],
capture_output=True,
text=True
)
return result.returncode == 0
except Exception:
return False
def run_bash(command, working_dir=None):
global LAST_COMMAND
global LAST_OUTPUT
global LAST_WORKING_DIR
global COMMAND_HISTORY
if not command.strip():
return "Error: Empty command"
if not check_tool_exists(command):
executable = command.split()[0]
return (
f"Error: Tool '{executable}' "
"not found."
)
env = os.environ.copy()
env["LC_ALL"] = "C"
if not working_dir:
working_dir = get_opencode_dir()
log(f"[BASH EXEC] {command}")
try:
p = subprocess.run(
command,
shell=True,
capture_output=True,
text=True,
timeout=30,
cwd=working_dir,
env=env
)
output = p.stdout + p.stderr
summarized = summarize_output(output)
LAST_COMMAND = command
LAST_OUTPUT = summarized
LAST_WORKING_DIR = working_dir
COMMAND_HISTORY.append({
"command": command,
"working_dir": working_dir,
"output": summarized,
"timestamp": datetime.now().isoformat()
})
COMMAND_HISTORY = COMMAND_HISTORY[-25:]
return summarized
except Exception as e:
log(f"[BASH ERROR] {e}")
return str(e)
def clean_model(model):
return model.replace("ollama/", "").replace("_", ":")
class Proxy(BaseHTTPRequestHandler):
def do_GET(self):
if self.path == "/v1/models":
self.send_response(200)
self.send_header(
"Content-Type",
"application/json"
)
self.end_headers()
self.wfile.write(json.dumps({
"object": "list",
"data": [
{
"id": "qwen2.5-coder:14b",
"object": "model"
}
]
}).encode())
def do_POST(self):
try:
length = int(
self.headers.get(
"Content-Length",
0
)
)
body = self.rfile.read(length)
req = json.loads(body.decode())
model = clean_model(
req.get(
"model",
"qwen2.5-coder:14b"
)
)
messages = req.get("messages", [])
messages.insert(0, {
"role": "system",
"content": SYSTEM_PROMPT
})
final_content = ""
while True:
payload = {
"model": model,
"messages": messages,
"tools": TOOLS,
"stream": False
}
res = requests.post(
OLLAMA_URL,
json=payload
)
data = res.json()
msg = data.get("message", {})
tool_calls = msg.get(
"tool_calls",
[]
)
content = msg.get(
"content",
""
).strip()
log('CONTENT:' + content)
extracted = extract_json_block(content)
if (
not tool_calls and
extracted.startswith("{")
):
try:
parsed = json.loads(extracted)
parsed["name"] = (
normalize_tool_name(
parsed.get("name", "")
)
)
tool_calls = [{
"function": parsed
}]
except Exception:
pass
messages.append(msg)
if not tool_calls:
final_content = content
break
for call in tool_calls:
raw_name = call["function"]["name"]
name = normalize_tool_name(
raw_name
)
args = call["function"].get(
"arguments",
{}
)
if isinstance(args, str):
try:
args = json.loads(args)
except Exception:
args = {
"command": args
}
if name == "bash":
cmd = args.get(
"command",
""
)
wd = args.get(
"working_dir",
get_opencode_dir()
)
result = run_bash(
cmd,
wd
)
messages.append({
"role": "tool",
"content": result,
"tool_call_id": call.get(
"id",
"temp_id"
)
})
# Key improvement:
# conversational memory reinjection
messages.append({
"role": "system",
"content": (
"Previous command executed:\n\n"
f"Command: {cmd}\n"
f"Directory: {wd}\n\n"
"Output:\n"
f"{result}\n\n"
"If the user asks what command "
"was used, answer naturally."
)
})
else:
messages.append({
"role": "tool",
"content": (
f"Unknown tool: {raw_name}"
),
"tool_call_id": call.get(
"id",
"temp_id"
)
})
self.send_response(200)
self.send_header(
"Content-Type",
"text/event-stream"
)
self.end_headers()
if not final_content:
final_content = "Done."
words = final_content.split(" ")
for i, word in enumerate(words):
chunk_text = word + (
" " if i < len(words) - 1 else ""
)
chunk = {
"choices": [
{
"delta": {
"content": chunk_text
},
"index": 0,
"finish_reason": None
}
]
}
self.wfile.write(
f"data: {json.dumps(chunk)}\n\n".encode()
)
self.wfile.flush()
self.wfile.write(
b"data: [DONE]\n\n"
)
self.wfile.flush()
except Exception:
error_message = traceback.format_exc()
log(
f"CRITICAL ERROR:\n{error_message}"
)
try:
self.send_response(500)
self.end_headers()
except Exception:
pass
finally:
log("Transaction closed.")
if __name__ == "__main__":
print(
f"Server started on "
f"http://localhost:{PORT}"
)
server = HTTPServer(
("0.0.0.0", PORT),
Proxy
)
server.serve_forever()gedit ~/.config/opencode/opencode.json
...
"qwen2.5-coder:14b": {
"name": "qwen2.5-coder:14b",
"id": "qwen2.5-coder:14b"
},
...
and i run like so
in the folder of a project
opencode --agent --model=ollama/qwen2.5-coder:14b
0 Comments
No comments yet. Be the first!
Leave a Comment