Engineering

TensorPanel: Self-Hosted AI Infrastructure Management With a Laravel Control Plane and Go Agent

Name: Talivio Technology
Author: Talivio Technology OÜ

TensorPanel lets teams deploy open-source LLMs on their own GPU servers — with one-click model deployment, fine-tuning, an OpenAI-compatible API proxy, and a Go agent that runs on the GPU server itself. Here's how the architecture works.

Eren Bostan January 22, 2026 8 min read

TensorPanel is a multi-tenant SaaS platform for teams that want to run open-source AI models on their own GPU infrastructure rather than paying per-token to a cloud API provider. The platform connects to GPU servers — Hetzner dedicated servers, AWS instances, RunPod deployments, or bare-metal hardware — and provides a management layer for model deployment, fine-tuning, API access, and team permissions.

The architecture splits across three components: a Laravel control plane that handles the SaaS logic, a Go agent that runs on each GPU server, and a Flutter mobile application that gives users a private ChatGPT-style interface to their own models.

TensorAgent: The Go Binary on the GPU Server

The design decision that defines TensorPanel's architecture is where the GPU server management logic runs. It would have been simpler to SSH into servers from the Laravel backend — send commands, parse output, manage state remotely. Instead, TensorPanel installs a lightweight Go binary (TensorAgent) on each GPU server that acts as a local gateway.

TensorAgent runs an HTTPS API on port 8080. All communication between the Laravel control plane and the GPU server goes through this API. The agent handles model deployment (downloading from HuggingFace, spawning inference containers), fine-tuning job execution (running Docker containers with unsloth or axolotl), hardware monitoring (parsing nvidia-smi output for GPU metrics, gopsutil for CPU/RAM/disk), and rate limiting enforcement.

Using Go for the agent was deliberate. Python agents are common in ML infrastructure tooling, but Python's startup time and memory footprint make it a poor choice for a lightweight daemon that needs to be always-on and responsive. A compiled Go binary starts in milliseconds, uses ~10MB of RAM at idle, and handles concurrent HTTP requests efficiently without the GIL considerations that would affect a Python service doing the same work.

One-Click Model Deployment

The model marketplace in TensorPanel lists curated open-source models: Llama 3, Mistral, DeepSeek, Qwen, and others. Each model entry includes its VRAM requirements. When a user selects a model to deploy on a specific server, TensorPanel checks the server's available VRAM (total VRAM minus VRAM currently in use by running models) before allowing the deployment.

Multiple models can run simultaneously on a single GPU server if VRAM permits. TensorPanel tracks running models per server and their VRAM consumption, automatically assigns available ports to new deployments, and updates the available-VRAM calculation in real time. A server with 80GB VRAM might run several smaller models simultaneously rather than a single large one.

The deployment itself is triggered through TensorAgent: the agent downloads the model from HuggingFace using an encrypted HuggingFace token stored in the tenant settings, then spawns a Docker container running vLLM (for production deployments) or Ollama (for prototyping). The container exposes an inference endpoint that TensorPanel's API proxy can route to.

OpenAI-Compatible API Proxy

TensorPanel exposes an API at /api/v1/chat/completions that is compatible with the OpenAI API specification. A team already using the OpenAI Python SDK or any tool that targets the OpenAI API can switch to TensorPanel by changing the base URL and API key — no other code changes required.

The proxy layer handles routing, token usage tracking, and quota enforcement. Each API key is associated with a role, and roles have configurable RPM (requests per minute) and monthly token quotas. The rate limiting is enforced both at the control plane level and synchronized to TensorAgent for local enforcement on the GPU server itself — a double-check that prevents quota evasion by calling the agent directly.

Fine-Tuning as a First-Class Feature

Fine-tuning is not an add-on in TensorPanel — it's built into the core interface. Users upload a training dataset in JSON format, configure hyperparameters (LoRA vs QLoRA vs full fine-tuning, learning rate, batch size, epoch count), and submit the job. TensorAgent executes the fine-tuning run in a Docker container, streaming loss and epoch metrics back to the control plane in real time.

The fine-tuning interface shows a live loss curve as the job runs, not just a "job running" indicator. When the job completes, the resulting model adapter is available for deployment alongside the base model. This workflow — from dataset upload to running inference on a fine-tuned model — is entirely self-contained within TensorPanel without requiring any ML engineering expertise from the user.

Global Guardrails

TensorPanel includes a content guardrail system: blocklists and allowlists applied to system prompts and completion requests. These are configured at the tenant level and enforced by TensorAgent at request time — the enforcement happens locally on the GPU server before requests reach the inference engine, not just at the control plane level.

This local enforcement matters for compliance use cases. If a team needs to guarantee that certain content never enters or exits their AI models, having the enforcement happen at the GPU server (rather than at a remote control plane that could theoretically be bypassed) provides a stronger guarantee.

TensorScripts: One-Command Server Bootstrap

Connecting a new GPU server to TensorPanel takes one command: curl -sL https://tensorpanel.talivio.com/agent/install.sh?token=YOUR_TOKEN | sudo bash. The TensorScripts handle NVIDIA driver installation, CUDA toolkit setup, Docker and NVIDIA Container Toolkit installation, and TensorAgent deployment and registration with the control plane. A fresh GPU server goes from bare OS to ready-to-deploy in a single terminal session.

#TensorPanel #AI #LLM #GPU #Infrastructure #Go #Laravel

Eren Bostan

Co-Founder & Developer, Talivio Technology OÜ

TensorPanel: Self-Hosted AI Infrastructure Management With a Laravel Control Plane and Go Agent

TensorAgent: The Go Binary on the GPU Server

One-Click Model Deployment

OpenAI-Compatible API Proxy

Fine-Tuning as a First-Class Feature

Global Guardrails

TensorScripts: One-Command Server Bootstrap

More from the Blog

VoxSim: Simulating Society to Understand the Impact of Policy Before It Lands

Earthquake AI: Predicting Seismic Risk for Turkey With Machine Learning

Vendio Is Live: A Sales Platform Built for Wholesale and Import Businesses

Talivio Technology Demo

TensorPanel: Self-Hosted AI Infrastructure Management With a Laravel Control Plane and Go Agent

TensorAgent: The Go Binary on the GPU Server

One-Click Model Deployment

OpenAI-Compatible API Proxy

Fine-Tuning as a First-Class Feature

Global Guardrails

TensorScripts: One-Command Server Bootstrap

More from the Blog

VoxSim: Simulating Society to Understand the Impact of Policy Before It Lands

Earthquake AI: Predicting Seismic Risk for Turkey With Machine Learning

Vendio Is Live: A Sales Platform Built for Wholesale and Import Businesses

Privacy Policy

1. Data Collection

2. Data Usage

3. Data Security

4. Cookie Policy

5. Your Rights

Terms of Service

1. Service Scope

2. User Responsibilities

3. Prohibited Uses

4. Payment Terms

5. Intellectual Property

6. Liability Limitation

7. Changes

GDPR & Privacy Policy

What is GDPR?

Legal Basis

Your GDPR Rights

Right of Access (Article 15)

Right of Rectification (Article 16)

Right of Erasure (Article 17)

Right of Restriction (Article 18)

Right of Portability (Article 20)

Right of Objection (Article 21)

Data Transfers

Data Retention Periods

Data Protection Officer

Talivio Technology Demo