Build a Local AI Pentesting Assistant on Kali Linux with Ollama and MCP

Topics: Ollama, MCP, Python, Kali Linux, Responsible Scope

The tool does not determine whether you are a professional. Scope does.

Before any script runs, before any model generates a command, you need written authorization for every target you plan to touch. That is not a disclaimer to skip past. Every piece of tooling in this article enforces that principle because I have watched what happens when it gets ignored.

A few years ago a student ran a scan against a host that was not in the lab scope. I did not give a zero and move on. That student wrote the apology email. Not me, the student wrote it, disclosed exactly what ran and what the scan returned, and waited to hear what the victim decided to do about it. Outside a classroom, unauthorized access carries consequences the victim controls, not the teacher. That framing changes how seriously students take scope documents.

Build with that in mind.

What You Are Building

A local AI assistant that runs on Kali, takes natural-language prompts, and executes reconnaissance commands against an explicit allow-list of targets. No cloud. No API keys leaving your machine. The model runs on your hardware via Ollama, and the MCP server enforces what the model is allowed to touch.

Step 1: Install Ollama on Kali

curl -fsSL https://ollama.com/install.sh | sh

Once installed, create a systemd override so Ollama binds to your machine’s IP rather than only localhost. This matters when you want to query it from other VMs in your lab.

sudo systemctl edit ollama

Add this to the override file:

[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"

Reload and restart:

sudo systemctl daemon-reload && sudo systemctl restart ollama

Confirm it is running:

curl http://localhost:11434

You should get back: Ollama is running

Step 2: Pull a Model

The tutorial default is llama3.1, which works. My choice for a setup like this is different.

ollama pull llama3.1

What I actually load for security work is Qwen3.5-9B-Claude-4.6-Opus-Uncensored-Distilled. It is a distilled model, specifically a distillation of Claude 4.6 Opus reasoning. Distillation here means the smaller model was trained to reproduce the reasoning patterns of a much larger one. You get near-Opus-quality chain-of-thought on a 9B parameter footprint that runs locally. For an AI assistant that needs to reason about what a scan result actually means, that quality gap matters. The base llama3.1 at 8B will follow instructions, but it reasons shallowly through security-specific interpretation tasks.

Test the API directly before building anything on top of it:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1",
  "prompt": "List three reasons to enumerate before exploiting.",
  "stream": false
}'

A coherent response means the model is up.

Step 3: Set Up the Python Environment

python3 -m venv red-team-ai
source red-team-ai/bin/activate
pip install mcp ollama

Create the project directory:

mkdir ai-red-team && cd ai-red-team

Step 4: Build the MCP Server

Create mcp_server.py. Three things gate every command the model can run: which networks are in scope, which tools are allowed, and the handler that checks both before execution.

from mcp.server import Server
from mcp.types import Tool, TextContent
import subprocess
import ollama

ALLOWED_NETWORKS = ["192.168.1.0/24", "10.10.10.0/24"]
ALLOWED_COMMANDS = ["nmap", "ping", "dig", "whois"]
MODEL = "llama3.1"

app = Server("red-team-assistant")

@app.list_tools()
async def list_tools():
    return [
        Tool(
            name="run_recon",
            description="Run a whitelisted recon command against an allowed target.",
            inputSchema={
                "type": "object",
                "properties": {
                    "command": {"type": "string"},
                    "target": {"type": "string"}
                },
                "required": ["command", "target"]
            }
        )
    ]

@app.call_tool()
async def call_tool(name: str, arguments: dict):
    if name == "run_recon":
        command = arguments["command"]
        target = arguments["target"]

        if command not in ALLOWED_COMMANDS:
            return [TextContent(type="text", text=f"Command '{command}' is not allowed.")]

        in_scope = any(target.startswith(net.split("/")[0][:8]) for net in ALLOWED_NETWORKS)
        if not in_scope:
            return [TextContent(type="text", text=f"Target '{target}' is out of scope.")]

        result = subprocess.run([command, target], capture_output=True, text=True, timeout=30)
        return [TextContent(type="text", text=result.stdout)]

Make it executable and run it:

chmod +x mcp_server.py
python mcp_server.py

Pop a second terminal, type something like “run a basic nmap scan on 192.168.1.1” and watch the server deny or execute based on the list.

The Limits of a Hardcoded Allow-List

The allow-list above works, but it has a structural problem: someone edits a Python file by hand every time an engagement changes scope. That is how scope boundaries get blurry.

The architecture I would actually build toward uses two agents. The first agent reads the engagement’s scope document, extracts all authorized IP ranges and hosts, and generates the allow-list programmatically. The second agent runs alongside the active MCP server and monitors every tool call against that generated scope. Any tool call targeting a host outside the scope document gets flagged before the command runs.

One agent ingests the scope document and generates the allow-list. A second monitors the active MCP and flags violations in real time. That is a materially more mature architecture than a Python list someone edits the morning of an engagement.

The Scan Flag Beginners Get Wrong

Once the server is running, the AI will generate nmap commands based on your prompts. Watch what flags it reaches for.

Beginners grab -A, -T5, -sT, -sV, or --script http-enum first. Any of those and the IDS has you logged before you finish the scan. -sS -T2 -p 22,80,443 first — map what is there before the target knows you are looking. Aggressive flags come after you understand what you are working against.

The model follows the rules you give it. Build them tight. Quiet defaults and scope constraints extend your precision — aggressive defaults on an open prompt will show you what the target’s detection stack looks like.

If you are career-transitioning into security and want a realistic map from certs to job-ready skills, the Cybersecurity Career Roadmap covers the actual sequence for $47. ku5e.com

Written by Mario Martinez Jr. (ku5e / Gary7) | TryHackMe Profile | ku5e.com/blog

What You Are Building#

Step 1: Install Ollama on Kali#

Step 2: Pull a Model#

Step 3: Set Up the Python Environment#

Step 4: Build the MCP Server#

The Limits of a Hardcoded Allow-List#

The Scan Flag Beginners Get Wrong#