Article

A Jetson Orin Nano single-board computer on a home lab desk with a monitor in the background showing an AI model benchmark comparison chart, cool ambient LED lighting, no people.

Frontier Models Are Overkill for Most Production Workloads

Topics: AI Models, Open Source, Ollama, Production AI, Infrastructure The trading bot running on my Jetson Orin Nano uses llama3.2:3b for its daily summary task. Not because it was the first model I tried. deepseek-r1:14b at 9GB does not fit the 7.4GB unified memory pool. llama3.1:8b mostly fits and crashes at the edge. llama3.2:3b stays stable at roughly 2GB and writes the summary well. The model writes one paragraph per day: what position the bot holds, what the P&L is, what the trailing stop did. It does that task well. The fact that it is several capability tiers below GPT-5.5 does not show up anywhere in the output. ...

The Ethical AI Company Billed You for Using Competitor Tools

Topics: Anthropic, Claude Code, AI Ethics, Billing, Vendor Trust Anthropic’s detection logic found “hermes.md” in a user’s git commit history. The user was on the $200/month Claude Max plan with 86% of their usage allocation untouched and no active session running. Anthropic billed $200.98 in extra charges. When the user reported it, support acknowledged the billing error three times and refused the refund. The post reached 1.4 million views. Anthropic then issued the refund plus one month of credit. ...

A home developer workstation at night with dual monitors showing API routing configuration code and a cost comparison spreadsheet, dim desk lamp, personal lab setup, no people.

Claude Code on DeepSeek: 17x Cheaper

Topics: Claude Code, DeepSeek, AI Costs, Developer Tools, Open Source Claude Code’s tool ecosystem and the model it runs on are two separate things. A project called DeepClaude treats them that way. DeepClaude intercepts API calls from Claude Code and routes them to DeepSeek V4 instead of Anthropic’s models. The tool layer, file editing, bash execution, session context, autonomous loops, stays intact. The inference backend changes. The cost difference is approximately 17x. ...

An empty hospital emergency triage station at night, medical monitors showing vital signs, a diagnostic computer terminal on the desk, dim clinical lighting, no people present.

AI Outperformed ER Doctors in a Harvard Trial

Topics: AI, Healthcare, Clinical Trials, Emergency Medicine Listen to this article Harvard ran a controlled trial of AI performance in emergency triage and published the results this week. The AI outperformed emergency physicians on diagnostic accuracy. Most of the conversations that follow a result like this focus immediately on liability. That conversation is worth having. It is not the most important one. What Emergency Triage Actually Tests Emergency triage is decision-making under a specific set of conditions: incomplete information, time pressure, high consequence, and compounding cognitive load from case volume. A physician who has seen 40 patients in a shift is making probabilistic judgments under fatigue in a way that a physician at the start of a shift is not. ...

The 47 Percent Debugging Skill Drop

Topics: AI Coding Agents, Developer Skills, Claude Code, Software Engineering Anthropic published research this year showing that developers who leaned heavily on AI coding agents experienced a 47% drop in debugging skills. The finding that made it uncomfortable is in the same document: supervising an AI coding agent effectively requires the exact debugging skills that atrophy from using one. You need the skill to catch what the agent gets wrong. Using the agent is what costs you the skill. ...

A terminal monitor in a dark server room displaying API pricing comparison data in green text, server rack hardware with blinking LEDs in the background, dim ambient lighting.

DeepSeek V4 Broke the Pricing Argument

Topics: AI Models, Open Source, Enterprise Costs, API Pricing Claude Opus 4.7 costs $5 per million input tokens and $25 per million output tokens. GPT-5.5 is $5 input and $30 output. DeepSeek V4, released as open weights on Friday, costs $1.74 input and $3.48 output, runs a 1 million token context window, and scores within a few benchmark points of both on math and Q&A. The pricing argument for closed frontier models just got harder to make. ...

I Built a Trading Bot That Runs Its LLM on a Jetson in My Closet

Topics: Python, Alpaca, Ollama, Jetson Orin Nano, Trading Automation The trading bot watches XNDU every five minutes during market hours. XNDU is a photonic quantum computing company. Photonics means room temperature operation. The cooling infrastructure that makes quantum computing prohibitively expensive at scale is not part of the design. XNDU had solid financials this week and got upgraded to a strong buy. I queued 100 paper shares for the 9:31 AM open on April 30, 10% trailing stop, $5,000 position cap. ...

The Karpathy Loop

March 8, 2026: Andrej Karpathy dropped a 630-line Python script, aimed an AI agent at his own training code with a single metric to chase, and went to bed. Two days later the agent had run 700 experiments, found 20 genuine improvements, and cut training time by 11%. It also found a bug in Karpathy’s attention implementation that he had missed — not because the agent is smarter, but because it tried more things faster without getting bored after the 15th failed attempt. ...

Zero-Click Prompt Injection in Claude's Chrome Extension: One Iframe, No Warning, Everything Gone

The attack required no action from the victim. Visit a page. Leave. By the time the browser tab closed, the extension had already talked to Claude, exported chat history, read Gmail, and potentially sent an email under your name. Patched in Claude Chrome extension v1.0.41. Here is how the chain worked. The Attack Chain The Claude Chrome extension trusted any page on *.claude.ai to send it messages. That wildcard, every subdomain under claude.ai, is where the attack found its entry point. ...

Run a Private AI That Reads Your Documents, Locally, With No Internet Required

The way RAG works is easier to understand if you stop thinking about AI memory. Think about a dictionary instead. You do not memorize every definition before you need one. Look up the word when you need it. RAG does the same thing with your files — chunks them, embeds them into a vector database, and pulls back only what matches your question. The model never sees the whole library. ...