Skip to main content

Designing Houdini Agent: Why I Bet on a Thin Bridge

I'm building an open-source toolkit to bring AI coding agents into Houdini's workflow: houdini-agent. This post is about the ideas behind it — why I'm making the design choices I'm making, and how I'm thinking about AI-assisted DCC work more broadly.

I've been using Houdini for over eight years. In that time, the tool itself has changed a lot — COPs got rebuilt, KineFX replaced the old rigging system, USD/LOPs became a first-class citizen — but the way I work with it hasn't changed nearly as much. I still spend a significant chunk of my day on tasks that require Houdini expertise but not necessarily creative judgment: debugging VEX, tracing attribute flows through ten-deep node chains, cross-checking parameter values, building boilerplate networks I've built a hundred times before.

That's the work I wanted to hand off.

The Starting Observation

When coding agents reached the point where they could hold context across a real multi-step task — read a scene, reason about what they see, write code, verify the result, iterate — something clicked for me. Houdini is, at its core, a visual programming environment built on Python. Every node, every parameter, every attribute is accessible through the hou module. If I give an agent access to that Python layer, along with enough context about how Houdini works, it can execute a huge range of tasks. Not by replacing my thinking, but by handling the execution at a speed I can't match.

So I started building houdini-agent.

Why Not Specialized Endpoints?

The first instinct — mine, and probably anyone else's — is to wrap common operations into nice API calls. /create_terrain, /scatter_points, /build_rig. It feels clean. It looks good in a demo.

But I went down that path mentally and it doesn't hold up. Houdini's entire value proposition is composability. The reason people choose Houdini over other DCCs is that you can wire anything to anything. A pre-built "create terrain" endpoint works until you need a different noise function, or a specific attribute layout, or a feedback loop into a solver. Then you're either hacking around your own abstraction or back to doing it by hand.

I realized I was thinking about the problem backwards. The question isn't "what operations should I expose?" — it's "what's the thinnest possible interface that gives the agent full access?"

The answer turned out to be surprisingly small: a way to execute arbitrary hou Python code on Houdini's main thread, with undo support. That's the core. Everything else is optimization.

Read Structured, Write Free

The design split I landed on is: structured endpoints for reading, general execution for writing.

For reading, the agent needs standardized return formats to reason about. So there are dedicated endpoints for scene snapshots (nodes, connections, flags, errors), geometry inspection (attributes, statistics, sampled values), and node info (cook time, memory, bbox). These are layered progressively — the agent can get a high-level overview first, then drill into specific nodes, then look at actual attribute data. This avoids dumping an entire scene's worth of data when you only need to check if a node has errors.

For writing, it's exec() and batch(). The agent writes hou code directly, the same way I would in Houdini's Python Shell. batch() groups multiple operations into a single HTTP round-trip and a single undo block — so the user can Ctrl+Z the entire thing as one unit. There's also a verify parameter on exec: after running code, the server automatically checks specified nodes for errors, geometry counts, and cook time. This closes the loop without a separate call.

It's not the prettiest API. But it's honest about what Houdini actually is — a programmable environment where the "right" approach changes with every project.

Context as Architecture

Here's something I didn't expect to matter as much as it does: the knowledge layer.

Instead of hard-coding Houdini patterns into the bridge code, all domain knowledge lives in plain Markdown files. VEX gotchas, HDA conventions, KineFX workflows, USD/LOPs recipes — each in its own document. The agent reads what it needs before acting.

This turned out to be one of the highest-leverage decisions in the project. When Houdini ships a new version and something changes, I update a text file. When I learn a better pattern for rigging or discover an attribute naming issue that keeps tripping people up, I add a line. Over time, these docs have become a compressed version of years of Houdini experience in a form that the agent can actually consume and apply.

It's also the part that scales best with my own expertise. The more I know about Houdini, the better these docs get, and the better the agent performs. The bridge code barely changes.

Built to Last (and Evolve)

I'm not naive about how fast this space moves. The coding agents that exist today will look primitive in a year. So I made two bets:

Bet one: keep the foundation boring. HTTP, JSON, Python, undo blocks. No framework dependencies, no custom protocols, no abstractions that might not age well. The bridge is a few hundred lines of Python doing one thing well. There's nothing in it that will break when the next Houdini version ships.

Bet two: build in a self-evolution mechanism. After significant work sessions, the agent writes structured retrospectives — what was done, what was wasteful, what the toolkit could do better. These reflections accumulate over time. Periodically, I review them in batch to spot recurring friction points: if the same problem shows up across multiple sessions, it becomes a concrete change to the codebase.

The idea is that the toolkit improves from being used, not just from me sitting down to improve it.

The Bigger Picture

I think we're at the beginning of a fundamental shift in how technical artists work with DCC tools. The pattern I'm exploring with Houdini isn't Houdini-specific — it's about recognizing that any scriptable application is, in principle, accessible to a coding agent. The question is how to build the interface layer so that it's useful today and gets more useful as agents improve.

My answer, for now, is: keep it thin, keep it general, invest in context, and let the agent's growing capabilities do the heavy lifting. The bridge doesn't need to be smart. It needs to be a reliable conduit between a smart agent and a powerful tool.

That's the bet. So far, it's paying off.