top of page

Steering a Permissionless AI Safety Organization

ree

Futarchy as an Engine


I argued that the AI safety community's structure is partially misaligned with its own sense of urgency. We operate on short timelines but use slow, permission-based systems that "bycatch" 95% of aspiring contributors. The proposed solution was a permissionless, volunteer-powered organization—Theomachia Labs—to absorb this talent and allow people to contribute now.

But a permissionless structure creates a new, critical problem: coordination. If anyone can join and start a project, how do you avoid descending into chaos? How do you ensure that this massive influx of volunteer effort is directed toward the most promising research directions and not wasted on dead ends or duplicative work?

A top-down management structure is hard to implement for various reasons: lack of senior coordinators, immature state of the field itself, the need for quick iterations. What we need is a decentralized mechanism for signaling, prioritization, and strategic adjustment. We need an engine for steering.

Geodesic already started using futarchy to select projects for the MARS program. It points to the solution. It’s an excellent application of prediction markets to make better initial decisions. But we can, and must, go further.


From Static Selection to Dynamic Steering


Using a prediction market to pick which projects to fund is a powerful idea. You define success metrics (e.g., "Will this project result in a paper accepted to NeurIPS?"), and the market's collective wisdom selects the proposal with the highest probability of success.

This is a massive improvement over traditional committees, but it’s a static, one-time snapshot. Real research is a dynamic process of exploration, dead ends, and unexpected pivots. A truly effective coordination system must operate continuously.

At Theomachia Labs, we can use prediction markets not just to select projects, but to manage them throughout their lifecycle.


First of all, instead of one big bet on a final outcome nine months away, each project would be broken down into a series of verifiable milestones, each with its own market.

  • "Will Project Deceptive-Alignment-Detector have a working code prototype by the end of Q1?" (Resolves YES/NO)

  • "Will the initial results from Project Mechanistic-Interpretability-Tool be independently replicated by July?" (Resolves YES/NO)

These markets provide a real-time dashboard of the organization's collective confidence in each project's progress. A lagging market price is a powerful, impersonal signal that a project is in trouble, prompting review and intervention far more efficiently than waiting for a quarterly report.


Even more power comes from conditional markets, which let us ask "what if?" questions.

  • "Conditional on the current approach of Project A failing its Q2 milestone, what is the probability it will publish a paper by year's end if it pivots to Strategy B?"

This is a tool for making hard decisions. It allows teams to objectively assess whether they should persevere or pivot. It depersonalizes the decision to abandon a failing approach, reframing it as a rational update based on the collective's best forecast. For a volunteer organization, where motivation is key, this is a crucial mechanism for avoiding demoralizing death marches on doomed projects.


However, the hardest and most important step is to move beyond "output" metrics (papers, code) to "impact" metrics. This is notoriously difficult, but not impossible.

  • "What is the probability that this research direction will be cited in a major AI lab's safety policy within 2 years?"

  • "Conditional on this interpretability tool being released, will it be used in at least three independent research papers (not from our lab) within 18 months?"

  • "Will this project's findings shift the consensus on Topic X by >10% on the Alignment Forum, as measured by a pre- and post-project survey?"

These markets force a constant orientation toward the ultimate goal: producing research that actually makes a difference. They create a powerful incentive to work on things that the community believes will be influential, not just easy to publish.


Futarchy as the Operating System for Theomachia Labs


With hundreds of volunteers across the globe, no central committee can effectively track everything. The market becomes the coordination layer. Promising projects will see high market confidence, attracting more volunteer talent and attention. Lagging projects will see their confidence drop, signaling that resources should flow elsewhere.


In a traditional structure, your contribution is measured by your output. In this model, good judgment is a first-class contribution. By trading on the markets, you are actively helping to steer the entire organization. A skilled forecaster who never writes a line of code or a sentence of prose can become one of the most valuable members of the community, their expertise in research taste being directly translated into better organizational decisions. Success in the markets can be rewarded with reputation, influence (e.g., more voting power on new market creation), or even a share of future funding, creating a powerful incentive to participate.


Want to propose a new research direction? You don't need to write a long proposal and convince a committee. You simply define a project with clear, resolvable milestones and create the initial set of markets. If the idea has merit, the community will signal its belief by trading, and the project will attract volunteers. If it's a bad idea, it will wither on the vine due to market indifference.


The Anti-Goodharting Squadron


The obvious counter-argument is Goodhart's Law. Won't researchers just start optimizing for easily achievable milestones instead of doing ambitious research?

Yes, if the system is stupid. The solution is not to abandon measurement but to build a more sophisticated, adversarial system of measurement. At the very least, I propose to:

  • Rely on a diverse and constantly changing set of metrics, combining output, replication, and impact markets.

  • Create markets specifically designed to counteract Goodharting. For example: "Conditional on Project X achieving all its stated goals, what is the probability that a post-mortem review will conclude it was a case of 'gaming the metrics'?" This creates a direct incentive to punish and predict such behavior.

  • Some markets could be resolved by a poll of trusted experts, providing a check against purely quantitative metrics.

  • Expert opinions and reviews can and should be apply when possible as well.


An Engine, Not Just a Key


If we are serious about short timelines, we need to do more than just open the doors. We need to build a machine that can effectively harness the energy of everyone who walks through them.


Futarchy provides a mechanism for coordination, motivation, and strategic agility that is native to a permissionless, decentralized structure. It turns the entire organization into a constantly updating forecasting engine, aimed squarely at the problem of AI safety.

Comments


Join our mailing list for updates on publications and events

© 2025 by Theomachia Labs

bottom of page