Project Deal

Recently, economists have begun theorizing about a world in which AI models handle many or most transactions on humans’ behalf. We thought we’d run a new experiment—Project Deal—to learn more about this in practice.

For one week, we created a classified marketplace for employees in our San Francisco office—like Craigslist, but with a twist: all of the deals were conducted by AI models acting on our employees’ behalf. In December 2025, Claude interviewed people about which of their personal belongings they might want to sell and what sorts of things they might be willing to buy. We incentivized participation by giving everyone’s agent $100 to spend. Then, our employees’ Claude agents made postings vying for each other’s attention. Negotiations commenced. Deals were made, closets decluttered. At the end of it all, people brought in and exchanged the actual, physical goods that were haggled over by their AI avatars—covering everything from a snowboard to a plastic bag full of ping-pong balls.

We were struck by how well Project Deal worked. Our AI agents struck 186 deals at a total transaction value of just over $4,000. To our surprise, participants were very enthusiastic about the experience—they even stated a willingness to pay for a similar service in the future.

But we also ran a parallel experiment (this one in secret). We tested how our participants would fare if we varied which Claude model represented them. We compared our then-frontier model, Claude Opus 4.5, to our smallest model, Claude Haiku 4.5. We found that agent quality does make a difference: people represented by “smarter” models got objectively better outcomes. Yet our post-experiment survey found that those with weaker models didn’t notice their disadvantage.

To be sure, this was a pilot experiment with a self-selected participant pool. But we suspect we’re not far from more agent-to-agent commerce bubbling up in the real world, with real consequences.

The first thing to say is that our experiment worked. It is possible for AI agents to represent humans in a marketplace. In our “real” run, our 69 agents struck 186 deals across over 500 listed items, for a total transaction value of just over $4,000. And these were far from trivial, one-click deals. Agents had to identify potential matches, propose prices, field counteroffers, and reach agreement—all in natural language, without a prebaked negotiation protocol. When our surveyed participants rated the fairness of the individual deals, the scores were unremarkable, in the best possible sense: on a scale from 1 (unfair to one party) to 7 (unfair to the other), they hovered around 4—right in the middle. On this and other measures, people reported they were broadly satisfied with how their agents represented them.

But not every agent did equally well.
When we looked at the two runs with a mix of Opus and Haiku agents, we found that Opus outperformed Haiku on most objective measures.

There was clearly a quantitative disadvantage to being represented by Haiku: these users got worse deals. But they didn’t seem to notice it. This has an uncomfortable implication: if “agent quality” gaps were to arise in real-world markets—and there is no reason to think they won’t—then people on the losing end might not realize they’re worse off. That said, our experiment wasn’t designed to dive deep into the dynamics at play here—we’ll need more research to know whether a fully agentic economy might see inequality taking root quietly.

Another finding surprised us, too. At least in this pilot experiment, it transpires that it didn’t really matter how people instructed their agents to approach the task of bargaining...users who instructed their agents to act aggressively didn’t have a better chance of selling items, didn’t sell their items for more, and didn’t pay less for what they bought.

We’re still unsure how an economy with AI agents in the mix might develop. But we’ve now seen the outlines of at least a few possibilities.

On the optimistic side, many of our volunteer participants genuinely enjoyed this experiment, and felt they got value from the service provided by their agents—whether in the form of getting rid of unwanted stuff, setting themselves up for an afternoon out with an extremely fluffy dog, or collecting a few books they’d been meaning to read. Most of our volunteers reported that they’d do this again. In fact, when we asked them if they’d be willing to pay for an agent like this, 46% said yes. So there’s at least the potential for the automated collection of preferences and execution of deals to provide some value, possibly by reducing friction in the market and therefore increasing the gains from trade.

But it is not clear that things will go so smoothly. Even in our small experiment, we saw evidence that access to higher-quality agents confers a quantifiable market advantage. Will those dynamics reinforce, or even compound, existing economic inequalities?

In this experiment, we didn’t make our marketplace especially competitive or adversarial. But as agents transact in a world of corporations—rather than volunteers we’ve encouraged with $100—they might be placed under very different incentives. Optimizing directly for AI agents’ attention could become a powerful tool. This might not translate into welfare improvements for humans, much as optimizing electronic commerce for human attention has come with substantial downsides. It might also introduce a new category of information and security concerns in digital exchange, in the form of jailbreaking (getting agents to reveal information they shouldn’t) and prompt injection (surreptitiously causing agents to take unwanted action).

The policy and legal frameworks around AI models that transact on our behalf simply don’t exist yet. But this experiment shows that such a world is plausible. More than that, it shows that such a world isn’t far away. Society will need to move quickly to reckon with these changes.