profile

Nonrival

Last forecast of Season 2! (OpenAI)

Published 5 months ago • 2 min read

Welcome to Nonrival, the newsletter where readers make predictions about business, tech, and politics.

This is the last forecast of Season 2 and of the year! I’ll send a recap of this question on Wednesday and scores when the last few remaining questions resolve—and then in February I should have final scores for the entire season. But no new questions between now and then, and more to come soon on the next phase of Nonrival.

Thanks for forecasting. Send feedback to newsletter@nonrival.pub.

In this issue

  • Forecast: On Dec. 31, will the top-ranked model on the Chatbot Arena Leaderboard, based on Arena Elo rating, be from OpenAI?
  • (Make a forecast with one click at the bottom of this email.)

Is OpenAI still winning?

What, really, did the kerfuffle at OpenAI amount to? Sam Altman is back in charge, a couple of board members are gone, Microsoft has a board seat but still no voting rights. Is the company back on track?

This week’s question is about the competition in chatbots. Right now OpenAI is the undisputed leader. Will that change anytime soon?

OpenAI is not expected to release a new class of model (ie GPT-5) in the near future, but it can still make smaller releases to improve GPT-4. The question is whether it is more or less likely to release updates and products following last month’s boardroom drama. On the one hand, some of the seemingly more "safety"-minded board members are out, perhaps clearing the way for more progress. On the other hand, Altman might feel pressure to demonstrate that he’s not moving too quickly—and it may take a while for the company to reset after all the reshuffling.

In the meantime, competitors have GPT-4 in their sights. Last week, Inflection AI—a company started by DeepMind co-founder Mustafa Suleyman and LinkedIn co-founder Reid Hoffman—launched the second version of its large language model. Inflection claims that it is “the best model in the world for its compute class and the second most capable LLM in the world today” and that the model will soon power its chatbot Pi.

A group of Berkeley researchers maintains a ranking of chatbot performance across three metrics; right now OpenAI’s GPT-4 tops all three, followed by Anthropic’s Claude. Inflection and Pi are not yet included on that leaderboard, though the researchers have asked for API access in order to vet Inflection’s claims.

Anthropic also released a model update in late November, and new entrants in the chatbot space are constantly emerging—including open source models.

The Berkeley researchers’ ranking is based on users’ feedback. They chat with two anonymous models simultaneously then rate which one is better. Those votes get compiled into a score called an “Elo rating”. The higher the rating, the more likely it is that the bot in question will outperform a competitor.

These rankings depend on researchers getting API access to a given chatbot, so they aren’t a definitive measure of performance. Open source models are easier to evaluate than proprietary ones. Some models aren’t accessible and so aren’t ranked, even if they’re quite good.

Nonetheless, the leaderboard speaks to a critical question: In the wake of a board coup, is OpenAI at risk of losing its lead?

Forecast

On Dec. 31, will the top-ranked model on the Chatbot Arena Leaderboard, based on Arena Elo rating, be from OpenAI?

​Very likely (~90% chance)

Likely (~70%)

Uncertain (~50%)

Unlikely (~30%)

Very unlikely (~10%)

Bonus trivia: Which person was NOT involved in founding OpenAI: Bill Gates, Elon Musk, Peter Thiel.

(Make a forecast by clicking a link above and you'll get to answer this trivia question.)

Just want to make a quick forecast? Click a link above and you're done! Your forecast will be recorded.

Or, click a link and then complete the survey. You can provide your reasoning and end with a bit of trivia.

Deadline: Make a forecast by 9am ET Wed. 12/5.

Resolution criteria: Based on the Chatbot Arena Leaderboard Elo Rating, when I go to check on 12/31.

Nonrival

The newsletter where readers make predictions about business, tech, and politics. Read the newsletter. Make a prediction with one click. Keep score.

Read more from Nonrival

Welcome to Nonrival, the newsletter where readers make predictions about business, tech, and politics. This is the first scoring email of Season 3 so everyone's total points have been reset, and now are based on just last week's question. Thanks for forecasting. Send feedback to newsletter@nonrival.pub. In this issue Scores: Will the preliminary April Index of Consumer Sentiment be higher than the final March index of 79.4? Date: This question was posed to readers on Sunday, April 7. Outcome:...

14 days ago • 1 min read

Welcome to Nonrival, the newsletter where readers make predictions about business, tech, and politics. Thanks for forecasting. Send feedback to newsletter@nonrival.pub. In this issue Recap: Will the preliminary April Index of Consumer Sentiment be higher than the final March index of 79.4? Average reader forecast: 59% Your forecast: [040724 GOES HERE]% The vibes will keep improving Most of you think that the April data on US consumer sentiment will improve over March's three-year high. As...

18 days ago • 1 min read

Welcome to Nonrival, the newsletter where readers make predictions about business, tech, and politics. How it works: Read the newsletter, then click a link at the bottom to make a prediction. You'll get scores based on how accurate your prediction is, compared to what actually happens. New cadence: I'll be sending one new forecast question a month, usually the first Sunday. Thanks for forecasting. Send feedback to newsletter@nonrival.pub. In this issue Forecast: Will US consumer sentiment...

21 days ago • 2 min read
Share this post