Humans Still Beat AI in the Long Horizon: Revisiting Test-Time Scaling in the Agent Era

joyemang33.github.io

Humans Still Beat AI in the Long Horizon: Revisiting Test-Time Scaling in the Agent Era

joyemang33.github.io

cm0002@infosec.pub to

AI - Artificial intelligence@programming.devEnglish · 5 days ago

Agents can spend test-time compute by trying, observing, and revising. We derive an Elo reference for repeated sampling, then show that in a 2022 two-week coding marathon, current agents plateau within 24 hours while top humans keep improving.

You must log in or register to comment.

Chat

AI - Artificial intelligence@programming.dev

Aii@programming.dev

Create a post

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !Aii@programming.dev

AI related news and articles.

Rules:

No Videos.
No self promotion: Don’t post links to your articles.

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

25 users / day
73 users / week
265 users / month
937 users / 6 months
2 local subscribers
302 subscribers
346 Posts
325 Comments
Modlog