AI model testing is being gamed and AI leaderboard rankings can be tricked. An Oxford review found issues in nearly half of ...
AI chatbots have been linked to serious mental health harms in heavy users, but there have been few standards for measuring whether they safeguard human well-being or just maximize for engagement. A ...
On Friday, research organization Epoch AI released FrontierMath, a new mathematics benchmark that has been turning heads in the AI world because it contains hundreds of expert-level problems that ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results