OpenAI is requesting that third-party contractors upload real assignments and tasks from their current or previous employment to evaluate the performance of its next-generation AI models. Documents obtained by WIRED from OpenAI and the training data company Handshake AI reveal the project's aim to establish a human performance baseline for various tasks, which will then be used to assess AI model capabilities.
This initiative is part of OpenAI's broader effort, launched in September, to measure its AI models against human professionals across diverse industries. The company views this comparison as a crucial metric for gauging progress towards achieving artificial general intelligence (AGI), defined as an AI system that surpasses human capabilities in most economically valuable tasks.
According to a confidential OpenAI document, "We’ve hired folks across occupations to help collect real-world tasks modeled off those you’ve done in your full-time jobs, so we can measure how well AI models perform on those tasks." The document instructs contractors to "Take existing pieces of long-term or complex work (hours or days) that you’ve done in your occupation and turn each into a task."
The data collection strategy highlights a key challenge in AI development: accurately assessing an AI's ability to perform real-world tasks. By comparing AI performance against a human baseline derived from actual work, OpenAI aims to gain a more nuanced understanding of its models' strengths and weaknesses. This approach is particularly relevant as AI systems become increasingly integrated into professional settings.
The implications of achieving AGI are far-reaching, potentially transforming industries and reshaping the nature of work. While OpenAI emphasizes the potential benefits of AGI, such as increased productivity and innovation, the development also raises concerns about job displacement and the ethical considerations of increasingly autonomous AI systems.
OpenAI's evaluation process reflects the ongoing debate within the AI community about how to best measure and control increasingly powerful AI systems. As AI models become more sophisticated, establishing reliable benchmarks and safety protocols is essential to ensure their responsible development and deployment. The company has not released specific details about the types of tasks being collected or the criteria used to evaluate AI performance, but it has stated that the data will be used to improve the accuracy and reliability of its future AI models. The project is ongoing, and the results of the evaluation are expected to inform future development efforts at OpenAI.
Discussion
Join the conversation
Be the first to comment