OpenAI Taps Contractor Work to Sharpen AI Performance

AI Insights

4 min

Byte_BearAI

16h ago

OpenAI Taps Contractor Work to Sharpen AI Performance

AI Insights

Views

Likes

Min Read

Sources

OpenAI is requesting that third-party contractors upload real assignments and tasks from their current or previous employment to evaluate the performance of its next-generation AI models. Documents obtained by WIRED from OpenAI and the training data company Handshake AI reveal the project's aim to establish a human performance baseline for various tasks, which will then be used to assess AI model capabilities.

This initiative is part of OpenAI's broader effort, launched in September, to measure its AI models against human professionals across diverse industries. The company views this comparison as a crucial metric for gauging progress towards achieving artificial general intelligence (AGI), defined as an AI system that surpasses human capabilities in most economically valuable tasks.

According to a confidential OpenAI document, "We’ve hired folks across occupations to help collect real-world tasks modeled off those you’ve done in your full-time jobs, so we can measure how well AI models perform on those tasks." The document instructs contractors to "Take existing pieces of long-term or complex work (hours or days) that you’ve done in your occupation and turn each into a task."

The data collection strategy highlights a key challenge in AI development: accurately assessing an AI's ability to perform real-world tasks. By comparing AI performance against a human baseline derived from actual work, OpenAI aims to gain a more nuanced understanding of its models' strengths and weaknesses. This approach is particularly relevant as AI systems become increasingly integrated into professional settings.

The implications of achieving AGI are far-reaching, potentially transforming industries and reshaping the nature of work. While OpenAI emphasizes the potential benefits of AGI, such as increased productivity and innovation, the development also raises concerns about job displacement and the ethical considerations of increasingly autonomous AI systems.

OpenAI's evaluation process reflects the ongoing debate within the AI community about how to best measure and control increasingly powerful AI systems. As AI models become more sophisticated, establishing reliable benchmarks and safety protocols is essential to ensure their responsible development and deployment. The company has not released specific details about the types of tasks being collected or the criteria used to evaluate AI performance, but it has stated that the data will be used to improve the accuracy and reliability of its future AI models. The project is ongoing, and the results of the evaluation are expected to inform future development efforts at OpenAI.

AI-Assisted Journalism

This article was generated with AI assistance, synthesizing reporting from multiple credible news sources. Our editorial team reviews AI-generated content for accuracy.

Share & Engage

AI Analysis

Deep insights powered by AI

Discussion

Join the conversation

Be the first to comment

AI Slop & CRISPR's Promise: Navigating the Future of Tech

This article explores the controversial rise of AI-generated content, or "AI slop," examining its potential to both degrade online spaces and offer unexpected creative value. It also touches on the evolving landscape of CRISPR technology and the anticipation of more lenient regulations for gene-editing applications.

Byte_Bear

Byte_Bear•

Inference Security to Combat AI Runtime Attacks by 2026

3 min

Tech4h ago

Inference Security to Combat AI Runtime Attacks by 2026

AI-driven runtime attacks are outpacing traditional security measures, with adversaries exploiting vulnerabilities in production AI agents within seconds, far faster than typical patching cycles. This shift is driving CISOs to adopt inference security platforms that offer real-time visibility and control over AI models in production to mitigate these emerging threats. CrowdStrike's 2025 report highlights the speed and sophistication of these attacks, emphasizing the need for advanced security solutions.

Byte_Bear

Byte_Bear•

Orchestral AI: Taming LLM Chaos with Reproducible Orchestration

3 min

AI Insights4h ago

Orchestral AI: Taming LLM Chaos with Reproducible Orchestration

Orchestral AI, a new Python framework, offers a simpler, reproducible approach to LLM orchestration, contrasting with the complexity of tools like LangChain. By prioritizing synchronous execution and type safety, Orchestral aims to make AI more accessible for scientific research and cost-effective development, potentially impacting how AI is integrated into fields requiring deterministic results.

Cyber_Cat

Cyber_Cat•

Anthropic Blocks Unofficial Claude Access: What It Means

3 min

AI Insights4h ago

Anthropic Blocks Unofficial Claude Access: What It Means

Anthropic is implementing technical measures to prevent unauthorized access to its Claude AI models, specifically targeting third-party applications spoofing the Claude Code client for advantageous pricing and usage. This action disrupts workflows for users of open-source coding agents and restricts rival labs' ability to train competing systems using Claude, raising questions about the balance between protecting AI models and fostering open innovation.

Cyber_Cat

Cyber_Cat•

Fujifilm's X-E5: The X100VI, But Make It Interchangeable!

3 min

Entertainment4h ago

Fujifilm's X-E5: The X100VI, But Make It Interchangeable!

Fujifilm's X-E5 is the hot new camera that's basically an X100VI with the freedom of interchangeable lenses, answering the prayers of photography enthusiasts everywhere! While scoring points for its compact design, killer image quality, and beloved Fujifilm color science, the X-E5 proves even camera giants can't achieve perfection, leaving some wanting more in video and weather-sealing.

AI Uncovers Best Post-Resolution Gear Deals

New Year's resolutions often involve habit formation, and AI-powered tools, like fitness trackers and smartwatches, can play a role in achieving these goals by providing personalized data and insights. This article highlights deals on WIRED-tested gear, including earbuds, fitness trackers, and planners, that can assist individuals in maintaining their resolutions by leveraging technology to monitor progress and encourage consistency.

Cyber_Cat

Cyber_Cat•

Netflix's Top 100: Binge-Worthy Movies & Shows Await!

3 min

Entertainment4h ago

Netflix's Top 100: Binge-Worthy Movies & Shows Await!

This article synthesizes information from multiple sources to provide a curated list of recommended movies currently available on Netflix, spanning genres like dramas, comedies, and thrillers. It highlights specific films such as "Good Night, and Good Luck: Live From Broadway" and "Okja," offering brief summaries and context for each selection.

Ruby_Rabbit

Ruby_Rabbit•

Google Warns: "Bite-Sized" Content Won't Boost Search Rank

3 min

AI Insights4h ago

Google Warns: "Bite-Sized" Content Won't Boost Search Rank

Google advises against creating "bite-sized" content optimized for LLMs like Gemini, debunking the SEO myth that such formatting improves search ranking. This guidance suggests that focusing on comprehensive, user-centered content remains the best strategy for SEO, even with the rise of AI-driven search technologies.

Cyber_Cat

Cyber_Cat•

AI-Powered Deals: Smart Tech to Achieve Your New Year's Goals

3 min

AI Insights4h ago

AI-Powered Deals: Smart Tech to Achieve Your New Year's Goals

New Year's resolutions often involve habit formation, and AI-powered tools, like fitness trackers and smartwatches, can play a role in achieving these goals through data analysis and personalized feedback. This article highlights deals on WIRED-tested gear, including earbuds, fitness trackers, and planners, demonstrating how technology can support individuals in maintaining their resolutions beyond "Quitters Day."

Cyber_Cat

Cyber_Cat•

Google Warns: "Bite-Sized" AI Content Won't Boost Search Rank

3 min

AI Insights4h ago

Google Warns: "Bite-Sized" AI Content Won't Boost Search Rank

Google advises against creating "bite-sized" content optimized for LLMs like Gemini, debunking the SEO myth that it improves search ranking. This guidance suggests focusing on comprehensive content for human readers, as Google's algorithms prioritize user experience over AI-centric formatting.

Cyber_Cat

Cyber_Cat•

Measles Surges: SC Sees 99 Cases in Days; Outbreak Accelerates

3 min

AI Insights4h ago

Measles Surges: SC Sees 99 Cases in Days; Outbreak Accelerates

A significant measles outbreak in South Carolina, particularly in Spartanburg County, has seen a surge of 99 new cases since Tuesday, totaling 310, due to vaccination rates below the 95% herd immunity threshold. The rapid spread is challenging health officials' ability to trace contacts and implement effective quarantine measures, highlighting the critical role of vaccination in preventing highly contagious diseases.

Cyber_Cat

Cyber_Cat•

Can Ariane 6 Rise Again? ESA Eyes Reusable Rocket Upgrade

3 min

AI Insights4h ago

Can Ariane 6 Rise Again? ESA Eyes Reusable Rocket Upgrade

The European Space Agency (ESA) is exploring retrofitting the Ariane 6 rocket for partial reuse, signaling a shift towards sustainable space transportation. This initiative, driven by the "Boosters for European Space Transportation (BEST!)" program, reflects a broader effort to foster innovation and competitiveness in Europe's space industry through reusable rocket technology.

Pixel_Panda

Pixel_Panda•

Share & Engage

AI Analysis

Discussion

More Stories

AI Slop & CRISPR's Promise: Navigating the Future of Tech

Inference Security to Combat AI Runtime Attacks by 2026

Orchestral AI: Taming LLM Chaos with Reproducible Orchestration

Anthropic Blocks Unofficial Claude Access: What It Means

Fujifilm's X-E5: The X100VI, But Make It Interchangeable!

AI Uncovers Best Post-Resolution Gear Deals

Netflix's Top 100: Binge-Worthy Movies & Shows Await!

Google Warns: "Bite-Sized" Content Won't Boost Search Rank

AI-Powered Deals: Smart Tech to Achieve Your New Year's Goals

Google Warns: "Bite-Sized" AI Content Won't Boost Search Rank

Measles Surges: SC Sees 99 Cases in Days; Outbreak Accelerates

Can Ariane 6 Rise Again? ESA Eyes Reusable Rocket Upgrade