November 10, 2025 Kevin Anderson

AI Is Requesting More Data but Delivering Less Results

Artificial intelligence (AI) has become an integral part of modern technology, powering everything from virtual assistants to complex decision-making systems. Yet, despite AI’s insatiable appetite for vast amounts of data, a paradox has emerged: AI is asking for more data but delivering less intelligence. This paradox challenges the common assumption that more data automatically leads to smarter AI. Instead, it reveals critical issues around data quality, model training, and the true capabilities of AI systems.

In this blog post, we delve into this paradox, exploring the importance of human data, the dangers of AI systems learning from their own output, the implications of excessive data use, and the future of AI development. We also discuss the ethical responsibilities and sustainable solutions necessary to navigate this complex landscape.

Key Takeaways

Quality over quantity: High-quality, diverse, and representative datasets are essential for AI models to realize their full potential, rather than simply relying on vast amounts of data.
Risks of self-generated data: Training AI systems on their own output, or synthetic data alone, can lead to “model collapse,” diminishing diversity, creativity, and accuracy in AI-generated results.
Sustainable AI development: Efficient data usage, transparency, and ethical considerations are critical to creating AI systems that are both powerful and responsible.

Read Next Section

The Importance of Human Data in AI Development

Artificial intelligence systems fundamentally rely on data to learn and make decisions. However, the nature of the data—its quality, diversity, and authenticity—plays a far more significant role than sheer volume. Human data, encompassing real-world human interactions, behaviors, and feedback, provides the nuanced context that AI tools need to operate effectively.

AI developers and researchers are increasingly aware that feeding AI with the right data rather than just more data leads to better performance. Human intelligence is complex, involving subtle cues and contextual understanding that cannot be easily replicated by synthetic or low-quality datasets. As a result, AI models trained on high-quality human data exhibit enhanced capabilities in tasks such as natural language processing, image recognition, and decision making processes.

The Role of Human Data in AI Training

Data Type	Characteristics	Impact on AI Performance
Human Data	Complex, nuanced, context-rich	Enables AI to learn subtle patterns and context
Synthetic Data	Generated by AI models, lacks full realism	Useful for augmentation but insufficient alone
Training Data	Large datasets used for model learning	Quantity important but quality paramount
Text Data	Includes natural language, documents, messages	Critical for language models and conversational AI

The table above highlights different data types and their roles in AI training. While synthetic data and massive training data volumes are common, human data remains the cornerstone for building AI systems that reflect real-world complexities and deliver meaningful intelligence.

Read Next Section

The Dangers of Training AI on Its Own Output

One of the most significant challenges in AI development is the risk of “model collapse,” where AI systems become overly dependent on their own generated outputs as training data. This feedback loop causes the AI to lose diversity in its responses, leading to a degradation in quality and creativity.

When AI models are trained repeatedly on their own output, they tend to amplify existing biases and errors, resulting in less accurate and less reliable systems. This phenomenon undermines the ability of AI to generalize beyond the narrow scope of its training data, reducing its effectiveness in various AI applications.

On Model Collapse

“The model becomes poisoned with its own projection of reality.”
— Researchers in Nature Journal on AI Model Collapse (2024)
Source

This quote underscores the critical issue AI developers face when relying too heavily on synthetic or self-generated data. It highlights the necessity of incorporating diverse, high-quality human data to maintain the robustness and accuracy of AI models.

Read Next Section

The Consequences of Excessive Data Collection and Processing

Collecting and processing vast amounts of data is resource-intensive and comes with several unintended consequences. AI systems trained on excessive data can become overly complex, leading to difficulties in interpretation and troubleshooting. This complexity often results in cognitive overload for humans trying to understand AI decisions and outputs.

Moreover, the environmental impact of massive data centers required to store and process this data is significant. The power consumption, water usage, and infrastructure demands strain resources globally. Financial costs also escalate with the need for more storage, computing power, and maintenance.

Environmental and Financial Costs of AI Data

Resource	Impact	Example
Power	High electricity consumption in data centers	Data centers projected to use 8% of US power by 2030
Water	Large volumes for cooling AI hardware	Microsoft data centers consume over a billion liters daily
Money	High costs for infrastructure and maintenance	AI hardware and GPU shortages driving up prices
Network	Increased bandwidth demand	AI-driven internet traffic growth of 30-35% annually

This table illustrates the real-world impact of AI’s data demands on power, water, money, and network infrastructure, emphasizing the need for more efficient and sustainable AI solutions.

Read Next Section

The Benefits of Using Less Data: Efficiency and Effectiveness

Contrary to the “more is better” mindset, using less data—when carefully curated and high quality—can lead to more efficient and effective AI systems. Smaller datasets force AI developers to focus on relevant and meaningful information, improving model interpretability and reducing biases.

Data-efficient learning approaches, such as active learning and transfer learning, enable AI models to perform well with limited data by selectively acquiring new data or leveraging pre-trained models. These methods reduce cognitive and computational costs while maintaining or even enhancing AI performance.

Example of Data-Efficient AI Model – LIMO

The LIMO model, designed for mathematical reasoning, achieved impressive results by training on only 817 high-quality examples rather than billions of data points. It demonstrated 57.1% accuracy on the American Invitational Mathematics Examination (AIME) and 94.8% on the MATH dataset, showcasing the power of quality over quantity.

Source: Mallick, S. (2025). Less is More in AI Reasoning: A New Hypothesis for Data-Efficient Learning. Medium.

This example highlights how AI systems with smaller, well-curated datasets can outperform larger models trained on vast but noisy data, reinforcing the value of data quality and efficiency.

Read Next Section

Overcoming the Paradox: Towards Sustainable and Responsible AI

To resolve the paradox of AI demanding more data but delivering less intelligence, AI developers and companies must prioritize the quality, diversity, and representativeness of their datasets. This shift entails a focus on human data, ethical AI development, and sustainable computing practices.

New technology such as AI-first architectures and private AI systems can help manage data more securely and efficiently. Additionally, AI products that integrate transparency and accountability will foster trust and reduce unintended consequences.

About Ethical AI Development

“Greatness doesn’t come from machines—it comes from people. The future of work depends on how we use AI to amplify human intelligence.”
— Eric Mosley, Forbes (2024)
Source

This quote encapsulates the fundamental reason AI must be developed responsibly, with human intelligence and ethical considerations at the core.

Read Next Section

Conclusion

Artificial intelligence is at a crossroads. While its hunger for data grows, delivering meaningful intelligence requires a rethinking of how data is collected, curated, and used. By embracing high-quality human data, avoiding the pitfalls of training on AI’s own output, and focusing on data-efficient learning, AI developers can unlock the true capabilities of AI systems.

Sustainable AI solutions that balance power consumption, costs, and ethical responsibilities will shape the future of AI. Ultimately, the goal is to create AI that amplifies human intelligence and delivers real value across industries—from healthcare and education to energy and finance and beyond.

AI Is Requesting More Data but Delivering Less Results

Key Takeaways

The Importance of Human Data in AI Development

The Role of Human Data in AI Training

The Dangers of Training AI on Its Own Output

On Model Collapse

The Consequences of Excessive Data Collection and Processing

Environmental and Financial Costs of AI Data

The Benefits of Using Less Data: Efficiency and Effectiveness

Example of Data-Efficient AI Model – LIMO

Overcoming the Paradox: Towards Sustainable and Responsible AI

About Ethical AI Development

Conclusion

Keep Reading