Google Unveils Gemini Model that Scrolls Internet Like Humans
In a significant breakthrough, Google has released its new AI model, Gemini, which can interact with website user interfaces (UIs) in real-time, mimicking human-like behavior. This development marks another step towards creating artificial intelligence that can operate across web environments with minimal human oversight.
According to Javier Zayas, Contributing Writer for ZDNET, the Gemini model is built atop Google's Gemini 2.5 Pro and can execute tasks such as clicking, typing, and scrolling directly within a web page. Users simply need to provide a prompt in natural language, and the model will automatically complete the task.
For instance, if a user prompts the model to "Open Wikipedia, search for 'Atlantis,' and summarize the history of the myth in Western thought," Gemini will navigate the website, perform the requested actions, and provide a summary. This capability is similar to tools developed by OpenAI and Anthropic, which also aim to create AI models that can interact with web environments.
"We're excited about the potential of Gemini to revolutionize how we interact with the internet," said a Google spokesperson. "This model has the ability to learn from user interactions and adapt to new tasks, making it an invaluable tool for various industries."
The development of Gemini is part of Google's broader effort to create AI that can operate independently in complex environments. The company acknowledges that its models are not without limitations, including hallucinations – a phenomenon where AI generates information that is not based on actual facts.
"AI has the potential to greatly benefit society, but it also requires careful consideration and development," said Dr. Fei-Fei Li, Director of the Stanford Artificial Intelligence Lab (SAIL). "The release of Gemini is an important step towards creating more advanced AI models, but we must continue to address the challenges associated with their development."
Gemini is currently available in public preview, allowing developers and researchers to test its capabilities. As the model continues to evolve, it is expected to have significant implications for various industries, including customer service, marketing, and education.
The release of Gemini marks a significant milestone in the development of AI that can interact with web environments. While there are still challenges associated with its development, the potential benefits of this technology are vast and far-reaching.
Background:
Google's Gemini model is built atop the company's Gemini 2.5 Pro framework, which provides a foundation for developing advanced AI models. The release of Gemini follows similar developments by OpenAI and Anthropic, which have also created AI models that can interact with web environments.
Additional Perspectives:
Experts in the field of artificial intelligence are hailing the development of Gemini as a significant breakthrough. "The ability to create AI models that can interact with web environments is a major step forward for the industry," said Dr. Andrew Ng, Co-Founder of Coursera and former Chief Scientist at Baidu.
However, some experts have raised concerns about the potential risks associated with developing advanced AI models. "While Gemini has the potential to greatly benefit society, we must also consider the challenges associated with its development," said Dr. Kate Crawford, Principal Researcher at Microsoft Research.
Current Status:
Gemini is currently available in public preview, allowing developers and researchers to test its capabilities. As the model continues to evolve, it is expected to have significant implications for various industries.
Next Developments:
Google has announced plans to continue developing Gemini, with a focus on addressing the challenges associated with its development. The company will also be working closely with industry partners to explore potential applications of the technology.
In conclusion, the release of Google's Gemini model marks an important step towards creating AI that can operate across web environments with minimal human oversight. While there are still challenges associated with its development, the potential benefits of this technology are vast and far-reaching.
*Reporting by Zdnet.*