Microsoft Copilot Vision is here, letting AI see what you do online
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. learn more
Microsoft co-pilot Getting smarter every day. The Satya Nadella-led company just announced that its artificial intelligence assistant now has “visual” capabilities, allowing it to browse the web with users.
Although the function is First announced in October this yearthe company is now previewing it to a select group of Pro subscribers. According to Microsoft, these users will be able to trigger Copilot Vision on web pages opened in the Edge browser and interact with them with content visible on the screen.
The feature is still in the early stages of development and is fairly limited, but once fully developed it could be a game-changer for Microsoft’s enterprise customers – helping them with analysis and decision-making when interacting with the company’s products. Systems (OneDrive, Excel, SharePoint, etc.)
In the long term, see how Copilot Vision stacks up against more open and capable agency offerings such as those from Anthropic selection and Emerging Artificial Intelligenceallowing developers to integrate agents to view, reason about, and take action across applications from different vendors.
What can we expect from Copilot Vision?
When a user opens a website, they may or may not have an intended goal. But when they do, like researching an academic paper, the process of performing the required tasks revolves around browsing the website, reading all of its content and then making a call about it (e.g. should the website’s content be used as a reference) whether it’s a paper or not ). The same applies to other daily online tasks, such as shopping.
With the new Copilot Vision experience, Microsoft aims to make the entire process easier. Essentially, users now have an assistant located at the bottom of the browser that can be called upon at any time when needed to read the website content, cover all text and images, and help make decisions.
It scans, analyzes and delivers all the required information instantly, taking into account the user’s intended goals – like a second pair of eyes.
This feature has far-reaching benefits – it can instantly speed up your workflow – as well as significant impact because agents are reading and evaluating whatever you’re browsing. However, Microsoft assures that once the Vision session is closed, all context and information shared by the user will be deleted. It also states that website data is not captured/stored to train the underlying model.
“Simply put, we prioritize copyright, creators, and user privacy and security and put them first,” the Copilot team wrote in a blog post announcing the preview of the feature.
Expand based on feedback
Currently, a selection of Copilot Pro Subscribers In the United States, users who have signed up for the early access Copilot Labs program will be able to use visual features in their Edge browser. The feature will be opt-in, meaning they won’t have to worry about artificial intelligence reading their screen all the time.
Additionally, at this stage it only works on selected websites. Microsoft said it will listen to feedback from early users and gradually improve the feature, while expanding support for more Pro users and other websites.
In the long term, the company may even extend these capabilities to other products in its ecosystem, such as OneDrive and Excel, making it easier for business users to work and make decisions. However, there is no official confirmation yet. Not to mention, given the caution hinted at here, it might take some time before that becomes a reality.
Microsoft’s preview of Copilot Vision comes as competitors push the field of agent artificial intelligence. Salesforce has Introducing AgentForce Across its Customer 360 products, it automates workflows in areas such as sales, marketing and service.
Meanwhile, Anthropic launched “Computer use” It allows developers to integrate Claude to interact with the computer desktop environment and perform tasks previously only handled by humans, such as opening applications, interacting with interfaces and filling out forms.
2024-12-06 16:38:15