Sci-Tech

Microsoft drops Florence-2, a unified model to handle a variety of vision tasks

Published

3 months ago

June 19, 2024

Microsoft drops Florence-2, a unified model to handle a variety of vision tasks

It’s time to celebrate the incredible women leading the way in AI! Nominate your inspiring leaders for VentureBeat’s Women in AI Awards today before June 18. Learn More

Today, Microsoft’s Az u re AI team dropped a new vision foundation model called Florence-2 on Hugging Face.

Available under a permissive MIT license, the model can handle a variety of vision and vision-language tasks using a unified, prompt-based representation. It comes in two sizes — 232M and 771M parameters — and already excels at tasks such as captioning, object detection, visual grounding and segmentation, performing on par or better than many large vision models out there.

While the real-world performance of the model is yet to be tested, the work is expected to give enterprises a single, unified approach to handle different types of vision applications. This will save investments on separate task-specific vision models that fail to beyond their primary function, without extensive fine-tuning.

What makes Florence-2 unique?

Today, large language models (LLMs) sit at the heart of enterprise operations. A single model can provide summaries, write marketing copies and even handle customer service in many cases. The level of adaptability across domains and tasks has been amazing. But, this success has also left researchers wondering: Can vision models, which have been largely task-specific, do the same?

VB Transform 2024 Registration is Open

Join enterprise leaders in San Francisco from July 9 to 11 for our flagship AI event. Connect with peers, explore the opportunities and challenges of Generative AI, and learn how to integrate AI applications into your industry. Register Now

At the core, vision tasks are more complex than text-based natural language processing (NLP). They demand comprehensive perceptual ability. Essentially, to achieve universal representation of diverse vision tasks, a model must be capable of understanding spatial data across different scales, from broad image-level concepts like object location, to fine-grained pixel details, as well as semantic details such as high-level captions to detailed descriptions.

When Microsoft tried solving this, it found two key roadblocks: Scarcity of comprehensively annotated visual datasets and the absence of a unified pretraining framework with a singular network architecture that integrated the ability to understand spatial hierarchy and semantic granularity.

To address this, the company first used specialized models to generate a visual dataset called FLD-5B. It included a total of 5.4 billion annotations for 126 million images, covering details from high-level descriptions to specific regions and objects. Then, using this data, it trained Florence-2, which uses a sequence-to-sequence architecture (a type of neural network designed for tasks involving sequential data) integrating an image encoder and a multi-modality encoder-decoder. This enables the model to handle various vision tasks, without requiring task-specific architectural modifications.

“All annotations in the dataset, FLD-5B, are uniformly standardized into textual outputs, facilitating a unified multi-task learning approach with consistent optimization with the same loss function as the objective,” the researchers wrote in the paper detailing the model. “The outcome is a versatile vision foundation model capable of performing a variety of tasks… all within a single model governed by a uniform set of parameters. Task activation is achieved through textual prompts, reflecting the approach used by large language models.”

Performance better than larger models

When prompted with images and text inputs, Florence-2 handles a variety of tasks, including object detection, captioning, visual grounding and visual question answering. More importantly, it delivers this with quality on par or better than many larger models.

For instance, in a zero-shot captioning test on the COCO dataset, both 232M and 771M versions of Florence outperformed Deepmind’s 80B parameter Flamingo visual language model with scores of 133 and 135.6, respectively. They even did better than Microsoft’s own visual grounding-specific Kosmos-2 model.

When fine-tuned with public human-annotated data, Florence-2, despite its compact size, was able to compete closely with several larger specialist models across tasks like visual question answering.

“The pre-trained Florence-2 backbone enhances performance on downstream tasks, e.g. COCO object detection and instance segmentation, and ADE20K semantic segmentation, surpassing both supervised and self-supervised models,” the researchers noted. “Compared to pre-trained models on ImageNet, ours improves training efficiency by 4X and achieves substantial improvements of 6.9, 5.5, and 5.9 points on COCO and ADE20K datasets.”

As of now, both pre-trained and fine-tuned versions of Florence-2 232M and 771M are available on Hugging Face under a permissive MIT license that allows for unrestricted distribution and modification for commercial use or private use.

It will be interesting to see how developers will put it to use and offload the need for separate vision models for different tasks. Small, task-agnostic models can not only save developers the need to work with different models but also cut down the compute costs by a significant margin.

VB Daily

Stay in the know! Get the latest news in your inbox daily

By subscribing, you agree to VentureBeat’s Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Source link

Sci-Tech

Cristiano Ronaldo first to hit 1bn social media followers

Published

7 days ago

September 13, 2024

Tom Gerken

Cristiano Ronaldo first to hit 1bn social media followers

Cristiano Ronaldo has hit 1bn total followers across his various social media accounts – making him the first person to reach that mind-boggling figure.

The number is calculated by combining his total number of followers across Instagram, Facebook, Twitter, YouTube, and Chinese social media sites Weibo and Kuaishou.

It does not equate to one billion individual followers, as many people will follow him across multiple platforms, and some will be fake accounts, known as bots.

Nonetheless social media expert Paolo Pescatore, from PP Foresight, described it as a “staggering number” that media and brands would pay close attention to.

“What an achievement, and it further underlines the fundamental shift taking place in media.”

It showed “the power to reach new, younger audiences thanks to technology”, he told the BBC.

On the pitch, Ronaldo was famed for his rivalry with Argentinian star Lionel Messi.

But off it, there is no competition for who is winning the social media contest – Messi has a mere 623 million followers.

Some of the other celebrities with the biggest presence on social media are:

690m: Selena Gomez, actor/singer
607m: Justin Bieber, singer
574m: Taylor Swift, singer

Other notable names the BBC looked into include The Rock (557m), Kylie Jenner (551m ) and Ariana Grande (508m).

MrBeast, the top YouTuber in the world, has 543m total followers, while WWE, often considered to have an enormous social media presence, can only point to reaching a quarter of the audience of Cristiano Ronaldo with 268m combined followers.

The footballer will have reached this milestone thanks to his decision to join YouTube last month, where his channel rocketed to 50 million subscribers within a single week.

So far, the channel consists mainly of conversations between Ronaldo and his wife Georgina Rodríguez, as well as his former Manchester United colleague Rio Ferdinand.

He announced the news in a post shared across his various social media platforms.

Cristiano Ronaldo has made a career out of breaking records.

His successes include being top scorer in Uefa Champions League history, having the most goals in the European Championship, and making more international appearances than anyone else.

Last week he became the first footballer to score 900 top-level career goals.

As with his playing career, he still has scope to improve his numbers on social media too, as unlike some of his rivals, he is not on TikTok or Threads.

All of which is likely to add to another figure he dominates: earnings.

According to Forbes, his total earnings now stand at $260 million – the highest of any athlete.

Source link

Sci-Tech

Musk and Zuckerberg have ‘polluted culture’

Published

7 days ago

September 13, 2024

Zoe Kleinman

Musk and Zuckerberg have ‘polluted culture’

Meta boss Mark Zuckerberg and X owner Elon Musk are “the worst polluters in human history”, Stephen Fry has said.

The actor and comedian made the claim during a lecture at Kings College, London.

“You and your children cannot breathe the air or swim in the waters of our culture without breathing in the toxic particulates and stinking effluvia that belch and pour unchecked from their companies into the currents of our world,” he said of the pair.

The BBC has approached the two men’s companies for comment.

Mr Fry has a track record of being an early adopter of technology – and was once a regular poster on X, when it was known as Twitter.

He stopped posting in 2022, a few months after the platform was purchased by Mr Musk, but has retained his account. He is no longer active on any social networks.

“I’m the chump who thought social media could change the world,” he told his audience at the Digital Futures Institute.

He said he was at first enthusiastic about the potential of social media to unite people around the world and bring about positive change in society, citing the Arab Spring protests which were coordinated online as an example – but added that he had been proved wrong.

He described what he considered to be a fatal flaw in attempts by early Facebook algorithms to “maximise engagement”, saying nobody had predicted that engagement would be “most maximised by… the worst passions” such as anger, shock and horror.

“We are decidedly hopeless at knowing where technology will take us or what it will do to us,” he said.

He returned to the theme several times throughout his one hour speech, in which he also considered the future of artificial intelligence.

Mr Fry argued that AI was “poised to disrupt every space we have”.

He said he hoped corporate greed would not corrupt the development of AI tech at the expense of safety.

“The best I can do is this – Einstein and Russell said in their manifesto on nuclear weapons – we appeal as human beings to human beings, remember your humanity and forget the rest,” he said.

Mr Fry’s broadside was not the only attack on Mr Musk.

Earlier on Thursday, senior Meta executive Sir Nick Clegg, talking at Chatham House, in London, had been similarly scathing of Mr Musk’s platform X.

The former deputy prime minister called it “a tiny, elite, news-obsessed, politics-obsessed app” and added that in his view the social network had become “a one-man hyper-partisan hobby horse.”

In March 2024 X claimed to have 550 million monthly visitors. Facebook has just over 3bn.

Additional reporting by Liv McMahon

Source link

Sci-Tech

Vodafone clashes with UK’s competition watchdog over Three merger

Published

7 days ago

September 13, 2024

Tom Gerken and Nick Edser

Vodafone clashes with UK’s competition watchdog over Three merger

Vodafone and Three have rejected claims by the UK’s competition watchdog that their proposed merger would lead to higher prices for millions of mobile users.

The Competition and Markets Authority (CMA) has “provisionally concluded” the deal would weaken competition between mobile networks.

It has particular concerns that customers who are least able to afford mobile services would be most affected.

The findings are the latest from the CMA’s ongoing probe into the merger, which it launched in January.

The regulator will now consult on its findings and potential solutions to its worries over competition.

These solutions could include legally binding investment commitments, and measures to protect both retail and wholesale customers.

Vodafone’s CEO for European Markets, Ahmed Essam, told the Today programme, on BBC Radio 4, that he still believed the merger would make a better network for customers, and add to the competition in the market.

“We’ve made a significant commitment to an £11bn investment,” he said.

“We’re willing to make sure that this is legally binding, and we undertake a commitment to deploy this.”

He also said the firm had already traded part of its radio spectrum with a competitor.

But the CMA said it is “not convinced” that it would be good for consumers.

“The main knockback to the merging parties is that the CMA considers claims of superior network quality post integration to be “overstated”,” said Kester Mann from analysis firm CCS Insight.

But he said the regulator was not shutting the door on the deal.

“Vodafone and Three should be encouraged by the tone of the CMA’s report, which appears more open to the merger than I was expecting.”

But Rocio Concha, director of policy and advocacy at consumer group Which?, took a different view.

“The regulator’s finding has set a high bar for the merger to proceed,” she said.

“It is clear from those findings that the planned merger between Vodafone and Three could have a negative impact on millions of consumers.”

But she warned it would be “challenging” for the regulator to find remedies for its concerns.

Vodafone and Three revealed plans to merge their UK-based operations in June last year, creating the biggest mobile network in the UK with around 27 million customers.

But the CMA provisionally concluded on Wednesday that such a deal would lead to a “substantial lessening in competition”.

In addition to worries over price and service levels, the regulator is also concerned that the deal may make it more difficult for smaller players such as Lyca Mobile, Sky Mobile and Lebara – who rent space from the bigger operators – to get a good deal.

Vodafone and Three have said the tie-up would lead to an additional investment of £11bn in the UK.

The CMA found that a merger of the two could improve the quality of mobile networks and accelerate next generation 5G networks and services, as claimed by the companies.

But it considered these claims were “overstated”, and that the merged firm would not necessarily have the incentive to carry out planned investment after the merger.

In a statement, Vodafone and Three said they disagreed with the CMA’s findings.

“By all measures, the merger is pro-growth, pro-customer and pro-competition. It can, and should, be approved by the CMA,” they said.

The CMA will issue a final report into the deal in December.

The firms added they would be working with the regulator to secure approval for the tie-up.

Source link