Facebook owner Meta introduces AI model that can identify objects in images

Facebook’s parent company Meta on Wednesday launched a new artificial intelligence model that can identify objects within images.

Called “segment anything”, it aims to accelerate research into segmentation and offer more general image and video understanding.

Segmentation refers to the ability to identify which pixels in an image belong to a specific object.

Meta said its initiative aims to “democratise” segmentation, which is used in various applications, from analysing scientific imagery to editing photos.

Today we're releasing the Segment Anything Model (SAM) — a step toward the first foundation model for image segmentation.

SAM is capable of one-click segmentation of any object from any photo or video + zero-shot transfer to other segmentation tasks ➡️ https://t.co/qYUoePrWVi pic.twitter.com/zX4Rxb5Yfo
— Meta AI (@MetaAI) April 5, 2023

Usually, it requires highly specialised work by technical experts with access to AI-trained infrastructure and large volumes of data to create an accurate segmentation model for specific tasks.

Under the new project, Meta has released its general Segment Anything Model (SAM) and its Segment Anything 1-Billion mask dataset (SA-1B).

Both aim to enable a broad set of applications and foster further research into foundation models for computer vision.

Meta's Facebook warns users that 400 malicious apps are trying to steal their log-in data

Facebook parent Meta shares up 19% despite drop in first-quarter net profit

SAM covers a broad set of uses and can be even used for underwater photos.

It can improve creative applications such as extracting image parts for collages or video editing. It could also be used to boost scientific studies of natural occurrences on earth or even in space.

“We believe the possibilities are broad and we are excited by the many potential use cases we haven’t even imagined yet,” Meta said.

The SA-1B dataset is available for research purposes and SAM is available under a permissive open-licence framework.

“Our goal was to build a foundation model for image segmentation … a promptable model that is trained on diverse data and that can adapt to specific tasks, analogous to how prompting is used in natural language processing models,” Meta said.

“However, the segmentation data needed to train such a model is not readily available online or elsewhere, unlike images, videos, and text.

"Thus, with Segment Anything, we set out to simultaneously develop a general, promptable segmentation model and use it to create a segmentation dataset of unprecedented scale."

Earlier, to solve any kind of segmentation problem, there were two classes of approaches: interactive segmentation that required a person to guide the method; and automatic segmentation that needed substantial amounts of manually annotated objects to train.

“Neither approach provided a general, fully automatic approach to segmentation,” Meta said.

The SA-1B dataset that Meta released on Wednesday is the largest to date, the company claimed.

It collected this data using SAM.

“Annotators used SAM to interactively annotate images, and then the newly annotated data was used to update SAM in turn," Meta said.

"We repeated this cycle many times to iteratively improve both the model and dataset."

The final dataset includes more than 1.1 billion segmentation masks collected on about 11 million licensed and privacy-preserving images.