Sci-Tech

Meta’s Ray-Ban Smart Glasses Use AI to See, Hear and Speak. What Are They Like?

Published

2 months ago

March 28, 2024

In a sign that the tech industry keeps getting weirder, Meta soon plans to release a big update that transforms the Ray-Ban Meta, its camera glasses that shoot videos, into a gadget seen only in sci-fi movies.

Next month, the glasses will be able to use new artificial intelligence software to see the real world and describe what you’re looking at, similar to the A.I. assistant in the movie “Her.”

The glasses, which come in various frames starting at $300 and lenses starting at $17, have mostly been used for shooting photos and videos and listening to music. But with the new A.I. software, they can be used to scan famous landmarks, translate languages and identify animal breeds and exotic fruits, among other tasks.

To use the A.I. software, wearers just say, “Hey, Meta,” followed by a prompt, such as “Look and tell me what kind of dog this is.” The A.I. then responds in a computer-generated voice that plays through the glasses’ tiny speakers.

The concept of the A.I. software is so novel and quirky that when we — Brian X. Chen, a tech columnist who reviewed the Ray-Bans last year, and Mike Isaac, who covers Meta and wears the smart glasses to produce a cooking show — heard about it, we were dying to try it. Meta gave us early access to the update, and we took the technology for a spin over the last few weeks.

We wore the glasses to the zoo, grocery stores and a museum while grilling the A.I. with questions and requests.

The upshot: We were simultaneously entertained by the virtual assistant’s goof-ups — for example, mistaking a monkey for a giraffe — and impressed when it carried out useful tasks like determining that a pack of cookies was gluten-free.

A Meta spokesman said that because the technology was still new, the artificial intelligence wouldn’t always get things right, and that feedback would improve the glasses over time.

Meta’s software also created transcripts of our questions and the A.I.’s responses, which we captured in screenshots. Here are the highlights from our month of coexisting with Meta’s assistant.

Pets

BRIAN: Naturally, the very first thing I had to try Meta’s A.I. on was my corgi, Max. I looked at the plump pooch and asked, “Hey, Meta, what am I looking at?”

“A cute Corgi dog sitting on the ground with its tongue out,” the assistant said. Correct, especially the part about being cute.

MIKE: Meta’s A.I. correctly recognized my dog, Bruna, as a “black and brown Bernese Mountain dog.” I half expected the A.I. software to think she was a bear, the animal that she is most consistently mistaken for by neighbors.

Zoo Animals

BRIAN: After the A.I. correctly identified my dog, the logical next step was to try it on zoo animals. So I recently paid a visit to the Oakland Zoo in Oakland, Calif., where, for two hours, I gazed at about a dozen animals, including parrots, tortoises, monkeys and zebras. I said: “Hey, Meta, look and tell me what kind of animal that is.”

The A.I. was wrong the vast majority of the time, in part because many animals were caged off and farther away. It mistook a primate for a giraffe, a duck for a turtle and a meerkat for a giant panda, among other mix-ups. On the other hand, I was impressed when the A.I. correctly identified a species of parrot known as the blue-and-gold macaw, as well as zebras.

The strangest part of this experiment was speaking to an A.I. assistant around children and their parents. They pretended not to listen to the only solo adult at the park as I seemingly muttered to myself.

Food

MIKE: I also had a peculiar time grocery shopping. Being inside a Safeway and talking to myself was a bit embarrassing, so I tried to keep my voice low. I still got a few sideways looks.

When Meta’s A.I. worked, it was charming. I picked up a pack of strange-looking Oreos and asked it to look at the packaging and tell me if they were gluten-free. (They were not.) It answered questions like these correctly about half the time, though I can’t say it saved time compared with reading the label.

But the entire reason I got into these glasses in the first place was to start my own Instagram cooking show — a flattering way of saying I record myself making food for the week while talking to myself. These glasses made doing so much easier than using a phone and one hand.

The A.I. assistant can also offer some kitchen help. If I need to know how many teaspoons are in a tablespoon and my hands are covered in olive oil, for example, I can ask it to tell me. (There are three teaspoons in a tablespoon, just FYI.)

But when I asked the A.I. to look at a handful of ingredients I had and come up with a recipe, it spat out rapid-fire instructions for an egg custard — not exactly helpful for following directions at my own pace.

A handful of examples to choose from could have been more useful, but that might require tweaks to the user interface and maybe even a screen inside my lenses.

A Meta spokesman said users could ask follow-up questions to get tighter, more useful responses from its assistant.

BRIAN: I went to the grocery store and bought the most exotic fruit I could find — a cherimoya, a scaly green fruit that looks like a dinosaur egg. When I gave Meta’s A.I. multiple chances to identify it, it made a different guess each time: a chocolate-covered pecan, a stone fruit, an apple and, finally, a durian, which was close, but no banana.

Monuments and Museums

MIKE: The new software’s ability to recognize landmarks and monuments seemed to be clicking. Looking down a block in downtown San Francisco at a towering dome, Meta’s A.I. correctly responded, “City Hall.” That’s a neat trick and perhaps helpful if you’re a tourist.

Other times were hit or miss. As I drove home from the city to my house in Oakland, I asked Meta what bridge I was on while looking out the window in front of me (both hands on the wheel, of course). The first response was the Golden Gate Bridge, which was wrong. On the second try, it figured out I was on the Bay Bridge, which made me wonder if it just needed a clearer shot of the newer portion’s tall, white suspension poles to be right.

BRIAN: I visited San Francisco’s Museum of Modern Art to check if Meta’s A.I. could do the job of a tour guide. After snapping photos of about two dozen paintings and asking the assistant to tell me about the piece of art I was looking at, the A.I. could describe the imagery and what media was used to compose the art — which would be nice for an art history student — but it couldn’t identify the artist or title. (A Meta spokesman said another software update it released after my museum visit improved this ability.)

After the update, I tried looking at images on my computer screen of more famous works of art, including the Mona Lisa, and the A.I. correctly identified those.

Languages

BRIAN: At a Chinese restaurant, I pointed at a menu item written in Chinese and asked Meta to translate it into English, but the A.I. said it currently only supported English, Spanish, Italian, French and German. (I was surprised, because Mark Zuckerberg learned Mandarin.)

MIKE: It did a pretty good job translating a book title into German from English.

Bottom Line

Meta’s A.I.-powered glasses offer an intriguing glimpse into a future that feels distant. The flaws underscore the limitations and challenges in designing this type of product. The glasses could probably do better at identifying zoo animals and fruit, for instance, if the camera had a higher resolution — but a nicer lens would add bulk. And no matter where we were, it was awkward to speak to a virtual assistant in public. It’s unclear if that ever will feel normal.

But when it worked, it worked well and we had fun — and the fact that Meta’s A.I. can do things like translate languages and identify landmarks through a pair of hip-looking glasses shows how far the tech has come.

Source link

Sci-Tech

New version of Chat-GPT can teach maths and flirt

Published

8 hours ago

May 13, 2024

Zoe Kleinman

OpenAI has unveiled a new, faster version of its generative AI tool, ChatGPT.

Source link

Sci-Tech

OpenAI Unveils New ChatGPT That Listens, Looks and Talks

Published

8 hours ago

May 13, 2024

Cade Metz

As Apple and Google transform their voice assistants into chatbots, OpenAI is transforming its chatbot into a voice assistant.

On Monday, the San Francisco artificial intelligence start-up unveiled a new version of its ChatGPT chatbot that can receive and respond to voice commands, images and videos.

The company said the new app — based on an A.I. system called GPT-4o — juggles audio, images and video significantly faster than previous version of the technology. The app will be available starting on Monday, free of charge, for both smartphones and desktop computers.

“We are looking at the future of the interaction between ourselves and machines,” said Mira Murati, the company’s chief technology officer.

The new app is part of a wider effort to combine conversational chatbots like ChatGPT with voice assistants like the Google Assistant and Apple’s Siri. As Google merges its Gemini chatbot with the Google Assistant, Apple is preparing a new version of Siri that is more conversational.

OpenAI said it would gradually share the technology with users “over the coming weeks.” This is the first time it has offered ChatGPT as a desktop application.

The company previously offered similar technologies from inside various free and paid products. Now, it has rolled them into a single system that is available across all its products.

During an event streamed on the internet, Ms. Murati and her colleagues showed off the new app as it responded to conversational voice commands, used a live video feed to analyze math problems written on a sheet of paper and read aloud playful stories that it had written on the fly.

The new app cannot generate video. But it can generate still images that represent frames of a video.

With the debut of ChatGPT in late 2022, OpenAI showed that machines can handle requests more like people. In response to conversational text prompts, it could answer questions, write term papers and even generate computer code.

ChatGPT was not driven by a set of rules. It learned its skills by analyzing enormous amounts of text culled from across the internet, including Wikipedia articles, books and chat logs. Experts hailed the technology as a possible alterative to search engines like Google and voice assistants like Siri.

Newer versions of the technology have also learned from sounds, images and video. Researchers call this “multimodal A.I.” Essentially, companies like OpenAI began to combine chatbots with A.I. image, audio and video generators.

(The New York Times sued OpenAI and its partner, Microsoft, in December, claiming copyright infringement of news content related to A.I. systems.)

As companies combine chatbots with voice assistants, many hurdles remain. Because chatbots learn their skills from internet data, they are prone to mistakes. Sometimes, they make up information entirely — a phenomenon that A.I. researchers call “hallucination.” Those flaws are migrating into voice assistants.

While chatbots can generate convincing language, they are less adept at taking actions like scheduling a meeting or booking a plane flight. But companies like OpenAI are working to transform them into “A.I. agents” that can reliably handle such tasks.

OpenAI previously offered a version of ChatGPT that could accept voice commands and respond with voice. But it was a patchwork of three different A.I. technologies: one that converted voice to text, one that generated a text response and one that converted this text into a synthetic voice.

The new app is based on a single A.I. technology — GPT-4o — that can accept and generate text, sounds and images. This means that the technology is more efficient, and the company can afford to offer it to users for free, Ms. Murati said.

“Before, you had all this latency that was the result of three models working together,” Ms. Murati said in an interview with The Times. “You want to have the experience we’re having — where we can have this very natural dialogue.”

Source link