Multimodal AI needs active human interaction

Consider that you are comfortably seated on your sofa, reading this article in Nature Human Behaviour. Your AI assistant says “I see you are reading about multimodal AI. But you asked me to remind you that we need to finalize visuals for your presentation tomorrow morning. I’ve gone through the audience feedback of your last presentation and have mocked up new illustrations to introduce your idea.”Current multimodal AI models have the ingredients for this sort of interaction. Many real-life tasks, such as driving and medical diagnosis1, are difficult to solve solely through verbal communication and require multimodal information. Recent commercial general-purpose AI is equipped with modalities of vision and audition (for example, GPT-4o, Gemini 1.5 and Claude 3). Techniques such as retrieval-augmented generation are being developed to enable large language models (LLMs) to use multimodal databases2. Portable multimodal AI devices (for example, the handheld rabbit r1, the wearable ai pin, and Ray-Ban Meta smart glasses) are being developed to provide assistance in the physical world. This multimodal trend greatly increases the range of problems that can be solved or assisted by AI tools. It also opens up human–AI communication channels in real time through voice and facial expression.

Multimodal AI needs active human interaction

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Multi-output prediction of dose–response curves enables drug repositioning and biomarker discovery

Hot Topics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Popular Articles

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis