Multimodal tactile sensing fused with vision for dexterous robotic housekeeping

With robotics becoming increasingly entering our daily lives for assisting old people and disabled persons, the pursuit of mimicking human perceptual and cognitive abilities has driven advancements in the development of robots with multimodal sensing capabilities, including touch, vision, etc. In recent years, tactile sensing and its fusion with visual sense (referred to as tactile-visual fusion sensing) have emerged as groundbreaking approaches in this endeavor. These technologies hold the potential to revolutionize the way that robots perceive their environment and interact with it. However, current perceptual technologies still cannot meet the demands of handling most robotic tasks in complex house-living environments. For example, vision alone struggles to distinguish between similarly-shaped objects, such as crumpled paper, plastic bag, and napkin. Moreover, vision alone falls short in assisting dexterous manipulations of slippery or fragile objects without tactile sense. Current tactile sensors still face great challenges in multisensory integration and fusion, rapid response capability, and highly sensitive perception.

Fig. 1 A tactile-visual fusion robot enables complex household tasks.

In our recent work published in Nature Communications doi:10.1038/s41467-024-51261-5, we propose a tactile-visual fusion robot system to handle daily necessaries for accomplishing complex housekeeping tasks (Figure 1). Firstly, we develop a flexible tactile sensor utilizing multilayer thin-film thermistors (Pt) to implement multimodal perceptions of contact pressure, temperature, matter thermal properties, and texture of objects, as well as slip detection (Figure 2). Notably, the tactile sensor exhibits an ultrasensitive (0.05 mm/s) and ultrafast (4 ms) slip sensing capability which is extremely indispensable for dexterous and reliable grasping control, mitigating the risk of dropping or crushing objects, and preventing potential harm (Figure 3a-c). Micro features in the response signals of the top sensing layer of the tactile sensor can be employed to detect the textures of objects, enabling the sensor fine recognize daily necessities, such as fabrics (Figure 3d-f). These perceptual capabilities are significant for assisting robots to work in complex house-living environments.

Fig. 2: The structure, working principle, and functions of the multimodal tactile sensor. (a) The structure of the tactile sensor is composed of a top sensing layer, a bottom sensing layer, PDMS, and a porous material in the middle. (b) Working principle of the bottom sensing layer. (c) The pressure response of the tactile sensor. (d) Working principle of the top sensing layer. (e) The output signals of the tactile sensor responding to ambient temperature and object temperature, respectively. (f) The responses of the tactile sensor when touching on different materials (g) The responses of the tactile sensor when contacting with and slipping on different objects, respectively. (h) Slip and texture are detected from the macro and micro features of the tactile sensor signals.

Fig. 3: (a) Macro features for different materials at different slipping velocities. (b) The tactile sensor has a low detection limit of 0.05 mm/s for slip detection. (c) The tactile sensor has a fast response time of 4 ms for slip detection. (d) Micro features at different slipping velocities. (e) FFT analyses of the micro features at different slipping velocities. (f) FFT analyses of the micro features at different slipping velocities.

Although the proposed multi-modal tactile sensor bestows robots with rapid and intricate tactile perception, tactile sensing alone falls short in meeting the demands of robots in complex scenarios. In this paper, we propose and develop a multi-tactile-visual fusion architecture combining multimodal tactile with robot vision that seamlessly encompasses the multimodal sensations from the bottom level to the robotic decision-making at the top level (Figure 4). On this basis, we devise the corresponding tactile-visual fusion strategies for dexterous grasping and accurate recognition on daily objects. The grasping strategy utilizes ultrafast and ultrasensitive slip feedback control to achieve fine grasping with minimum grip strength, preventing the crushing of fragile objects. Combining vision also assists in the object position and pose for fine grasping. The tactile-visual fusion recognition strategy employs a hybrid cascade strategy to achieve accurate recognition of various daily necessities. By leveraging the proposed multimodal tactile sensors and tactile-visual fusion system, the robot autonomously accomplishes multi-item sorting and cleaning desktop, most are challenging tasks. For example, when dealing with a cup containing liquid, the robot capably detects the liquid by tactile-based grasping, then pours the liquid into a water tank, and finally deposits the empty cup into the recyclable box. For some items that are difficult to grasp, such as a pen, a piece of paper, book, the robot with tactile-visual fusion can intelligently handle them by moving the objects to the edge of the table and then dexterously grasping them just like humans. This elaborate sorting and collection ensure the success of excellent housekeeping service.

Fig. 4: A tactile-visual fusion robot architecture, including signal level, perception level, decision level, and system level.

The developed multimodal tactile sensors and the proposed tactile-visual fusion robot architecture endow the robot with excellent perceptual and executive capabilities, facilitating flexible and reliable interaction with humans and helping humans in daily lives.

Multimodal tactile sensing fused with vision for dexterous robotic housekeeping

Turbocharging protein binding site prediction with geometric attention, inter-resolution transfer learning, and homology-based augmentation | BMC Bioinformatics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Hot Topics

Turbocharging protein binding site prediction with geometric attention, inter-resolution transfer learning, and homology-based augmentation | BMC Bioinformatics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Turbocharging protein binding site prediction with geometric attention, inter-resolution transfer learning, and homology-based augmentation | BMC Bioinformatics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Popular Articles

Turbocharging protein binding site prediction with geometric attention, inter-resolution transfer learning, and homology-based augmentation | BMC Bioinformatics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models