Deep Learning-Based Flexible Piezoelectric Acoustic Sensors with Noise-Robust Voice Coverage for Speech Processing

When and Where

May 11, 2022
5:00pm - 7:00pm

Hawai'i Convention Center, Level 1, Kamehameha Exhibit Hall 2 & 3

Presenter

Young Hoon Jung

Co-Author(s)

Young Hoon Jung¹

Korea Advanced Institute of Science and Technology¹

Abstract

Young Hoon Jung¹

Korea Advanced Institute of Science and Technology¹

In the era of artificial intelligence of things (AIoT), flexible piezoelectric acoustic sensors (f-PAS) have been spotlighted as a promising candidate for voice user interfaces (VUI) by mimicking the human cochlea (trapezoidal membrane and ~10,000 hair cell channels). However, biomimetic f-PAS can induce the signal distortion in real-life applications, due to the fundamental difference and high sensitivity compared with the conventional microphones.<br/>Herein, we demonstrate a deep learning-based noise-robust flexible piezoelectric acoustic sensor (NPAS) for speech processing. The noise-robust response was achieved via three methods: i) the frequency coverage of multi-channel NPAS, ii) the optimized seven-signal processing by convolutional neural network (CNN), iii) the newly designed deep U-net model for speech enhancement. The NPAS achieved noise-robust response and 0.1 – 8 kHz coverage by designing the multi-resonant bands outside the noise dominant spectrum, and using Nb-doped PZT (PNZT) membrane, respectively. Compared to the condenser microphones, the highly sensitive NPAS showed the clear sound detection with 35 times higher sensitivity and SNR. Adopting the newly optimized CNN model with the channel attention method, the NPAS exhibited a 62% reduction in error rate compared to the MEMS microphone. Deep U-net based speech enhancement of the NPAS was also achieved via the selective processing of multi-channel signals. Last, the AI-based NPAS demonstrated the separation of multi-speaker’s voices from a crowd.