Young Hoon Jung1
Korea Advanced Institute of Science and Technology1
Young Hoon Jung1
Korea Advanced Institute of Science and Technology1
In the era of artificial intelligence of things (AIoT), flexible piezoelectric acoustic sensors (f-PAS) have been spotlighted as a promising candidate for voice user interfaces (VUI) by mimicking the human cochlea (trapezoidal membrane and ~10,000 hair cell channels). However, biomimetic f-PAS can induce the signal distortion in real-life applications, due to the fundamental difference and high sensitivity compared with the conventional microphones.<br/>Herein, we demonstrate a deep learning-based noise-robust flexible piezoelectric acoustic sensor (NPAS) for speech processing. The noise-robust response was achieved via three methods: i) the frequency coverage of multi-channel NPAS, ii) the optimized seven-signal processing by convolutional neural network (CNN), iii) the newly designed deep U-net model for speech enhancement. The NPAS achieved noise-robust response and 0.1 – 8 kHz coverage by designing the multi-resonant bands outside the noise dominant spectrum, and using Nb-doped PZT (PNZT) membrane, respectively. Compared to the condenser microphones, the highly sensitive NPAS showed the clear sound detection with 35 times higher sensitivity and SNR. Adopting the newly optimized CNN model with the channel attention method, the NPAS exhibited a 62% reduction in error rate compared to the MEMS microphone. Deep U-net based speech enhancement of the NPAS was also achieved via the selective processing of multi-channel signals. Last, the AI-based NPAS demonstrated the separation of multi-speaker’s voices from a crowd.