NTT Study: Artificial Neural Networks for Recognizing Natural Sounds Exhibit Human-Like Responses

2023-08-01 20:24

TOKYO--(BUSINESS WIRE)--Aug 1, 2023--

Nippon Telegraph and Telephone Corporation (headquartered in Chiyoda-ku, Tokyo; Akira Shimada, President & CEO; hereinafter "NTT") has discovered that artificial neural networks (NNs, *1) that recognize natural sounds (*2) show human-like responses to changes in sound amplitude. This study provides a unified understanding of the human perception of amplitude modulation (AM, *3), investigated by psychoacoustic studies, and AM processing in the brain, investigated by neuroscience studies. In the future, this research is expected to be applied to various fields including the medical and welfare areas, contributing to, for instance, the development of devices with similar mechanisms to human hearing. This research was published in the American scientific journal "Journal of Neuroscience" on May 24, 2023 (U.S. Eastern Time).

This press release features multimedia. View the full release here: https://www.businesswire.com/news/home/20230801173570/en/

Figure 1: Framework of this study. The responses of the NN trained on natural sounds were compared with human perception and brain activity, which advanced our understanding of perceptual functions and their mechanisms. (Photo: Business Wire)

1. Background

Humans recognize a sound based on various cues. One of the important cues is the pattern of slow temporal changes in the amplitude (amplitude modulation, AM, *3, Figure 2). NTT Laboratories has been conducting studies using artificial neural networks (NN, *1) to understand auditory AM processing. AM sounds were fed to NNs trained to recognize natural sounds (*4) and their responses were examined. Their responses to AM sounds were similar to those observed in animal brains. The results suggest that the response to AM sound in animal brains might be a result of adaptation to recognize natural sounds.

However, until now, we have only examined the relationship between sound recognition and the response properties of single neurons in the brain. We have not yet understood the relationship between sound recognition and perception, which results from the activities of many neurons. Moreover, we have only compared our NNs with non-human animal brains. It was not clear whether the same framework could explain human perception partly because the single neuron activities cannot be easily measured in humans. Therefore, we conducted a new study comparing NNs with human perception and demonstrated their similarities.

As a target perceptual property, we focused on the smallest AM depth that a person can detect (AM detection threshold*5 ). This has been investigated in many auditory studies, but little is known about its relationship with sound recognition, which is an essential auditory function in daily life.

2. Findings

Using artificial NNs trained for natural sound recognition, we simulated perceptual experiments and neuronal activity recording experiments. The results showed that the NNs exhibit human-like AM detection threshold patterns, even though we did not take the nature of the human or animal auditory system into account when constructing the NNs (Figure 3).

This suggests that the human AM detection threshold might also be a property arising from the adaptation of the auditory system to sound recognition during its evolution and/or development. Furthermore, we found that natural AM patterns during NN training are important for the NN to obtain this property. We also found that the layers in the NN that exhibited human-like AM detection threshold patterns corresponded to the inferior colliculus, the medial geniculate body, and the auditory cortex in the brain. This result provides insight into the brain regions involved in AM detection in humans (Figure 4).

These results provide a unified explanation of previous findings in perceptual psychology and neuroscience from the perspective of adaptation to natural sounds.

3. Key features

Simulation of perceptual experiments.

A multilayer (deep) artificial NN was used. To reduce possible biases of the researchers in the NN construction, it was trained to recognize sounds using sound waveforms as input without manually designed features. The computer simulation of AM detection was performed using the same sound stimulus as those in human perception experiments.

This made it possible to directly compare the obtained AM detection thresholds with those of humans. When a stimulus sound is fed to the model, a time series of activity values is obtained from each NN unit. To calculate the AM detection threshold of the NN, we time-averaged the unit activities in each layer and estimated whether the stimulus was an AM or non-AM sound from the time-averaged activities (Figure 5). By performing this procedure for AM stimuli with various depths, we calculated the minimum AM depth required to discriminate whether or not the stimulus sound is an AM sound (i.e., AM detection threshold).

Sound features necessary for a human-like AM detection threshold.

We also confirmed that the AM patterns of natural sounds for training are important for NNs to acquire human-like AM detection thresholds. We trained NNs for the recognition of sounds that retained their natural AM structure (*6) and sounds the AM structure of which was destroyed (*6). The NNs trained on sounds with a natural AM structure exhibited a similar AM detection threshold to those of humans (Figure 3).

4. Future directions

Auditory studies often try to understand perceptual properties such as detection thresholds by simulating sensory information processing in a multi-stage model. In the future, we will clarify the correspondence between the processing stages in such existing models and our NN, and examine in detail which stages of auditory information processing can or cannot be explained by adaptation to sound recognition.

The present study suggests that AM patterns in natural sounds are important for NNs to acquire a human-like detection threshold. This finding may lead to a better understanding of brain development/plasticity and the mechanisms behind hearing difficulties. For example, signals reaching the brain can change due to some damage in the auditory periphery. If such a condition can be modeled, it will be possible to analyze the effects of hearing loss or its compensation by information processing in the brain. This may lead to the development of devices that more closely resemble the mechanism of human hearing for medical and welfare applications.

The framework of this research can be extended to auditory functions other than AM processing and to sensory functions more generally. For example, the process by which sound information from both ears is integrated has been studied as extensively as AM processing, but there is currently little unified understanding linking the psychophysical and neurophysiological findings regarding human binaural sound processing. The same paradigm adopted for this research can be used to explore these functions.

Support for this research

This research was supported by JSPS Grant-in-Aid for Scientific Research 20H05957 (Research on Area of Scientific Transformation (A) Deep Texture).

Paper information

Human-like Modulation Sensitivity Emerging through Optimization to Natural Sound Recognition. Takuya Koumura, Hiroki Terashima, and Shigeto Furukawa. Journal of Neuroscience 24 May 2023, 43 (21) 3876-3894; https://doi.org/10.1523/JNEUROSCI.2002-22.2023

Glossary

*1 Artificial neural network (NN)
A type of machine learning model that often performs complicated classification tasks with high accuracy. It processes data using a structure consisting of many consecutive layers, each layer consisting of many units. A unit in a layer receives input from the units in the layer below, and after simple processing, its output is transmitted to the units in the next layer.

*2 Natural sound
Sounds that humans hear on a daily basis. For example, animal vocalizations, the sound of rain, sneezing, the sound of a door creaking, and the sound of a car engine.

*3 Amplitude modulation (AM)
A pattern of slow changes in the amplitude of a signal (amplitude envelope). Important parameters describing amplitude modulation are its speed and depth (Figure 2).

*4 Training a machine learning model for sound recognition
Adjusting parameters of the model to increase the accuracy of sound recognition. In the case of an NN, parameters such as the number of units in a layer and the connection pattern and weights between units are adjusted.

*5 AM detection threshold
The minimum AM depth required to distinguish whether a sound stimulus is amplitude modulated or not. Experimentally, it is measured by whether AM and non-AM sounds (sounds without slow changes in amplitude) can be discriminated. In general, the deeper the AM, the easier it is to discriminate between them.

*6 Sounds the AM structure of which is preserved or destroyed
A sound was divided into its amplitude envelope that reflects the AM structure and its temporal fine structure (TFS) that is a faster variation. By combining the amplitude envelope of the original sound and the TFS of a noise sound, we generated a sound the AM structure of which was preserved. By combining the constant amplitude envelope and the TFS of the original sound, we generated a sound the AM structure of which was destroyed. Hilbert transform was used to divide a sound into its amplitude envelope and its TFS.

About NTT

NTT believes in resolving social issues through our business operations by applying technology for good. An innovative spirit has been part of our culture for over 150 years, making breakthroughs that enable a more naturally connected and sustainable world. NTT Research and Development shares insights, innovations and knowledge with NTT operating companies and partners to support new ideas and solutions. Around the world, our research laboratories focus on artificial intelligence, photonic networks, theoretical quantum physics, cryptography, health and medical informatics, smart data platforms and digital twin computing. As a top-five global technology and business solutions provider, our diverse teams deliver services to over 190 countries and regions. We serve over 75% of Fortune Global 100 companies and thousands of other clients and communities worldwide. For more information on NTT, visit https://www.rd.ntt/e/.

NTT and the NTT logo are registered trademarks or trademarks of NIPPON TELEGRAPH AND TELEPHONE CORPORATION and/or its affiliates. All other referenced product names are trademarks of their respective owners. © 2023 NIPPON TELEGRAPH AND TELEPHONE CORPORATION

View source version on businesswire.com:https://www.businesswire.com/news/home/20230801173570/en/

CONTACT: Nick Gibiser

Wireside Communications®

For NTT

+1-804-500-6660

ngibiser@wireside.com

KEYWORD: JAPAN ASIA PACIFIC

INDUSTRY KEYWORD: RESEARCH TECHNOLOGY MEDICAL DEVICES NEUROLOGY OTHER TECHNOLOGY AUDIO/VIDEO OTHER HEALTH HEALTH SCIENCE OTHER SCIENCE

SOURCE: NTT

PUB: 08/01/2023 08:07 AM/DISC: 08/01/2023 08:05 AM

http://www.businesswire.com/news/home/20230801173570/en