(Nanowerk Spotlight) The human brain effortlessly converts spoken words into meaning, requiring minimal energy while achieving remarkable accuracy. Replicating this feat in artificial systems has proved challenging. Although speech recognition technology has advanced significantly, the underlying computing architectures remain fundamentally inefficient, demanding considerable power and computational resources.
Traditional systems rely on the von Neumann architecture, where memory and processing units are physically separated. This creates performance bottlenecks, especially in tasks that involve complex temporal patterns like speech. Even artificial neural networks, which have improved recognition performance, require extensive training across all layers—an energy-intensive process that differs starkly from how the brain learns.
Reservoir computing (RC) offers a more efficient alternative. Instead of training every layer, RC uses a fixed, randomly connected system—the “reservoir”—to transform input signals into higher-dimensional representations. Only the output layer is trained, significantly reducing overhead. This setup mirrors some aspects of how biological neural circuits process temporal information, particularly through memory and nonlinear responses.
Various physical systems have been explored for RC implementations, including photonic devices, spintronic oscillators, and memristors. Yet these often suffer from fixed nonlinear properties, which limit adaptability. Different tasks demand different degrees of nonlinearity. Speech recognition, for instance, requires more complex transformations than binary classification.
Researchers at Shandong University in China have now developed a new RC system that uses cavity magnon polaritons (CMPs) to recognize spoken digits with 99.2% accuracy while consuming just 2.5 picojoules per input pulse. Their work, published in Advanced Functional Materials (“Speech Recognized by Cavity Magnon Polaritons”), combines magnetic and microwave physics to build a tunable, energy-efficient computing platform.
a) Schematic diagram of the speech recognition RC system based on the CMPs. The audio signal is converted into time-domain signals in 64 frequency bands after passing through the cochlea model. After mask processing, the input vector is obtained and then fed into the reservoir. The output vector, which is composed of the response signals of the virtual nodes in the reservoir, is processed by the Wout (weight matrix, trained through linear regression), and finally, the recognition result is obtained. b) Schematic diagram of the experimental setup. The central component is the coupling of a planar resonant cavity (orange) and a YIG sphere (black). The input signal is generated by an arbitrary waveform generator and then propagates through the signal line (blue). This signal will cause the magnetic field near the YIG sphere to change rapidly. A microwave source with a designated frequency is used to drive CMPs. The amplifier and diode, independent of CMP, are separately integrated onto the PCB for signal processing. The microwaves radiated by CMPs are first amplified by an amplifier, and then rectified by a diode into voltage signals, and finally, the output signals are collected by an oscilloscope. c) Description of the magnetization relaxation process after the impulse signal is applied. When a pulsed magnetic field (green part) is applied to the CMPs, the dynamic motion of the system transitions between two different frequencies 𝜔1 and 𝜔2. (Image: Reprinted with permission by Wiley-VCH Verlag)
CMPs are hybrid states formed by strong coupling between microwave photons in a resonant cavity and magnons—collective spin excitations—in a magnetic material. This interaction produces a system with several features ideal for RC: fast response, tunable nonlinearity, and short-term memory effects.
The researchers constructed their device from a planar microwave cavity coupled to a small yttrium iron garnet (YIG) sphere. By applying an external magnetic field, they could precisely tune the resonance properties of the CMP system. A continuous microwave signal excited the CMP mode, while audio signals—converted into pulse trains—temporarily modulated the local magnetic field near the YIG sphere.
To simulate the human auditory process, they passed each audio sample through a cochlear model, which split the signal into 64 frequency bands. These were then processed with randomly generated binary masks to produce input pulses. When injected into the CMP device, each pulse induced a brief shift in the system’s resonance, producing a transient nonlinear response. The device’s output was sampled at fixed time steps, effectively generating a set of virtual nodes—dynamical memory states that acted like a recurrent neural network.
The system’s key advantage lies in its tunability. By adjusting the magnetic field, the researchers controlled the nonlinearity of the CMP response. They identified an optimal operating point at 166.5 millitesla, where the nonlinear coefficient peaked. At this setting, the CMP device achieved its highest recognition accuracy.
To enhance the richness of the reservoir, the researchers employed a parallel computing strategy: each audio sample was processed 40 times using different mask matrices. This increased the diversity of responses without changing the underlying hardware. They trained the output layer on 450 spoken digits from the TI-46 database using linear regression and then tested it on 50 new samples. The result: a recognition error rate of less than 0.8%.
This represents a marked improvement over earlier versions of the system. Prior to tuning, the recognition rate stood at 86.28%—a figure boosted significantly through careful nonlinear optimization.
In addition to high accuracy, the system demonstrated exceptional efficiency. With 20-nanosecond minimum pulse widths and a driving current of 25 mA, each pulse required just 2.5 picojoules. This low energy footprint makes CMP-based RC especially appealing for edge computing applications where power constraints are critical.
Compared to other RC platforms, CMP-based systems offer several advantages. Their nonlinearity can be tuned dynamically without modifying the hardware. Microwave photons allow for ultrafast information transmission, while magnons maintain short-term memory—together offering a balanced architecture for temporal pattern recognition.
This research introduces a promising new direction for brain-inspired computing. While the current implementation focuses on speech recognition, the principles demonstrated here could apply to a wide range of tasks, including time series prediction and pattern classification. The combination of speed, tunability, and efficiency positions CMP-based reservoir computing as a compelling platform for future neuromorphic technologies.
Get our Nanotechnology Spotlight updates to your inbox!
Thank you!
You have successfully joined our subscriber list.
Become a Spotlight guest author! Join our large and growing group of guest contributors. Have you just published a scientific paper or have other exciting developments to share with the nanotechnology community? Here is how to publish on nanowerk.com.