Main

Nanoscale spintronic oscillators (or spin-torque nano-oscillators) are nanoscale pillars composed of two ferromagnetic layers separated by a non-magnetic spacer (Fig. 1a). Charge currents become spin-polarized when they flow through these junctions and generate torques on the magnetizations10,11 that lead to sustained magnetization precession at frequencies of hundreds of megahertz to several tens of gigahertz. Magnetization oscillations are converted into voltage oscillations through magneto-resistance. The resulting radio-frequency oscillations, of up to tens of millivolts (ref. 12), can be detected by measuring the voltage across the junction (Fig. 1b). Spin-torque nano-oscillators are therefore simple and ultra-compact: their lateral size can be scaled down to 10 nm and their power consumption reduced to 1 μW (ref. 13). Because they have the same structure as present-day magnetic memory cells, they are compatible with complementary metal–oxide–semiconductor (CMOS) technology, have high endurance, operate at room temperature and can be fabricated in large numbers (currently up to hundreds of millions) on a single chip14. Just as the frequency of a neuron is modified by the spikes received from other neurons, the frequencies of spin-torque nano-oscillators are highly sensitive to the magnetization dynamics of neighbouring oscillators to which they are coupled15,16. Together, these features of spin-torque nano-oscillators make them promising candidates for use in neuromorphic computing with large arrays of coupled oscillators17,18,19,20,21. However, they have yet to be used to perform an actual computing task.

Figure 1: Spin-torque nano-oscillator for neuromorphic computing.
figure 1

a, Schematic of a spin-torque nano-oscillator, consisting of a non-magnetic spacer (gold) between two ferromagnetic layers, with magnetization m for the free layer (blue) and M for the fixed layer (silver). A current injected into the oscillator induces magnetization precessions of m. For our experiments we used a nano-oscillator with a diameter of 375 nm; however, diameters of 10–500 nm are possible. b, Measured a.c. voltage emitted by the oscillator as a function of time, , for a steady current injection of 7 mA at an external magnetic field μ0H = 430 mT. The dotted blue lines highlight the amplitude . c, Voltage amplitude as a function of d.c. current IDC at μ0H = 430 mT (blue squares). The purple shaded area highlights the typical excursion in the voltage amplitude that results when an input signal of Vin = ±250 mV is injected (here for IDC = 6.5 mA (vertical dotted line) and μ0H = 430 mT). d, Schematic of the experimental set-up. A d.c. current IDC and a rapidly varying waveform that encodes the input Vin are injected into the spin-torque nano-oscillator. The microwave voltage Vosc emitted by the oscillator in response to the excitation is measured with an oscilloscope. For computing, the amplitude of the oscillator is used, and measured directly with a microwave diode. e, Input Vin (top; magenta) and measured microwave voltage Vosc (bottom; grey) emitted by the oscillator as a function of time. Here IDC = 6 mA and μ0H = 430 mT. The envelope of the oscillator signal is highlighted in blue. For computing it is sampled periodically, as shown by the blue circles labelled V1–7.

PowerPoint slide

Our idea is to exploit the amplitude dynamics of spin-torque nano-oscillators for neuromorphic computing. Their oscillation amplitude (dotted blue line in Fig. 1b) is robust to noise, owing to the confinement that is provided by the counteracting torques exerted by the injected current and magnetic damping22. In addition, is highly nonlinear as a function of the injected current and depends intrinsically on past inputs15. Exploiting the amplitude dynamics of spin-torque nano-oscillators thus combines in one single nanodevice the two most crucial properties of neurons—nonlinearity and memory—the realization of which would otherwise require several electronics components and a much larger on-chip area using conventional CMOS23. To compute, we encode neural inputs in the time-dependent current I(t) that is injected into the oscillator and use the amplitude response as the neural output.

Our nano-oscillators consist of circular magnetic tunnel junctions, with a 6-nm-thick free layer of FeB of 375-nm diameter, which have magnetic vortex ground states (see Methods). We measure the dynamics of the signal amplitude directly using a microwave diode. In Fig. 1c we show the nonlinear response of the amplitude to a d.c. current IDC: , where Ith is the current threshold for steady oscillations to occur15. Using an arbitrary waveform generator, we inject a varying current though the junctions in addition to the d.c. current, using the set-up schematized in Fig. 1d. The resulting voltage oscillations, recorded with an oscilloscope, are shown in Fig. 1e. The amplitude of the oscillator varies in response to the injected d.c. current, with a relaxation time that induces a few hundred nanoseconds memory of past inputs22.

Recent studies have revealed that time-multiplexing can enable a single oscillator to emulate a full neural network24,25,26. Here we use this approach—a form of “reservoir computing”4,5 (see Methods)—to demonstrate the ability of spin-torque nano-oscillators to realize neuromorphic tasks. We perform a benchmark task of spoken-digit recognition. The input data, taken from the TI-46 database27, are audio waveforms of isolated spoken digits (0 to 9) pronounced by five different female speakers (Fig. 2a). The goal is to recognize the digits, independent of the speaker.

Figure 2: Spoken-digit recognition.
figure 2

ad, Principle of the experiment. a, Audio waveform corresponding to the digit 1 pronounced by speaker 1. b, Filtering to frequency channels for acoustic feature extraction. The audio waveform is divided in intervals of duration τ. The cochlear model filters each interval into 78 frequency channels (65 for the spectrogram model), which are then concatenated as 78 (65) values for each interval, to form the filtered input. c, Pre-processed input (transformed from the purple shaded region in b). The filtered input is multiplied by a randomly filled binary matrix (masking process), resulting in 400 points separated by a time step θ of 100 ns in each interval of duration τ (τ = 400θ). d, Oscillator output. The envelope of the emitted voltage amplitude of the experimental oscillator is shown (μ0H = 430 mT, IDC = 6 mA). The 400 values of per interval τ ( sampled with a time step θ) emulate 400 neurons. The reconstructed output ‘1’, corresponding to this digit, is obtained by linearly combining the 400 values of sampled from each interval τ. e, f, Spoken-digit recognition rates in the testing set as a function of the number of utterances N used for training for the spectrogram filtering (e; μ0H = 430 mT, IDC = 6 mA) and for the cochlear filtering (f; μ0H = 448 mT, IDC = 7 mA). Because there are many ways to pick the N utterances, the recognition rate is an average over all 10!/[(10 − N)!N!] combinations of N utterances out of the 10 in the dataset. The red curves are the experimental results using the magnetic oscillator. The black curves are control trials, in which the pre-processed inputs are used for reconstructing the output on a computer directly, as described in Methods, without going through the experimental set-up. The error bars correspond to the standard deviation of the recognition rate, based on training with all possible combinations.

PowerPoint slide

Neural networks classify information through chain reactions: neuron after neuron, each input undergoes a series of nonlinear transformations28. In a trained network, the same digit always triggers a similar chain reaction even if it is pronounced by different speakers, whereas different digits generate different chain reactions, thus allowing pattern recognition. An input can trigger a chain reaction in space by using ensembles of neurons, wherein the state of downstream neurons depends on the state of upstream neurons. But an input can also trigger a chain reaction in time by constantly exciting a single nonlinear oscillator with memory: in this case, the state of the oscillator in the future depends on the state of the oscillator in the past. We use the latter approach, which simplifies the hardware because only one oscillator is needed, but requires preprocessing of the input: each point of the audio waveform is converted into a fast-paced binary sequence that is designed to generate a chain reaction of amplitude variations in the oscillator24.

The procedure is illustrated in Fig. 2a–d and detailed in Methods. Because acoustic features are mainly encoded in frequencies29, we filter each audio file into Nf different frequency channels (a standard procedure in speech recognition), which are then concatenated in intervals of duration τ (Fig. 2b). For preprocessing, each of these segments is multiplied by a randomly filled binary matrix (of dimension Nf × Nθ). In this way, each point of the input audio waveform is converted into a binary sequence of duration τ that is composed of Nθ points separated by a time step θ (τ = Nθθ). When this preprocessed input (Fig. 2c) is applied as a current to our spin-torque nano-oscillator, the resulting amplitude variations (Fig. 2d) function as a set of Nθ neurons coupled in time (we take Nθ samples per interval τ). For spoken-digit recognition, we emulate Nθ = 400 neurons and use θ = 100 ns (about one-fifth of the relaxation time of the oscillators) to set the oscillator in a transient state.

The responses of the voltage amplitude of the oscillator are recorded for each utterance of each spoken digit. The goal of the subsequent training process, performed on a computer, is to choose a linear combination of these responses (sets of in each τ) for each digit such that the sum is one for that digit and zero for the rest (see Methods). Because each digit has been pronounced ten times by each of the five speakers, we can use some of the data to determine the coefficients (training), and the rest to evaluate the recognition performance (testing); see Methods. To assess the effect of our oscillator on the quality of recognition, we always perform a control trial without the oscillator. In that case, the preprocessed input traces are used to reconstruct the outputs on the computer directly, without going through the experimental set-up.

The improvement shown in the experimental results over the control results (see Fig. 2e, f) indicates that the spin-torque nano-oscillator greatly improves the quality of spoken-digit recognition, despite the added noise that is concomitant to its nanometre-scale size. In Fig. 2e (linear spectrogram filtering), we present an example in which the extraction of acoustic features, achieved by Fourier transforming the audio waveform over finite time windows, plays a minimal part in classification. Without the oscillator (black line), the recognition rates are consistent with random choices; with the oscillator (red line), the recognition rate is improved by 70%, reaching values of up to 80%. This example highlights the crucial role of the oscillator in the recognition process. Using nonlinear cochlear filtering30 (Fig. 2f), which is the standard in reservoir computing24,25,26 and has been optimized on the basis of the behaviour of biological ears, we achieve recognition rates of up to 99.6%, as high as the state-of-the-art. Compared to the control trial, the oscillator reduces the error rate by a factor of up to 15. Our results with a spin-torque nano-oscillator are therefore comparable to the recognition rates obtained with more complicated electronic or optical systems (between 95.7% and 99.8% for the same task with cochlear filtering)23,24,25,26,29.

The optimal operating conditions for pattern recognition with our spin-torque nano-oscillator are determined by the oscillation amplitude and noise. We use a simpler task, classification of sine and square waveforms with the same period25, to investigate the ability of the oscillator to classify waveforms in a wide range of injected d.c. currents IDC and applied magnetic fields μ0H (see Methods). As can be seen in Fig. 3a, the quality of pattern recognition, characterized by the root-mean-square of deviations between the reconstructed output and the target, varies from 10% to more than 30% depending on the bias conditions. The oscillator performs well when it responds strongly to the time-varying preprocessed input, with large amplitude variations in both the positive and negative directions, Vup and Vdw, respectively (Fig. 3b, top). On the other hand, it performs poorly when the noise in the oscillator ΔV (the standard deviation of the noise in the voltage amplitude) is high (Fig. 3b, bottom). As shown in Fig. 3b, we extract these parameters from the time traces of the voltage emitted from the oscillator at each bias point, and plot VupVdw (Fig. 3c) and 1/ΔV (Fig. 3d) as a function of the d.c. current IDC and field μ0H. The red regions of large oscillation amplitudes in Fig. 3c correspond to low magnetic fields, in which the magnetization is weakly confined, and to high currents, for which the spin torque on magnetization is maximal. The blue regions of high noise in Fig. 3d correspond to areas just above the threshold current Ith for oscillation, in which the oscillation amplitude is growing rapidly as a function of current and is becoming sensitive to external fluctuations15. As can be seen by comparing Fig. 3c and d, the range of bias conditions highlighted by the dotted white boxes (currents of 6–7 mA and magnetic fields of 350–450 mT) features wide variations in oscillation amplitudes and low noise. In this region, root-mean-square deviations below 15% are achieved, and there are no classification errors between sine and square waveforms. The similarity between the map of VupVdwV (Fig. 3e) and that of the classification performance (Fig. 3a) confirms that the best conditions for classification correspond to regions of optimal compromise between low noise and large amplitude variations. The necessity of a high signal-to-noise ratio for efficient neuromorphic computing, highlighted here for magnetic oscillators, is a general guideline that applies to any type of nanoscale oscillator.

Figure 3: Conditions for optimal waveform classification and identification of important oscillator properties.
figure 3

The task consists of recognizing sine waveforms from square ones with the same period. The target for the output that is reconstructed from the oscillator’s response is one for square, zero for sine. We emulate 24 neurons , τ = 24θ. a, Root-mean-square (r.m.s.) deviation of output-to-target deviations: map as a function of d.c. current IDC and magnetic field μ0H. b, Extraction of parameters from the time traces of the oscillator’s response. Top, maximum positive (Vup) and negative (Vdw) variations in the oscillator’s amplitude in response to the varying pre-processed input. Bottom, noise ΔV of the voltage amplitude at steady state under IDC. c, Maximal response (VupVdw) of the oscillator to the input: map in the IDCμ0H plane. d, Inverse of the noise amplitude 1/ΔV: map in the IDCμ0H plane. The threshold current Ith is indicated by a white solid line. In c and d, the optimal range of bias conditions for waveform classification is marked by a white dashed rectangle (currents of 6–7 mA and magnetic fields of 350–450 mT). e, Map of the ratio of maximal amplitudes to noise VupVdwV, showing that these parameters largely determine the performance of the oscillator (compare with a).

PowerPoint slide

As a conclusion, our pattern-recognition results show that simple, ultra-compact spintronic oscillators have all of the properties that are needed to emulate collections of neurons: nonlinearity, memory and stability. The ability of groups of these oscillators to mimic neural connections by influencing the behaviour of one another through current and magnetic-field coupling opens up a route to realizing large-scale neural networks in hardware, which exploit magnetization dynamics for computing15,16,17,18,19,20,21.

Methods

Samples

Magnetic tunnel junction (MTJ) films with a stacking structure of buffer/PtMn(15)/Co71Fe29(2.5)/Ru(0.9)/Co60Fe20B20(1.6)/Co70Fe30(0.8)/MgO(1)/Fe80B20(6)/MgO(1)/Ta(8)/Ru(7) (with thicknesses given in parentheses in nanometres) were prepared by ultrahigh vacuum (UHV) magnetron sputtering. After annealing at 360 °C for 1 h, the resistance–area products (RA) were approximately 3.6 Ω μm2. Circular-shape MTJs with a diameter of approximately 375 nm were patterned using Ar ion etching and e-beam lithography. The resistance of the samples is close to 40 Ω and the magneto-resistance ratio is about 135% at room temperature. The FeB layer presents a vortex structure as the ground state for the dimensions used here. In a small region called the core of the vortex, the magnetization spirals out of plane. Under d.c. current injection, the core of the vortex steadily gyrates around the centre of the dot with a frequency in the range 250–400 MHz for the oscillators we consider here. Vortex dynamics driven by spin torque are well understood, well controlled and have been shown to be particularly stable22.

Measurement set-up

The experimental implementation for spoken-digit recognition and sine/square classification tasks is illustrated in Fig. 1d. The pre-processed input signal Vin is generated by a high-frequency arbitrary-waveform generator and injected as a current through the magnetic nano-oscillator. The sampling rate of the source is set to 200 MHz (20 points per interval of time θ) for the spoken-digit recognition task and 500 MHz (50 points per interval of time θ) for the classification of sines and squares. The peak-to-peak variation in the input signal is 500 mV, which corresponds to peak-to-peak current variations of 6 mA, as illustrated in Fig. 1c (part of the incoming signal is reflected owing to impedance mismatch). The bias conditions of the oscillator are set by a d.c. current source and an electromagnet that applies a field perpendicular to the plane of the magnetic layers. The oscillating voltage emitted by the nano-oscillator is rectified by a planar tunnel microwave diode, with a bandwidth of 0.1–12.4 GHz and a response time of 5 ns. The input dynamic range of the diode is between 1 μW and 3.15 mW, corresponding to a d.c. output level of 0–400 mV. We use an amplifier to adjust the emitted power of the nano-oscillator to the working range of the diode. The output signal is then recorded by a real-time oscilloscope. In Figs 1b, c, e, 2d and 3b–e, the amplitude of the signal emitted by the oscillator is shown without amplification (the signal measured after the diode has been divided by the total amplification of the circuit, about +21 dB). If, owing to sampling errors, the measured envelope of the oscillators is shifted with respect to the input, classification accuracy can be degraded. We use alignment marks to align our measurements with the input when we reconstruct the output. The alignment precision is ±1 ns.

General concepts of reservoir computing

In machine learning, a reservoir is a network of recurrently and randomly connected nonlinear nodes4,5. When an input signal is injected in the reservoir, it is mapped to a higher-dimensional space in which it can become linearly separable. The key insight behind reservoir computing is that the network does not need any tuning: all connections inside the reservoir are kept fixed. Only external connections (between the reservoir and an output layer) are trained to achieve the desired task.

In other words, reservoir computing requires the generation of complex nonlinear dynamics but, as a trade-off, learning is greatly simplified. For efficient reservoir computing, several requirements related to the dynamical properties of the network should be satisfied. First, different inputs should trigger different dynamics (separation property) and similar inputs should generate similar dynamics (approximation property), enabling efficient classification. Second, the reservoir state should not depend only on present inputs but also on recent past inputs. This short-term memory, called fading memory, is essential for processing temporal sequences for which the history of the signal is important.

A single nonlinear oscillator can emulate a reservoir when it is set in transient dynamics by a rapidly varying input24. The loss of parallelism is compensated by an additional pre-processing input step: the input is multiplied by a rapidly varying mask, which enables virtual nodes to be defined, interconnected in time through the resultant oscillator dynamics. This approach provides a marked simplification of the reservoir scheme for hardware implementations, and has been realized in hardware with optical or electronic oscillators assembled from several components23,24,25,26.

Spoken-digit recognition

For this task, the inputs are taken from the NIST TI-46 data corpus27. The input consists of isolated spoken digits said by five different female speakers. Each speaker pronounces each digit ten times. The 500 audio waveforms are sampled at a rate of 12.5 kHz and have variable time lengths.

We used two different filtering methods: spectrogram and cochlear models. Both filters break the word into several time intervals Nτ of duration τ and analyse the frequency content in each interval τ through either a Fourier transform (spectrogram model; 65 channels, Nτ {24, …, 67}; Fig. 2b) or a more complicated nonlinear approach (cochlear model; 78 channels, Nτ {14, …, 41}). The input for each word is composed of an amplitude for each of the Nf = 65 or Nf = 78 frequency channels times Nτ time intervals. This input is pre-processed by multiplying the frequency content for each time interval by a mask matrix containing Nf × Nθ random binary values, giving a total of Nτ × Nθ values as input to the oscillator (Fig. 2c). Here, we are modelling Nθ = 400 input neurons, each of which is connected to all of the frequency channels for each time interval.

Each preprocessed input value is consecutively applied to the oscillator as a constant current for a time interval of θ ≈ 100 ns, which is about five times shorter than the relaxation time of the oscillator, as recommended in ref. 24. This time is short enough to guarantee that the oscillator is maintained in its transient regime so the emulated neurons are connected to each other, but is long enough to let the oscillator respond to the input excitation. The amplitude of the a.c. voltage across the oscillator is recorded for offline post-processing (Fig. 2d).

The post-processing of the output consists of two distinct steps. The first is called the training (or learning) process and the second is called the classification (or recognition) process. The goal of training is to determine a set of weights wi,θ, where i indexes the desired digit. These weights are used to multiply the output voltages to give 10Nτ output values, which are then averaged over the Nτ time intervals to give 10 output values yi, which should ideally be equal to the target values for the appropriate digit and 0.0 for the rest. In the training process, a fraction of the utterances are used to train these weights; the rest of the utterances are used in the classification process to test the results.

The optimum weights are found by minimizing the difference between and for all of the words used in the training. In practice, optimal values are determined by using techniques for extracting meaningful eigenvalues from singular matrices such as the linear Moore–Penrose pseudo-inverse operator (denoted by a dagger symbol †). If we consider the target matrix , which contains the targets for all of the time steps τ used for the training, and the response matrix S, which contains all neuron responses for all of the time steps τ used for the training, then the matrix W, which contains the optimal weights, is given by . This step is performed on a computer and takes several seconds. In the future, real-time processing on a nanosecond timescale could be realized using fully parallel networks of interacting nano-oscillators.

During the classification phase, the ten reconstructed outputs corresponding to one digit are averaged over all of the time steps τ of the signal, and the digit is identified by taking the maximum value of the ten averaged reconstructed outputs. The averaged reconstructed output that corresponds to the digit in question should be close to 1 and the others should be close to 0. The efficiency of the recognition is evaluated by the word success rate, which is the rate of digits that are correctly identified. The training can be done using more or fewer data (here ‘utterances’). We always trained the system using the ten digits spoken by the five speakers. The only parameter that we changed is the number of utterances used for the training. If we use N utterances for training, then we use the remaining 10 − N utterances for testing. However, some utterances are very well pronounced whereas others are hardly distinguishable. As a consequence, the resulting recognition rate depends on which N utterances are picked for training in the set of ten (for example, if N = 2, then the utterances picked for training could be the first and second, but also the second and third, or the sixth and tenth, or any other of the 10!/(8!2!) combinations of 2 picked out of 10). To avoid this bias, the recognition rates that we present here are the average of the results over all possible combinations. The error bars corresponds to the standard deviation of the word recognition rate. The raw spectrogram is not complex enough to allow a correct reconstruction of the target during the training. Adding the oscillator brings complexity and suppresses this phenomenon.

Sine- and square-wave classification

For this classification task, the input is a random sequence of 160 sines and squares with the same period—the first half of the sequence for training and the second half for classification. Each period is discretized into eight points separated by a time step τ. The pre-processing consists of multiplying the value of each point by the same binary sequence that is generated by a random distribution of +1 and −1 values. In contrast to spoken-digit recognition, the mask is a binary vector (instead of a binary matrix). The fast binary sequence contains 24 values, so 24 neurons are emulated during each time step τ.

The target for the network output y is 0 for all of the trajectories in response to a sine and 1 for all of the trajectories in response to a square. The best weights are found by linear regression, as explained above for the spoken-digit recognition task. For sine/square recognition, we record five points instead of one for each neuron when we measure the output of the oscillator. During post-processing, we use these additional states between and to increase the number of coefficients available for solving the problem, and thus increase classification accuracy. In addition, the best performance does not necessarily correspond to a target in exact phase with the oscillator’s output. The standard deviation of the root-mean-square value of Voutput − Vtarget, obtained with ten repetitions, is around 1%.

Data availability

The datasets generated and analysed during this study are available from the corresponding author on reasonable request.