Energy-efficient superparamagnetic Ising machine and its application to traveling salesman problems

Si, Jia; Yang, Shuhan; Cen, Yunuo; Chen, Jiaer; Huang, Yingna; Yao, Zhaoyang; Kim, Dong-Jun; Cai, Kaiming; Yoo, Jerald; Fong, Xuanyao; Yang, Hyunsoo

doi:10.1038/s41467-024-47818-z

Download PDF

Article
Open access
Published: 24 April 2024

Energy-efficient superparamagnetic Ising machine and its application to traveling salesman problems

Nature Communications volume 15, Article number: 3457 (2024) Cite this article

Subjects

Abstract

The growth of artificial intelligence leads to a computational burden in solving non-deterministic polynomial-time (NP)-hard problems. The Ising computer, which aims to solve NP-hard problems faces challenges such as high power consumption and limited scalability. Here, we experimentally present an Ising annealing computer based on 80 superparamagnetic tunnel junctions (SMTJs) with all-to-all connections, which solves a 70-city traveling salesman problem (TSP, 4761-node Ising problem). By taking advantage of the intrinsic randomness of SMTJs, implementing global annealing scheme, and using efficient algorithm, our SMTJ-based Ising annealer outperforms other Ising schemes in terms of power consumption and energy efficiency. Additionally, our approach provides a promising way to solve complex problems with limited hardware resources. Moreover, we propose a cross-bar array architecture for scalable integration using conventional magnetic random-access memories. Our results demonstrate that the SMTJ-based Ising computer with high energy efficiency, speed, and scalability is a strong candidate for future unconventional computing schemes.

Efficient combinatorial optimization by quantum-inspired parallel annealing in analogue memristor crossbar

Article Open access 22 September 2023

Combinatorial optimization by weight annealing in memristive hopfield networks

Article Open access 12 August 2021

Ferroelectric compute-in-memory annealer for combinatorial optimization problems

Article Open access 18 March 2024

Introduction

The demands for future data-intensive and energy-efficient computing tasks overwhelm the computational power of conventional von Neumann architectures¹. For example, NP-hard problems are often encountered in combinatorial optimizations², resource allocation³, cryptography⁴, finance⁵, image processing⁶, tour planning⁷, and job sequencing⁸, and their computational time and hardware resources increase exponentially with the problem size, which makes them very difficult or impossible to be solved by conventional computers in a finite time. These problems can be mapped to the Ising model, a mathematical model to characterize interactions between magnetic spins⁹. The dynamics of the model is algorithm- based, i.e. by constructing a proper coupling matrix and allowing the system to evolve utilizing an intrinsic convergence property of the Ising model, the ground state could be obtained as a solution to the corresponding problems. However, as the system might be trapped in many local minima, the annealing process has usually been adopted in Ising computers to address such limitations. It is commonly agreed that adding fluctuations prevents the Ising computer from being stuck at the local minima.

Efficient algorithms and hardware systems for finding an optimal or near-optimal solution of an Ising model at a fast speed and low power have been sought. Adiabatic quantum computing (AQC)^10,11 and quantum computing^12,13,14,15 based on superconducting qubits are capable of converging the Ising model by tunneling out of local minima to the global minima. A 100-node Maxcut problem was solved using a quantum computer of 2048 spins with huge power consumption¹⁶. Besides the high cost and complexity of cryogenic temperature, this proof-of-concept system was limited by the sparse connections only between the nearest neighbors, which leads to sub-optimal outcomes¹⁷. Simulated annealing based on CMOS implementations was exploited for parallel Ising computing, including central processing units (CPU)^18,19, graphics processing units (GPU)²⁰, and field-programmable gate array (FPGA)^21,22. These hardware have reported as large as 16,384 spins, however, it requires huge hardware resources for generating random numbers to introduce stochasticity to escape from the local minima^4,18,23,24. Coherent Ising machine (CIM) is an optical scheme with competitive energy efficiency. However, it requires a long fiber ring cavity and relies on external FPGA for implementing coupling^25,26. The temporal multiplexing process is also time-consuming and hard to expand to large systems. Recently, experiments and simulation works have investigated various devices to emulate the behavior of Ising spins by taking advantage of their intrinsic physics. An 8-spin asynchronous probabilistic computer based on superparamagnetic tunnel junctions for solving integer factorization tasks of values up to 945 was demonstrated⁴. SPICE simulations of 16-city TSP using simulated annealing method were presented²⁷. Other works such as 8-spin phase-transition nano-oscillators²⁸, multiferroic oxide devices with a high thermal stability²⁹, and magnetoresistive random access memory (MRAM)^30,31 have also conceptually proved that spin-based devices are suitable for representing Ising units. However, these works have encountered challenges in either partially-connected Ising spins or small scalability which limit the Ising computer from solving practical problems.

TSP discussed in this paper is a well-known problem which is much beyond the limitation of locally connected Ising models. Other combinatorial optimization problems, such as knapsack problems, coloring problems, and number partitioning, need all-to-all connection to satisfy specific constraints⁹. In practice, an additional graph embedding process is often required when mapping to 2-dimensional CMOS circuitry which only considered the coupling between adjacent spins^32,33,34. Since the embedding increases the required number of auxiliary spins and causes spin connections to change, the annealing accuracy is degraded significantly, especially when the problem size is large. This means that supporting a fully connected Ising model is highly recommended for dealing with a wide range of problems. Another problem is the rapidly increasing connectivity when considering large-scale systems, which usually results in huge energy consumption and latency. Since the number of spins that a particular annealing processor can handle limit the scale of the problem that can be solved, how to solve complex problems with limited hardware in an energy-efficient way has also drawn significant attention.

In this work, we experimentally report a scalable Ising computer based on 80 SMTJs with all-to-all connections and successfully solve the 4761-node TSP problem. The intrinsic stochasticity in SMTJ enables ultra-fast and low-power Ising annealing without using extra resources for random number generation and Metropolis determining process⁷. By combining global annealing with intrinsic annealing in SMTJ, the convergence of the Ising problem is guaranteed especially in large-scale Ising problems. The method to determine parameters of global annealing is discussed. With an all-to-all connection among Ising spins, the combinatorial optimization of 9-city TSP is solved with the optimal solution. We further develop the algorithm for constrained TSP (CTSP) with no extra auxiliary Ising bits both in algorithm and hardware, indicating the superiority and flexibility of this Ising computer. Furthermore, we propose an optimization strategy based on graph partitioning (GP) and CTSP and experimentally solved a 70-city TSP, which typically needs 4761 nodes, on our 80-node Ising computer with a near-optimal solution. The system can obtain the lowest power consumption of 0.64 mW as well as high energy efficiency of 39 solutions per second per watt among state-of-art Ising annealers. We have experimentally demonstrated that large-scale Ising problems can be solved by small-scale hardware in an energy-efficient way.

Results

SMTJ-based artificial Ising spin

Various NP-hard problems can be solved by constructing corresponding Ising models and observing the ground states during evolution processes. Figure 1a shows an all-to-all connected Ising model, whose Ising Hamiltonian can be written as

$$H=-{\sum }_{i,j}^{N}{J}_{i,j}{s}_{i}{s}_{j}-{\sum }_{i}^{N}{h}_{i}{s}_{i},$$

(1)

where $H$ is the total energy of the system, $N$ is the total number of spins, ${s}_{i}$ is the$\,i$-th spin with one of two states; “+1” (Ising spin up) or “−1” (Ising spin down), ${J}_{i,j}$ is the coefficient of coupling between the $i$-th and the $j$-th spins, and ${h}_{i}$ is the external field of the$\,i$-th spin. For a fixed configuration of other spins than ${s}_{k}$, the probability of ${s}_{k}$ staying in the down-state is given by

$${p}{\_}\!\!{{{{{{\rm{\_}}}}}}\downarrow }=\frac{1}{1+{e}^{-2\Lambda }}.$$

(2)

where $\Lambda=\frac{\partial H}{\partial {s}_{k}}$ (see Supplementary Note 1).

**Fig. 1: Illustration of SMTJ-based Ising computer.**

One natural implementation of this Ising spin is based on a stochastic nanomagnet. The inset of Fig. 1b shows the sketch of an SMTJ, consisting of a tunneling barrier sandwiched by a reference layer and a free layer (see Methods section). Because of the small device diameter (~50 nm), the energy barrier of the free layer between the anti-parallel (AP) and parallel (P) states is low that the retention time of either state is in the range of μs to ms, similar to previous studies^4,35. The SMTJ resistance, measured as a function of time in Fig. 1c, shows preferred AP states at high currents and P states at low currents. When the current (I) is ~4 μA, SMTJ shows an equal chance of AP and P states. The probability of the AP state under different input currents over 0.1 s is fitted in Fig. 1b by a sigmoid function:

$${p}{\_}{{{{{{\rm{\_}}}}}}{{{{{\rm{AP}}}}}}}=\frac{1}{1+{e}^{-{{{{{\rm{a}}}}}}\times (I-{{{{{\rm{b}}}}}})}}$$

(3)

where ${{{{{\rm{a}}}}}}=4.67 \, {{{{{\rm{and\; b}}}}}}=3.9 \, {{{{{\rm{\mu }}}}}}{{{{{\rm{A}}}}}}.$ In order to emulate Ising spin ${s}_{k}$ with our SMTJ device, we only need to make the probability of the down-state of ${s}_{k}$ to be equal to that for the AP state of SMTJ, namely ${p}{\_}{{{{{{\rm{\_}}}}}}{{{{{\rm{AP}}}}}}}={p}{\_}{{{{{{\rm{\_}}}}}}\downarrow }$, with two calibration coefficients. Thus, we can derive the form of the current ${{I\_}}_{k}$ injected to SMTJ as (see Supplementary Note 1):

$${{I\_}}_{k}=\frac{2\Lambda }{a}+b=\frac{c}{a}\left({\sum}_{j}2{J}_{{kj}}{s}_{j}+{h}_{k}\right)+b$$

(4)

where $c=1/{kT}$ is the effective inverse temperature which can be conducted for global annealing.

Intrinsic annealing in SMTJs-based Ising computer

By integrating 80 SMTJs with a peripheral circuit and a microcontroller unit (MCU), we build an 80-node Ising computer (see Supplementary Note 2). Each Ising spin in Eq. (1) is emulated by an SMTJ with intrinsic randomness, where P (AP) state represents spin-up (down). Figure 1d shows the photograph of the printed circuit board (PCB) and the diagram of the system (see Methods section). The system contains 8 processing elements (PEs); each PE has 10 SMTJ computing units. Each SMTJ computing unit includes a transistor and a resistor to adjust the state of SMTJ into stochastic. During the computing process, an MCU examines the states of all SMTJs by reading the output of comparator arrays through multiplexers and generates new input voltages for digital-to-analog converters (DACs) according to the updating rule in Eq. (4) (see Supplementary Note 3 for calibration of 80 SMTJ computing units).

During the evolution process, an Ising solver could be easily trapped in a local minimum state. To avoid this non-optimal solution, annealing algorithms such as simulated annealing (SA) or quantum annealing (QA) were developed. The general idea of SA is to make the system evolve from a high temperature to a low temperature gradually⁷. The convergence and relaxation of SA can be mathematically provable³⁶. During each iteration, a random number is generated for stochasticity and introduced to determine whether the result in this iteration should be accepted or not. In QA, quantum fluctuations cause quantum tunneling between states¹⁷. In both SA and QA, stochasticity needs to be introduced into the annealing process. In contrast, our Ising system utilizes the intrinsic stochastic behaviors of SMTJ to perform the Metropolis process of standard SA in hardware, which greatly saves the solution time and hardware resources for generating randomness (see Supplementary Note 4). Besides, our Ising computer has an all-to-all connection which has wider application scenarios, as well as a better capability of escaping from local minima.

Ising mapping of N-city TSP and CTSP

We have applied our Ising computer to the TSP problem, one of the combinatorial optimization problems, which applies to various sectors, such as vehicle routing, logistics, planning, and scheduling. The goal is to find the shortest route that visits all listed cities once and only once given distances between the cities in the list. In order to solve this problem, we first map N-city-TSP to an ${N}^{2}$-spin Ising model, or ${(N-1)}^{2}$-spin model assuming a fixed starting city. Figure 2a shows the coordinates of 9 cities and Fig. 2b shows the 81-spin Ising model, whose rows indicate the cities and columns indicate the visiting order. We define the binary spin, s, as ${s}_{i,j}$ = 1 if city i is visited as j-th city or ${s}_{i,j}$ = −1 otherwise. The total Hamiltonian of TSP is expressed by⁹

$${H}_{{TSP}}= {\sum }_{i}{\left({\sum }_{j}{s}_{{ij}}+(N-2)\right)}^{2}+{\sum }_{j}{\left({\sum }_{i}{s}_{{ij}}+(N-2)\right)}^{2} \\ +w{\sum }_{j}{\sum }_{i,i{\prime} }{d}_{i,i{\prime} }\left(\frac{{s}_{i,j}+1}{2}\right)\left(\frac{{s}_{i{\prime},j+1}+1}{2}\right)$$

(5)

where the first term is a constraint that represents only one city is visited at the j-th visit, and the second term represents one city is visited only one time. $w$ is a constant small enough ($0 \, < \, w \, < \, 1$) not to violate the two constraints of the TSP cycle. ${d}_{i,{i{{\hbox{'}}}}}$ is the distance between city $i$ and city ${i{{\hbox{'}}}}$. According to Eqs. (1) and (5), coupling matrix $J$ of 81 spins could be obtained, as shown in Fig. 2c (see Supplementary Note 5). It shows that spins in the same row or column have strong coupling, as indicated by the first two terms in Eq. (5).

**Fig. 2: Construct Ising model for a 9-city TSP.**

We define CTSP as the visiting orders of some cities are enforced during the traveling. This is quite useful in real-life scenarios. For example, a delivery man collects food and drinks at shop A and must deliver hot drinks to B first even though the total cost is higher than optimal. We propose an algorithm for solving CTSP by adding negative “distance” to the Hamiltonian. For example, suppose that city A and city B are required to be connected in the CTSP as city 2 and city 7 shown in Fig. 2d, and then we add the term

$${-\theta H}_{{{{{{\rm{c}}}}}}}=-\theta \left({\sum }_{j}{s}_{A,j}{s}_{B,j+1}+{s}_{B,j}{s}_{A,j+1}\right)$$

(6)

such that the energy of a path, where city A and city B are connected, is always lowered by $\theta .$ When $\theta$ is sufficiently large, the optimal path must have city 2 and city 7 connected. Thus, the total Hamiltonian of the CTSP is expressed by

$${H}_{{CTSP}}={H}_{{TSP}}{-\theta H}_{{{{{{\rm{c}}}}}}}.$$

(7)

Constructing an Ising model for CTSP is exactly the same as TSP except for extra allowed visiting sequences, as shown in Fig. 2e. This would lead to a modification of the coupling matrix of $J$ according to Eq. (7) (see the deduction of ${J}_{{CTSP}}$ in Supplementary Note 6). From Fig. 2f we can clearly see the differences between ${J}_{{CTSP}}$ and ${J}_{{TSP}}$. This algorithm of CTSP fits for arbitrary constraints of visiting sequences as well as their combinations.

Experimental demonstration of 9-city TSP

We first run a 9-city TSP in the 80 SMTJ-based Ising computer at a relatively low but non-zero effective temperature to examine the intrinsic annealing in SMTJ. The iteration time is set comparable to the longest retention time of SMTJs to avoid reading previous spin states. In our experiments, we set the iteration time as 0.1 ms. As shown in Fig. 3a, as the effective inverse temperature (c) is increased quickly to 0.5, the system converges rapidly to a low energy state within 50 iterations and reaches the ground state after 4000 iterations. It should be noted that the intrinsic stochasticity in SMTJs helps the system escape from local minima without an extra annealing process, as shown in the right inset of Fig. 3a. Figure 3b illustrates the evolution of 9 spins out of 81 spins. The evolution of all 81 spins can be found in Supplementary Note 7.

We choose four states in Fig. 3a to inspect the traveling path in Fig. 3c and their Ising spins, namely ${s}_{i,j}$, as shown in Fig. 3d. The yellow square in Fig. 3d represents ${s}_{i,j}=1$(visited) and the blue square represents ${s}_{i,j}=-1$(not visited). In an initial state A, the spin states are randomly set and then converge to a relatively low energy at state B. State C is an intermediate solution during the annealing process. State D is the optimal solution satisfying two constraints of the TSP. Because we anneal the system to a relatively low but non-zero temperature so that the convergence to a sub-optimal state could be guaranteed, and at the same time, the intrinsic randomness in SMTJ helps the system to escape from local minima and find a ground state quickly. We test 10 different random initial states each with 5000 iterations and find that in all cases the system can obtain a relatively small energy, as shown in Supplementary Note 8. However, there is a probability that the system jumps out of the ground state because of the non-zero temperature. If we continue to observe the evolution in a large timescale, the system would move back to the global minimum state. In some cases, where the speed and near-optimal solution matter but the accurate optimal solution is not, the number of iterations can be chosen to be small.

Further global annealing of the system to a lower effective temperature may guarantee the convergence of the computation. Here we use linear annealing as an example to examine the convergence of this algorithm in a very large-iteration limit. The initial temperature should be chosen sufficiently high to ensure that the thermal energy exceeds any energy barrier ($\Delta H{=H}_{\max }-{H}_{\min }$) within the system, while still adhering to the fundamental constraints of the specific Ising model. For a given N-city TSP, ${H}_{\max }$ in Eq. (5) can be estimated as $w\times N\times \bar{d}$, assuming that the distance between any two cities is the same as the average distance $\bar{d}$. Similarly, ${H}_{\min }$ can be estimated as $w\times N\times {d}_{\min }$. Therefore, the initial $c$ of 9-city TSP in our experiment can be estimated as ${c}_{{{{{{\rm{initial}}}}}}}\, \sim 1/\Delta H=0.07$, where $w=0.5$, $N=9$ for a total of 9 cities, $\bar{d}=4$ and ${d}_{\min }=0.8$ for the average and shortest distance of each two cities, respectively in Fig. 3c. We then choose ${c}_{{{{{{\rm{initial}}}}}}}$ = 0.2 which is sufficiently safe for annealing. As the temperature linearly decreases, the dynamical system gradually stabilizes. The final temperature should be low enough i.e., ${c}_{{{{{{\rm{final}}}}}}} \, \gg \, 1/\Delta H$, to freeze all possible fluctuations. Here we set ${c}_{{{{{{\rm{final}}}}}}}=1.8$ which is at least one order larger than $1/\Delta H$. This can also be verified by observing randomly generated states under ${c}_{{{{{{\rm{final}}}}}}}$ for long iterations. Regarding the annealing speed, if several changes in the spin configuration are observed under each value of c, then this annealing speed is valid. Plenty trials are required to find the proper annealing speed (details in Supplementary Note 8).

In Fig. 3e we can find the first global minimum energy appears after 16,500 iterations, and converge to the ground state after 40,000 iterations. Temperature schedules can be optimized to reduce iteration numbers, e.g. increase the effective temperature in the first few time steps, and then decrease gradually, or learned by the reinforcement learning method³⁷. In practice, we use one memory to store the minimum energy state during the computation, and another memory to record the final energy state. We take the minimum value of these two results as the solution. Figure 3f shows the success probability (defined as finding the optimal path) of TSP with various node sizes. The success probability of 9-city TSP reaches 95% after 10⁴ iterations. The success probability with the parameter $w$ in Eq. (5) which determines the relative strength of the constrain term and distance term is also discussed. If the $w$ is too large, then the probabilities of violations, namely the invalid path, would increase, as shown in Supplementary Note 8. If $w$ is too small, then the effect of the distance term is small, which results in a slower convergence to the ground state.

The advantages of this annealer are threefold: (1) Selective working modes by using different temperature schemes. One is the probabilistic sampling mode working at a constant temperature, which is similar to an asynchronous probabilistic computer⁴; the other is the annealing mode conducted by reducing the effective temperature. (2) Fast speed and low power consumption to find the ground state because of the intrinsic annealing properties in SMTJ. (3) Global annealing outperforms probabilistic sampling in achieving efficient convergence, especially for large-scale problems.

We have implemented a synchronous design with a lower requirement on the speed of peripheral circuits. This design also effectively mitigates issues such as leakage, sneak currents, and parasitic resistances which might encountered in asynchronous hardware with a memristive (or resistive) crossbar array.

Compressing 70-city TSP to 80-node Ising computer

Generally, the number of spins required for an N-city TSP is (N-1)², which limits the scalability of TSP on state-of-the-art computing systems. Here, we propose a graph Ising compressing algorithm based on CTSP that can significantly reduce the number of spins and interactions for solving a TSP. Figure 4a is an example of how we apply this algorithm to our 80-node SMTJ Ising computer for solving a 70-city TSP (4761 nodes, st70 data set from TSPLIB³⁸). The major steps of this algorithm can be described as follows: (a) divide the cities into several smaller groups until the number of cities in each group is less than 10 by GP method; (b) solve TSP within each group separately; (c) integrate neighboring groups to obtain an initial path of the whole group; and (d) optimize the path in (c) by a CTSP window sliding over the whole map.

**Fig. 4: Experimental solution of 70-city TSP.**

It is worth mentioning that GP is also an Ising problem. When converting a global TSP into local TSPs, using GP would be more hardware-friendly for our Ising computer compared to other clustering algorithms. It is based on the idea that the original graph can be separated into multiple sub-graphs depending on the Euclidean distance. The number of spins required for solving GP is ~N and thus, GP is quite efficient for local TSPs since the problem size can be reduced to ~ ${\left(N-1\right)}^{2}/a$, where $a$ is the number of groups, and each TSP can be optimized independently (see GP mapping in Supplementary Note 9).

The final step (d) is based on CTSP, where a rectangular window slides over the path and cuts it into several disconnected lines, among which the two longest lines are chosen and the edge cities are connected as a circular path (Supplementary Note 10). The CTSP is solved within each window for sub-area optimization without changing the visiting order of edge cities. After this, the two lines at the edge cities are opened and CTSP is carried out again after sliding to the next window. GP-CTSP-based optimization algorithm provides an efficient way of finding near-optimal solutions for large-scale TSP on limited hardware resources.

Figure 4b shows the comparison of numbers of spins for different TSPs by a conventional Ising method⁹, cluster Ising method³⁹, and our method. The required number of spins in our method is relatively unchanged for various TSPs, while that of other methods increases substantially with the scale of the problem. Figure 4c shows the total path of 70-city TSP as a function of iteration number using different SA-based algorithms, including symbiotic organisms search⁴⁰, ant colony optimization⁴¹, multi-offspring genetic algorithm⁴², and gene-expression programming⁷. Finally, we obtain the near-optimal path with a total energy of 700.71, which is slightly higher than the optimal solution of 675. However, the iteration number for an optimized solution is 4.9$\times$10⁶ by our method, which is two to three orders lower than that of SA-based algorithms running on Intel Core-i7 CPU⁷ with the main frequency of 3 GHz, as shown in Fig. 4c.

Ising computer scaling and cross-bar architecture

The above experimental demonstration shows our Ising computer with 80 SMTJs is capable of finding a near-optimal solution to a medium-scale NP-hard problem. We then explore the performance with increasing from 70 to 200 cities. The simulation of complete TSP task is carried out using MATLAB, incorporating a stochastic model of the SMTJ employed in our experiment (details in Supplementary Note 11). The solution quality is defined as

$${Solution}\,{quality}=1-\frac{{total}\,{path}-{best}\,{demonstrated}\,{solution}}{{best}\,{demonstrated}\,{solution}}.$$

(8)

Figure 5a illustrates the solution quality of the best results obtained for each TSP task (Supplementary Note 12 for the best solutions). Notably, as the number of SMTJ (M) increases, higher quality solutions can be attained. It is worth emphasizing that the shortest path obtained for the 101-city TSP is 640.9755 in our study, surpassing the optimal path of 642.3095 provided by TSPLIB (Eil101.opt.tour). This outcome serves as evidence of the superiority of our method. The utilization of more SMTJs solving TSP per sliding window leads to improved optimization of CTSP annealing, resulting in an enhanced solution quality, as depicted in Fig. 5b. Consequently, the time to convergence s would also increase with the use of more SMTJS. When dealing with a fixed hardware capacity, an appropriate number of SMTJs for CTSP optimization can be assigned, taking into account both the solution quality and convergence speed. Figure 5c showcases the success rate (defined as achieving 95% solution quality) as the problem size increases. The success probability of 200-city TSP, whose complexity is ~40,000 nodes, can reach as high as 90%, demonstrating the scalability of our method compared to typical TSP (without GP and CTSP)⁹.

**Fig. 5: Scaling with problem size and cross-bar implementation towards large-scale Ising computing.**

We also propose a cross-bar architecture for large-scale Ising computer implementation, which can be integrated by using modern MRAM and CMOS technologies. The core part of this architecture consists of SMTJ bit cells organized as a cross-bar array, integrated with row decoders and read sense amplifiers (RSA), as shown in Fig. 5d. Each SMTJ bit cell contains one select transistor and one SMTJ (1T1SMTJ), whereas the gate of the select transistor is driven by word lines (WL), and the source of all bit cells are connected to the ground. Each bit line is assigned with an RSA. The current flows through SMTJ can be continuously adjusted by Vin of RSA, and the state of SMTJ can be read by RSA at the same time. Figure 5e illustrated the circuit of RSA, in which two clamp transistors control the current flow through the bit cell path and reference path by the gate voltage (Vin), and a current mirror is used to guarantee the same current of the above two paths. Then different voltages would show in the Q and QB point when the resistance of SMTJ is higher or lower than the reference resistor (Rref). By utilizing an enabled voltage sense amplifier (VSA), the voltages at the Q and QB points are sensed, allowing the SMTJ state to be determined as either Vdd (P state) or 0 V (AP state). Particularly, a voltage equalization circuit (VEC) is designed for initializing VSA to avoid incorrect readout. Electrical coupling through a resistance change⁴³ is evaluated to have neglectable effects (details in Supplementary Note 11). Figure 5f shows the signals to control and read bit cells. In phase 0 (PH0), one row of SMTJs is selected by WL, and Vin prepared by peripheral circuit is applied to the corresponding RSA. EQ is set high to initialize Q, QB and Vout as Vdd/2. In phase 1 (PH1), the SMTJ fluctuates from the falling edge to next rising edge of EQ. Finally, in phase 2 (PH2), RSAs read the data of one row in parallel at the falling edge of SEN. After the first row has been retrieved, the partial sum starts to be computed. Meanwhile, the same process for the second row can be started, so and so forth. To avoid reading the previous state, the duration of PH1 is preferred to be comparable with the retention time of SMTJ, which limits the main frequency of the system (see details in Supplementary Note 11).

Discussion

We compare our system with other state-of-art Ising solvers, including CMOS annealer (Intel Core i7 processor)⁷, quantum annealer (D-Wave 2000Q)^16,17, CIM with FPGA²⁶, memristor Hopfield neural networks (mem-HNN)⁴⁴, and phase-transition nano-oscillators (PTNO)²⁸ in solving 4761-node TSP70, as shown in Table 1. We use the experimental data for benchmarking from literature, and two kinds of SMTJs for comparison. One is our perpendicular anisotropy SMTJ device and the other is assuming recently reported in-plane anisotropy SMTJ with a retention time of 8 ns^45,46. The major attributes are the main frequency (defined as 1/iteration time), power, time-to-solution as well as energy efficiency (defined as solutions per second per watt). As quantum computers, CIM, mem-HNN, and PTNO only demonstrated ~100-node max-cut problems, we estimate the time-to-solution for solving TSP70 by assuming that the algorithm and the total number of spins to find a near-optimal solution is the same as our work (details in Supplementary Note 13). Here, we set 80-spin Ising computer as a standard and fix the number of iterations of 400,000 for a good solution to TSP70. Only Ising computing parts are calculated for power consumption.

Table 1 Comparison of state-of-art Ising computers

Full size table

In Table 1, although the main frequency of CPU is the highest among all candidates, the energy efficiency is lower than our SMTJ-based approach. This is due to the redundant logic and data transfer delay between the memory and PEs in a conventional von-Neumann architecture. The SMTJ-based approach currently outperforms the quantum annealer both in the power consumption as well as time to solution. The power of quantum annealer is huge which needs to be optimized further for real applications. CIM is another promising architecture with a fast speed and acceptable power consumption. Current CIM systems are proof-of-concept systems which are not at present optimized for energy efficiency. Mem-HNN has a relatively fast speed assuming the 180-nm CMOS technology. However, the required number of devices is large, which limits the integrated density. The PTNO approach uses capacitors or resistors to mimic spin coupling, whose main frequency would be limited by the system scale and parasitic effects. It is reported that the ideal main frequency would decrease from 500 to 87 MHz when the system scale increases from 8-node to 100-node²⁸. Our SMTJ-based Ising computer outperforms other approaches with low power consumption with 0.64 mW (details in Supplementary Note 13).

We experimentally demonstrate perpendicular MTJs with a retention time of ~0.1 ms and solve TSP70 Ising problems at an energy efficiency of 39 solutions per second per watt. Furthermore, we simulate an Ising computer with 4 Kb SMTJs using 40 nm commercial CMOS technology. The simulated energy efficiency for solving TSP70 by using the same SMTJ can reach 68 solutions per second per watt. By using reported in-plane SMTJ⁴⁵ and advanced CMOS, the system could obtain the highest energy efficiency of $5.4\times {10}^{3}$, which shows several orders of magnitude improvement over other approaches. This result suggests that an SMTJ-based Ising computer can be a good candidate for solving dense Ising problems in a highly energy-efficient and fast way.

In summary, we have experimentally demonstrated an intrinsic all-to-all Ising computer based on 80 SMTJs, and solved 9-city TSP with the optimal solution. Furthermore, a compressing strategy based on CTSP and GP is proposed to experimentally solve 4761-node 70-city TSP on an 80-node system with a near-optimum solution as well as ultra-low energy consumption. A cross-bar architecture is then proposed for large-scale Ising computers and the 200 city TSP task is simulated. Our system provides a feasible solution to fast, energy-efficient, and scalable Ising computing schemes to solve NP-hard problems.

Methods

Sample growth and device fabrication

Thin film samples of substrate/[W (3)/Ru (10)]₂/W (3)/Pt (3)/Co (0.25)/Pt (0.2)/[Co (0.25)/Pt (0.5)]₅/Co (0.6)/Ru (0.85)/Co (0.6)/Pt (0.2)/Co (0.3)/Pt (0.2)/Co (0.5)/W (0.3)/CoFeB (0.9)/MgO (1.1)/CoFeB (1.5)/Ta (3)/Ru (7)/Ta (5) were deposited via DC (metallic layers) and RF magnetron (MgO layer) sputtering on the Si substrates with thermal oxide of 300 nm with a base pressure of less than $2\times {10}^{-8}$ Torr at room temperature. The numbers in parentheses are thicknesses in nanometers. To fabricate the superparamagnetic tunnel junctions, bottom electrode structures with a width of 10 µm were firstly patterned via photolithography and Ar ion milling. MTJ pillar structures with a diameter of ~50 nm for the superparamagnetic behavior were patterned by using e-beam lithography. The encapsulation layer of Si₃N₄ was in-situ deposited after ion milling without breaking vacuum by using RF magnetron sputtering, and top electrode structures with a width of 10 µm were patterned via photolithography and top electrodes of Ta (5 nm)/Cu (40 nm) were deposited by using DC magnetron sputtering.

MTJ characterization by probe station

The setup includes a source meter (Keithley 2400) for supplying DC bias currents and a data acquisition card (NI-DAQmx USB-6363) for the read operation. A single SMTJ operation cycle comprises two steps (i.e. bias and read). A small DC input current with an amplitude of 1–20 μA is applied to SMTJ. Simultaneously, the DAQ card reads the voltage signal across the SMTJ at a maximum sampling rate of 2 MHz. The MTJ switching probability varies in accordance with the amplitude of applied currents. The retention time of MTJ is determined from random telegraph noise measurements over 250 ms. The expectation values of event time τ is determined by fitting an exponential function to the experimental results.

Ising PCB

80 SMTJ arrays and peripheral circuits are integrated on a 12 cm × 15 cm PCB, controlled by an MCU (Arduino Mega 2560 Rev3). Four 12-bit rail-to-rail DACs (AD5381) with 160 output channels in total are used to generate analog DC inputs for PE and comparator arrays. Half of the DAC output channels are used to provide stimulation to the gate terminal of NMOSs (2N7002DW-G), and others are used to provide reference voltages to comparators (AD8694). The drain voltages of NMOS are compared with reference voltages and generate outputs in parallel. Outputs of comparator arrays are read by MCU through four multiplexers (FST16233) and then are calculated to obtain new inputs for DACs. The supply voltage of the PCB board and SMTJs is 5 V and 0.8 V, respectively. The value of resistors in each computing unit can be designed to adjust the center of sigmoidal curves.

Data availability

The data generated during this study are available within the article and the Supplementary Information file. Source data are provided with this paper.

Code availability

The codes that support this study can be available from the corresponding author upon request.

References

Theis, T. N. & Wong, H. S. P. The End of Moore’s Law: A New Beginning for Information Technology. Comput. Sci. Eng. 19, 41–50 (2017).
Article Google Scholar
Shim, Y., Jaiswal, A. & Roy, K. Ising computation based combinatorial optimization using spin-Hall effect (SHE) induced stochastic magnetization reversal. J. Appl. Phys. 121, 193902 (2017).
Article ADS Google Scholar
Tindell, K. W., Burns, A. & Wellings, A. J. Allocating hard real-time tasks: An NP-Hard problem made easy. J. Real.-Time Syst. 4, 145–165 (1992).
Article Google Scholar
Borders, W. A. et al. Integer factorization using stochastic magnetic tunnel junctions. Nature 573, 390–393 (2019).
Article ADS CAS PubMed Google Scholar
Tatsumura, K., Hidaka, R., Yamasaki, M., Sakai, Y. & Goto, H. A Currency Arbitrage Machine Based on the Simulated Bifurcation Algorithm for Ultrafast Detection of Optimal Opportunity. in 2020 IEEE International Symposium on Circuits and Systems (ISCAS) 1–5 (IEEE, 2020). https://doi.org/10.1109/ISCAS45731.2020.9181114.
Cohen, E., Carmi, M., Heiman, R., Hadar, O. & Cohen, A. Image restoration via ising theory and automatic noise estimation. In 2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB) 1–5 (IEEE, 2013). https://doi.org/10.1109/BMSB.2013.6621708.
Zhou, A.-H. et al. Traveling-Salesman-Problem Algorithm Based on Simulated Annealing and Gene-Expression Programming. Information 10, 7 (2018).
Article Google Scholar
Garza-Santisteban, F. et al. A Simulated Annealing Hyper-heuristic for Job Shop Scheduling Problems. In 2019 IEEE Congress on Evolutionary Computation (CEC) 57–64 (IEEE, 2019). https://doi.org/10.1109/CEC.2019.8790296.
Lucas, A. Ising formulations of many NP problems. Front. Physics 2, 5 (2014).
Albash, T. & Lidar, D. A. Adiabatic quantum computation. Rev. Mod. Phys. 90, 015002 (2018).
Article ADS MathSciNet Google Scholar
Dickson, N. G. & Amin, M. H. S. Does Adiabatic Quantum Optimization Fail for NP-Complete Problems? Phys. Rev. Lett. 106, 050502 (2011).
Article ADS PubMed Google Scholar
Heim, B., Ronnow, T. F., Isakov, S. V. & Troyer, M. Quantum versus classical annealing of Ising spin glasses. Science 348, 215–217 (2015).
Article ADS MathSciNet CAS PubMed Google Scholar
Kadowaki, T. & Nishimori, H. Quantum annealing in the transverse Ising model. Phys. Rev. E 58, 5355–5363 (1998).
Article ADS CAS Google Scholar
Martoňák, R., Santoro, G. E. & Tosatti, E. Quantum annealing of the traveling-salesman problem. Phys. Rev. E 70, 057701 (2004).
Article ADS Google Scholar
Okuyama, T., Hayashi, M. & Yamaoka, M. An Ising Computer Based on Simulated Quantum Annealing by Path Integral Monte Carlo Method. In 2017 IEEE International Conference on Rebooting Computing (ICRC) 1–6 (IEEE, 2017). https://doi.org/10.1109/ICRC.2017.8123652.
Hamerly, R. et al. Experimental investigation of performance differences between Coherent Ising Machines and a quantum annealer. Sci. Adv. 5, eaau0823 (2019).
Article ADS PubMed PubMed Central Google Scholar
Johnson, M. W. et al. Quantum annealing with manufactured spins. Nature 473, 194–198 (2011).
Article ADS CAS PubMed Google Scholar
Yamaoka, M. et al. A 20k-Spin Ising Chip to Solve Combinatorial Optimization Problems With CMOS Annealing. IEEE J. Solid State Circuits 51, 303–309 (2016).
Article Google Scholar
Yamaoka, M. et al. 24.3 20k-spin Ising chip for combinational optimization problem with CMOS annealing. In 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers 1–3 (IEEE, 2015). https://doi.org/10.1109/ISSCC.2015.7063111.
Davendra, D., Metlicka, M. & Bialic-Davendra, M. CUDA Accelerated 2-OPT Local Search for the Traveling Salesman Problem. In Novel Trends in the Traveling Salesman Problem (eds. Davendra, D. & Bialic-Davendra, M.) (IntechOpen, 2020). https://doi.org/10.5772/intechopen.93125.
Tatsumura, K., Dixon, A. R. & Goto, H. FPGA-Based Simulated Bifurcation Machine. In 2019 29th International Conference on Field Programmable Logic and Applications (FPL) 59–66 (IEEE, 2019). https://doi.org/10.1109/FPL.2019.00019.
Tatsumura, K., Yamasaki, M. & Goto, H. Scaling out Ising machines using a multi-chip architecture for simulated bifurcation. Nat. Electron 4, 208–217 (2021).
Article Google Scholar
Mathew, S. K. et al. μ RNG: A 300–950 mV, 323 Gbps/W All-Digital Full-Entropy True Random Number Generator in 14 nm FinFET CMOS. IEEE J. Solid-State Circuits 51, 1695–1704 (2016).
Article ADS Google Scholar
Pervaiz, A. Z., Sutton, B. M., Ghantasala, L. A. & Camsari, K. Y. Weighted p-Bits for FPGA Implementation of Probabilistic Circuits. IEEE Trans. Neural Netw. Learn. Syst. 30, 1920–1926 (2019).
Article PubMed Google Scholar
McMahon, P. L. et al. A fully programmable 100-spin coherent Ising machine with all-to-all connections. Science 354, 614–617 (2016).
Article ADS MathSciNet CAS PubMed Google Scholar
Inagaki, T. et al. A coherent Ising machine for 2000-node optimization problems. Science 354, 603–606 (2016).
Article ADS CAS PubMed Google Scholar
Sutton, B., Camsari, K. Y., Behin-Aein, B. & Datta, S. Intrinsic optimization using stochastic nanomagnets. Sci. Rep. 7, 44370 (2017).
Article ADS PubMed PubMed Central Google Scholar
Dutta, S. et al. An Ising Hamiltonian solver based on coupled stochastic phase-transition nano-oscillators. Nat. Electron 4, 502–512 (2021).
Article Google Scholar
Sharmin, S., Shim, Y. & Roy, K. Magnetoelectric oxide based stochastic spin device towards solving combinatorial optimization problems. Sci. Rep. 7, 11276 (2017).
Article ADS PubMed PubMed Central Google Scholar
Faria, R., Camsari, K. Y. & Datta, S. Implementing Bayesian networks with embedded stochastic MRAM. AIP Adv. 8, 045101 (2018).
Article ADS Google Scholar
Zand, R., Camsari, K. Y., Datta, S. & DeMara, R. F. Composable Probabilistic Inference Networks Using MRAM-based Stochastic Neurons. J. Emerg. Technol. Comput. Syst. 15, 1–22 (2019).
Article Google Scholar
Choi, V. Minor-embedding in adiabatic quantum computation: I. The parameter setting problem. Quantum Inf. Process 7, 193–209 (2008).
Article MathSciNet Google Scholar
Sugie, Y. et al. Minor-embedding heuristics for large-scale annealing processors with sparse hardware graphs of up to 102,400 nodes. Soft Comput 25, 1731–1749 (2021).
Article Google Scholar
Cai, J., Macready, W. G. & Roy, A. A practical heuristic for finding graph minors. Preprint at http://arxiv.org/abs/1406.2741 (2014).
Chaves-O’Flynn, G. D., Wolf, G., Sun, J. Z. & Kent, A. D. Thermal Stability of Magnetic States in Circular Thin-Film Nanomagnets with Large Perpendicular Magnetic Anisotropy. Phys. Rev. Appl. 4, 024010 (2015).
Article ADS Google Scholar
Mitra, D., Romeo, F. & Sangiovanni-Vincentelli, A. Convergence and finite-time behavior of simulated annealing. Adv. Appl. Probab. 18, 747–771 (1986).
Article MathSciNet Google Scholar
Mills, K., Ronagh, P. & Tamblyn, I. Finding the ground state of spin Hamiltonians with reinforcement learning. Nat. Mach. Intell. 2, 509–517 (2020).
Article Google Scholar
MP-TESTDATA - The TSPLIB Symmetric Traveling Salesman Problem Instances. http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/tsp/ (2013).
Dan, A., Shimizu, R., Nishikawa, T., Bian, S. & Sato, T. Clustering Approach for Solving Traveling Salesman Problems via Ising Model Based Solver. In 2020 57th ACM/IEEE Design Automation Conference (DAC) 1–6 (IEEE, 2020). https://doi.org/10.1109/DAC18072.2020.9218695.
Ezugwu, A. E.-S., Adewumi, A. O. & Frîncu, M. E. Simulated annealing based symbiotic organisms search optimization algorithm for traveling salesman problem. Expert Syst. Appl. 77, 189–210 (2017).
Article Google Scholar
Mohsen, A. M. Annealing Ant Colony Optimization with Mutation Operator for Solving TSP. Comput. Intell. Neurosci. 2016, 1–13 (2016).
Article Google Scholar
Wang, J., Ersoy, O. K., He, M. & Wang, F. Multi-offspring genetic algorithm and its application to the traveling salesman problem. Appl. Soft Comput. 43, 415–423 (2016).
Article Google Scholar
Talatchian, P. et al. Mutual control of stochastic switching for two electrically coupled superparamagnetic tunnel junctions. Phys. Rev. B 104, 054427 (2021).
Article ADS CAS Google Scholar
Cai, F. et al. Power-efficient combinatorial optimization using intrinsic noise in memristor Hopfield neural networks. Nat. Electron 3, 409–418 (2020).
Article Google Scholar
Hayakawa, K. et al. Nanosecond Random Telegraph Noise in In-Plane Magnetic Tunnel Junctions. Phys. Rev. Lett. 126, 117202 (2021).
Article ADS CAS PubMed Google Scholar
Safranski, C. et al. Demonstration of Nanosecond Operation in Stochastic Magnetic Tunnel Junctions. Nano Lett. 21, 2040–2045 (2021).
Article ADS CAS PubMed Google Scholar

Download references

Acknowledgements

This work was supported by National Research Foundation (NRF), Prime Minister’s Office, Singapore, under its Competitive Research Programme (NRF-000214-00 to H.Y.), Advanced Research and Technology Innovation Center (ARTIC to H.Y.), the National University of Singapore under Grant (project number: A-0005947-19-00 to H.Y.), and Ministry of Education, Singapore, under Tier 2 (T2EP50123-0025 to H.Y.). We thank Yuqi Su, and Chne-Wuen Tsai from National University of Singapore and Zhi-Da Song from Peking University for useful discussions.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore
Jia Si, Shuhan Yang, Yunuo Cen, Jiaer Chen, Yingna Huang, Zhaoyang Yao, Dong-Jun Kim, Kaiming Cai, Jerald Yoo, Xuanyao Fong & Hyunsoo Yang
Key Laboratory for the Physics and Chemistry of Nanodevices and Center for Carbon-based Electronics, School of Electronics, Peking University, Beijing, China
Jia Si

Authors

Jia Si
View author publications
You can also search for this author in PubMed Google Scholar
Shuhan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yunuo Cen
View author publications
You can also search for this author in PubMed Google Scholar
Jiaer Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yingna Huang
View author publications
You can also search for this author in PubMed Google Scholar
Zhaoyang Yao
View author publications
You can also search for this author in PubMed Google Scholar
Dong-Jun Kim
View author publications
You can also search for this author in PubMed Google Scholar
Kaiming Cai
View author publications
You can also search for this author in PubMed Google Scholar
Jerald Yoo
View author publications
You can also search for this author in PubMed Google Scholar
Xuanyao Fong
View author publications
You can also search for this author in PubMed Google Scholar
Hyunsoo Yang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.S. and H.Y. conceived and designed the experiments. J.S. designed, fabricated, and coded the hardware system. D.K., and S.Y. fabricated the devices. J.S., S.Y., and K.C. performed device measurements. Z.Y. bonded the components on PCB. J.S. designed SMTJ-based Ising system. J.S., J.C., Y.C., Y.H. and X.F. developed the optimization algorithm and performed simulations. J.S., S.Y., Y.C., J.Y., X.F. and H.Y. analyzed the data. J.S. and H.Y. wrote the manuscript. H.Y. proposed and supervised this work. All authors discussed the results and revised the manuscript.

Corresponding author

Correspondence to Hyunsoo Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Si, J., Yang, S., Cen, Y. et al. Energy-efficient superparamagnetic Ising machine and its application to traveling salesman problems. Nat Commun 15, 3457 (2024). https://doi.org/10.1038/s41467-024-47818-z

Download citation

Received: 16 June 2022
Accepted: 11 April 2024
Published: 24 April 2024
DOI: https://doi.org/10.1038/s41467-024-47818-z

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.