# Reliable and High Performance STT-MRAM Architectures based on Controllable-Polarity Devices

Kaveh Shamsi<sup>\*</sup>, Yu Bi<sup>\*</sup>, Yier Jin<sup>\*</sup>, Pierre-Emmanuel Gaillardon<sup>†</sup>, Michael Niemier<sup>‡</sup>, and X. Sharon Hu<sup>‡</sup>

\*Department of Electrical Engineering and Computer Science, University of Central Florida

<sup>†</sup>École Polytechnique Fédérale de Lausanne (EPFL) - Switzerland

<sup>‡</sup>Department of Computer Science and Engineering, University of Notre Dame

Abstract—Source degeneration of access devices in the parallel  $(P) \rightarrow$  anti-parallel (AP) switching in Spin Transfer Torque Magnetic Random Access Memories (STT-MRAM) has ultimately been a limiting factor in the operational speed of these types of memories. In this work, new architectures for memory singlecells and arrays of cells are presented that utilize Schottky-Barrier Silicon Nanowire Field Effect Transistors with polarity control capabilities (e.g., SiNW-FETs), to substantially increase the performance of STT-MRAM, specifically Multi-Level Cell (MLC) STT-MRAM. The proposed design offers built-in reliability improvement as it omits one of the available four states in the MLC STT-MRAM memory facilitating the resistance level detection for peripheral circuitry. Our simulation results of the developed memory cell show 49.7% reductions in  $P \rightarrow AP$ switching time, as well as 51.3% increases in available drive current under 1.4V supply voltage when compared to FinFET 22nm technology. With respect to memory arrays, the proposed architecture demonstrates an average write latency reduction of 37% in comparison with FinFET 22nm technology node.

# I. INTRODUCTION

The limitations of conventional memory cells, such as limited density and high leakage power of SRAM, as well as fundamental scaling issues associated with off-chip DRAM have motivated the development of novel memory cells based on emerging devices. A variety of nonvolatile memories are being studied as substitutes for conventional storage/memory units in different layers of the memory hierarchy [1]. Spintransfer-torque magnetic RAM (STT-MRAM) has shown great promise as a nonvolatile, near-zero leakage, high-density and fully CMOS-compatible memory [2]. Drawbacks to STT-MRAM include high write power, long write latency, and reliability issues such as read-disturbance and read/write errors [3]. Still, despite the limitations, STT-MRAM could be a formidable competitor to conventional SRAM - especially in the last layers of on-chip cache where high density and minimal leakage power are often more desirable than low access time [4].

To further increase the overall storage capacity of STT-MRAM, multi-level cell STT-MRAM (MLC STT-MRAM) has been proposed to store multiple bits in a single cell [5]. However, when compared to single-level cell STT-MRAM (SLC STT-MRAM), MLC STT-MRAM operates with a doublecycle writing scheme and due to MLCs higher resistances compared to that of SLCs, larger access devices with more drive current may be required. The more severe sourcedegeneration effect during weak-write further tightens the design constraints on the access device. As STT-MRAM cell area is solely determined by the size of the access transistor and interconnects rather than the MTJ itself [6], MLC may lose its bit-density advantages to SLC when considering the increased read/write bit-error rate as well. System and circuit level techniques have been proposed to improve the robustness and performance of MLC STT-MRAM such that the benefits of increased capacity can be utilized by system applications [6], [7]. However, the access device's limited drive current, especially during weak write, remains an issue.

Meanwhile, research on emerging technologies is not limited to memory devices. Numerous academic and industrial research groups are also studying new transistor technologies that could continue dimension scaling performance benefits of Moore's Law [1]. Among the transistor technologies being studied are polarity controllable Schottky-Barrier FETs (SBFETs), which have been proposed as devices that can potentially increase the functionality per transistor in logic circuits [8]. Operation of Silicon Nano-Wire FETs (SiNW-FET) and Carbon Nano-Tube FETs (CNT-FET) as SBFETs has been experimentally verified [9], [10].

In this work, we propose to use SiNW-FETs as the access transistor in the MLC STT-MRAM memory cell and array design. We aim to overcome the conventional FET source degeneration effect that results in degraded performance in STT-MRAM memory arrays during the weak write operation. Experimental results will be provided based on available device models both on the single cell and the memory array to uphold our claim that the combinations of multiple emerging devices may outperform traditional CMOS-emerging device based designs. Our contributions are summarized below:

- We introduce a new MLC STT-MRAM cell design that leverages ambipolar behavior existent in polarity controllable FETs to address the source degeneration issue in STT-MRAM. To the best of our knowledge, this is the first work exploiting ambipolar behavior for such purpose. The proposed approach yields significant performance improvements.
- We propose a self-contained memory array implementing a state-restricted MLC scheme that improves the reliability of MLC MRAM and we utilize memory array simulation tools to demonstrate the performance improvement.



Fig. 1: IMAMTJ and PMAMTJ

 We design layouts for a 22nm technology node SiNW-FET MRAM cell and show that the proposed architecture area overheads compared to FinFET based cells are not significant.

The rest of the paper is organized as follows: Section II briefly reviews the basic concepts of MTJs and SiNW-FETs, write-asymmetry and discusses related work. Section III provides simulation and analysis of an olar transistor-based cell. The SiNW-FET based MLC STT-MRAM individual cell's performance and structure is elaborated in Section IV. Memory array design, operation mechanisms, and layout as well as simulation results verifying the array performance are presented in Section V. Section VI concludes with future work and the security motivations for developing symmetric write STT-MRAMs.

# II. BACKGROUND

In this section, we briefly review the device technologies (and representative terminology), the problem that we aim to solve, and the related work on the topic.

# A. Magnetic Tunnel Junctions

A magnetic tunnel junction (MTJ) consists of two ferromagnetic layers separated by a thin insulator. One ferromagnetic layer is magnetically pinned to a certain direction, and is thus referred to as the pinned-layer (PL). The other layer is referred to as the free-layer (FL). FL state can be altered by means of external field or spin-polarized current. When the PL and FL are aligned in a parallel (P) state, the structure has a low resistance and is assumed to represent logic '0'. When the FL and PL are anti-parallel (AP) the resistance of the structure is different/higher, and can thus represent logic '1'. When a current passes through the device, the state/magnetic polarization of the MTJ's free layer can change via STT, if the current density in the free-layer is higher than a certain value  $(J_c)$  [2]. MTJ devices can be comprised of FLs and PLs that have either in-plane magnetic anisotropy (IMA) or perpendicular magnetic anisotropy (PMA) (see Figure 1). While both structures have been experimentally demonstrated [11], we use PMA device models as PMA devices typically offer lower switching currents/delays when compared to devices with inplane magnetic anisotropy and are thus better candidates for future MRAMs [11].

# B. Silicon Nano-Wire FETs

Ambipolar behavior in field-effect transistors can be achieved using Schottky drain and source contacts. By adjusting the barrier height by means of an electrostatic field,



Fig. 2: 3D sketch of the SiNW-FET featuring 2 independent gates and its associated symbol (3D sketch is not drawn to scale).



Fig. 3: (a) SLC STT-MRAM cell. (b) Write asymmetry.

the type of carriers in the channel can be altered. Polaritycontrollable Schottky-Barrier FETs (SBFETs) have been fabricated successfully using silicon [9], and carbon nano-tubes [12] based on this principle. SiNW-FETs as shown in Figure 2a consist of a number of vertically-stacked nano-wires as the channel, covered by Gate-All-Around (GAA) electrodes [9]. The GAAs provide solid control of charge transport through the nanowires rendering an ultra-low leakage FET. The device's 3D structure is an evolution of a FinFET and is therefore fully compatible with the FinFET fabrication processes. Multiple stacks of nanowires (or "fins" in the context of FinFETs) can be added in parallel to increase drive current. As shown in Figure 2b the polarity gate contact (PG) will decide if the FET behaves as p-type or n-type (NMOS/ntype and PMOS/p-type mode in this context). The control gate (CG) will control the current through the channel as is the case with a conventional MOSFET gate contact.

# C. STT-MRAM Cell

A 1-transistor, 1-MTJ (1T-1MTJ) cell is depicted in Figure 3a. Current can be driven through the MTJ in both directions via exerting complementary voltages on the bit-line (BL) and source-line (SL), and asserting a *write pulse* on the write-line (WL) which is connected to the NMOS gate pin [13]. The write pulse must be sufficiently long such that the "slowest cell" will still switch with high probability. During a *read*, the resistance of the STT-RAM cell will be compared to a reference value using a sense-amplifier to determine the state of the device, and produce a logic level voltage output [14]. A lot of work has been done to optimize the 1T-1MTJ cells at both the circuit and device levels [15].

#### D. Write Asymmetry and Related Work

That said, per Figure 3b, a long-standing challenge with 1T-1MTJ cells is asymmetric drive currents as the NMOS device has to drive current through the MTJ in both directions. When writing a logic '0' (a "weak write") the NMOS device will conduct with a smaller  $V_{gs}$ , and thereby delivers smaller current. Furthermore, the charge-spin interaction is fundamentally less efficient during the  $P \rightarrow AP$  switching as compared to an  $AP \rightarrow P$  transition [15]. This can aggravate the NMOS driving power asymmetry or mitigate it based on whether the FL is connected to the BL or to the NMOS drain pin (the latter case is referred to as reversely connected MTJ. See [13]). Still, as [16] indicates, the asymmetric drive power of the access transistors contributes to a large delay and power overhead. Moreover, the width of the write pulse has to be extended for the weak write to complete causing excess power loss on cells that have already switched to a new/desired state.

The authors of [16] tried to overcome the limitation by exerting a negative voltage on the BL for the weak write to maintain the  $V_{gs}$  and driving power of the NMOS device during the slow write operation. Other proposed solutions include fabricating reversely connected MTJs [13].

However, although the existing methods can improve the weak write operation, they actually lead to the deterioration of the strong write operation. As such, it is obviously more desirable to maintain strong write performance and *improve* weak write performance.

#### III. AMBIPOLAR FET BASED MRAM CELL

In this work, we study how an SBFET-based cell could impact the performance of weak/strong writes. We will leverage the ambipolarity provided by SBFETs to reduce the asymmetry during slow write while keeping the transistor count to one. If the SBFET can switch to PMOS mode during the slow write, the source degeneration issue can be eased. However, ptype devices generally exhibit inferior performance compared to n-type devices. To study the effect in depth, in this section, we provide analysis on the phenomena as well as introducing the simulation framework.

#### A. Simulation Framework

In our simulations, we use a SPICE model of an MTJ from [17] [18]. This model is based on the underlying physical equations of spin-charge interaction as well as being used in related work on write asymmetry [16]. The current through the device, which exerts the spin-transfer torque on the free layer, is modeled as an effective field in the Landau-Lifshitz-Gilbert equation. Non-Equilibrium Green Functions (NEGF) are used to calculate the resistance characteristics of the device. The model is described in Verilog-A format and the parameters listed in Table I have been used for the MTJ. As for the access transistor, 45nm Bulk-CMOS and 22nm FinFET, PTM models have been used as fixed-polarity FETs [19] and a SiNW-FET model is extracted from TCAD simulations verified by experimental device data and calibrated to a 22nm process

TABLE I: MTJ Model Parameters Used for Simulations



Fig. 4: Weak write operation. (a) NMOS device. (b) PMOS device.

node (22nm gate areas). The 45nm Bulk-CMOS is used as most reported experimental MRAM prototypes utilize this technology, and FinFETs are used since they are the leading technology for sub-45nm integration as they offer high-drive current and low leakage in deeply scaled geometries. Fins of 15nm width and 30nm height have been used for FinFET access devices. The SiNW-FET can be fabricated with stacks of 4 or 6 nanowires and we use 6-nanowire stacks for their higher drive current. FinFET 22nm low-standby-power (LSTP) technology performance is comparable to our 6N stack SiNW-FET model and is thus used as the main reference in the simulations. 45nm Bulk-CMOS technology is used in some of the analysis for a reference to the current MRAM technology.

#### B. NMOS/PMOS Mode Weak Write Analysis

P-type and n-type access devices (or modes) conveying a weak write are depicted in Figure 4. Current passing through the resistive memory element will raise the source pin voltage of the n-type access transistor as seen in Figure 4a and reduce the  $V_{gs}$  of the access device, thereby reducing its drive current. If a p-type device is used, as depicted in Figure 4b, it will experience a full swing  $V_{gs}$ . However, the drive current produced by a p-type device is lower than an n-type device due to the difference in electron and hole mobility. Thus, the conditions for which a p-type device can deliver more drive current (which is the main metric for a stronger write) than an n-type device can be formalized as follows:

$$V_{gsn} = V_g - V_s$$
  
=  $V_g - R \times I_c$   
 $V_{gsp} = V_{DD}$   
 $I_{cp}(V_{gsp}) \ge I_{cn}(V_{gsn})$  (1)

Although drive-current maintains a near-square relation with  $V_{gs}$  throughout the "ON" region of the device operation, the drive current of the deep sub-micron technologies is a complicated function of various parameters including  $V_{gs}$ . To find the range for which the p-type device prevails, HSPICE simulation results of the drive current delivered by the p-type and n-type minimum-size devices in the FinFET and SiNW-



Fig. 5: Drive current vs. source resistance in a 1T-1R cell during weak write for n-type and p-type minimum width devices at 1V supply voltage.

FET technologies versus the resistance of the memory element are plotted in Figure 5. It can be seen that as soon as the resistance (R) rises above  $1k\Omega$  the source-degenerated n-type device drive current will fall below that of the p-type device, whereas typical MTJ resistance levels are above this value proving that an SBFET in p-type mode will perform better in weak write operations.

Despite the advantage of p-type transistors (mode) conveying the weak write, it is difficult to implement both types of devices into a memory array row. As the WL signal is shared among the access transistor gate pins, it is impossible to turn both n-type and p-type devices "ON" in the same cycle. A high voltage on WL will have the n-type devices conducting while the p-type devices are cut-off and vice versa. In the next section we describe how the MLC double-cycle access mechanism can utilize the ambipolar devices perfectly.

### IV. MLC SINW-FET STT-MRAM CELL DESIGN

# A. MLC STT-MRAM

MLC STT-MRAM has recently been proposed as a technique to increase STT-MRAM density [5]. The cell is constructed by adding one more MTJ in either series or parallel with the original MTJ. The series MLC as depicted in Figure 6a was pursued due to the higher reliability and lower sensitivity to process variation [20] and is thus used in this paper. Two MTJ devices of different cross-sectional areas are physically fabricated on top of each other resulting in an element that can have up to four separate resistance states. As STT switching is dependent on the current density in the free-layer, the MTJ with a larger free-layer volume will require a higher current to switch and is therefore referred to as the hard-bit [6]. The smaller MTJ will have a higher resistance but can switch at lower currents and is referred to as the soft-bit. It is worthy to note that drive currents below the critical switching current density  $(I_c)$  of the larger MTJ can still eventually switch it due to the thermal activation process [15], however, this requires a much longer time. Therefore, terminating the WL pulse before the large MTJ switching occurs should allow the soft-bit to switch state without disturbing the hard-bit.

As shown in Figure 6a that for the write operation, the soft bit can be changed by injecting small currents in different directions. However changes to the hard-bit will alter the state of the soft-bit demanding a two step writing mechanism



Fig. 6: (a) Serial MLC STT-MRAM cell. (b) Proposed SiNW-FET MRAM state-diagram (note that state '01' cannot be reached).



Fig. 7: P $\rightarrow$ AP switching energy and maximum MTJ voltage vs. supply voltage for BL/SL

to enable transitioning arbitrarily between any two of the available four states. For the read operation, the resistance of the cell is compared to three resistance references to determine the soft-bit and hard-bit [5]. Although MLC STT-RAM structure obviously improves overall storage density, higher read/write time/energy as well as increased read/write error rates degrade system level performance [6]. As the MLC structure adds another MTJ in series to the cell, similar to the SLC, it suffers from the degradation of write performance in the weak write as well. In fact, the added resistance of the second MTJ aggravates the source degeneration effect and asymmetry further increasing the minimum allowed size for the access device. Therefore, MLCs can greatly benefit from improvements in weak write performance.

# B. SiNW-FET based MLC STT-MRAM

In this section, we present an MLC STT-MRAM cell that utilizes SBFETs and we compare the performance of a single cell against counterpart FET technologies. As the MLC double-MTJ structure has a higher resistance, higher voltages may be required on the BL and SL to allow for switching of both MTJs with satisfying switching times. As in MLCs the voltage across the cell is divided between the two MTJs and the access device, therefore, BL/SL voltages can be boosted while keeping the oxide voltage of individual MTJs in



Fig. 8:  $P \rightarrow AP$  switching performance. (a) Switching time. (b) Drive current.

a safe range (large voltages accross the MTJ can cause oxidebreakdown). Figure 7 shows a plot of switching time/energy and maximum voltage across smaller MTJ versus BL/SL voltage, demonstrating the trade-off between switching energy (delay follows same trend as energy) and oxide reliability. A 1.4V BL/SL voltage is chosen so that all three technologies are able to complete the P $\rightarrow$ AP switching for the purpose of our comparisons while  $V_{MTJ}$  stays below 0.8V. The MLC structure is simulated by connecting two MTJs in series. The cross-sectional dimension parameters (cylinder radius) of the large MTJ are set to 1.41 times of the small MTJ. Other parameters are equally set to the values listed in Table I for both MTJs. The 1.41 ratio in dimensions will lead to an area ratio of 2X giving the large MTJ an  $I_c$  twice of that of the smaller MTJ.

Transient simulations are carried out by connecting the BL/SL to high/low voltage (1.4V/0V) and asserting a WL pulse. PG is biased based on whether p-type or n-type mode is demanded. Switching time, which is the energy/current measurement window as well, is considered as the time period between the start of the WL pulse to when the magnetization of the larger (slower) MTJ reaches 90% of the intended value. Switching time and average drive current delivered during the switching of MLCs for the  $P \rightarrow AP$  and  $AP \rightarrow P$  transitions are presented in Figures 8 and 9. As for the weak write operation ( $P \rightarrow AP$ , see Figure 8), an average decrease in delay of 49.7% and 82.7% is observed for SiNW-FET p-type mode compared to 22nm FinFET and 45nm Bulk-CMOS n-type devices respectively. Also, there is an average increase in weak write drive current (51.3%) when compared to 22nm FinFET technology - all while strong write performance is maintained as seen in Figure 9. As a good way of demonstrating the advantage of letting the p-type mode handle weak writes, the SiNW-FET itself has been used in n-type mode as seen in Figure 8a and a switching time degradation penalty of 309.3% is visible. A summary of cell switching performance for devices with 2 fins/stacks/minimum-width can be viewed in Table II.

# V. SINW-FET MLC-STT-MRAM ARRAY DESIGN

As the performance of the individual cell of the SiNW-FET based MLC STT-MRAM was illustrated in the previous section, this section will cover the design of the memory array.



Fig. 9: AP $\rightarrow$ P switching performance. (a) Switching time. (b) Drive current.



Fig. 10: Two-step write procedure. (a) Step 1. (b) Step 2.

#### A. State Restricted Array Operation

A two-step write operation of the proposed memory array is depicted in Figure 10 where the polarity-gates of the SiNW-FETs are connected to their respective BLs. As we will explain shortly, connecting the PG to the BL will provide



Fig. 11: Layout of the proposed cells with 3 fingers. (a) FinFET-Based MRAM cell. (b) Proposed SiNW-FET-based MRAM cell.

| Technology      | CMOS   | 5 45nm           | FinFET | Г 22nm | SiNW-FET |           |             |        |  |
|-----------------|--------|------------------|--------|--------|----------|-----------|-------------|--------|--|
|                 |        |                  |        |        | Polarity | Switching | n-mode only |        |  |
|                 | AP→P   | P→AP             | AP→P   | P→AP   | AP→P     | P→AP      | AP→P        | P→AP   |  |
| Transistor Area | 90×4   | 5nm <sup>2</sup> | 2 fins |        | 2 1      | fins      | 2 fins      |        |  |
| Switching Time  | 2.83ns | 37.4ns           | 2.5ns  | 14.4ns | 2.24ns   | 6.47ns    | 2.24ns      | 31.7ns |  |
| WL Pulse Width  | 37.4ns | 37.4ns           | 14.4ns | 14.4ns | 6.47ns   | 6.47ns    | 31.7ns      | 31.7ns |  |
| Write Energy    | 0.46pJ | 2.03pJ           | 0.45pJ | 1.41pJ | 0.44pJ   | 0.88pJ    | 0.44pJ      | 2.6pJ  |  |
| Average Current | 119uA  | 57.3uA           | 133uA  | 69.8uA | 146uA    | 98uA      | 146uA       | 57.3uA |  |

TABLE II: MLC STT-RAM single cell write comparison using different technologies and strategies

the appropriate biasing that the PG requires for our proposed writing mechanism. Figure 11 depicts an approximate SiNW-FET based MRAM cell layout that shows insignificant area overhead compared to the FinFET MRAM cell, mainly due to the self-aligned formation of the polarity and control gate areas (See [9] for photos of the fabricated SiNW-FETs and fabrication procedures demonstrating the self-aligned gates).

As for the writing procedure with such cell layout, the write operation will complete in two steps: In the first step, as seen in Figure 10a, all BLs will be charged to high voltage operating the access FETs to NMOS mode while the SLs are grounded. WL is asserted high on the active row while holding the WLs of the inactive rows low. This will switch all MLCs in the active row to the parallel-parallel state '00' (PP) through a strong write operation by n-type mode devices while inactive rows are not disturbed. In the second step, as depicted in Figure 10b, based on the state that is intended for the specific cell, SL/BL configurations are as follows: 1) The BL and SL of cells that intended the already achieved '00' state are disconnected using tri-state BL and SL drivers. 2) The SL of cells that aim for a '11' state are charged to full supply voltage while their BL is grounded. 3) The SL of cells that intend a '10' state are exerted with a lower voltage (0.5V) while their BL is grounded. As all BLs are grounded, all devices operate in p-type mode. Therefore, a low WL pulse on the active row while holding high the WL of the inactive rows will switch the cells with full swing SL voltages to '11' state, while only switching the soft bit of the '01'-intended cells, all through weak write operations conveyed by p-type mode devices.

The state diagram of the proposed architecture is depicted in Figure 6. It can be noticed that the proposed writing mechanism will leave the '01' state unreachable. However, as presented by Wen et al. in [7] the '01' and '10' overall resistance distributions are close to one another and thus having both states results in high bit-error rates requiring large Error-Correction Code (ECC) units. The authors of [7] proposed State-Restricted MLC (SR MLC) which omits one of the '01' and '10' states and utilizes ternary coding. They proved that system level performance and reliability improves even though 25% of the available capacity is discarded. Our array structure will implement the SR MLC STT-MRAM automatically with high-performance.

# B. Proposed Array Evaluation

To verify the performance of the proposed array for when PG is connected to BL, the NVSIM tool is used [21]. Terminal capacitance values are extracted from HSPICE simulations for CMOS and FinFET technologies based on the PTM models. The capacitance of SiNW-FET CG and PG terminals are derived from TCAD simulations. Extracted capacitance values are integrated into the BL, SL and WL derivations in NVSIM considering the added capacitance of PG to the BL for SiNW-FET MRAM. The tool has also been modified to calculate timing and power information for the State-Restricted MLC STT-MRAM based on the proposed double step writing mechanism. Power is calculated by averaging the power consumption of the different write scenarios explained previously. The leakage power calculation of inactive rows has also been added to the software using data collected from HSPICE leakage simulations. The peripheral circuitry which includes row/column decoders and sense-amplifier banks are implemented in the CMOS 22nm process node already available in NVSIM for all of the different cell types to provide a fair comparison of the effect of the cell structures on array performance. The data for different array sizes and transistor widths is presented in Table III and Figures 12, 13, 14, and 15.

As for the data presented in Table III, on average, the overall write time, a complete write cycle, for different array sizes across a range of access device fins/stacks/widths is decreased by 37% compared to FinFET 22nm and 55.4% compared to 45nm Bulk-CMOS. The increased BL capacitance will only effect the read operation by a small 1.62% increase in delay compared to FinFET 22nm and 1.43% compared to CMOS 45nm at widths of 2x and 3x which are the optimal widths for performance/area. The read delay will increase for larger array sizes yet the average read delay penalty is 14.2% and 12% compared to FinFET 22nm and 45nm Bulk-CMOS respectively. At the same driving power SiNW-FET leakage is lower than FinFET due to GAAs control over the channel and an average of 8.9% reduction in leakage across inactive rows is achieved using 6N SiNW-FETs compared to FinFET 22nm LSTP transistors with 15nm fin height. The Bulk-CMOS leakage power is orders of magnitude higher than FinFET and SiNW-FET and dominates the leakage power of the memory when the size of the array increases while for FinFET and SiNW-FET technologies the leakage is dominated by the peripheral circuitry's leakage power rather than inactive rows leakage.

| TABLE III: | NVSIM | array | simulation | results |
|------------|-------|-------|------------|---------|
|------------|-------|-------|------------|---------|

| Size |         |      | Latency(ns) |        |        |       |        | Energy(pJ) |         |         |         |        | Leakage(uW) |        |        |        |         |
|------|---------|------|-------------|--------|--------|-------|--------|------------|---------|---------|---------|--------|-------------|--------|--------|--------|---------|
|      |         |      |             | Write  |        |       | Read   |            | Write   |         | Read    |        |             | 1      |        |        |         |
| rows | columns | fins | SiNW        | FinFET | CMOS   | SiNW  | FinFET | CMOS       | SiNW    | FinFET  | CMOS    | SiNW   | FinFET      | CMOS   | SiNW   | FinFET | CMOS    |
| 64   | 64      | 2    | 8.864       | 17.034 | 37.635 | 1.616 | 1.593  | 1.595      | 35.015  | 55.435  | 96.914  | 7.715  | 7.701       | 7.698  | 4.862  | 4.908  | 38.690  |
| 64   | 64      | 3    | 7.130       | 11.885 | 17.646 | 1.630 | 1.604  | 1.607      | 35.867  | 44.403  | 57.212  | 7.726  | 7.710       | 7.710  | 4.883  | 4.957  | 55.630  |
| 64   | 64      | 4    | 6.288       | 10.044 | 13.556 | 1.643 | 1.613  | 1.616      | 35.024  | 40.837  | 48.516  | 7.736  | 7.718       | 7.718  | 4.899  | 5.005  | 72.569  |
| 64   | 64      | 6    | 5.403       | 8.487  | 10.689 | 1.669 | 1.627  | 1.630      | 34.320  | 37.967  | 43.023  | 7.757  | 7.729       | 7.730  | 4.941  | 5.097  | 106.441 |
| 64   | 64      | 8    | 4.954       | 7.758  | 9.551  | 1.694 | 1.638  | 1.643      | 34.412  | 36.418  | 41.090  | 7.774  | 7.738       | 7.741  | 4.973  | 5.184  | 140.310 |
| 128  | 128     | 2    | 8.901       | 17.062 | 37.664 | 1.660 | 1.622  | 1.627      | 71.917  | 111.135 | 194.208 | 15.415 | 15.393      | 15.388 | 9.798  | 10.014 | 146.215 |
| 128  | 128     | 3    | 7.182       | 11.917 | 17.679 | 1.694 | 1.638  | 1.644      | 74.535  | 89.171  | 115.347 | 15.438 | 15.408      | 15.408 | 9.883  | 10.197 | 214.494 |
| 128  | 128     | 4    | 6.355       | 10.080 | 13.593 | 1.725 | 1.652  | 1.659      | 73.763  | 82.138  | 98.239  | 15.457 | 15.420      | 15.421 | 9.947  | 10.373 | 282.769 |
| 128  | 128     | 6    | 5.497       | 8.538  | 10.743 | 1.787 | 1.681  | 1.692      | 74.180  | 76.604  | 87.829  | 15.494 | 15.442      | 15.445 | 10.077 | 10.746 | 419.340 |
| 128  | 128     | 8    | 5.079       | 7.824  | 9.620  | 1.850 | 1.708  | 1.723      | 76.194  | 73.710  | 84.540  | 15.530 | 15.460      | 15.465 | 10.206 | 11.098 | 555.891 |
| 256  | 256     | 2    | 9.050       | 17.125 | 37.726 | 1.790 | 1.681  | 1.694      | 151.410 | 223.345 | 389.937 | 30.809 | 30.764      | 30.747 | 19.895 | 20.751 | 567.687 |
| 256  | 256     | 3    | 7.369       | 11.992 | 17.774 | 1.880 | 1.719  | 1.739      | 160.302 | 179.829 | 234.409 | 30.850 | 30.794      | 30.793 | 20.155 | 21.499 | 841.905 |
| 256  | 256     | 4    | 6.580       | 10.176 | 13.700 | 1.970 | 1.756  | 1.782      | 162.419 | 166.173 | 201.346 | 30.890 | 30.817      | 30.820 | 20.414 | 22.207 | 1.116mW |
| 256  | 256     | 6    | 5.796       | 8.675  | 10.888 | 2.150 | 1.829  | 1.868      | 170.575 | 155.920 | 182.828 | 30.974 | 30.857      | 30.867 | 20.932 | 23.621 | 1.664mW |
| 256  | 256     | 8    | 5.448       | 8.003  | 9.811  | 2.330 | 1.902  | 1.954      | 181.928 | 150.955 | 178.555 | 31.061 | 30.895      | 30.912 | 21.451 | 25.036 | 2.213mW |
| 512  | 512     | 2    | 9.460       | 17.338 | 37.979 | 2.220 | 1.873  | 1.909      | 333.164 | 451.035 | 786.049 | 61.578 | 61.487      | 61.471 | 40.829 | 44.421 | 2.236mW |
| 512  | 512     | 3    | 7.893       | 12.223 | 18.071 | 2.510 | 1.974  | 2.044      | 365.594 | 365.640 | 483.713 | 61.674 | 61.543      | 61.545 | 41.867 | 47.256 | 3.335mW |
| 512  | 512     | 4    | 7.215       | 10.474 | 14.044 | 2.799 | 2.086  | 2.179      | 384.473 | 339.971 | 422.201 | 61.778 | 61.593      | 61.607 | 42.906 | 50.091 | 4.434mW |
| 512  | 512     | 6    | 6.647       | 9.110  | 11.350 | 3.377 | 2.310  | 2.450      | 430.075 | 322.750 | 394.391 | 62.018 | 61.681      | 61.725 | 44.984 | 55.761 | 6.632mW |
| 512  | 512     | 8    | 6.569       | 8.576  | 10.420 | 3.953 | 2.534  | 2.720      | 482.070 | 316.105 | 395.074 | 62.301 | 61.769      | 61.849 | 47.061 | 61.431 | 8.830mW |



Fig. 12: Write latency for arrays of different sizes in different technologies.

# VI. CONCLUSION AND SECURITY DISCUSSION

A memory array structure based on controllable-polarity access transistor cells was presented that showed a large performance improvement as well as built-in reliability for the MLC STT-MRAM. The writing asymmetry in the different write operations of the STT-MRAM is a potential source of power or timing side-channel information leakage and our future work includes investigating this source of vulnerability in STT-MRAM or other nonvolatile asymmetric memories. The remaining asymmetry in the write scenarios can be mitigated by reversely connecting the MTJ or fabricating intentionally asymmetric ambipolar FETs. From the implementation perspective, the recent development of fin-based SBFETs thwarts some of the difficulties of fabricating defect free nanowires and GAAs as well as increasing drive-current due to the fin structure [22]. Under the proposed circuit architecture SR MLC STT-MRAM with high density, reliability and speeds comparable to that of the SLCs, based on PMA MTJs does not seems far from reality.

#### REFERENCES

- [1] "International technology roadmap for semiconductors(itrs)," in http://www.itrs.net.
- [2] M. Hosomi, H. Yamagishi, T. Yamamoto, K. Bessho, Y. Higo, K. Yamane, H. Yamada, M. Shoji, H. Hachino, C. Fukumoto *et al.*, "A novel nonvolatile memory with spin torque transfer magnetization switching: Spin-ram," in *Electron Devices Meeting*, 2005. *IEDM Technical Digest. IEEE International*. IEEE, 2005, pp. 459–462.
- [3] W. Zhao, Y. Zhang, T. Devolder, J.-O. Klein, D. Ravelosona, C. Chappert, and P. Mazoyer, "Failure and reliability analysis of stt-mram," *Microelectronics Reliability*, vol. 52, no. 9, pp. 1848–1852, 2012.
- [4] W. Xu, H. Sun, X. Wang, Y. Chen, and T. Zhang, "Design of last-level on-chip cache using spin-torque transfer ram (stt ram)," *Very Large Scale Integration (VLSI) Systems, IEEE Transactions on*, vol. 19, no. 3, pp. 483–493, 2011.
- [5] T. Ishigaki, T. Kawahara, R. Takemura, K. Ono, K. Ito, H. Matsuoka, and H. Ohno, "A multi-level-cell spin-transfer torque memory with seriesstacked magnetotunnel junctions," in *VLSI Technology (VLSIT), 2010 Symposium on.* IEEE, 2010, pp. 47–48.



Fig. 13: Write energy for arrays of different sizes in different technologies.



Fig. 14: Read latency for arrays of different sizes in different technologies.



Fig. 15: Read energy for arrays of different sizes in different technologies.

- [6] X. Bi, M. Mao, D. Wang, and H. Li, "Unleashing the potential of mlc stt-ram caches," in *Proceedings of the International Conference on Computer-Aided Design*. IEEE Press, 2013, pp. 429–436.
- [7] W. Wen, Y. Zhang, M. Mao, and Y. Chen, "State-restrict mlc sttram designs for high-reliable high-performance memory system," in *Proceedings of the 51st Annual Design Automation Conference*, ser. DAC '14, 2014.
- [8] Y. Bi, P.-E. Gaillardon, X. Hu, M. Niemier, J.-S. Yuan, and Y. Jin, "Leveraging emerging technology for hardware security - case study on silicon nanowire fets and graphene symfets," in *Asia Test Symposium* (*ATS*), 2014, pp. 342–347.
- [9] M. De Marchi, D. Sacchetto, S. Frache, J. Zhang, P. Gaillardon, Y. Leblebici, and G. De Micheli, "Polarity control in double-gate, gateall-around vertically stacked silicon nanowire fets," in *Electron Devices Meeting (IEDM), 2012 IEEE International*, Dec 2012, pp. 8.4.1–8.4.4.
- [10] R. Martel, V. Derycke, C. Lavoie, J. Appenzeller, K. K. Chan, J. Tersoff, and P. Avouris, "Ambipolar electrical transport in semiconducting singlewall carbon nanotubes," *Phys. Rev. Lett.*, vol. 87, 2001.
- [11] S. Ikeda, K. Miura, H. Yamamoto, K. Mizunuma, H. Gan, M. Endo, S. Kanai, J. Hayakawa, F. Matsukura, and H. Ohno, "A perpendicularanisotropy cofeb-mgo magnetic tunnel junction," *Nature materials*, vol. 9, no. 9, pp. 721–724, 2010.
- [12] Y.-M. Lin, J. Appenzeller, J. Knoch, and P. Avouris, "High-performance carbon nanotube field-effect transistor with tunable polarities," *Nanotechnology, IEEE Transactions on*, vol. 4, no. 5, pp. 481–489, 2005.
- [13] C. Lin, S. Kang, Y. Wang, K. Lee, X. Zhu, W. Chen, X. Li, W. Hsu, Y. Kao, M. Liu *et al.*, "45nm low power cmos logic compatible embedded stt mram utilizing a reverse-connection 1t/1mtj cell," in *Electron Devices Meeting (IEDM)*, 2009 IEEE International. IEEE, 2009, pp. 1–4.
- [14] J. P. Kim, T. Kim, W. Hao, H. M. Rao, K. Lee, X. Zhu, X. Li,

W. Hsu, S. H. Kang, N. Matt *et al.*, "A 45nm 1mb embedded stt-mram with design techniques to minimize read-disturbance," in *VLSI Circuits (VLSIC), 2011 Symposium on*. IEEE, 2011, pp. 296–297.

- [15] X. Fong, S. H. Choday, and K. Roy, "Design and optimization of spintransfer torque mrams," in *More than Moore Technologies for Next Generation Computer Design*. Springer, 2015, pp. 49–72.
- [16] D. Lee, S. K. Gupta, and K. Roy, "High-performance low-energy stt mram based on balanced write scheme," in *Proceedings of the 2012* ACM/IEEE international symposium on Low power electronics and design. ACM, 2012, pp. 9–14.
- [17] X. Fong, S. K. Gupta, N. N. Mojumder, S. H. Choday, C. Augustine, and K. Roy, "Knack: A hybrid spin-charge mixed-mode simulator for evaluating different genres of spin-transfer torque mram bit-cells," in *Simulation of Semiconductor Processes and Devices (SISPAD), 2011 International Conference on.* IEEE, 2011, pp. 51–54.
- [18] G. Panagopoulos, C. Augustine, and K. Roy, "A framework for simulating hybrid mtj/cmos circuits: Atoms to system approach," in *Proceedings* of the Conference on Design, Automation and Test in Europe. EDA Consortium, 2012, pp. 1443–1446.
- [19] "http://ptm.asu.edu/."
- [20] Y. Zhang, L. Zhang, W. Wen, G. Sun, and Y. Chen, "Multi-level cell sttram: Is it realistic or just a dream?" in *Computer-Aided Design (ICCAD)*, 2012 IEEE/ACM International Conference on, Nov 2012, pp. 526–532.
- [21] X. Dong, C. Xu, Y. Xie, and N. P. Jouppi, "Nvsim: A circuit-level performance, energy, and area model for emerging nonvolatile memory," *Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on*, vol. 31, no. 7, pp. 994–1007, 2012.
- [22] J. Zhang, M. De Marchi, P.-E. Gaillardon, and G. De Micheli, "A schottky-barrier silicon finfet with 6.0 mv/dec subthreshold slope over 5 decades of current," in *Proceedings of the International Electron Devices Meeting (IEDM'14)*, no. EPFL-CONF-201905, 2014.