#### SPEED-INDEPENDENT FUSED MULTIPLY ADD AND SUBTRACT UNIT

Yuri Stepchenkov, Victor Zakharov, Yuri Rogdestvenski, Yuri Diachenko, Nickolaj Morozov and Dmitri Stepchenkov



Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, IPI RAS, Moscow, Russian Federation

#### OUTLINE

- What circuits do we design?
- Block diagram of Fused Multiply-Add & Subtract (SI-FMAS) Unit
- Simplified indication
- SI-FMAS implementation
- Testing SI-FMAS
- Conclusions

**EWDTS-2016** 



**IPI FRC CSC RAS** 

**EWDTS-2016** 

# **ADVANTAGES OF SI CIRCUITS**

- Their workability does not depend on delay of their cells
- They are free of hazards
- They have extremely wide workability range on supply voltage and ambient temperature,

They detect constant failures and stop working

IPI FRC CSC RASEWDTS-2016

#### **SPEED-INDEPENDENT PRINCIPLES**

- Two phase operation: work and spacer (pause)
- Each circuit cell can switch only once during transient of the circuit from spacer to next work state
- Full indication of all cells in the circuit in each phase of work

**IPI FRC CSC RAS** 

**EWDTS-2016** 

#### **BLOCK DIAGRAM OF SI-FMAS**



#### **BLOCK DIAGRAM OF SI-FMAS CORE**



#### **SIMPLIFIED INDICATION (1)**

#### Why is it possible?

- First work state just appeared at circuit's outputs during transient from spacer to work phase is a stationary state
- CMOS cell stops its switching into opposite state if input combination that caused this transient has disappeared

**IPI FRC CSC RAS** 

**EWDTS-2016** 

#### **SIMPLIFIED INDICATION (2)**

How can we optimize indication?

- Full indication in spacer phase and simplified one in work phase
- Bitwise indication in combinational circuits

Taking into account bitwise indicators in the input register of the following pipeline stage

**IPI FRC CSC RAS** 

**EWDTS-2016** 

#### **SIMPLIFIED INDICATION (3)**



#### **SIMPLIFIED INDICATION (4)**



#### **OPTIMIZED PIPELINE INDICATION**



**IPI FRC CSC RAS** 

**EWDTS-2016** 

#### **IMPLEMENTATION BASIS**

Traditional CMOS circuitry with dualrail signals everywhere except multiplier utilized ternary coding, Goldstein 65-nm CMOS process with 6 metals, Standard cell library (Dolphin) Self-timed cell library (IPI65D, 108) cells) designed in IPI FRC CSC RAS

**IPI FRC CSC RAS** 

**EWDTS-2016** 

#### FEATURES OF SI-FMAS

| Parameter                                    | Synchronous<br>analog | SI-FMAS                         |
|----------------------------------------------|-----------------------|---------------------------------|
| Die size, mm <sup>2</sup>                    | 0.312                 | 1.12                            |
| Performance, Gflops                          | 2.06                  | 3.15                            |
| Latency, ns                                  | 10.8                  | 1.84                            |
| Die size efficiency, mm <sup>2</sup> /Gflops | 0.151                 | 0.321                           |
| Workability range on V <sub>DD</sub>         | $V_{DD} \pm$ 10%      | V <sub>th</sub> V <sub>BD</sub> |

**IPI FRC CSC RAS** 

**EWDTS-2016** 

#### LAYOUT OF SI-FMAS

**Input FIFO** 

#### Fraction Multiplier

Exponent calculator

**Adder-Subtractor** 

Normalizer

Round & Postnormalization Output FIFO Multiplexer



**IPI FRC CSC RAS** 

**EWDTS-2016** 

#### **GOALS OF TESTING: SYNCHRONOUS**

- Logical Level:
  - Functional verification
- Electrical Level:
  - Eliminating hazards and signal competitions in a full range of supply voltage and temperature

**IPI FRC CSC RAS** 

**EWDTS-2016** 

#### **GOALS OF TESTING: SPEED-INDEPENDENT**

#### Logical Level:

- Functional verification
- Self-timed analysis (ASPECT, FAZAN, FIESTA)
- Electrical Level:
  Nothing

**IPI FRC CSC RAS** 

**EWDTS-2016** 

### HARDWARE TEST ENVIRONMENT



**IPI FRC CSC RAS** 

**EWDTS-2016** 

## TEST ORDER

- Supply nominal voltage
- Set fixed operands at SI-FMAS inputs and Mode=0
- Run clock generator and set Start =1
- Change input Mode to Mode=1
- Observe periodic pulses at output OK
- Change supply voltage and/or temperature until OK disappears
- Repeat experiment for other operands

**IPI FRC CSC RAS** 

EWDTS-2016

#### WORKABILITY RANGE



#### SUMMARY

- Designed speed independent (SI) pipelined 64-bit FMAS unit conforming to IEEE 754 demonstrates high average performance (up to 3.15 Gigaflops), low latency (less than 2 ns), and wide workability range being implemented in 65 nm standard CMOS process
- Developed test environment proves that suggested unit is true SI unit whose functionality does not depend on real parameters of its components
- Next researches will be devoted to decomposition of the multiplier in order to obtain the same performance of the SI-FMAS unit while using one computing channel instead of two identical channels

**IPI FRC CSC RAS** 

**EWDTS-2016** 

# **Thank You!**

**IPI FRC CSC RAS** 

**EWDTS-2016** 

## CONTACTS

- Director: academician Sokolov I.A.
- Address: Institute of Informatics Problems of the Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences (IPI RAS), Moscow, Russian Federation, 119333, Vavilova str., 44, b.2
- Tel: +7 (495) 137 34 94
- Fax: +7 (495) 930 45 05
- E-mail: <u>ISokolov@ipiran.ru</u>
- Stepchenkov Y.A., tel. +7 (495) 671 15 20, Ystepchenkov@ipiran.ru

**IPI FRC CSC RAS** 

EWDTS-2016