Energy Efficient Computing Power Management System on the Nehalem Family of Microprocessors

Intel'southward® 4th generation Core™ microprocessors are powered past Fully Integrated Voltage Regulators (FIVR). These 140 MHz multi-stage buck regulators are integrated into the 22nm processor dice, and characteristic up to eighty MHz unity gain bandwidth, non-magnetic package trace inductors and on-die MIM capacitors. FIVRs are highly configurable, allowing them to ability a wide range of products from 3W fanless tablets to 300W servers. FIVR helps enable 50% or more battery life improvements for mobile products and more than than doubles the meridian power available for burst workloads.

Figures - uploaded by Fabrice Paillet

Content may be subject to copyright.

Bring together for free

FIVR – Fully Integrated Voltage Regulators on fourth

Generation Intel® Core™ SoCs

Edward A. Burton, Gerhard Schrom, Fabrice Paillet,

Jonathan Douglas

CCDO

Intel Corporation

Hillsboro, OR, United states of america

William J. Lambert, Kaladhar Radhakrishnan,

Michael J. Hill

ATTD

Intel Corporation

Chandler, AZ, USA

Abstract—Intel's® quaternary generation Core™ thouicroprocessors are

140 MHz multi-stage cadet regulators are integrated into the

22nm processor die, and feature up to 80 MHz unity gain

bandwidth, non-magnetic package trace inductors and on-die

MIM capacitors. FIVRs are highly configurable, allowing them

to power a wide range of products from 3W fanless tablets to

300W servers. FIVR helps enable fifty% or more than battery life

improvements for mobile products and more than doubles the

height power available for burst workloads.

I. INTRODUCTION

Intel's® 4th generation Core™ microprocessors (code

name Haswell) are powered by Fully Integrated Voltage

Regulators (FIVR), the industry'southward first large scale deployment

of loftier current switching regulators integrated into a VLSI dice

and parcel. An overview of the schemeast is given in Fig. 1(a).

A first phase VR, which is on the motherboard, converts from

the PSU or bombardment voltage (12-20V) to approximately i.8V,

which is distributed across the microprocessor die. The second

conversion stage is containd of between 8 and 31 (depending

on the product) FIVRs, which are one40MHz synchronous

multiphase buck converters with upward to xvi phases. A simplified

schematic for a two phase FIVR domain is shown in Fig. 1(b).

The power FETs, control circuitry and high frequency

decoupling are on the die, while the inductors and mid-

frequency input decoupling capacitors are placed on the

parcel. Each FIVR is independently programmable to achieve

optimal operation given the requirements of thursdaye domain it is

powering. The settings are optimized past the Power Control Unit of measurement

(PCU), which specifies the input voltage, output voltage,

number of operating phases, and a variety of other set uptings to

minimize the total prisoner of warer consumption of the dice.

FIVR is the enabling technology behind fundamental improvements

for Intel'due south® 4thursday generation Core™ microprocessors including a

l% or more increase in battery life for mobile products, and a

2-3x increase in peak available power (westhich converts into

outburst performance). The motherboard voltage regulators

eliminated by FIVR free up space that tin can be used to add

platform features or reduce platform dimensions. Details are

discussed in Section V.

A. Background

Intel's® 2008 microprocessor microarchitecture introduced

the Power Control Unit (PCU) [1], a microcontroller that

monitored conditions beyond the die in real time, and

dynamically suited a multifariousness of settings to optimally manage

ability consumption and performance. One of the almost

important features controlled past the PCU were newly added

loftier electric current power gates, which provided a meaning

improvement in CPU energy efficiency past eliminating large

leakage losses on idle compute domains. The power gates on

loftier electric current domains were introduced by adding gate

transistors into thin 'cracks' betwixt major functional blocks

and represented a very small-scale pct of the total dice surface area.

The die bumps required to support the loftier electric current domains

required a much larger expanse than the gate transistors themselves.

The large "bump area" posed a hard barrier to productization

until a scheme was devised to "borrow" bumps from

surrounding circuitry using a thick, low loss routing layer. An

extension of this scheme makes FIVR affordable.

A limitation of power gate is that all agile domains still

operate at the highest voltage required by any private

practicemain. To create separate voltage domains an entirely new

regulator must be added to the motherboard, which adds cost,

increases surface area, and requires actress parcel pins. An

improvement suggested past recent research is the integration of

high frequency buck regulators straight on the microprocessor

bundle, or in the die itself [2] [3] [4] [5]. This allows a much

larger number of independent power domains, each managed

dynamically to match the local computational demand. For

example, this would let ane CPU cadre to run at an dragd

voltage and frequency to satisfy a heavy computational load,

while other cores execute lower priority code at a much lower

voltage and frequency to save power.

Each of the previous works cited had at least one consequence

making it poorly suited for broad, high volume

deployment. The multi-scrap arroyo taken by [2] resulted in

an effective current density of 1.3A/mmtwo that would make it

expensive to implement because of the silicon area required to

back up a full power microprocessor product. In [3] another

instance of the multi-chip approach is demonstrated. This work,

which used a 90nm procedure and inductors integrated onto the

die, achieved a college current density (8A/mm2) but reported a

relatively depression efficiency of 76% (compared to 85% for 3.3V to

ane.0V conversion in [two]). Instead of the multi-chip arroyo, the

authors of [4] integrated the regulator directly into the dice on a

45nm process, but nevertheless suffered from relatively low current

density (1.7A/mm2). The authors likewise report an efficiency of

83% for 1.5V to 1.0V conversion due in role to the quality of

the discrete inductors that were used.

FIVR builds on the VR designs presented in [2], while the

implementation strategy that makes FIVR affordable, is an

extension of the bump "borrowing" scheme developed for the

loftier electric current power gates [i].

B. Motivation

This paper will prove that FIVR addresses the issues in prior

piece of work that prevented broad deployment of integrated switching

regulators in high volume products. Extending the earlier bump

borrowing scheme yields the same current density inpucker, and

corresponding cost decrease, that first made ability gates

affordable. Improvements in the inductors and transistors yield

efficiency in the 90% range for typical high power workloads.

The high unity gain frequency (up to 80MHz) allows FIVR to

work with just on die ThousandIM for output capacitance.

While these advancements are necessary, they're

bereft to brand a reasonable business concern case for FIVR. To

pay the costs of developing and fabricating FIVR, it's important

to quantify the bodily customer-visible benefits provided in a

real implementation. At the offset of the design FIVR'southward expected

benefits roughshod into one-half a dozen categories. FIVR delivered

material benefits in every category, and some benefits were far

larger than expected. The benefit categories were: battery life

increment, increased available power (for increased burst

functioning), decreased power required for a given level of

performance (or almost equivalently, increased performance

for a given power consumed), decreased platform cost and size,

improved product flexibility and scalability. Run across section Five for

the detailed FIVR impact.

Two. IMPLEMENTATION , DESIGN , AND SIMULATION

A. Circuitry

A block diagram representing the circuitry for a single

FIVR domain is shown in Fig. ii. The buck regulator bridges ardue east

formed by replacing the ability gates in previous products with

NMOS and PMOS cascode ability switches. The cascode

configuration allows the ability switches to exist implemented

with standard 22nm logic devices while nevertheless handling an input

voltage of up to i.8VDC [2]. This avoids the price of extra

processing steps for high voltage devices, while achieving

excellent switching characteristics. The span drivers are

controlled thru high-voltage level-shifters and support ZVS

(zip-voltage switching) and ZCS (zero-current-switching)

soft-switching performance. The gates of the cascode devices are

continued to the "one-half-rail", Vccdrvn , regulated to Fivein /2. This is

also the negative supply of the PMOS span driver as well as

the positive supply of the NMOS bridge commuter.

The area occupied by the power switches and drivers is

pocket-size, then they are distributed across the dice, immediately above

the connexion to their associated package inductor which

minimizes routing losses. This is illustrated in Fig. 3(a), which

shows the location of the package inductors under the die for a

four cadre LGA function. The driver circuitry is interleaved with the

power switches in an array which minimizes parasitics to allow

for very high switching frequencies. This also allows the size of

the bridge to be easily scaled based on the current requirement

and optimization points for each power domain.

Each FIVR domain is controlled by a FIVR Control Module

(FCM). The FCM contains the circuitry for generating the

PWM signals using double-edge modulation, as indicated in

Effigy 1. (a) Representative partitioning of the separate high current power domains on a 4thursday generation Core™ Microprocessor. (b) Simplified schematic of

a single FIVR domain, showing the partitioning of the components between the die and the packet.

Fig. two by the dashed box. Separate circuitry not shown in Fig. 2

manages phase current balancing, and the resulting digital

PWM signals are distributed from the FCM to individual

bridges. The PWM frequency, PWM gain, phase activation,

and the bending of each phasdue east are all programmable in fine

increments to enable optimal efficiency and minimum voltage

ripple across a span of different operating points. Spread-

spectrum is used for EMI and RFI (Radio Frequency

Interference) control.

The FCM module also contains the feedback control

circuitry (compensator). A high-precision ix-fleck DAC generates

a reference voltage for a programmable, high bandwidth analog

fully differential blazon-3 compensator. Sense lines feed the

output voltage back to the compensator. The endpoint of these

sense lines is strategically placed to achieve minimum DC error

and optimal transient responseastward at an important circuit location

in the domain. The compensator is programmed individually

for each voltage domain based on its output filter, and can be

reprogrammed while the domain is agile to maintain optimal

transient response as phase shedding occurs.

The 1000ey to makink FIVR affordable was integrating thursdaye

power devices directly into the microprocessor die. As due westith the

ability gate circuitry discussed in the introduction, the power

switching circuitry for FIVR can be placed in small areas

between major circuit blocks. The lower current handling of the

die bumps makes the die bump expanse requirements for FIVR

much larger than the actual dice area required. Since FIVR is

integrated into the microprocessor dice, routing on the thick

metal die layer permitsouth extra bumps to exist 'borrowed' from areevery bit

over other circuits, which avoids wasting whatsoever excess die surface area

due to bump current limits. This makes the effective current

density of FIVR 31A/mm2, a 24x increase over the bump-

express 1.3A/mm2 reported in [ii].

B. Passives

In society to keep the buck output filter minor enough to fit on

the die and packageastward it is necessary for FIVR to switch at a high

frequency – 140 MHz in most cases. This allows the cadet

output filter inductors to be implemented using simply the bottom

metal layers of a standard flip-fleck package. Ability routing is

constrained to the superlative layers of the parcel as a effect, simply the

proximity of the inductors to the load ensures that minimal

power is dissipated on these layers. Theast inductors are not-

magnetic, i.east. Air Core Inductors (ACI). A representative ACI

from an 8-phase domain of a product with an LGA packet is

shown in Fig. 3(b), including the connectedness points to the ability

switches, the DC current path through the inductor, and the

connection of the inductor to the output plane. Packet blueprint

rules permit the ACIs to be placed in to close proximity with one

some other. On a representative LGA packagdue east with four CPU

cores, this immune 59 inductors on 10 different voltage rails to

be implemented in a 20mm x 8mm area. The package

implementation also allows inductor designs to be customized

on a per rails basis to grandeet efficiency, ripple, and transient

response requirements.

Decoupling for the input runway is provided by a combination

of ceramic packet capacitors and on-dice MIM capacitors [vi].

The on package ceramic capacitors go on the output impedance

of the input rail low from approximately one MHz to their self-

resonance effectually twenty MHz. The MIM capacitors are on the dice

along with the power circuitry and provide high frequency

decoupling, including at the switching frequency and its

harmonics. Decoupling for the output rails is provided

primarily by the MIM capacitors, which are sufficient to

provide practiced transient response if wide bandwidth feedback

control is used (come across the results department). In some cases the MIM

capacitors are supplemented with extra package ceramic

capacitors. The comparatively low self-resonant frequency of

the ceramic capacitors complicates theastward control loop blueprint and

Effigy 2. Simplified block diagram of the circuitry for a single representative FIVR domain

does little to attenuate voltage ripple, but the ceramic capacitors

can provide a net transient response do good if they are robustly

connected to the output power plane.

C. Arrangement Control

In lodge to minimize losses from FIVR, a modified version

of the PCU [one] dynamically configures each FCM based on the

current activity level of the domain. The PCU turns each rail on

or off based on action, and specifies an output voltage target

to support the desired frequency. Information technology too optimizes the settings

discussed in section Ii.A for the anticipated operating

atmospheric condition. These settings include the number of active phases

(i.due east. stage shedding to improve lite load performance), the

compensator settings (to maintain optimal transient response as

the number of phases changes), and the timing of switch drivers

(to ensure zero voltage switching at light loads). This allows

each FIVR domain to operate at near peak efficiency beyond a

wide range of load conditions from retention to Turbo. An

example of the do good this provides is evidencen in Section 4.A.

Iii. CHARACTERIZATION AND PERFORMANCE T ESTING

Validating and optimizing a voltage regulator requires the

measurement of key parameters such as voltage ripple,

efficiency, power supply rejection ratio, transient response, and

control loop stability margins. Dueast to FIVR's high level of

integration and fast switching frequency many of theseast standard

measurements are difficult or impossible to perform using off

the shelf exam equipment. For case, an IA Cadre voltage

domain powered past a FIVR capable of supplying over 30A

occupies less than 15mm 2 on the packageast, which is completely

covered by the microprocessor die. This renders the zipper

of an external load for a full current efficiency measurement

impossible. This section describes some fundamental Design For Test

(DFT) features that are included in FIVR to allow authentic

label.

A. Control Loop Transfer Office

To enable characterization of the control, a loftier frequency

programmable signal generator is placed in the feedback

network on every FIVR domain. The bespeak generator is

activated in a examination style to inject a known, synchronized point

into the on-dice feedback loop. Past controlling the test feature

and monitoring the output voltage on the package westwardith an

oscilloscope, the response of the control loop is directly

measured. Repeated measurements are used to melody the

compensator to achieve fast response and adept stability

margins.

B. Load Transient Response and Rejection Ratio

Microprocessors require a nearly abiding DC voltage in thursdaydue east

presence of big load transients. For characterization purposes,

however, it is difficult to create a well behaved step load using

only the execution of lawmaking in normal functioning. The authors in

[7] instead use a scheme called Integrated Frequency Domain

Impedance Meter (IFDIM), which gates the clock network on

and off at a fixed frequency creating a large load transient. The

frequency is programmable, the magnitude of the transient can

be precisely calibrated using DC measurements, and the load

step is known to occur within i clock cycle, so the

microprocessor itself is consequenceively turned into an alternating

current load. This characteristic is included on every FIVR domain,

allowing the transient responseastward to a known load to exist measured.

FIVR domains are characterized across a wide frequency range

at multiple operatinm points for both output impedance and

output load coupling across domains.

C. Efficiency

As was previously mentioned, the level of integration

makes it impossible to connect a high electric current load direct to

the output of a FIVR rail. An additional complication is thursdayat the

circuitry on the dice cannot be disconnected from the FIVR

output, so whenever FIVR is powered some extra electric current draw

due to leakage results. This required the development of a new

technique for accurately measuring efficiency. A brief

summary of the method is given here. Kickoff, a procedure using

an external low current load and FIVR operating in a test mode

is used to calibrateast the leakage. The completely ungated clock

tree is then operated at varying frequencies to create a large

constructive adjustable DC output current. An iterative series of

measurements is and so used to precisely calibrate the total

current drawn by the clocks and the leakage, which allows the

efficiency to be measured, when combined with conventional

(a) (b)

Figure 3. (a) The bottom of an Intel® 4th generation Core™ microprocessor LGA package is shown along with along with a picture of the corresponding

die. A group of 8 FIVR inductors is pulled off to the side. (b) An enlarged 3D view of two FIVR inductors is shown with electric current flow arrows.

measurements of the voltage and the current at the output of the

first stage regulator.

IV. RESULTS

Due to the high switching frequency used, the performance

of FIVR is sensitive to the layout of the die and the package

which includes the inductors. Each combination of die and

package is individually optimized and validated. The following

section contains some key validation results from an Intel® 4th

generation Core™ microprocessor with four microprocessor

cores on an LGA package.

A. Measurements

Fig. 4(a) shows the voltage ripple for a low racket domain

measured under the die near the connexion of the ACI to the

power airplane. The measurement was boilerplated 128 times against

the PWM clock (with spread spectrum clocking turned off). To

achieve an accurate measurement, a controlled impedance

differential sense line wasouthward routed on the package from the

measurement location to a probe connection point with a

matched termination. An active differential oscilloscope probe

was thursdayen connected to the probe signal. This ensures an accurate,

wide bandwidth measurement is achieved, as opposed to

probing the packet power planes straight or using a unmarried

ended sense line, which tin can essentially attenuate the

measurement over a altitude as short equally a few millimeters. In

two stage operation less than 4mV (less than 1% of the voltage

ready bespeak) of ripple is achieved with a runway driven past air core

inductors well under 2mmii in area.

Fig. iv(b) shows the efficiency every bit measured using the

process in section III.C for 1.70V to one.05V conversion with

the bridges configured for difficult switching. The efficiency

measurement has been repeated for varying numbers of phases,

in each case showing a peak efficiency of approximately ninety%

at 0.75A/phase. By employing a phase shedding scheme it is

possible to proceed the efficiency of the domain within a few

per centum of the acme efficiency of the domain from 1A to 15A.

This is managed by the PCU which can phase shed when the

efficiency tin be improved, just also has the intelligence to

avoid stage shedding when it could exist problematic, for

example when a big load transient is possible.

The measured output voltage (averaged 128 times) during

an 8.5A load step kenerated past the IFDIM feature on the

graphics voltage rail is shown in Fig 5(a). The measurement

Figure iv. (a) Measured voltage ripple for a depression noise domain for unmarried

phase and 2 phase operation (128 averages) (b) Measured efficiency for a

voltage domain as a office of the number of active phases

Figure 5. (a) Measured voltage droop on a graphics domain in response to

an 8.5A footstep load (b) Comparison of the effective impedance profile for the

graphics voltage domain on a 3rd generation Core™ microprocessor versus a

4th generation Cadre™ microprocessor

was performed with a similar probing configuration to that used

for the voltage ripple measurement. The combination of a high

bandwidth feedback loop and on die decoupling go along the

voltage droop under 50mV despite a rise fourth dimension for current pace

of under 1ns (orders of magnitude faster than normal graphics

circuit behavior). The main droop upshot lasts under 30ns, and

the DC voltage is restored inside 100ns. A nonlinear control

feature saturates the duty bicycle when a large transient is

detected (not active in Fig. five(a)). The feature was found to

provide upward to a 25% reduction in voltage droop for step loads,

but the do good is significantly reduced for certain aperiodic load

patterns. The effective output impedance profile for the same

rail is shown in Fig. 5(b). The top impedance demonstrates the

fast bandwidth of the compensator. Because the inductors are

located immediately beneath the bodily surface area of the die that

consumes current, the DC and depression frequency load line is

well-nigh zero. The figure also shows the impedance profildue east for

an Intel® 3rd generation Cadre™ microprocessor graphics rail,

which is powered past a platform VR. For this runway, a DC load

line is required due to the distance betwixt thursdaye VR and the die.

Several resonant peaks occur from the various stages of

decoupling capacitors on the motherboard and parcel, and the

parasitic inductance between them and the actual point of

current consumption on the die.

Fig. half-dozen shows the open loop transfer function for a FIVR

domain, measured using the signal generator DFT. This rail

demonstrates a unity proceeds bandwidth of vii8 MHz while still

maintaining twoscore° phase margin. Robust compensator excursion

design and very small propagation delays were necessary to

attain this bandwidth, which, in plough, was required to maintain

good transient response on rail with limited output

capacitance. The high bandwidth also enables fast voltage

transitions. A FIVR rail turning on and turning off are shown in

Fig. vii. Both transitions are programmed to nigh one-half a

microsecond for a full range transition – two orders of

magnitude faster than a typical platform-based solution. The

fast ramp rate translates into ability savings for the organisation, every bit

the voltage track tin can be turned on, used, and turned off over again

about instantly.

A big number of add-onal measurements are taken for

validation purposes that are not shown here due to space

constraints. These include the output impedance of the 5in rail,

audio susceptibility measurements, and the coupling noise due

to load transients from one rail to some other (particularly from

very loftier current domains such as core and graphics to low

current systalk agent domains). EMI/RFI characterization is

besides performed.

B. Comparison to Previous Piece of work

Tabular array I contains a comparing to previous works discussed

in the introduction. FIVR operates at a higher switching

frequency than previous works, westhich is enabled in part by very

expert gate charge characteristics for the MOSFETs. This allows

up to 90% efficiency at a common conversion ratio.

Five. FIVR I MPACT TO P RODUCTS

Bombardment life improvement: Sufficient bombardment life for a

complete due westork 24-hour interval has long been desired from mobile products.

FIVR, combined with power management architecture

improvements, has enabled this for Intel® ivth generation

Cadre™ products. Increases of well over fifty% take existen widely

reported (for example, [8] and [9]). FIVR's battery life do good

comes by several means:

Figure half dozen. The measured open up loop proceeds and stage of a FIVR bear witnessing

78MHz bandwidth with xl° phase magrin

Figure vii. A FIVR rail ramping to its voltage set point from fully off, and

then turning off once more. Voltage transition times are programmable, simply

typically prepare for one-half a microsecond for a 1V transition.

 Standby current historically consumes a large fraction of

the battery's stored energy. FIVR'southward fast bandwidth allows

low frequency supply noise to exist rejected, resulting in up

to a ninety% reduction in decoupling requirements. This

allows both the first and second stages of regulation to be

power cycled much faster than on previous products,

enabling new deep sleep states with up to 20x lower

standby power. With the lowered capacitance, power

expended, and time wasted entering and exiting the states

is similarly reduced. Reduced slumber-state entry/exit time

also saves power by increasing sleep-state usage.

 FIVR'southward fast control loop and integration into the package

result in i tenth the top impedance of prior solutions

(see Fig. 5(b)) in the sub-MHz stimulus range nigh

relevant to the graphics architecture. The resulting depression

frequency supply due northoise reduction improves power at a

given functioning by upwards to xxx%.

 FIVR increases the number of voltage rails, allowing each

domain to exist set up at the minimum possible voltage thursdayat

supports error-complimentary operation, reducing both leakage and

dynamic power.

 Replacing multiple loftier current voltage regulators on the

motherboard with a unmarried kickoff stage regulator reduces the

PCB footprint of the ability delivery solution. This extra

space can be used for a larger battery, with somdue east examples

demonstrating upwards to ten% growth.

 Trimming FIVR together with the microprocessor removes

manufacturing baby-sit-bands normmarry required to ensure

that every VR will work with every CPU.

Increased available peak power: Anorth illustrative instance

shows how FIVR tin increment the meridian power available to the

microprocessor. A typical mobile processor platform using the

prior generation ability delivery scheme had two 30A, 1.1V

VRs providing 33W for cores and 33W for graphics. Using the

aforementioned power FETs and inductors for the FIVR's 1.8V input FiveR,

the role has 108W power rail (30A/stage * 2 phases * 1.8V),

which tin can be dynamically allocated to a combination of FIVRs

by the PCU. For core-only workloads, nearly the unabridged 108W

can be allocated for the cores, increasing the available power

ceiling by 3x. For graphics workloads, 36W tin be partitioned

to the cores with the remaining 72W going to graphics – more than

than double the power available from the 33W platform VR.

Because power consumption scales as CV 2F and frequency

scales with voltage, the increment in available power could be

used to operate the graphics at up to 26% higher frequency than

possible with the platform VR. A similar calculation yields a

44% college cadre frequency in the core-just scenario. The

duration of these scenarios is limited by the thermal capabilities

of the platform, only translates into improved speed in many real

scenarios.

Decreased power at a given performance: Intel'due south ® Iris™

Pro graphics uses FIVR's higher available power to deliver

loftier end graphics. FIVR's high unity gain bandwidth presents

less than a tenth the peak output impedance provided by the

prior generation's platform VR in the sub-MHz range importemmet

to the graphics load (run into fig. 5b). Considering FIVR has doubled

the graphics power ceiling, few (if any) of our shipping parts

would fit inside the premises of the older generation'southward platform.

The higher currents typically imply hundreds of millivolt

droops on the older platforms. The combination of high currents

with high impedance peaks yields a hypothetical power revenue enhancement in

the 20-30% range (bold one could, and actually would,

ship these high current levels into thursdaye old platforms). FIVR

avoids that tax.

Improved production flexibility and scalability: FIVR'south ability

to add voltage rails onto a mutual shared input rail without

package growth or even platform changes brings meaning

flexibility and modularity into the pattern space that was non

available before. New voltage rails can exist added as needed,

without any platform modify. This power allowed usa to

introduce the Iris™ Pro graphics into standard platforms even

though new rails were needed for the EDRAM and its loftier

speed OPIO link.

TABLE I. C OMPARISON OF FIVR TO P REVIOUSLY REPORTED I NTEGRATED V OLTAGE R EGULATORS

One thousand. Schrom et al., 2010 [ii]

T. DiBene et al., 2010 [iii]

N. Sturcken et al., 2012 [4]

Total Output Imax

capability

Limited by outset stage and

thermals (Upwardly to 400 A)

Express by commencement stage and

thermals (Upward to 700 A)

Integrated into network die

Package trace, & magnetic

discrete

Magnetic sparse-film on VR

die

Discrete wire-wound air cadre

second array of parcel trace

a MCM – Multi Chip Module – the active circuitry is on a split die assembled on the aforementioned package

Platform size and cost reduction: Since four platform VR

controllers are eliminated, along with the associated decoupling

caps, power FETs and inductors, there's a clear platform size

and cost advantage . FIVR'southward total platform BOM price reduction

is expected to be several billion dollars over the product

lifetime. The power inductors feeding ability to the CPU often

show up in the critical thickness cross-section of small-scale grade-

factor laptops and tablets, and trading FIVR's total component

count reduction for a thickness reduction is straightforward. A

platform phase count increase results in lower electric current per

stage, and the lower stage current tin exist satisfied with a lower

contour set of inductors.

In the prior generation platforms, some dual-sided PCBs

have tall components like ICs and inductors on the primary side

and lower profile detached components on the secondary side.

Often the secondary-side components are more often than not high

frequency decoupling, located immediately underneath front-

side ICs, with sparsely populated areas between. In such cases,

FIVR eliminates most of the secondary-side components, and

frees up infinite on the primary side to accommodate the residual. The

resulting populated PCB thickness is reduced by the height of

the tallest removed dorsumside components.

In small systems, the platform size tends to limit the featureast

fix, leading to fewer connectivity options, smaller storage

space, etc. FIVR's platform size reductions can provide more

space to implement these features.

Half-dozen. CONCLUSION

Consumers expect every generation of mobile computer

products to have yardore compute power, thinner and lighter form

factors, and longer battery life than the last. The 4th generation

Intel® Core™ ability compages using FIVR provides

improvements in all three of these areas. To the author'southward

knowledge, this is the first consumer product to make use of

integrated switching regulators on this calibration. Furthermore,

FIVR'south functioning is improved versus previously reported

prototypes. The authors therefore feel that FIVR is an important

advancement in the field of power electronics.

ACKNOWLEDGMENT

Nosotros would like to acknowledge the followinyard FIVR team

members who were non already listed every bit authors: the FIVR

silicon design team including George Geannopoulos, Keith

Hodgson, Narayanan Raghuraman, Alex Lyakhov, Michael W

Rogers, Ravi Vunnam, Lan D Vu, Mark Southward Milshtein, Chiu

Keung Tang, Hong Yun Tan, Seh Leon Goh, Samie Samaan,

Narayanan Natarajan, Rajan Vijayaraghavan, Ashish Khanna

and Munish Chauhan; top-level integration past Pankaj Aswal;

modeling and implementation development by Doug Huard and

Alex Waizman; layout studies, inductor development and

package designs past John Smith, Brad Larson, and Huong Exercise;

mask design past Neafifty Tanksley and Galyna Burenkova; test and

manufacturing support from RJ Hayes and Arun

Krishnamoorthy; modeling support by Alex Levin and Anne

Augustine.

REFERENCES

Due south. Gunther, A. Deval, T. Burton and R. Kumar, "Energy-Efficient

Calculating: Prisoner of warer Management Organisation on the Nehalem Family unit of

Microprocessors," Intel Technology Periodical, vol. 14, no. iii, pp. l-65,

2010.

F. Paillet, K. Schroone thousand and J. Hahn, "A 60MHz 50W Fine-Grain Parcel-

Integrated VR Powering a CPU from 3.3V," in Advanced Power

Electronics Briefing, Palm Springs, CA, 2010.

J. T. Dibene, et al., "A 400 Amp fully integrated silicon voltage

regulator with in-die magnetically coupled embedded inductors," in

Avant-garde Ability Electronics Conference, Palm Springs, CA, 2010.

Due north. Sturcken, et al., "A switched-inductor integrated voltage

regulatorwith nonlinear feedback and network-on-chip load in 45nm

SOI," IEEE Periodical of Solid-Country Circuits, vol. 47, no. 8, August 2012.

G. Schrom, et al., "A 100 MHz Viii-Phase Buck Converter Delivering

12 A in 25 mm^two Using Air-Core Inductors," in Proc. 22nd Annu. IEEE

Applied Power Electronics Conf., 2007.

C. Auth, et al., "A 22nm High Performance and Low-Power CMOS

Technology Featuring Fully-Depleted Tri-Gate Transistors, Cocky-

Aligned Contacts, and High Density ThousandIM Capacitors," in 2012

Symposium on VLSI Technology, Honolulu, HI, 2012.

A. Waizman, 1000. Livshitz and Chiliad. Sotman, "Integrated Ability Supply

Frequency Domain Impedance Meter (IFDIM)," in 13th Conference on

Electrical Performance of Electronic Packaging, Portland, OR, 2004.

A. L. Shimpi, "Isouthward Haswell Ready for Tablet Duty? Battery Life of

Haswell ULT vs Mod ARM Tablets," 22 July 2013. [Online].

Available: http://www.anandtech.com/show/7117/haswell-ult-

investigation. [Accessed 15 November 201iii].

R. Baldwin, "How the Haswell Chip Makes the New MacBook Air 50ast

12 Hours," 10 June 2013. [Online]. Available:

http://world wide web.wired.com/gadgetlab/2013/06/haswell-mba/. [Accessed 15

November 2013].

... Commencement, DarkGates bypasses the ability-gates of F max -constrained processors at the package level by shorting gated and un-gated CPU cadre power-delivery domains. This enables the sharing of 1) the decoupling capacitors of the dice (i.e., Metal Insulator Metal (MIM) [17]) and the bundle (i.e., decaps [eighteen]), and 2) the package routing resources among CPU cores, resulting in lower voltage drops, and improving voltage/frequency (i.due east., V/F) curves. one Second, DarkGates extends the power management rmware (e.one thousand., Pcode [20]) algorithms to operate in ii modes: one) bypass mode, which increases the CPU cores' voltage and frequency, and ii) normal mode, which ane Intel processors are individually calibrated in the factory to operate on a speci c voltage/frequency and operating-condition curve speci ed for the individual processor [19]. Reducing the voltage guardband increases the eastward ective voltage, which allows the processor to operate at higher frequency for the same voltage level [12]. ...

... There are iii usually-used PDNs in contempo high-terminate client processors [60,89]: motherboard voltage regulators (MBVR) [33,37,90,91], integrated voltage regulators (IVR) [77-lxxx, 92, 93], and low dropout voltage regulators (LDO) [17,41,74,94]. We describe aspects of the MBVR PDN here due to its simplicity. ...

... The ability commitment of a mod processor is limited by EDC, besides known every bit the maximum instantaneous current, tiptop current, Icc max , or 4th power limit (i.east., PL4 [39]). EDC is the maximum amount of current at any instantaneous brusque flow of fourth dimension that tin be delivered by a motherboard VR or an integrated VR (eastward.1000., FIVR [17]). EDC limit is typically imposed by the limited maximum electric current that the VRs can supply [17,20,22,98,111,112]. ...

To reduce the leakage ability of inactive (night) silicon components, mod processor systems shut-off these components' power supply using depression-leakage transistors, called power-gates. Unfortunately, power-gates increase the system'due south power-commitment impedance and voltage guardband, limiting the organisation's maximum attainable voltage (i.e., Vmax) and, thus, the CPU core's maximum attainable frequency (i.e., Fmax). As a result, systems that are functioning constrained by the CPU frequency (i.e., Fmax-constrained), such every bit high-end desktops, suffer significant functioning loss due to power-gates. To mitigate this performance loss, we propose DarkGates, a hybrid system architecture that increases the performance of Fmax-constrained systems while fulfilling their power efficiency requirements. DarkGates is based on 3 key techniques: i) bypassing on-chip ability-gates using package-level resources (called bypass mode), ii) extending power management firmware to support operation either in bypass mode or normal mode, and iii) introducing deeper idle power states. We implement DarkGates on an Intel Skylake microprocessor for client devices and evaluate it using a wide multifariousness of workloads. On a real 4-core Skylake organisation with integrated graphics, DarkGates improves the average performance of SPEC CPU2006 workloads beyond all thermal pattern power (TDP) levels (35W-91W) betwixt four.2% and 5.3%. DarkGates maintains the performance of 3DMark workloads for desktop systems with TDP greater than 45W while for a 35W-TDP (the everyman TDP) desktop it experiences only a 2% degradation. In addition, DarkGates fulfills the requirements of the Free energy STAR and the Intel Ready Mode energy efficiency benchmarks of desktop systems.

... The significant functioning indices of SC converters are regulation, efficiency, ripple, power density, and response fourth dimension. In addition, SC-based voltage regulators provide the benefits of tighter noise margins due to absence of complex poles, subtract in voltage stress across semiconductor device and converter power loss scaling with load electric current [10], [xi]. A detailed SC converter operation methodology is explained in [12]. ...

Sunita Saini
Davinder Singh Saini

Fundamental charge vector method assay is a single parameter optimization technique limited to conduction loss assuming all frequency-dependent switching (parasitic) loss negligible. This paper investigates a generalized structure to blueprint DC-DC SC converters based on conduction and switching loss. A new technique is proposed to notice the optimum value of switching frequency and switch size to calculate target load current and output voltage that maximize the efficiency. The analysis is done to place switching frequency and switch size for two-phase 2:1 serial-parallel SC converter for a target load current of 2.67mA implemented on a 22nm technology node. Results testify that a minimum of 250MHz switching frequency is required for target efficiency more than ninety% and the output voltage greater than 0.85V where the switch size of a unit cell corresponds to 10Ω on-resistance. MATLAB and PSpice simulation tools are used for results and validation.

... To minimize ripple, decoupling capacitors are placed on the board, package and die, such that a robust power commitment is available across a large frequency range. Despite these capacitors, there are parasitic inductances across the power commitment network (PDN), which lead to resonances at various frequencies, with the most disruptive ones ordinarily between 0.5 and 100MHz [1]. When in that location is a large current surge on the flake, it may activate 1 of these resonances, resulting in a significant voltage droop, which can be ∼x% of the Vcc level. ...

Amir Mizrahi
Yizhak Shifman
Joseph Shor

The Vcc level and temperature of IC's are important parameters which make up one's mind the ability/performance. Resonances in the package and platform tin cause significant AC voltage droops which tin can dethrone functionality, requiring additional guard-ring. Prior-art droop detectors utilise digital delay circuits, such as tunable replica circuits to measure these droops. Notwithstanding, the filibuster is a strong function of temperature as well as the DC Vcc level, making information technology difficult to differentiate the Air-conditioning droop across different voltage and temperature levels. It is proposed to utilize a current controlled oscillator (CCO) with an analog bias to mitigate the voltage and temperature dependencies, such that merely the Air conditioning droop is measured. The CCO frequency is independent of the DC Vcc level, while the temperature is also characterized forth with the AC droop, such that both temperature and droop levels can be extracted. The sensor tin measure droops and temperature to an accurateness of 10mV and ± 3 °C respectively. The circuit occupies $8800~\mu \text{one thousand}^{2}$ in 65nm with a ability consumption of $297~\mu \text{W}$ . This excursion is very useful to characterize the power grid in design for test (DFT) applications as well as on-the-fly real fourth dimension chip functioning.

A fully integrated switched‐inductor switched‐capacitor (SISC) DC–DC converter is proposed. This converter is designed in such a way that the input voltage can exist twice the process allowable voltage without damaging the on‐flake transistors. To mitigate large series resistance of on‐flake inductors as one of the primary challenges in the switched‐inductor power supply on chip, two solutions are proposed. By using analytical model of an on‐flake inductor, optimal physical dimensions are designed to achieve the desired inductance with minimum series resistance in a minor expanse. Along with the optimization, the dual‐path structure of the proposed converter reduces series resistance losses of the on‐fleck inductor and increases the constructive quality factor upward to 7 times in duty cycle of 0.8. The proposed converter is implemented in 0.xviii μm standard CMOS process. The circuit converts input voltage of iii.6–0.9 V at the load current of 125 mA with the efficiency of 72.8%. Efficiency enhancement factor reaches 72% at 600 mV output voltage. The achieved electric current density of the proposed converter is 333 mA/mm2. By computing small signal model of the proposed converter and designing a suitable feedback loop, an advisable transient behavior was proved In this work, a method is presented to implement a fully integrated dc–dc converter. Input dc voltage higher than the maximum permissible voltage of the used technology is downwards‐converted with high efficiency. This is considering of using an optimized on‐scrap inductor and providing two parallel switched‐capacitor and switched‐inductor paths to supply the output ability. This feature reduces the current flowing through the output series inductor and causes significant improvement of the inverter efficiency.

The exploration of custom deep neural network (DNN) accelerators for highly energy constrained border devices with on-device intelligence is gaining traction in the enquiry community. Despite the superior throughput and performance of custom accelerators as compared to CPUs or GPUs, the energy efficiency and versatility of state-of-the-fine art DNN accelerators is constrained due to a) the storage and movement of a big volume of information and b) the express scope of monolithic architectures, where the entire accelerator executes only a single model at whatever given time. In this paper, a multi-voltage domain heterogeneous DNN accelerator is proposed that executes multiple models simultaneously with different power-functioning operating points. The proposed compages meantime implements near-memory computing and leakage reuse, where the leakage current of idle memory banks within each processing element is utilized to deliver current to the adjacently placed multiply-and-accumulate (MAC) units. The proposed architecture and circuit techniques are evaluated with SPICE simulation in a 65 nm CMOS applied science. The simulation results point that the proposed heterogeneous architecture with leakage reuse results in an energy efficiency of 3.27 tera-operations per 2d per watt (TOPS/W) equally compared to a conventional monolithic and unmarried voltage domain architecture that exhibits an energy efficiency of 0.0458 TOPS/Westward. In addition, the proposed accelerator that implements the leakage reuse technique on only half of the retentivity elements storing the weights reduces the power consumption of the sub-arrays of processing elements past 26% (99.iv mW) as compared to an accelerator that does not utilise leakage reuse.

In this cursory, a novel double-side silicon-embedded coreless inductor is proposed and demonstrated for integrated dc–dc converter applications. The inductor has double-side thick windings embedded into the silicon substrate and connected in parallel. Extremely large effective metal thickness of $300~\mu \text{m}$ can, therefore, be achieved. Consequently, the 0.8 mm <sup xmlns:mml="http://world wide web.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> inductor fabricated shows a low dc resistance of 42 $\text{m}\Omega $ . A large inductance to dc resistance ratio of 0.iv nH/ $\text{m}\Omega $ is then accomplished with an inductance over 16.1 nH. The calculated top effective inductor efficiency is 96.ane% for ane.viii–0.85 V, 100 MHz dc–dc conversion.

We present a 100MHz eight-phase synchronous buck converter using air-core inductors. The voltage regulator (VR) chip was manufactured in a 90nm CMOS process and mounted on a flip-chip test packet together with surface-mountain inductors and decoupling capacitors. The measured superlative efficiency is 84.0% for Vin/Vout= 2.4V/1.5V and 79.3% for 2.4V/i.2V. The VR delivers a load current of 12A in an area of only 25mm2 and ii.5mm peak. This is the first sit-in of a high-frequency VR with air-core inductors, that reaches a tape power density of 3.78kW/in3.

C. Auth
C. Allen
A. Blattner
K. Mistry

A 22nm generation logic technology is described incorporating fully-depleted tri-gate transistors for the first time. These transistors feature a 3rd-generation high-thousand + metal-gate technology and a fifth generation of channel strain techniques resulting in the highest bulldoze currents yet reported for NMOS and PMOS. The use of tri-gate transistors provides steep subthreshold slopes (~70mV/dec) and very low DIBL (~50mV/V). Self-aligned contacts are implemented to eliminate restrictive contact to gate registration requirements. Interconnects feature 9 metallic layers with ultra-low-one thousand dielectrics throughout the interconnect stack. Loftier density MIM capacitors using a hafnium based high-k dielectric are provided. The technology is in high volume manufacturing.

Noah Sturcken
Michele Petracca
Steven B. Warren
Kenneth 50. Shepard

A 4-stage integrated buck converter in 45 nm silicon-on-insulator (SOI) technology is presented. The controller uses unlatched pulse-width modulation (PWM) with nonlinear proceeds to provide both stable small-signal dynamics and fast response (~700 ps) to large input and output transients. This fast control arroyo reduces the required output capacitance by 5× in comparison to a conventional, latched PWM controller at a like operating point. The converter switches bundle-integrated air-core inductors at 80 MHz and delivers i A/mm2 at 83% efficiency and 0.66 conversion ratio. A network-on-chip (NoC) serves as a realistic digital load along with a programmable current source capable of generating load electric current steps with slew charge per unit of ~one A/100 ps for characterization of the command scheme.

A. Waizman
M. Livshitz
Michael Sotman

IFDIM is an integrated and self-checking on-die current throttling method that accurately measures CPU'south power delivery impedance profile from the die upwardly to the voltage regulator. Impedance profile characterization in 100Hz-600MHz frequency ranges is demonstrated.

Is Haswell Ready for Tablet Duty? Battery Life of Haswell ULT vs Modernistic ARM Tablets

A 50 Shimpi

A. Fifty. Shimpi, "Is Haswell Ready for Tablet Duty? Battery Life of Haswell ULT vs Modern ARM Tablets," 22 July 2013. [Online]. Available: http://www.anandtech.com/show/7117/haswell-ultinvestigation. [Accessed 15 Nov 2013].

Free energy-Efficient Computing: Ability Direction System on the Nehalem Family of Microprocessors

S Gunther
A Deval
T Burton
R Kumar

S. Gunther, A. Deval, T. Burton and R. Kumar, "Free energy-Efficient Computing: Power Management System on the Nehalem Family unit of Microprocessors," Intel Engineering Journal, vol. fourteen, no. 3, pp. l-65, 2010.

A 400 Amp fully integrated silicon voltage regulator with in-die magnetically coupled embedded inductors

J T Dibene

J. T. Dibene, et al., "A 400 Amp fully integrated silicon voltage regulator with in-die magnetically coupled embedded inductors," in Avant-garde Power Electronics Conference, Palm Springs, CA, 2010.

A 60MHz 50W Fine-Grain Package-Integrated VR Powering a CPU from three.3V

F Paillet
One thousand Schrom
J Hahn

F. Paillet, G. Schrom and J. Hahn, "A 60MHz 50W Fine-Grain Package-Integrated VR Powering a CPU from 3.3V," in Avant-garde Power Electronics Conference, Palm Springs, CA, 2010.

How the Haswell Chip Makes the New MacBook Air Last 12 Hours

R Baldwin

R. Baldwin, "How the Haswell Chip Makes the New MacBook Air Terminal 12 Hours," 10 June 2013. [Online].

lowryphers1985.blogspot.com

Source: https://www.researchgate.net/publication/271416878_FIVR_-_Fully_integrated_voltage_regulators_on_4th_generation_IntelR_Core_SoCs

Energy Efficient Computing Power Management System on the Nehalem Family of Microprocessors

0 Response to "Energy Efficient Computing Power Management System on the Nehalem Family of Microprocessors"

Publicar un comentario

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel