UPS Problem at Datacentre

Published by Unipower AB, Case Study: UPS Problem at Datacentre, Date: January 29th, 2017. Email: info@unipower.se


Summary

A software development company builds a new datacentre and experience problems with the UPS units and a hard drive failure.

A monitoring system, PQ Secure from Unipower AB, was installed and it could be determined that the UPS units, especially UPS 2 doesn’t behave normally. It cannot be ruled out that the UPS caused the hard drive to fail.

UPS unit is recommended to be sent to manufacturer for warranty service/repair. Electric environment should be discussed with building owner since it fails to meet local regulations.

Background

Software development company builds new data centre and installs new servers with UPS backup. After running some time the UPS-monitoring software starts sending warning mails regarding low voltage. UPS goes to battery power and then immediately goes back to utility supply. System doesn’t seem to be affected so things are left under observation.

After a couple of months there was a suspicious hard drive failure in one server that could be connected to the UPS behaviour. IT manager decides this must be investigated and contracts a consultant.

Data centre contains 4 HP Proliant ML 350 servers, a couple of NUC servers, a new NAS unit, 2 x 1,5 kVA UPS units, some routers and switches.

Monitoring

Consultant recommends installing a permanent Power Quality Management System. A meter, UP-2210, and a system, PQ Secure, from Unipower AB was installed. The PQ meter monitors the incoming AC power (voltage and current) to the UPS units as well as the output voltage from UPS1 and UPS2.

The first data looked strange when UPS was switched in and after some analysis the reason was found and understood. When measuring on the inside (output) of the UPS you cannot use the ground/neutral reference from the primary/input side. When UPS is on battery it is galvanically separated from the utility network. The meter has isolated inputs so after re-installing the meter using neutral connection from each side respectively the measuring problem was solved.

This is the voltage to the data centre some typical days during the summer 2016:

.

On both UPS units, voltage dips below -10%, which is outside most equipment specifications. On UPS 2 this happens many times every day.

The dips were found to be caused by the AC conditioner units together with a weak network (long cables). But dips are only around -5% as seen on incoming voltage. The building owner and electrician said this was within specifications.

UPS 1 seems to handle these dips better but UPS 2 worsens the situation by amplifying the dips.

Details of the dip recordings:

Events type A
.

I1 (red curve, bottom graph) is the current feeding the two UPS units.

UPS 1 (green curve U2) doesn’t react on incoming small voltage dip, it has the same shape as U1 (red curve). This is expected behaviour.

UPS 2 behaves differently (blue curve U3). Incoming dip causes UPS 2 to switch to battery power, thus causing total current I1 to decrease. The initial dip is shorter but deeper and the battery supplied voltage is slightly lower.

After ca 1 second UPS 2 determines utility voltage good and switches back to utility operation.

It doesn’t seem normal that UPS 2 switch to battery on just a -3% incoming voltage dip. There are many events of this type.

Events type B
.

UPS 2 dips below 10% due to incoming small voltage swell. It doesn’t seem to switch to battery since the current doesn’t change much.

Events type C
.

Here UPS 1 unexpectedly switches to battery due to incoming small voltage swell. UPS 2 doesn’t react (which is expected).

Events type D

One event was a severe dip from the utility network that caused both UPS units to go to battery power.

.

UPS 1 goes back to utility voltage after ca 0,5 seconds. UPS 2 after ca 1 second. The dip seen by the equipment (both UPS 1 and 2) is slightly less deep than on the primary side.

Number of events
.

There are 3 actual dips on incoming voltage.

The nervous behaviour of the UPS units caused 21 events on UPS 1 and 565 events (!) on UPS 2. All during the month of August 2016.

Evaluation

Severity

All the events on U2 and U3 feeds the servers and other equipment, is it dangerous? The exact dip immunity for the equipment is not known at present. Generally speaking, the ITIC curve can be used for reference. A dip’s severity determines of its depth versus duration. The longer and/or deeper, the more severe it will be for the attached equipment. Sooner or later enough energy is missing so that the equipment fails.

.

In the ITIC graph above created by PQ Secure, all the events are plotted with the maximum depth (yaxis) against duration (x-axis).

Most events are within the limit curves meaning that most IT equipment should be able to endure them. The 10 events with long duration are below the 90% limit, they are recorded during longer operation on battery and were all from UPS 2 indicating it has generally a bit lower voltage output when on battery operation.

One event is below the limit. It was an actual dip from the utility network (type D above).

Regulations and legal requirements

Inside a building there are no legal requirements that can be referred to. It’s between the building owner and the users.

In the PCC (point of common coupling), the delivery point from the local electric utility, there are legal requirements. In Europe IEC-EN 61000-2-2 and EN 50160 are valid. The latter says the voltage must be 230V +/-10% during 95% of a week and never more than +10%/-15%. These norms also regulate other power quality parameters like harmonics, flicker, unbalance etc.

In Sweden, local regulation EIFS is legally binding. This regulation requires 230V +/-10% at all times in the PCC. Inside a building, it is reasonable to require similar voltage as in the PCC.

The EIFS report (below) can be created in PQ Secure and shows two failing sections. First, unbalance, is expected to fail since this is a three-phase parameter. The measurements were made on single phase only. This can be ignored.

.

The RVC (rapid voltage changes) also fails, which indicates there are too many small voltage dips.

.

RVC requirements are legally binding in the PCC only. Inside a building, it’s not mandatory but can be discussed with building owner.

Below are the allowed sags according to Swedish EIFS regulation. Yellow area is to be discussed with network owner if actions should be taken. Red area is forbidden and must be prevented by network owner.

The longest sags (>10 seconds) are caused by actual battery generated voltage is a bit too low on UPS 2.

.
Frequency of events

Looking at the frequency of the events it was found that it culminates in the period July to September. This was found to be consistent with when the AC does most of its work. The below timeline shows the events as blue vertical lines.

.

Another tool in PQ Secure is the profile view and it confirms the same analysis:

.

You can also see that the distribution over the selected week is relatively even. There is no special day or clock time where the events gather.

Direction of events

All events are judged upstream by the system meaning they originate in the above network that feeds the UPS units. The UPS units don’t cause any sags or swells that affect other equipment outside the UPS system. The only equipment affected by the sags is the equipment connected to the UPS units.

.
Harmonics

UPS systems can create large levels of harmonics if broken or badly designed. Voltage and current THD were studied and compared to measurement data from building connection point to local utility (PCC).

In the below graph voltage THD is compared between UPS (thicker curve) and building PCC. Levels are slightly higher at UPS but not alarming.

.
Results

• Power Quality Management System was acquired and installed. Continuous monitoring of the power to and from the UPS units was done.

• The UPS units, especially UPS 2 have a nervous behaviour switching to battery power when not needed. A small dip or swell on incoming voltage should not lead to UPS switching. On UPS 2 the switching leads to a short, deep sag, amplifying the original sag.

• The dips stressing the UPS units are caused by nearby AC units. Due to long cables the AC units cause the voltage to dip when switching on. This should be discussed with building owner. When doing an EIFS report (local regulation in Sweden) it fails on small RVC (small dips).

• The resulting events (dips) that affect the IT-equipment are not too severe (ITIC evaluation) and should not cause failures in equipment. The failing hard disk could not be clearly tied to one of the dips but it cannot be ruled out that the dips were involved.

• The frequent switching to battery power in UPS 2 may eventually lead to premature wear out. It cannot be normal behaviour with 500-600 events per month. UPS 1 is calmer and is expected to live longer.

• UPS 2 has too low output voltage when on battery. Unit is new, battery should not be bad. This together with nervous switching behaviour is reason to send UPS 2 for warranty service/repair.

• UPS system can create large amounts of harmonics if broken or badly designed. Harmonics were studied but no alarming levels were found.

Monitoring should continue to collect more data and help to find out the data centres immunity levels. UPS 2 should be observed after repair/service. RVC levels should be followed up.


About UnipowerUnipower AB offers a wide range of products for Power Quality measurements and Smart Grid systems.

Originating from a Swedish ABB company in the mid 80’s, Unipower has developed a competitive edge within the field of Power Quality and Smart-Grid solutions. We focus on norm compliance equipment, with a special focus on the requirements for power generation, transmission and distribution.

Our product lines reach from traditional portable PQ analysers to fully integrated and automated Power Quality Management systems for continuous supervision of the energy supply.

Website: unipower.se


Unipower PQ Secure 

The quality of your power is extremely important as disturbances or short outages can cause major problems for your equipment and systems. We have created a Power Quality Management System that will assist you in monitoring your system performance.

The Unipower PQ Secure system is a state of the art solution for Power Quality Management and disturbance evaluation. With a user-friendly interface and intelligible functions, PQ Secure provides you with continuous remote access to all the Power Quality parameters that you need. PQ Secure is a market-leading system specifi cally designed for the power and energy industry, distribution, transmission and troubleshooting companies. The software is comprehensible and easy to use, being built around high-performance and automation as basic principles.

PQ Secure is fully compliant, supporting the following international standards: PQDIF (IEEE 1159.3) COMTRADE (IEC 60255-24) IEC 61000-4-30.

The PQ Secure system is a complete Power Quality Management System. The system is modular and allows adding features both on the server side as well as on the meters. PQ Secure is designed to excel at data compression and storage, making it very fast and scalable. The system revolves around the following components: 

• Powerful evaluation tools
• Accurate statistical components
• Sortable and customisable event lists
• Data-efficient database storage solution
• Control room compatible real-time functions and much more

Learn More PQ Secure

Published by PQBlog

Electrical Engineer

Leave a comment