The transmission grid that serves the eastern US and Ontario is congested by large power transfers from neighboring regions (to allow for open competition) and is vulnerable to disturbances. Although there is a tendency to point at a "single" event triggering cascading outages, major blackouts are typically caused by multiple contingencies with complex interactions. Major blackouts seldom happen, requiring a sequence of low probability events to occur. Accurate sequence of events is difficult to predict, as there is practically an infinite number of operating contingencies. As system changes (e.g. independent power producers selling power to remote regions, load growth, new equipment installations), these contingencies may significantly differ from the expectations of the system designers. As a chain of events at various locations in the connected grid happens, operators cannot act quickly enough to address fast developing disturbances.
The likelihood of low-probability events escalating into a cascading outage increases when the grid is already under stress due to certain preconditions such as weather (high temperatures, thunderstorms, fog, geo-magnetic disturbances, etc.). There are, however, a number of controllable preconditions and factors for blackouts. A congested grid is a major precondition. Public pressures and the "Not in My Back Yard" sentiment make it difficult to site lines especially in the more densely populated heavy load areas. Another important factor for causing blackouts in recent years is a lack of reactive support close to the load so that the adequate voltage levels can be sustained. It is very important to assure that sufficient reactive power is available (e.g. strengthening the system with reactive power sources) and to exhaust all generator reactive power capabilities when required. Other factors include aging equipment that is prone to failure, maintenance practices such as adequate tree trimming, and insufficiently coordinated equipment maintenance and generation scheduling during stressed conditions.
In general, low level of investment in the grid in recent years is also a major contributing factor - the challenge here being identifying who is to invest and recuperate the costs of such investment. As blackouts rarely happen, it is not viable to require 2-3 year return-on-investment. The system now runs under tight operating margins in order to sustain profitability without increasing rates. Regulatory uncertainty at both the state and Federal levels has impeded transmission investment and prevented necessary system coordination on a regional scale.
The combination of these factors makes power systems more susceptible to disturbances. It is the cascading events that cause disturbances to propagate and turn into blackouts. A system is stressed and as system and equipment faults occur the chain of events starts. For example, some generators and/or lines are out for maintenance, or a line trips due to a fault. Other lines get overloaded and another line comes in contact with a tree and trips. There is a hidden failure in the protection system (e.g. outdated settings or HW failures) that causes another line or generator to trip. At that stage, power system is faced with overloaded equipment, voltage instability, transient instability, and/or small signal instability. If fast actions (e.g. load shedding, system separation) are not taken, the system cascades into a blackout.
Evaluation of disturbances show that protection systems have been involved in 70% of the blackout events. For example, zone 3 distance trips on overload and/or low voltage sensitive or ground over-current trips on high unbalance during high load. Inadequate or faulty alarm and monitoring equipment, communications, and real-time information processing can further exacerbate disturbances in the system. Either information is not available or operators are flooded with alarms, so that they cannot make proper decisions fast.
Human error or slow operator response are major contributing factors for cascading outages. As a disturbance develops operators in various regions are faced with the questions 'is the best course of actions to sacrifice own load, cut interties, or get support from neighbors?', 'should we help or should we separate?'. An important aspect in designing connected power systems is that individual systems should not allow cascading outages to spread through out the system. For example, as one part of the network is heading towards, or is in, a blackout state, neighboring power systems should not go totally black as well.
There are a number of other contributing factors that allow a blackout to spread, including lack of coordinated response among control areas. Each region focuses primarily on its own transmission system. Each of the individual parts can be very reliable yet the total connected system may not be as reliable. While accounting systems have boundaries, electric power and critical communications do not obey these boundaries. Intertie separations are not pre-planned for severe emergencies, leaving it to the operators to decide - very difficult during a fast developing disturbance. In any case, it is desirable to take automated actions before system separates or to separate it in a controllable manner. Special protection schemes are wide area schemes designed to detect abnormal system conditions and initiate pre-planned automatic and corrective actions based on system studies. A lack of or inadequate special protection schemes makes it more difficult to prevent spreading of the disturbance.
Phenomena that manifest during a power-system disturbance are divided into the following categories: angular instability, voltage instability, overload and power system cascading.
In recent years, voltage instability has been one of the major reasons for blackouts. The power system becomes unable to maintain voltage so that both power and voltage are controllable. A typical scenario here is high system loading due to heavy transfers across the grid, followed by events that initiate relaying actions (a fault, line overload, or generator hitting an excitation limit). As the grid becomes even more overloaded, more reactive power is consumed causing voltages to drop. As it is difficult to transfer reactive power across distances, it is desirable to provide enough reactive power close to the load.
However, regardless of provisions for reactive power support, due to heavy loading and tripping actions, the power system can experience "point of no return" where voltage can no longer be maintained. Voltage instability can cause the whole grid to experience blackout unless actions are taken to maintain the voltage. Those actions include switching on shunt capacitors and SVCs, blocking tap changers, exhausting reactive generation resources, and, as a last line of defense, shedding load (e.g. on under-voltage). .
During major blackouts very often some areas separate from the rest of the system causing power unbalance. Where there is surplus generation in an area, a coordinated generator tripping should be pursued to avoid sudden loss of power leading to a complete blackout. Where there is surplus load in an area, a well-coordinated under-frequency load shedding scheme should be employed and coordinated with the generator under-frequency schemes.
Reviewing an example of recent system blackouts can lead to further insight into the causes and cures for such events. The August 10, 1996 outage in California  alone cost $1 billion. An hour before the disturbance three 500kV lines tripped. This resulted in a heavy power flow (4700 MW) from North to South. A fourth line tripped due to a fault and a fifth line tripped due to design flaw. As a result 230 kV and 115kV lines experienced heavy loads. The 115kV line tripped due to relay hidden failure and the 130 kV line sagged and flashed over a tree. Voltage declined, power units went to full excitation and tripped. Power oscillations caused the tie line trip and caused out-of-step conditions, separating the system and causing further cascading separations. In this case, 30,500 MW of load is lost and 7.5 million customers were affected. Although each blackout is different, the August 14, 2003 blackout experienced some very similar patterns.
How to Prevent Blackouts
The events of August 14 underscored the need for increased investment in the transmission system, but any investment should be preceded by a prudent analysis of which investments are most necessary. There is no silver bullet solution to preventing blackouts, but there are general measures than can and should be taken to minimize impact of disturbances. Since the recent outage was caused by a complex sequence of cascading events, electric utilities, industry regulators, and state and Federal legislators must undertake the following steps to determine what happened, understand why the it happened, and prevent it from happening again.
Step One: Analysis & Audits
Multiple regulatory and government agencies have already begun an intense analysis of the blackout data to identify what actually happened and to put to rest the rumors and conjecture offered thus far in the media. An immediate focus must be placed on the possible failures of the control and protection systems intended to restrict and contain power system disturbances within smaller areas. System protection design changes are also needed to limit the impact of future blackouts. Critical alarm monitoring systems must be maintained in top operating condition, and newer alarm analysis technologies should be deployed to detect and to prevent the spread of major disturbances. Other equipment failures may be involved, requiring detailed failure and root cause analyses. Most important, however, inadequate flows of information between neighboring control centers may have resulted in an inexcusable time delay in reacting to an escalating problem.
Other reviews should also be conducted in the immediate term including audits of planning, operating, and maintenance practices to identify the factors that contributed to the recent blackout. Transmission system capabilities for handling today's higher flow levels and the huge volume of transactions must be investigated more thoroughly. Maintenance procedures should be revised to reduce the rate of equipment failures in critical transmission equipment, and vegetation clearance around transmission lines must be reexamined and corrected as necessary.
Step Two: Preventive and Corrective Actions
The analysis and audit process will identify the next set of actions required to minimize the possibility that a similar outage will happen again. Short-term upgrades will likely be required, such as improved monitoring and diagnostics as well as remedial action schemes and training for system operators. Monitoring systems with faster detection and wider communication capabilities could play a key role in the near term, as well as improved transient stability analysis capabilities and improved control algorithms that can take quicker and more appropriate corrective actions. The development of special protection schemes can help manage system disturbances and prevent blackouts. These wide area protection schemes to detect abnormal system conditions are based on pre-planned, automatic and corrective actions implemented based on system studies, with a goal to restore acceptable system performance. NERC defined standards of acceptable SPS performance. It is expected that planning and budgets for 2004 will be significantly influenced by the August 14, 2003 blackout.
In summary, following measures should be taken to prevent blackouts:
- Improve monitoring, diagnostics, and control center performance (e.g. availability of critical functions needs to increase to 99.99%);
- Secure real-time operating limits on daily basis (e.g. dynamic line ratings);
- Implement Special Protection Schemes and Adaptive Protection;
- Perform protection coordination studies on a regular basis as system conditions change;
- Test not only individual relays but system protection applications;
- Perform dynamic voltage and transient stability studies on a regular basis as system conditions change;
- Condition assessment of aging infrastructure and improved maintenance;
- Operator training, including a coordinated approach among control areas
Step Three: Public Policy, Transmission and Future Investments
A tightening of procedures at utility control centers and at the independent transmission system operations (ISOs) that will have long term effects on the industry is expected. Regulatory actions can be expected as well.
Preventing a blackout of this magnitude in the future will require a combination of long-term investments in the transmission system and much needed improvements in public policy. Significant investment in transmission hardware will be made in the next decade. The retirement and replacement of transmission equipment at the end of its useful life will be another important remedy for skyrocketing failure rates and potential outages in the future. But beyond the aging infrastructure issue, the transmission grid must also be upgraded and expanded to handle the increased energy flows. High-voltage power electronics devices would allow more precise and rapid switching to improve system control and to help increase the level of power transfer that can be accommodated by the existing grid. Distributed energy technologies could also play a role in relieving certain power flow demands on the transmission and distribution networks as well as in improving reliability. While the new investment will certainly include some new transmission lines, it will also encompass new power delivery technologies, including thyristor switches, superconducting materials, and VAR controls.
Today's communication and computer technologies have produced a new revolution in the power industry, especially in the field of power system control where vertical integration is much improved. Computer relays communicate not only with a center, but also with each other. This in turn will facilitate the overall system-wide protection and control philosophy. Microprocessor-based coordinated protection, monitoring and control systems are the key to innovations in preventing cascading disturbances. The implementation of an advanced wide-area protection system first requires a significant improvement of the existing decentralized systems. Decentralized subsystems will have to utilize advanced algorithms to make local decisions based on local measurements and/or selected remote information. With an information infrastructure, it is possible to tie all the monitoring, control and protection devices together through an information network. The key to a successful solution is fast detection, fast and powerful control devices, communication system, and smart algorithms, in other words "True Wide Area Protection and Control System".
In summary, following measures should be taken to prevent blackouts:
- Regulatory actions to assure coordination and enable efficient system planning, permitting, and market operations.
- Strengthen transmission network, through building lines and cables and distributed generation;
- Increase transmission power flow control capability by use of HVDC links and FACTS;
- New technologies enable coordinated wide-area protection, monitoring and control systems as cost-effective solutions (a true Wide Area Protection & Control System);
- Energy Storage;
1. "Wide Area Protection and Emergency Control", Working Group C-6, System Protection Subcommittee, IEEE PES Power System Relaying Committee, January 2003.