SALEM — A week after the state’s core computer network crashed — interrupting online and phone services at dozens of state agencies — officials still don’t know what caused the more than seven-hour outage.
The state data center, which manages the network, is conducting a post-incident review to try to determine the cause and come up with recommendations for preventing another failure.
That review is expected to be completed in about two weeks, said Sandy Wheeler, administrator for the state data center.
“We don’t know what caused it,” Wheeler said. “We have pulled system logs. We are asking Cisco — the provider of hardware — to evaluate all of the logs and determine what happened.”
At about 7:30 a.m. Saturday, June 9, the state data command center received an alert that the state computer systems were unreachable. The data center immediately deployed on-call information technology employees with expertise in network and security issues and notified information technology offices of the state’s some 80 agencies of the problem.
State date center employees had the system back up and running by 3 p.m. June 9, but some agencies, such as the Oregon Department of Transportation, experienced disruptions through Monday, June 11.
For example, motorists were unable to access the transportation department’s TripCheck website to see traffic conditions. Meanwhile, truckers couldn’t use the website for obtaining permits. Sportspeople couldn’t get on the Fish & Wildlife site to get a fishing license.
PG&E had a scheduled power shutdown at the Capitol Mall earlier June 9, but the loss of power isn’t believed to have caused the state network outage.
“We don’t believe it was related to power, because the routers flipped to battery and generator backups,” Wheeler said.
The state network is built to handle an outage without any interruption in online access. The system has four core routers spread out in three different buildings in the state government complex in Salem. Each router has a battery and a generator backup and two motherboards. The second one is meant to take over if the other one fails.
“Each of those routers has multiple links to each other so if for some reason a link were to get broken, it should be able to go around another route to get to another router,” Wheeler said.
One way to prevent another failure might be to test the routers more regularly, she said. The last full-scale test of the system was 18 months ago.