Business Computer Failures revealed

Anything and everything else go here. Post comments about anything at all, or just shoot the breeze with friends.

Business Computer Failures revealed

Postby glnc222 » January 31st, 2017, 2:55 am

Delta airlines had their flight operations computer system go down for hours end of January 2017, one of several such events across industries over years. An interesting tidbit showed up in the news: "An inspection later revealed that 300 of its 7,000 servers weren’t wired to backup power. When the servers on dual-power sources came back on, the 300 didn’t, causing the entire system to crash." Seems robots are not the only things needing better quality control. I must be naive to think such a large organization could not do a better job on such critical systems (with more resources than on mere management information systems, accounting etc.) Blame bureaucracy? One hopes the military at least does it better. There used to be a thing called "fault tolerant systems". Something's gotten lost over the years. Too many servers... Whatever happened to the supposed advantages in reliability of distributed systems? Phone companies seem to be able to do it.
glnc222
Robot Master
 
Posts: 4261
Joined: January 23rd, 2012, 9:19 pm
Location: North Carolina, U.S.

Re: Business Computer Failures revealed

Postby CleanMe » January 31st, 2017, 11:39 am

I agree. Delta seems to have done a very poor job at maintaining their systems and protecting their business.

"Delta says the company's failure to back up power to all of its servers went undetected until Monday's power failure, which caused the airline to cancel more than 2000 flights by Wednesday afternoon. Delta says it is addressing what it calls "dated" infrastructure issues, and has spent $1 billion per year to update its technology."

I seriously doubt the fact that 300 servers had no backup power went undetected. It had to have been a known issue that was simply not addressed - for whatever reason. (Adding backup power to those 300 servers would have been a modest expense.) Their IT Dept. must all have gray hair, as they surely knew that IF the system ever went down due to a power disruption, it would not come back up on backup power alone - because key components had no backup power.

The most interesting aspect of this information is that it shows Delta apparently does not conduct failover tests. That is - They do not conduct failover tests on a system upon which their whole business is reliant upon. I didn't see any data, but this incident must have cost them hundreds of millions in current and future revenue.

What if a fire, power surge, etc. rendered their server farm at the current location useless? I would have thought a data intensive business like an airline would have fully redundant rollover systems at an alternate location.
CleanMe
Robot Addict
 
Posts: 162
Joined: August 7th, 2010, 12:30 am

Re: Business Computer Failures revealed

Postby glnc222 » March 3rd, 2017, 12:15 am

An Amazon web services (big part of the internet) outage:

"team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended." (Seems they never heard of selection menus.)
"In theory, a series of failsafes should keep the fallout from such errors localized, but Amazon says that some of the key systems involved hadn’t been fully restarted in many years and “took longer than expected” to come back online." (sounds like some vacuum robots reported here).
"The company now claims it’s “making several changes as a result of this operational event.”

Not surprising in a way, given how the internet was built totally lacking in security features to begin with, so what else is lacking. One hopes strategic nuclear missile systems are better designed (well, they are supposed to require two keys at once etc. -- maybe taking too long to launch when needed (hard to read that fine print in the instructions). In haste, coffee is spilled on the launch panel...)
glnc222
Robot Master
 
Posts: 4261
Joined: January 23rd, 2012, 9:19 pm
Location: North Carolina, U.S.


Return to General Chit-Chat (Off Topic)

Who is online

Users browsing this forum: Brett and 257 guests