Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Cloud Data Storage Google

Google Cloud Accidentally Deletes UniSuper's Online Account Due To 'Unprecedented Misconfiguration' (theguardian.com) 52

A "one-of-a-kind" Google Cloud "misconfiguration" resulted in the deletion of UniSuper's account last week, disrupting the financial services provider's more than half a million members. "Services began being restored for UniSuper customers on Thursday, more than a week after the system went offline," reports The Guardian. "Investment account balances would reflect last week's figures and UniSuper said those would be updated as quickly as possible." From the report: The UniSuper CEO, Peter Chun, wrote to the fund's 620,000 members on Wednesday night, explaining the outage was not the result of a cyber-attack, and no personal data had been exposed as a result of the outage. Chun pinpointed Google's cloud service as the issue. In an extraordinary joint statement from Chun and the global CEO for Google Cloud, Thomas Kurian, the pair apologized to members for the outage, and said it had been "extremely frustrating and disappointing." They said the outage was caused by a misconfiguration that resulted in UniSuper's cloud account being deleted, something that had never happened to Google Cloud before.

While UniSuper normally has duplication in place in two geographies, to ensure that if one service goes down or is lost then it can be easily restored, because the fund's cloud subscription was deleted, it caused the deletion across both geographies. UniSuper was able to eventually restore services because the fund had backups in place with another provider.
"Google Cloud CEO, Thomas Kurian has confirmed that the disruption arose from an unprecedented sequence of events whereby an inadvertent misconfiguration during provisioning of UniSuper's Private Cloud services ultimately resulted in the deletion of UniSuper's Private Cloud subscription," the pair said. "This is an isolated, 'one-of-a-kind occurrence' that has never before occurred with any of Google Cloud's clients globally. This should not have happened. Google Cloud has identified the events that led to this disruption and taken measures to ensure this does not happen again."

Google Cloud Accidentally Deletes UniSuper's Online Account Due To 'Unprecedented Misconfiguration'

Comments Filter:
  • Not good (Score:3, Interesting)

    by dpalley ( 670276 ) on Friday May 10, 2024 @07:42PM (#64463855)

    In Azure, you get a dozen reminders over several weeks when deleting a subscription before it's actually deleted.

    Dan

  • by DMJC ( 682799 ) on Friday May 10, 2024 @07:49PM (#64463871)
    Got an outage? Blame Google, blame Microsoft, blame anyone but the idiot that pushed the services into The Cloud. That's why management loves Cloud. Idiots.
    • by vux984 ( 928602 ) on Friday May 10, 2024 @09:27PM (#64464005)

      " blame anyone but the idiot that pushed the services into The Cloud"

      Remember THAT idiot works for your company.

      Would REALLY want them running your internal infrastructure? Do you think that's going to end any better?

    • Got an outage? Blame Google, blame Microsoft, blame anyone but the idiot that pushed the services into The Cloud. That's why management loves Cloud. Idiots.

      You think your in-house sys-admin can't screw up? Cloud providers have more servers, more data centres, more admins, etc, etc. By going to the cloud you reduce your probability of downtime. And if you're smart you backup locally and to another cloud (just like they did).

      Now, all that complexity, if one of those big Cloud providers ever has a system wide outage? Now that's a big systemic risk to the Internet, possibly the cause of another financial meltdown.

      But for an individual organization? Cloud all the w

    • The buck stops at whomever is getting paid to do something. If a paid google service fails then it's absolutely appropriate to put most of the blame on google.

      However, there's some person in the 'CIO' or similar decision makers position that should hold some level of responsibility to have separate backup facilities and a disaster plan.

  • by jddj ( 1085169 ) on Friday May 10, 2024 @08:18PM (#64463907) Journal

    "UniSuper was able to eventually restore services because the fund had backups in place with another provider."

    "Fortunately, I keep my feathers numbered for just such an emergency."

    https://www.youtube.com/watch?... [youtube.com]

  • by khchung ( 462899 ) on Friday May 10, 2024 @09:20PM (#64463991) Journal

    Many recent Boeing incidents were one-of-a-kind too, is there any other plane has its windows ripped out in mid-air before or after?

    The fact that a business account could be so simply deleted (and requiring restore from the customer's own backup *elsewhere*!) already pointed to how little thought went into it when Google designed the system.

    Did no one in Google asked the simple question "Gee, this delete account function seemed quite powerful, what happens if it was triggered accidentally?" Apparently not, otherwise it would simply take Google one click to restore the account because the data was only mark-deleted and will be kept 90 days before being actually wiped out.

    Or we could guess how many levels of human approval this action required (one? or none?), or how come this function did not have a verification with the billing system to safeguard paying accounts won't be deleted, etc, etc. This could happen at all indicates a systematic problem with how Google view their customers when designing systems.

    • is there any other plane has its windows ripped out in mid-air before or after?

      Yes. Maintenance issues affect all sorts of aircraft. Window blowouts are very rare, but they happen.

      British Airways pilot was sucked out of an airplane mid-flight — and lived [businessinsider.com]

      Oregon man nearly got sucked out of a busted passenger-plane window. Now he’s a pilot. [oregonlive.com]

      Sichuan Airlines co-pilot was pulled back inside by crew after right windshield blew out at 32,000 feet [theguardian.com]

      • In this case it was a manufacturing issue and a door plug, not a maintenence issue and a window. And it was indeed a first.

    • by az-saguaro ( 1231754 ) on Saturday May 11, 2024 @02:16AM (#64464315)

      Your points are excellent, and they bring to mind a situation a bit different but analogous. EMR, electronic medical records in hospitals were mandated during the Obama years, accelerating adoption that was already going on. The few big companies who make these crappy systems also make a lot of money selling the service to the hospitals, many or most of which by now have been co-opted by corporations and private equity. The medical record, once a bastion of proper and quality care has now become just a billing and money capture tool for hospital administration. The EMR has its benefits, but as implemented and used, it has had a profound effect to degrade quality of care in the hospitals.

      The companies making EMR's such as Cerner-pos and Epic-pos are now big companies with lots of money, lots of employees, and nominally good in-house expertise on computers and networking systems - that is after all what the EMR is, just a big network capturing data. So, you might think that EMR's would work well from a basic technical point of view, regardless how the hospital admins abuse it.

      But, no.

      We are subject once every month or two to an announcement that the EMR will have scheduled "downtime" for maintenance and upgrades. Paper charts were never taken offline, never a problem, but the 8 hour emr lapses are disruptive to care. I am pretty sure that Google, Amazon, Ebay, the airlines, and a zillion other big companies know how to do maintenance and upgrades without taking their service offline. One would think that the EMR vendors would know how to manage that, but they don't, or maybe they do but they don't care. Their customer base is "small" by Amazon or Google standards, just a few hundred or thousand clients, the services have no social media presence for public complaints, and the nurses, doctors, and patients don't count because the EMR is mainly a money capture tool for the hospitals that don't care if clinical operations are bumpy as long as billing operations are smooth. So, no one complains except the clinical staff, and that no longer counts. Maintaining up time during a system upgrade should be basic and easy. The problem is simply disregard for the end user, for the social responsibility of delivering a quality product, and for the moral-ethical issues at the center of medical care products and services.

      Furthermore, there have been reports lately of hospitals hit by ransomware, with entire patient databases going down. How can the hospital admins and the crappy EMR companies operate without proper data backup - akin to the story reported here. However, the hospitals, at least the ones I work at, do not seem to have any "Customer's own backup" or third party data preservation to be a contingency backup.

      One thing I have gleaned from following Slashdot is a feeling among IT pros that corporate admins often have little or zero sense about computer and network technology, so they are apt to make foolish or short-sighted choices. Hiring a vendor for a service like EMR and not expecting robust 100% up time or having secure backups seems to be a typical hospital admin bone-head thing to do, but this would be moot if the companies delivered a properly designed and secure service, but they are not.

      The crapification of such services and business is a cancer which has rotted our society in the past 30-40 years. The fact that these problems happen, in the hospitals-emrs and with the parties in the posted story, are just two examples among presumably countless others. Your comments hit the nail on the head of questions that seem so obvious, and the issues are presumably not unknown to the gurus-pundits-"experts"-assholes in the corporate echelons. Honest mistakes can happen, but for companies with the self-aggrandizing arrogance of Google, arrogance born from the fact that they actually do have top notch in house expertise, the lapses that you enumerate should never have happened. The situation reported got resolved with a happy ending, but it doesn't inspire confidence that it won't happen again.

      • by Bongo ( 13261 )

        Thanks for your post, that's a really great read and I think, that alone is the crux of the problem, and we should all be spending our time thinking about that.

        It reminds me of Systemsntics which was written by a doctor.

        "the fundamental problem does not lie in any particular System but rather in Systems As Such (Das System an und fuer sich)"

        -- SYSTEMANTICS. THE SYSTEMS BIBLE by John Gall

        Personally I think that we as humans have built systems which are too complex for us to understand at the moment.

        The way t

      • The fundamental problem is people love obeying orders from someone else who's then responsible for the bad decisions.

        Software Engineers, Programmers, Computing Scientists, etc. form the most powerful middle-class profession ever in the History of mankind, bar none. They have access to the most powerful devices and machinery in existence, access to the most complex tools ever devised by human ingenuity, access to the vastest collection of knowledge ever assembled, the knowledge and the means to put it all to

      • So where does OpenEMR [open-emr.org] or OpenMRS [openmrs.org] fit into this if EMR in general is a failure?

        • Good point - but -

          (I don't have experience with OpenMRS).

          OpenEMR is practice management software rather than hospital style EMR.
          It came on the scene as I recall about 15 years ago, when doctors, clinics, and hospitals still mostly used paper charts but were transitioning to electronic records. Computerized practice management software has been around much longer, and OpenEMR is on that cusp in time when computerized offices and electronic records were starting to blend. It was only around 2014 when emr's

      • EPIC was a big company before Obama. I applied to a job there in the early 2000s, it was all IIS and SQL Server and they were writing their platform targeting .NET1 and writing their own custom XML parsers. In short, it was a hellscape for developers and I was glad to turn down their offer.

        They bragged about co-writing the EU regulation at that point and I am fairly sure they co-wrote the Obama regulation to the point every provider that was on smaller, homegrown or different software simply had to conform

    • You just have to pay a little bit more attention when parsing the statement.

      the outage was caused by a misconfiguration that resulted in UniSuper's cloud account being deleted, something that had never happened to Google Cloud before.

      The "something" that never happened before is deleting of this specific account (therefore is one-of-the-kind). They never said no other accounts have ever been deleted the same way, only that this specific UniSuper's account has never been deleted before.

    • I've run on premise clouds (Openstack) and iirc they wouldn't let me remove an account before I removed all if it's dependencies. There was no "nuke tenant" option.
  • oh shit

  • by gweihir ( 88907 ) on Saturday May 11, 2024 @02:07AM (#64464299)

    For something like that to happen, you usually need 3 or more mistakes by different people. Either Google has inadequate safeguards in place (very, very bad), or they do have people incompetent enough that all made a mistake here (very, very bad) or both (worse).

    This looks like tech-rot to me, where "managers" try to make things cheaper until they are done cheaper than possible and then crap like this happens.

  • GCP and Google business have been spiraling into shit , i can see that as part of my work ( managing multiple enterprise workspace and GCP accounts ) The service level in last 5 years have become unbearable and i reluctantly started recommending my clients to get off it and move to competition. Even when you do have support chats, 90 % is just lies , their actual support staff has no clue in basic operations or policies.
  • Good on their IT staff for not completely trusting Google and having alternate backups. Bad on their IT staff for not hosting their own data. The cloud is not your computer and the data stored on it IS NOT YOURS. Give your IT staff the resources to host the data locally and avoid this kind of crap in the future. Companies are not going to learn this simple lesson until these cloud providers really screw something up. Google really screwed up in this case only to be saved by the company's IT staff foreshadow

  • Sure GMail and YouTube work, but their Search is almost obsolete and any other product by them cannot be counted on.
  • We all know having your business on the public cloud is silly right? By the time you go do all the constant cost saving auditing, multi-AZ, mutl-cloud DR and such, does it really make sense at any real scale? I mean once you've hired even one dedicated AWS/GCP/Azure person, you've gotta be in too deep.

    Within five years, "cloud repatriation" will be as hot a resume as "cloud migration" was five years ago, and everyone who wants to keep their jobs will pretend nothing hilarious happened.

[We] use bad software and bad machines for the wrong things. -- R.W. Hamming

Working...