Short Answer:
Yesterday’s Secondary Flow Outage was a UI issue, and not a DSSF issue. Some of the AngularJS files for SF come from external sources which became unavailable. There was a Google load balancing issue yesterday.
Long Answer:
I have a Splunk alert that fires whenever there have been zero secondary flow purchases within a 15 minute period. It normally fires a few false alarms at night but some of the alarms started firing in the morning. I sent the first email saying something might be wrong at 8:46am, and reached Rohith on HipChat.
Since the 500 alerts were not firing, the first guess was maybe the logs stop rolling. That was not the case, so I just went to WWW and clicked the link the Secondary Flow, and the logon portal did not appear. I opened an incident for anything that Rohith was doing could be logged.
This looks exactly like the symptom of a 500 error, but it was not. Secondary Flow team took over the digging and Lenny and others found that the UI was not loading properly.
Load Balance Issue with Google:
https://status.cloud.google.com/incident/cloud-networking/17002
Example of incident reported by Google
Aug 30, 2017 09:30
We are experiencing an issue with a subset of Network Load Balance. The configuration change to mitigate this issue has been rolled out and we are working on further measures to completely resolve the issue. For everyone who is affected, we apologize for any inconvenience you may be experiencing. We will provide an update by 10:30 US/Pacific with current details.
Examples of CDN’s Used:
https://ajax.googleapis.com/ajax/libs/angularjs/1.5.0/angular.min.js
https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css
