[RESOLVED on Aug 2nd 4:00 am PDT] Microsoft O365 intermittent issues

If you are using Microsoft Office 365 services as a backend data source then you might experience 503s due to an issue on Microsoft O365 side. This issue is with API access point to the service and the regular access to O365 through its UI seems to be fine.

Update July 30th 10:00 am PST: We are following up with Microsoft support team, we will post back once we have any updates to share.

Update July 30th 12:00 noon PST: We are following up with Microsoft team on why some calls to
microsoft.graph.usedRange that returned intermittent errors with response headers:

Update August 2nd 9:00 am PST: Response from Microsoft: Our partner team enabled a change on July 28 which caused usedRange API to fail under some condition. We roll back this change at 4AM Aug 2 (US/Pacific time).

Status Code: ServiceUnavailable

Retry-After: 5
request-id: XXXXXXXX
client-request-id: XXXXXXXXX
x-ms-ags-diagnostic: {"ServerInfo":{"DataCenter":"West US","Slice":"E","Ring":"4","ScaleUnit":"001","RoleInstance":"XXXXXXXXX"}}
Date: Fri, 30 Jul 2021 17:50:07 GMT

The full error and call stack from the graph library was:
Microsoft.Graph.ServiceException: Code: InternalError
Message: Sorry, something went wrong.

Inner error
Code: transientFailure
Message: The request failed due to a transient error. Please try your request again.

Inner error
Code: InternalError
Message: Sorry, something went wrong.

   at Microsoft.Graph.HttpProvider.<SendAsync>d__19.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Graph.BaseRequest.<SendRequestAsync>d__36.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Graph.BaseRequest.<SendAsync>d__32`1.MoveNext()

Below links have more data:

https://portal.office.com/Adminportal/Home#/servicehealth/history/:/alerts/MO273940

1 25 1,233
25 REPLIES 25

Mobax
New Member

when wil lthis be fixed as my app have stopped working and it is affecting my teams in the field currently

Issue doesnโ€™t seem to be resolved, still getting errors even though MS have stated Issued Resolved. 8am UK time

3X_2_c_2ccd835fb542658954d4e8834d9822059827c327.png

All my Apps are now down with same error message. No one can do any work !!

cc @Sarah_Keown
@Harsh_Ch

@praveen

My teams are experiencing the exact same error and no one can give a timeline on when it wil lbe operational again

I can get into some apps now, seems to be very random.

@Harsh_Ch, its almost 11am here in UK and users are still having a lot of issues. Has anyone been in touch with Microsoft to report this as it seems like another issue if they are saying previous issue is resolved?

I have had no resolution. Out of desperation i duplicated my filed into GDrive and pointed the system to the new location and i managed to get back into the system. Gdrive seems more stable.

Not possible for me to do this, way too much data, we have a lot of Apps. Where are you located?

Im based in China but my app works for my company in South Africa. I use 15 datasets but the 2 main ones i moved to GDrive to see if that helps a bit. If it does ill be moving the rest over piece by piece. My system runs rought 8000 lines of data through it monhtly.

Same problem since yesterday. My business is paralyzed. Hope that Appsheet ingeneers are really in touche with Microsoft ingeneers, because survival of my business TODAY depends of itโ€ฆ

@Alexis1 @Martina we apologize for the inconvenience. Its the side-effect of the flexibility our platform provides i.e. to let you manage data in your store and power apps/automation from there using APIs.

Microsoft Service seems to have random access issues although the site claims its back to normal. We are trying to follow up with few known contacts. We will post back if we have any success with it.

Hi community:
Do you have some news about this problem?

Just posted the link with new updates.

How could it be explained that only some Microsoft accounts have this problem. And within the same Microsoft account one app works (EvaluacionEmpresas-980586) and another does not (TrazabilidadEmbarques-980586). In this moment, I have reconnected the data source and the error message persists:
3X_b_8_b81fe2e5d6c330f4683de0533d145a9665242c9f.jpeg

Hi @SmartD, @Mobax are you still having this issue or is it only us?

Same here. I have some items in my one drive still and its is up and down and affects my users big time

Hello @Martina, we are not sure that the problem has been solved. For now we have migrated the problem apps to Google. At this time we notice that the original apps are working fine, but since the problem is intermittent it is better to wait for AppSheet to confirm that the problem has been solved to migrate the original apps back to users. We are waiting.

We were hoping this would be resolved over the weekend but we are still haveing issues. 9am UK time.
@Harsh_Ch
@praveen

Or complete procurement and invoicing system replies on Apps now along with many other processes. This is really not good with all systems not working since Thursday.

Day 5 of problem now !!!
2:30pm UK time and our business of over 200 people is disabled because our Apps are not working.
Is AppSheet aware of how critical this is?
I know this is a Microsoft issue but is communication with Microsoft being prioritized?

@Harsh_Ch @praveen

@Martina, @Alexis1 and @SmartD as you are seeing the issue is completely random. And we have posted the detailed exceptions we are seeing in accessing MS Graph API for some apps. We are still following up with microsoft engineering and unfortunately, we donโ€™t have a positive news to share yet.

As few users suggested, migrating to Google Drive as a backup in one solution you can employ on your end.

We will update back this thread as soon as we have more information to share.

@Martina @Alexis1 and @SmartD , Microsoft has identified the root cause and seemed to have fixed at 4:00 am PDT time. Can you please confirm if you are able to render Apps now.

@Harsh_Ch Today in the morning I noticed that the applications were working. I was just waiting for confirmation from AppSheet to be sure the problem was fixed and was not an intermittence. Thank you very much for the update.

MIcrosoft assures us the issue is fixed now. We are also no longer seeing errors in our logs.

Per Microsoftโ€™s messages to us โ€ฆ


Hi Praveen/Brian, thanks for reporting this issue to us. Sorry for the impact the issue caused.

Thanks to the session info, we think we have found the root cause and made the fix. Could you please check on your side?
Quality is our first priority, you can send email to ecoxlgraph@microsoft.com for Excel Graph related issue in the future.

and

Hi Harsh, our partner team enabled a change on July 28 which caused usedRange API to fail under some condition. We roll back this change at 4AM Aug 2 (US/Pacific time). If you are still hit issue after this time, please let us know.


When issues involve other services outside Google, it is difficult for us to be predictive or transparent, and we also have only so much we can do, frustrating though it is. In this case, hereโ€™s a timeline of what we did to pursue the issue:

  1. Issue was reported July 28th and we investigated to see if it was a change we had made. Having determined it was not (because we were clearly receiving intermittent errors from the Microsoft API), we escalated to Microsoft support and reported that here.
  2. Microsoft was having a different issue at the time, assumed this was related, and then that issue was resolved. Unfortunately, that did not fix our problem but added delay.
  3. Fri Jul 30th at 1:30AM, I escalated to Charles Lamanna, Corporate VP at Microsoft
  4. Charles responded Fri Jul 30th at 9:20AM, including others on this team
  5. Fri Jul 30th at 11:45AM, Brian from our team responded on the thread giving Microsoft very detailed information that can help them debug the issue. It took hours of effort on our side ahead of that to capture this intermittent error with the right traces to send them.
  6. Fri Jul 30th at 4:45PM, someone from Microsoft responded that they do not see elevated errors across their product, so this needs more specific investigation.
  7. Sat Jul 31st at 4:12AM, a manager on the Microsoft Excel API team delegates to a couple of others on their team for investigation
  8. Mon Aug 2nd at 8:05AM, Harsh from our team ask again if they have made progress.
  9. Mon Aug 2nd at 8:10AM, Microsoft responds that they indeed identified the problem from the traces we sent and fixed the issue at 4:00AM.

Overall, I have found Microsoft a good and reliable partner in fixing issues with services when we escalate. We can only escalate occasionally however, and when we do, we need to have fully ensured that the problem is not on our side, and that there is no other way to resolve it. And once we escalate it, it is difficult to effectively convey the same sense of urgency that we feel when our own customer is blocked. We donโ€™t know if they are working on it or not, and we can ask them about progress but we cannot badger them. All this adds friction and delay. In this case, the intervening weekend probably did not help with time to resolution.

@Martina and others, I know this doesnโ€™t change the fact that you have been badly disrupted in your business, but at least you know that the most senior members of the AppSheet team were doing what we could to unblock you.

@praveen thank you for your detailed response and all your work.
Its the end of the working day here. I will get users to test out in the morning and let you know.
Its not feasible for us to transfer data over to Google when there are issues. We have too much data, Apps and users to keep it all in sync then to move it back.

Indeed. Moving data between data sources is a very significant undertaking. You cannot do it easily โ€“ has to be an intentional thoughtful thing.