Managed Workstation H: and I:\groups troubles

June 24, 2016

On 6/15/2016, we experienced a significant incident where the server infrastructure behind H:\ and I:\groups was unable to accept new connections. The cause of that incident is now understood. We applied a workaround of moving the 90% of file directories that were affected by this incident to new server infrastructure, which was already in use for 10% of our file directories. As mentioned at the time, we were hesitant to make that change because we weren’t fully confident that the new server infrastructure was ready for a bigger load, but at the time it appeared to be the quickest and best way to restore service.

Since then, we’ve had a series of incidents that were smaller in scope. This is because one of the 4 new servers continues to have problems, which seems to be tied to something that happens in the very early morning. The other 3 new servers don’t experience these same problems, and the incidents since 6/15 have only affected the same set of customers that leverage the 1/4 of file directories hosted by that one server. Those recurring incidents do not prevent new connections like the 6/15 incident, instead there is a slow degradation of performance to the point where it is very noticeable. The eOutage messages sent about those incidents have been pretty generic, so we thought it important to provide more detail than what has been communicated via eOutage.

We recognize that these continuing problems are impactful and undesirable, and continue to work hard to identify what is causing those problems and to seek solutions.

As we explore solutions we are also trying to limit any undesirable impacts from changes, but we recognize that moving too slowly isn’t acceptable. As we try to move quickly, we are making some changes that we’d normally move more slowly on, with more customer notification.

We really appreciate your patience with us, and please know that we continue to work hard to provide a reliable service that meets your needs.