Server Status

This section provides real-time news, announcements and updates regarding service offerings, hosting server statuses, scheduled reboots, network statuses, outages, upgrades and improvements, and planned or unplanned maintenance.  Come here to check the situation if your site or application is not responding or performing at expected speeds.



Service interruption, April 7, 2010 Print E-mail

Some service outages occurred today preventing HTTP access to the server.  We believe the problem has been resolved and everything is operating normally again.

The issue was the IIS server was failing to service requests and returning an ECONNRESET.  The problem was noticed around 3 PM PST and again around 5 PM PST.  The first occurrence was temporarily resolved by restarting the web services while the investigation began.  The cause had not been discovered until the 5 PM PST outage forced a server reset.  Around this time, the investigation revealed some configuration issues with a mandatory update of the server firewall/antivirus/security system.  Some time ago it was discovered that an optional email scanner feature was creating an excessive number of TCP endpoints that were not being terminated.  During the update of the security software, this feature was inadvertently re-enabled and was very likely (but not 100% verified to be) the cause of the failures today.  This feature was disabled.  Also around this time it was noticed that the update did not correctly bring in the previously stable firewall configuration from the earlier version and this needed to be reconfigured.  This required additional maintenance outside the standard maintenance window of 7 PM - 9 PM PST due to the urgency of having proper firewall configuration.  We apologize for the inconvenience and service interruption.  We've done everything in our power tonight to ensure that everything is configured properly and the situation will be monitored diligently in case the cause of the initial ECONNRESET problem is not yet fixed so that swift action can be taken.

Last Updated on Wednesday, 07 April 2010 23:22
 
Longstanding, devious, server performance vampire vanquished today Print E-mail

A very devious, tricky, and longstanding bug affecting server performance has been eliminated today. I feel like the coyote having finally caught the roadrunner on this particular issue. It has been my enemy for a long time and it is finally vanquished. I expect notable performance increases across the board, as this was a very low-level issue affecting raw filesystem and kernel performance, thus affecting every area of the system.

Last Updated on Wednesday, 07 April 2010 23:22
 
Server issue resolution info Print E-mail

The server issue that occurred (discussed in the previous blog post) has been resolved.  The issue was linked to an anti-virus email scanner.  While the issue is not fully understood as it has not been duplicated on another system yet, it was unmistakably linked to this as disabling this scanner resulted in a total recovery of the system without having to restart services (note that prior, even restarting the services completely failed to resolve it).  The TCPView utility from SysInternals reported this scanner having a large number TCP endpoints (and possibly half-open or fully open connections) which were not shutting down.  It may have also overloaded the Named Pipes causing other havoc, but primarily the connections and endpoints appear to have caused the system to hit a limit preventing new ones from being created, essentially blocking the web server from communicating with the database.  The scanner has been shut down until the situation is understood better thus returning the server to a fully functioning state.  This scanner had been enabled and tested many weeks ago and was thought to be completely stable which made it difficult to zero in on the problem.  Why it suddently started behaving in this manner is unknown as there was not a seriously increased influx of spam or other messages that would clearly have led to this, and even so, it should have been designed to handle it better.

Thanks to our customers for their patience and prompt reporting of issues which was helpful in resolving the situation and we apologize for the interruptions that were caused.

Last Updated on Wednesday, 07 April 2010 23:12
 
Server issue, Dec 23, 2009 Print E-mail

A server issue occurred this morning. The result of this issue was that some site functions and AyaNova systems were unavailable.  The basic cause was essentially that the databases stopped responding to internal requests from the web server.

Due to the time of day and the nature of the problem as it could be best understood at the time, I made some adjustments to various configurations and then did a full shutdown / reboot in order to get everything up and running as quickly as possible.  The system is back up and hopefully the changes made will ensure that the system will remain stable throughout the day until further investigation can be done and no further interruption will occur.

The problem is not yet fully understood due to the extensive log data that I will need to examine in order to properly investigate, which I will be doing throughout today.  Any further adjustments to the server will be made during the standard maintenance window of 7-9 PM tonight if necessary.

For the technical among you and in the interest of full disclosure:

The problem spanned Microsoft SQL and extended to MySQL causing them both to fail to respond to requests simultaneously.  Restarting various services revealed that the IIS FTP service refused to restart with an error regarding “insufficient storage” (of course it is not a disk related error), and while the databases restarted without incident, they failed to respond to requests.  The Event Viewer reported nothing of interest, however the Microsoft SQL logs have some information that will need to be examined more to understand.  My sense of the matter based on the FTP service failure message is that it lies somewhere in the arena of the Microsoft MSDTC system and how it internally maps ports in the networking layers for requests.  Some significant Windows Updates were applied on Sunday which were tested ahead of time, and worked fine Monday and Tuesday, but as always unforeseen issues can appear later as may be the case here.  The adjustments made to hopefully reduce the problem for today were reducing the port usage range for MySQL, reducing the total number of active connections in the AyaNova IIS Application Pool.  There was a known MSDTC problem noted in earlier versions of IIS, but resolved in the version we are running making the workarounds for the previous versions invalid for this one, which is another reason I suspect the recent Windows Updates may be at work here.

Last Updated on Wednesday, 07 April 2010 23:22
 



Shop!
Become a fan on FaceBook! Tweet us on Twitter! Link us on LinkedIn!
Skype Me™!