A couple of days back, Foursquare one of the popular social networking websites, was down. Mashable, the social blog, covered this event and constantly updated its blog to let users know what was going on. Foursquare, on their part quickly got on to the social media stream and re-assured their users that they were working on a solution and that they will get things back online. 

The Vice President of Foursquare issued a comment to tech blogs that their engineers were working on the solution and even tweeted an image of their office burning the midnight oil. After nearly 11 hours or so, they were able to get the service back.

This whole exercise was captured live via social media. Not only did the company act in a responsible way by getting the service back online, they also kept users updated and kept an online public status page where users could check the availability of their web service. 

Foursquare probably garnered good amount of Karma from their users because of their immediate reaction to this un-expected downtime. I believe that they employed a two-pronged strategy which worked in their favor. 

Their first and foremost strategy was to enable a public status page where users could check if their service was up and running. Sometimes the service might not be available not to users due to external factors. Thus, by enabling a public status page users could check if the service was up and running and the problem is only on their side. 



Foursquare Public Status Page. Source: Foursquare

Second, Foursquare used the social media very effectively. They spoke to leading tech blogs and updated the status once the service was back online. There are two important lessons to be learnt from the way how Foursquare handled its un-expected downtime.

First and foremost, make sure that you have a strong strategy in place in case anything like this would happen. If users are facing sudden downtimes on a massive scale, make sure that you communicate to them that you are working on a solution and that they can expect an update soon. Update them via Twitter, Facebook or through other social networks.

Second, establish a public status page while launching your web service and make sure your users know about it. Create an atmosphere of transparency which in turn builds confidence and trust among your users. It also let users know that you value your user’s time and effort of using your service and that you place high importance to high performance and availability.

Another good example would be ZOHO, an online business productivity suite which uses our website monitoring service to establish the faith and trust users have put into its service. Zoho’s public status page allows their users to view various services offered, and how (if at all) the service is being affected.



ZOHO Online Services Public Status Page. Source: ZOHO

Similarly, Salesforce.com and Google Apps Engine have also established a similar protocol to display their performance and availability to their end users.

If anything, the whole event with Foursquare puts an even stronger case for companies and businesses to constantly monitor their online services and display their performance and uptime to the public. Some might argue that by displaying your availability and performance, you are vulnerable to being criticized when your website/service goes down. However, if you run the risk of not being transparent about your website performance, you could end up hurting your business in the long run.

Using Site24x7, you can start monitoring your online service and embed the performance and availability stats of your website or online service for everyone to see.

Can you think of any other measure that can increase the transparency of an online service? Feel free to leave your comments.

There are some occasions when you may want to take your web site, web application or servers offline for some upgrade or maintenance purpose. To prevent your sites or applications from being monitored during the maintenance period, you should configure a maintenance schedule for your monitors and thus avoid unnecessary notifications from Site24x7.

Given below are step-by-step instructions on how to configure a maintenance schedule in Site24x7.

  1. Log in to your Site24x7 account and navigate to the Alerts tab.
  2. Click Schedule Maintenance->Add Schedule link. The 'Add Schedule' screen will be displayed.
  3. Provide details such as schedule name, description, recurrence details (i.e. daily, weekly or once), start time and end time in their respective fields.
  4. The ‘Available Monitors’ box will display all the monitors present in your account. Select the required monitors and move it to the ‘Selected Monitors’ box.
  5. Click the ‘Add’ button to complete the configuration.
For example, we have configured a maintenance schedule scheduled to run from 10:00 AM to 11:00 AM on Sundays.


Add Schedule Screen

The schedule thus created can be viewed from the Alerts tab under Schedule Maintenace->Schedule Details section. You may edit or delete the schedule settings from this screen or re-use the settings of the schedule for a different set of websites.

You may leverage the utility of maintenance schedules and avoid unwanted notifications during a scheduled maintenance. This helps to ensure that you receive alerts only when there is a problem and not otherwise.

Along with our recent iPhone client release, we have also included a couple of enhancements to our downtime reporting feature. Just thought of discussing these in greater detail for the benefit of our readers.

Mark Downtime as Maintenance - Exclude maintenance from Downtime Calculation

Lets just say you had taken your websites offline for some maintenance purpose. You most probably don't want to receive alerts for this scheduled downtime and you will not want this downtime to be reflected in your reports as well. In such a scenario, you can use the "mark downtime as maintenance" option and mark this downtime as maintenance in Site24x7. This will not be considered for downtime calculation and will be displayed separately in the availability chart.

You can specify maintenance period in 2 different ways:
  1. Create a Maintenance schedule for the monitor: If you know the maintenance time  beforehand or if the downtime is a recurring event, you can create a maintenance schedule for the time period. Once you create a schedule, the site will be automatically marked as 'under Maintenance' for the timeframe of the schedule.
  2. Mark downtime as Maintenance: Lets assume you forgot to configure a maintenance schedule and Site24x7 marked your site as down. However, since you know this is scheduled maintenance and don't really need to consider this as downtime, you can use the "Mark Downtime as Maintenance" option and mark the specific downtime as maintenance. Click the Mark as Maintenance icon from the downtime table in the monitor details page to convert a downtime to maintenance period.
Ability to add your own comments to the downtime

A second enhancement that has been included in our service is the option to specify your own comments for the downtimes. These comments can be anything that reflects the nature of the downtime or the reason for the downtime, etc.

These comments can be made public as well, so your visitors can also know the reason why your site went down.

What is your take on our latest enhancements? Feel free to comment or contact us directly for any questions.

Salesforce CRM experiences sudden downtime

Jan 07 2009 07:23:32 PM Posted By : Arun
Comments (0)
Salesforce.com (CRM) was down for around 30-40 minutes yesterday between 12:40 to 1:20 US Pacific time. Customers complained they were unable to access their accounts or were unable to reach the website in some cases. Salesforce's status page had a brief explanation of the outage.
Service Disruption Time: 1/6/09 12:40 pm PST Detail: Service Disruption All Instances Root cause: Starting at 01/06/2009 20:39 UTC, a core network device failed due to memory allocation errors. The failure caused it to stop passing data but did not properly trigger a graceful fail over to the redundant system as the memory allocation errors where present on the failover system as well. This resulted in a full service failure for all instances. Salesforce.com had to initiate manual recovery steps to bring the service back up. The manual recovery steps was completed at 01/06/2009 21:17 UTC restoring most services except for AP0 and NA31:17 UTC restoring most services except for AP0 and NA3 search indexing. Search of existing data would work but new data would not be indexed for searching. Emergency maintenance was performed at 01/06/2009 23:24 UTC to restore search indexing for AP0 and NA3 and the implementation of a work-around for the memory allocation error. While we are confident the root cause has been addressed by the work-around the Salesforce.com technology team will continue to work with hardware vendors to fully detail the root cause and identify if further patching or fixes will be needed. Further updates will be available as the work progresses.
The event has attracted lots of coverage on the net and also triggered discussions on the downside of using remote services. Just goes to re-inforce the fact that 100% uptime is practically impossible, even for the top-level SaaS players!
Site24x7's website monitoring service has been recommended in the book 'Beginning Microsoft Office Live' by Rahul Pitre. Rahul Pitre runs Acxede, a company that builds web-based applications for small and medium-sized businesses.

The book deals with learning all the basic functions of Microsoft Office Live and building websites with office Live. Site24x7 has been featured in the 'Maintaining your Website' section of the book. The author recommends Site24x7 as the best automated service for monitoring web site downtime. He mentions about our free account which can monitor 2 websites for free, and also about the paid versions for more savvy users.

You can read a preview of this book from Google books and also order it from Amazon Thanks for recommending Site24x7, Rahul!
Firefox users might be interested in this. Here is a simple trick that lets you view your Site24x7 downtime and trouble notifications within Firefox itself, in the Firefox sidebar.

Site24×7 alert in Firefox sidebar

Prerequisite: You have to subscribe to Site24x7 RSS feed in Google Reader Below are the steps to view your alerts in Firefox sidebar:
  1. Subscribe for a RSS alert from Site24x7. Login to your Site24x7 account, navigate to 'Alerts' tab and select the RSS link.
  2. Login to your Google Reader account and subscribe to the Site24x7 feed.
  3. Open the Bookmarks manager in Firefox browser (Bookmarks/Organize bookmarks)
  4. If you are using Firefox 2, select 'New Bookmark' in the 'File' menu. If using Firefox 3, select 'Organize' in the toolbar and select 'New Bookmark'.
  5. Enter a name, http://www.google.com/reader/i/ for location, and ensure to check Load this bookmark in the sidebar.
  6. Press Save Changes. Now you can select the bookmark to load Google Reader in the Firefox sidebar and view your Site24x7 alerts.
This trick was partly inspired by this post in MozillaLinks. Stay tuned for similar tricks in our future posts!