In these days where everything is moving to "the cloud", new challenges arises that most of us where spared from before; temporary condition such as network connectivity issues or simply a service unavailability. The technical term for this is called Transient Fault Handling and one you must be conversant with.

On MSDN there is a very informative and good background information for Transient Fault Handling which i strongly encourage you to read. While you are there, you could consider if the comprehensive work of the Enterprise Library from version 5.0 suits your needs in what has been named the Transient Fault Handling Application Block.

If it seems a big over the top for your solution, you could consider the static TransientFaultUtility class found in the Cuemon namespace; it is fully compatible with cloud provides such as Windows Azure. It has several overloads for invoking a fault sensitive method, and will continue until the operation is successful, the amount of retry attempts has been reached, or a failed operation is not considered related to a transient fault condition.

The minimum required parameters for invoking a transient fault protected method is an integer specifying retryAttempts, a function delegate that will determine if isTransientFault and last but not least; a function delegate/action delegate pointing to the faultSensitiveMethod.

To see this in action, have a look at Figure 1. What we do here is simply throwing an HttpException should we encounter a HTTP 502. This can easily be extended to the ones listed in the IsTransientFault callback method. Otherwise we just write some debug information. Figure 2 shows how to consume the TransientFaultExample class and is intentionally set to fail in the first run. Figure 3 shows the Debug Trace.


public class TransientFaultExample
{
    public TransientFaultExample()
    {
        NetHttpUtility.DefaultHttpTimeout = TimeSpan.FromSeconds(15);
    }

    public void OpenWebsite(Uri location)
    {
        using (HttpWebResponse response = NetHttpUtility.HttpGet(location))
        {
            if (response.StatusCode == HttpStatusCode.BadGateway) { throw new HttpException(502, response.StatusDescription); }
            Debug.WriteLine("Status code in response was {0} - {1}.", (int)response.StatusCode, response.StatusDescription);
            Debug.WriteLine("The headers of the response was {0}.", ConvertUtility.ToDelimitedString(response.Headers.AllKeys, ", ", HeaderConverter, response.Headers) as object);
        }
    }

    private string HeaderConverter(string header, WebHeaderCollection headers)
    {
        return string.Format("{0}: {1}", header, headers[header] ?? "null");
    }

    public bool IsTransientFault(Exception exception)
    {
        HttpException httpException = exception as HttpException;
        if (httpException != null)
        {
            switch (httpException.GetHttpCode())
            {
                case 404:
                case 408:
                case 410:
                case 500:
                case 502:
                case 503:
                case 504:
                    return true;
                default:
                    return false;
            }
        }
        return (exception.Message.IndexOf("timed out", StringComparison.OrdinalIgnoreCase) >= 0);
    }
}

Figure 1: A simple TransientFaultExample class that can easily be rewritten to more real-life scenarios


[TestClass]
public class TransientFaultExampleTest
{
    [TestMethod]
    public void TransientFault()
    {
        TransientFaultExample transient = new TransientFaultExample();

        try
        {
            TransientFaultUtility.ExecuteAction(5, transient.IsTransientFault, transient.OpenWebsite, new Uri("http://www.google.com:88/"));
        }
        catch (TransientFaultException ex)
        {
            Debug.WriteLine("TransientFaultException was thrown (which is good): {0}", ConvertUtility.ToString(ex, Encoding.Default, true) as object);
        }

        try
        {
            TransientFaultUtility.ExecuteAction(5, transient.IsTransientFault, transient.OpenWebsite, new Uri("http://www.google.com/"));
        }
        catch (TransientFaultException ex)
        {
            Debug.WriteLine("TransientFaultException was thrown (which is not so good - for Google at least): {0}", ConvertUtility.ToString(ex, Encoding.Default, true) as object);
            Assert.Fail();
        }
    }
}

Figure 2: Consumes the class defined in Figure 1



Debug Trace:

TransientFaultException was thrown (which is good): TransientFaultException (Cuemon)
Source:
    Cuemon
Message:
    The amount of retry attempts has been reached.
Data:
    Key: Attempts
    Value: 5
    Key: RecoveryWaitTimeInSeconds
    Value: 21
    Key: TotalRecoveryWaitTimeInSeconds
    Value: 56
InnerException [of TransientFaultException]:
    TimeoutException (System)
Source:
    Cuemon
Message:
    The operation has timed out.

Status code in response was 200 - OK.
The headers of the response was Cache-Control: private, max-age=0, Content-Type: text/html; charset=ISO-8859-1, Date: Thu, 25 Apr 2013 01:09:52 GMT, Expires: -1, Set-Cookie: PREF=ID=fd0ca0865d752f5b:FF=0:TM=1366852192:LM=1366852192:S=KNM_xhwUAUmLa1-f; expires=Sat, 25-Apr-2015 01:09:52 GMT; path=/; domain=.google.dk,NID=67=vYXlkUfWQ_paZ7fdrkXaq2gmgUati-Y3FfiPzpRLQTTWn7lQgWowKZgE53z4_D1G04SmEk0N_4YdaUKC2RZkajhrCZ69QrCRHKBumdejJg4Z2MKak7fUF0QUbL7nKf3F; expires=Fri, 25-Oct-2013 01:09:52 GMT; path=/; domain=.google.dk; HttpOnly, P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info.", Server: gws, X-XSS-Protection: 1; mode=block, X-Frame-Options: SAMEORIGIN, Transfer-Encoding: chunked.
Figure 3: The output of the test in Figure 2

If you don't like the default recovery wait time this can easily be added using one of the overloads on the TransientFaultUtility class. For your convenience, I have included the default implementation in Figure 4.


/// <summary>
/// Specifies the amount of time to wait for a transient fault to recover gracefully before trying a new attempt.
/// </summary>
/// <param name="currentAttempt">The current attempt.</param>
/// <returns>A <see cref="TimeSpan"/> that defines the amount of time to wait for a transient fault to recover gracefully.</returns>
/// <remarks>Default implementation is <see cref="RecoveryWaitTime"/> + 2^ to a maximum of 5; a total of 5 (default) + 32 = 37 seconds.</remarks>
public static TimeSpan RecoveryWaitTime(int currentAttempt)
{
    TimeSpan sleep = DefaultRecoveryWaitTime;
    sleep = sleep.Add(TimeSpan.FromSeconds(Math.Pow(2, currentAttempt > 5 ? 5 : currentAttempt)));
    return sleep;
}

Figure 4: The default implementation of the function delegate recoveryWaitTime



I hope this little introduction have inspired you to master Transient Fault Handling one way or another. Happy coding.