Wednesday 20 July 2016

Singleton HttpClient? Beware of this serious behaviour and how to fix it

If you are consuming a Web API in your server-side code (or .NET client-side app), you are very likely to be using an HttpClient.

HttpClient is a very nice and clean implementation that came as part of Web API and replaced its clunky predecessor WebClient (although only in its HTTP functionality, WebClient can do more than just HTTP).

HttpClient is usually meant to be used with more than just a single request. It conveniently allows for default headers to be set and applied to all requests. Also you can plug in a CookieContainer to allow for all sessions.

Now, ironically it also implements IDisposable suggesting a short-lived lifetime and disposing it as soon as you are done with. This lead to several discussions in the community (here from Microsoft Patterns and Practices, Darrel Miller in here and a few references in StackOverflow here) to discuss whether it can be used with longer lifetime and more importantly whether it needs disposal.

Singleton HttpClient matters, especially when it comes to the performance [Dragan Brankovich - Flickr]

HttpClient implements IDisposable only indirectly through HttpMessageHandler and only as a result of in-case not an immediate need - I am not aware of an implementation of HttpMessageHandler that holds unmanaged resources (the mere reason for implementing IDisposable).

In short, the community agreed that it was 100% safe, not only not disposing the HttpClient, but also to use it as Singleton. The main concern was thread safety when making concurrent HTTP calls - and even official documentations said there is no risk doing that.

But it turns out there is a serious issue: DNS changes are NOT honoured and HttpClient (through HttpClientHandler) hogs the connections until socket is closed. Indefinitely. So when does DNS change occur? Everytime you do blue-green deployment (in Azure cloud services when you deploy to staging slot and then swap production/staging slots). Everytime you change settings in your Azure Traffic Manager. Failover scenarios. Internally in a myriad of PaaS offerings.

And this has been going on for more than 2 years without being reported... makes me wonder what kind of applications we build with .NET?

Now if the reason for DNS change is failover, your connection would have been faulted anyway so this time connection would open against the new server. But if this were the blue-black deployment, you swap the staging and production and your calls would still go to the staging environment - a behaviour we had seen but had fixed it by bouncing the dependent servers thinking possibly this was an Azure oddity. What a fool was I - it was there in the code! Whose code? Well debateable...

Analysis

All of this goes back to the implementation in HttpClientHandler that uses HttpWebRequest to make connections none of which code is open sourced. But obviously using Jetbrain’s dotPeek we can look into the decompiled code and see that HttpClientHandler creates a connection group (named with its hashcode) and does not close the connections in the group until getting disposed. This basically means the DNS check never happens as long as a connection is open. This is really terrifying...
protected override void Dispose(bool disposing)
{
    if (disposing && !this.disposed)
    {
        this.disposed = true;
        ServicePointManager.CloseConnectionGroups(this.connectionGroupName);
    }
    base.Dispose(disposing);
}
As you can see, ServicePoint class plays an important role here: controlling number of concurrent connects to a ‘service point/endpoint’ as well as keep-alive behaviours.

Solution

A naive solution would be to dispose the HttpClient (hence the HttpClientHandler) every time you use it. As explained this is not how HttpClient is intended to be used.

Another solution is to set ConnectionClose property of DefaultRequestHeaders on your HttpClient:
var client = new HttpClient();
client.DefaultRequestHeaders.ConnectionClose = true;
This will set the HTTP’s keep-alive header to false so the socket will be closed after a single request. It turns out this can add roughly extra 35ms (with long tails, i.e amplifying outliers) to each of your HTTP calls preventing you to take advantage of benefits of re-using a socket. So what is the solution then?

Well, courtesy of my good friend Andy Jutton of Amido, the solution lies in an obscure feature of the ServicePoint class. Basically, as we said, ServicePoint controls many aspects of TCP connections and one of the properties is ConnectionLeaseTimeout which controls how many milliseconds a TCP socket should be kept open. Its default value is -1 which means connections will be stay open indefinitely… well in real terms, until the server closes the connection or there is a network disruption - or the HttpClientHandler gets disposed as discussed.

So the root cause is basically that with the default value of -1, which is IMHO, wrong and potentially dangerous setting.

Now to fix it, all we need to do is to get hold of the ServicePoint object for the endpoint by passing the URL to it and set the ConnectionLeaseTimeout:
var sp = ServicePointManager.FindServicePoint(new Uri("http://foo.bar/baz/123?a=ab"));
sp.ConnectionLeaseTimeout = 60*1000; // 1 minute
So this is something that you would want to do only at the startup of your application, once and for all endpoints your application is going to hit (if endpoints decided at runtime, you would be setting that at the time of discovery). Bear in mind, path and query strings are ignored and only the host, port and schema are important. Depending on your scenario, values of 1-5 minutes probably make sense.

Conclusion

Using Singleton HttpClient results in your instance not to honour DNS changes which can have serious implications. The solution is to set the ConnectionLeaseTimeout of the ServicePoint object for the endpoint.

34 comments:

  1. Disturbing, especially since ServicePointManager isn't available on all the platforms HttpClient is (UWP/WPSL etc).

    ReplyDelete
  2. If the ServicePointManager is not available then it cannot be used. I downloaded the Microsoft.Net.Http package from NuGet and decompiled both, the net40 and portable versions and the implementations there, although similar, are not the same. The portable version's HttpClientHandler.Dispose() method is essentially empty and does not use the ServicePointManager.

    I don't know what it means though. Their behavior might be different perhaps.

    ReplyDelete
  3. Great post Ali. I was halfway reading this and thought "ooh I better show AJ this" and then the man himself had already given you the solution!

    ReplyDelete
    Replies
    1. Oh yeah, the man fixed it and no one else had found it, I did discuss this with some key figures in the community.

      Delete
  4. Hey Ali,
    Just wanted to add a few more comments to this:
    a) I don't believe that HttpClient should not be disposed. It does allocate Cancellation Tokens and request and response bodies can be unmanaged streams. I just believe that it should only be disposed when your application is finished making HTTP requests.
    b) ServicePointManager doesn't hold connections open indefinitely, despite the default value of ConnectionLeaseTimeout. Inactive connections will be closed after the MaxIdleTime (100 secs default).
    c) ConnectionLeaseTimeout is a strange beast, that I wasn't aware of until Oren mentioned it today. It will cause outgoing requests to include Connection:Close headers after the lease expires, causing the connection to be terminated after the server sends its response. It makes sense why this is set to infinite to default because it is a strange way to force connections to be closed when a client is actively making requests.
    d)If your server side has done a production/staging switch then the ideal way stop any more requests going to the outdated app server is to tell the app server to return connection:close headers in all future requests it receives. That means no-one has to wait for any lease timeout. At most one request will hit the outdated server and that can only occur within the MaxIdleTime. Assuming DNS caching doesn't delay the switch any longer.
    e) I guess this could be implemented in cloud infrastructure. As soon as something become switched to staging, then all returned requests get a connection:close header added. I'll see if I can find someone in Azure AppService who knows this stuff.

    ReplyDelete
    Replies
    1. Hi Darrel, thanks for your points. First of all, an application does not just stop sending HTTP (can you even think of an example of this exceptional case?) - an active application will keep sending HttpClient when it has an active user hence the point that HttpClient lifetime is the same as the application. In a server scenario (our case), our APIs are almost never idle so the client connection would be open for hours and hours if not days and weeks. Production/Staging is only one scenario, there are traffic manager scenarios and many many more outside Microsoft world. Also your suggestion about load balancer sending connection-close will only cover cloud server scenarios while there are many other server scenarios where the server, rightly trusts that the DNS TTL is going to be observed.

      I will look at the Java implementation to try to understand how they have handled it but for now, the only reliable solution is the one recommended in the post.

      Delete
  5. Could you use ServicePointManager.SetTcpKeepAlive(...) to achieve the same goal?

    ReplyDelete
  6. I think there's some confusion here... connections are by default set to last forever. Open HTTP (and underlying TCP connections) will not do a DNS lookup again once opened. You can send multiple HTTP requests over the same connection, but will need to send them over a new connection to get a new DNS lookup. This is standard HTTP behavior.

    If the server becomes unavailable, then the connection will close automatically, resulting in a new connection and DNS lookup. If you just want to cause a DNS lookup yourself then you'll have to force a new connection, and your fix of setting a max-lifetime for the underlying TCP connection is just one way to do that. You can also dispose the entire HttpClient, or find the ServicePoint and dispose/recreate that as well.

    ReplyDelete
    Replies
    1. Good for you if you have known it.

      Yes, it makes sense once you know it. But have you been protecting yourself from this?

      All in all, my view is that this should have been communicated clearly in the docs with ways around it.

      Delete
    2. There's nothing to "protect" from - this is accurate and intended behavior for HTTP connections. DNS is just resolving IP addressing information and won't be checked again if the connection is active - why would it?

      If the DNS changes because of some server changes, then the connection will be broken anyway. If not, and the client continues to be connected, then that's normal and it's up to the client to reset if it wants to.

      Delete
  7. I just see the post i am so happy to the communication science post of information's.So I have really enjoyed and reading your blogs for these posts.Any way I’ll be replay for your great thinks and I hope you post again soon....

    SEO Company in Chennai

    ReplyDelete
  8. How would one resolve this issue in .NET Core? Do we have to wait for .NET Standard 2.0? Thanks.

    ReplyDelete
    Replies
    1. Frankly do not know about .NET core. It is still in flux and have not had the time to look at.

      Delete
    2. https://github.com/dotnet/corefx/issues/11224

      Delete
  9. There is a default 2' DNS resolution validity period imposed by the ServicePointManager, so there should be no reason to fiddle around with timeouts and just making the HttpClient static should suffice.
    #simpleprogrammingmodelsmatter

    ReplyDelete
    Replies
    1. Not sure I get what you say... are you suggesting that it works just out of the box and no reason to use any settings?

      Delete
    2. Indeed, just making it static should suffice, since the only real drawback I learned from this blog, the DNS issue, is covered by this auto 2' DNS refresh.

      Delete
    3. Hi Ali, I think Michel is talking about this: https://msdn.microsoft.com/en-us/library/system.net.servicepointmanager.dnsrefreshtimeout(v=vs.110).aspx

      Delete
    4. Is this the solution to the problem? The Service Point Manager refreshes its DNS lookup every 2 minutes by default?

      Delete
    5. Sadly, I've just checked it, it does not seem to help. Static HttpClient sticks to IP if connection is not forcible closed by either client or server. DNS changes are not picked up.

      Delete
    6. Can you produce a repro Eugene? We solved our problem with doing that. One issue remained and that was when service point being disposed due to inactivity.

      Delete
    7. Hi aliostad, I was actually answering to Michel Liesmons stating that his theory is not correct. The solution you suggest in your article DOES work.

      That's how I reproduced that. I've created local DNS record in hosts file with name "dns.test" that points to site A IP address. Test program repeatedly sends request to "dns.test". After application start I change DNS entry for "dns.test" to another location, say site B. Test program prints response and I can see which site is being contacted (actually status code was enough to distinguish).

      The results: Without trick with configuring ServicePoint after DNS change for "dns.test" I was contacting wrong endpoint for indefinitely long -- until service forcibly closes connection.

      Here is simplest LINQpad snippet for my test program: http://share.linqpad.net/3bxmi4.linq.

      Delete
  10. Hi, Good post, I have a peculiar scenario, even though we are disposing HttpClient still REST Server hangs every couple of hours and we have to reset IIS. Also we see lot of TCP Connections in TIME_WAIT mode made by server to it-self (local and foregin address are same when verified using netstat). Should I set ConnectionLease timeout since we donot need to lookup DNS everytime. Also modified our code to use static HttpClient, still no luck. Server hangs every couple of hours and we see lot of queued up requests in the worker process. Your help is much appriciated!!

    ReplyDelete
    Replies
    1. Please post your question to stackoverflow with a snippet of the code and post the URL here. I will be happy to look into it.

      Delete
    2. Thank you so much !!

      http://stackoverflow.com/questions/43105569/rest-api-server-hangs-on-large-volume-of-requests-using-c-sharp-httpclient

      Delete
    3. By the way also modified our code to use static HttpClient but that also does not seem to work. Also noticed a behaviour. When we make a single REST Api request to the server it is opening 120 TCP Connections all TIME_WAIT state with source ip as REST Server source port between 54000 and 58000 and destination ip as same REST Server and destination port 8523. Is this normal ?

      Delete
    4. Hi Sharma. Please Take a look at this: https://aspnetmonsters.com/2016/08/2016-08-27-httpclientwrong/.

      Delete
  11. I was spending a good amount of time looking for best School entrance essay Aid when I came across this informative blog. It is true that the Http server side should always be checked and tested for security. In addition to that, the client-request conformity tests should be carried out often so as to boost the response rate.

    ReplyDelete
  12. I think you should clarify what setting ConnectionLeaseTimeout actually does. In short, it sets a Connection:close header on next outgoing request at that interval. That's it. Since you mention doing this manually (and how it adds an extra 35ms per request), it seems relevant to mention that this is exactly what ConnectionLeaseTimeout does, except on a timer interval rather than with every request. People generally don't seem to understand this and appear to be lost on what to do if they need to support a platform that doesn't support this (most notably .NET Core). Explaining what it does (and that it's not hard to replicate) gives people a path for supporting this across platforms.

    ReplyDelete
  13. nice Singleton HttpClient? Beware of this serious behaviour and how to fix it blog.
    keep blogging more, thank you for sharing
    Devops Training in Bangalore

    Informatica interview questions

    Artificial Intelligence Training in Bangalore

    ReplyDelete
  14. Thank you for spreading this awareness.
    It was a pleasure reading your article.
    Hope to read more from you.
    For quality courses on anything IT related give myTectra a try.
    myTectra is the Marketing Leader In Banglore Which won Awards on 2015, 2016, 2017 for best training in Bangalore:
    python interview questions

    python online training

    ReplyDelete
  15. Singleton HttpClient?
    THANK YOU FOR THE AWARENESS
    very helpfull blog it was a pleasure reading your blog
    would love to read it more
    knowldege is not found but earned through hardwork and good teaching
    that being said click here to join us the next best thing in bangalore
    devops online training
    Devops Training in Bangalore

    ReplyDelete

Note: only a member of this blog may post a comment.