Detect httpd high CPU usage and restart it (CollabNet SVN fix) – Powershell

CollabNet has a bug with LDAP and httpd. After a few days of use the httpd process will take up 100% of the cpu resources on the system and cause SVN to lock up. Restarting this manually is a PITA, and restarting it every N number of minutes is not acceptable. After a bit of thinking I came up with this band-aid to the problem:

In short what this does is monitor a PID name ($PIDname) if it goes over $PIDmaxUsage it will begin the process of restarting the service that pertains to that process. You will notice that I check if the returned value is an array, this is because more than one httpd process can exist you can remove this if needed but it will check through each element of the array and restart the service pertaining to it if its over $PIDmaxUsage. If there isnt an array it will just restart the single process.

Obviously I would rather an actual fix from CollabNet as this seems to be an issue others are also having but at least for now its a band-aid that is working well. Hope that helps!

8 comments… add one
  • Anton

    Where is this script hooked into for it to run?

    • As in what drives it? Its kicked/controlled via windows task scheduler. We run it every 10 min.

  • Anton

    Thanks.  We face exactly the same issue.  Just sharing our band aid solution is to run the following scheduled task every hour (wrapped in a .bat file)

    net stop "CollabNet Subversion Server" &
    net start "CollabNet Subversion Server" &
    net stop "CollabNet Subversion Edge" &
    net start "CollabNet Subversion Edge"

    But yeah – they need to fix it 🙂

     

    • The reason we do not use that method (we used it previously) is that in our case the server will lockup in that hour. We can’t have that down time, so this script runs more often but only runs when its needed. They do need to fix it as it appears to be effecting many many installs.

  • Chris

    FYI – It’s worth noting that the cpu object returned in get-process is a *counter* of CPU time used, not *average*. As your script is written, it will always restart httpd after a certain amount of CPU usage with disregard to how fast it reaches that amount.

    I did some poking around, it should be possible to use something like:
    Get-Counter “\Process(httpd*)\% Processor Time” | select -expand countersamples
    to get an actual *load* value.

    • Im not sure I follow my post does not reference percentages or averages, get-process returns CPU time in seconds. $PIDmaxUsage is set to 90 seconds. Obviously this needs to be tuned to your environment. I found that 90s was the sweet spot for us. If you wanted to use percentages you could do something like this: http://powershell.com/cs/blogs/tips/archive/2013/04/16/documenting-cpu-load-for-running-processes.aspx

      • Chris

        It seems like, if you’re using a counter, you’d be restarting HTTPD even when it’s not wedged. If you went with a CPU load check (your link is a good example), then you’d only be restarting the service when it’s in the failure state (100% CPU load).

  • irfan

    Some people have had success by adding this directive to their Apache
    configuration:

    LDAPSharedCacheSize 0

    Just edit the C:\csvn\data\conf\httpd.conf file and add that to the
    end. And stop/start the server.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.