Debugging / Profiler Code in Production

So, debugging, profilers, it’s going to slow down the code! Let’s just have a moment and think about all the other things in the world that have production debuggers and profilers. We’ll cover the pro’s and con’s after:

1. Your car, hopefully has a spare tyre in the boot, along with the jack to hold up the car, and the wrench to remove the wheel nuts. How about the first aid kit or a warning triangle for traffic if you break down?

2. iPhone’s, Samsung Galaxy, and other mobiles phones, all have debugging modes,and recovery modes for firmware flashing. This makes it easy for field engineers or stores to restore firmware or fix issues without you having to send it right back to the manufacturer where they perhaps would instead have to take it apart and replace chips…

3. Manufacturing robots in blue chip companies all have pluggable interfaces for an engineer to come along with a laptop, and listen to all the sensors that are installed around the machine. These sensors allow the machine to monitor itself and predict issues or notify the users of the machines when part’s need attention.

4. Another one for cars – that ECU computer in your engine make it very easy for your local garage to plug a computer in to and diagnose issues around the car, from again, sensors installed at critical points around the vehicle.

5. Windows – Safe mode! There is also the feature when you overload the resources of the system and it starts advising that you should turn off theme support and unnecessary graphical enhancements.

6. Your computer – POST. Yep, it really is self testing. Better yet, you get a diagnostic report if your RAM is faulty or drives are not connected correctly.

The benefits of allowing debugging, performance monitoring or health checks in live systems is incredible. Would you rather spend time on the phone talking through a user experience to understand an issue, or would you rather have your application contact you with a full report on exactly what was going on in it’s environment before things started going wrong? Better yet, what if your application could monitor itself, and give the user some hints on improving or avoiding an upcoming problem?

Consider an event journal from your application. You could use this journal to replay through an automated test harness and arrive at exactly the same point your user began experiencing issues, or found a bug. This replay mechanism could also be used for performance testing changes. If it took 30 seconds to execute a journal of events in your application before a new feature was added, just re-run it and see how much longer it takes.

The visibility you can get from just recording check points in your code for performance monitoring is also very nice. Imagine a single line of code to “check in” with a monitor. This monitor could time all your DB calls, or amount of hits to a service, and warn or adjust itself if things become too chatty or slow.

Imagine actually receiving an email from your application as it immediately begins performing badly perhaps due to data access reasons. You could be working on the issue and have it resolved way before your users are aware and start raising issues with support/help desk – “Hi, we’re experiencing a slowness in the loading of this report, can you help?” – “Absolutely, we’re aware of the problem and the fix is already being applied. This will be resolved in the next few minutes.” – “Great, thanks!” (experienced reactions may vary ;)).

Now what does this cost? Usually, just a tiny bit of code noise. You may spot a line of code at the top of method that says “Checking In”, or a block with a pair of “BeginProfile” and “EndProfile” calls. Usually these calls are time stamping or recording state.

It’s important to realise that this kind of helping code has to be high performing, and not be doing anything risky/heavy. By risky, I mean, throwing data at a database that may go down which then simply starts throwing exceptions, killing your application. For heavy, you really don’t want these helper calls to be causing any delay by waiting for a server to respond or blocking your thread as it sits writing blocks of data to the local disk for example. Keep it simple – don’t get fancy.

This entry was posted in Debugging. Bookmark the permalink.

Leave a comment