Tools for the management of log data

Why include logs?

Yes, it is a lot of data. Yes, it is coming from a lot of sources. Most of what you will be working with will be log and app server data, but there is no reason why these same tools cannot be used for database logs, telnet logs, etc...

Microsoft LogParser

Microsoft Logparser is a swiss army knife for a piece of wood, where the piece of wood is your log. It uses a quite expressive SQL grammar to query files. I use this tool when I am operating on single logs or multiple logs of the same format where I need common information out of each log, such as examining the logs of all web servers behind a load balancer for summary information which can tell me about the configuration of the load balancer as well as information about the configuration of the web server and the content distribution network (CDN) management. It works quite well.

Support for a wide variety of web file formats is included by default and some graphical visualization is available. It is also one of the highest value offerings from Microsoft with the pricetag of $0.00 USD. You simply cannot beat that. This is my go to tool for looking at data from individual server logs in a performance test

Now come the cons on logparser

You can use the tool on a scheduled basis with batch files, but it is really designed to be a manual tool. Unless you augment the tool with a database where you import the information you will be out of luck for long term trend information. IT also only works on one file at a time. You cannot relate two or more files together for a complex view of the data. It's extensibility for new or custom log formats is low requiring all sorts of custom coding to handle distinct log file formats.

I have been known to combine the output of logparser with LaTeX for all sorts of custom reporting and graphs, even so far as to augment the default graphs of the LoadRunner web report. If you need some reporting magic sauce like this then drop me a note here and I will send you a whole set of links for the reporting tool chain.

You can go here to download a copy of logparser for your use.

Splunk and Splunk Storm

You can probably tell, I am really fond of Microsoft Logparser, but there are some things that it just does not do very well. The trend information is clearly one of those items, along with data visualization over time. Splunk />Another one that Logparser does not allow for at all is relating additional files to your log for more detailed analysis. There are also the extensibility issues and tracking common items across multiple log formats. For this I turn to Splunk.

For those of you who are unfamiliar with Splunk, think of it as a server based solution for log parsing and display of information in chart and graph form which really addresses all of the weaknesses of Logparser. If Logparser was the cool nerdy kid with the calculator on the beach, Splunk would be same kid after graduating from MIT and spending a couple of years working with Charles Atlas. Sure, some common elements but a world apart in capability.

Here is why Splunk is my go to solution for "larger" needs

  • Extensible log formats. I can educate Splunk with information about my log format and do not need to incorporate cumbersome parsing models into my queries to collect information from the logs. This makes my queries easier to support and debug for both web and non-web custom logging
  • Automated log collection and data presentation. This is particularly useful when incorporating performance engineering practices across the lifecycle. I can look at the unit testing graphs and observe changes in build to build performance or I can look at production data to observe how the behavior is changing in my user population. And I can collect information over time to allow for visualization of trend information on improving and degrading performance
  • I can relate the information in my logs to other information for richer display of data. For instance, what if I want to display performance of the shopping cart initialization across the united states and the world? I need to relate my application server log data to a set of information on geographic location of IP addresses. Similarly, what if I want to look at performance information for truly mobile users, ones whose IP addresses match up to mobile carriers around the world and not just the ones which are connected via WiFi. With Splunk I can do that, excluding mobile users who are not connected via mobile from my performance analysis efforts.
  • Built in reporting and notification in different formats
  • Easy tracking of common data, such as a user ID, across logs for multiple types of servers. This way I can tell where the business process failed and with timestamp data find out where the bottleneck is present, even without the overhead of deep diagnostics in place

Splunk comes in two forms, local and hosted (Splunk Storm!). I would encourage you to take a look at one or the other if you have the same analytical needs as me.

Download Splunk here

Sign up for Splunk Storm here

Why Include logs?