[Watt-dev] A cross-language spec for WATT

watt-dev
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Watt-dev] A cross-language spec for WATT

From:	Daniel P. Berrange
Subject:	[Watt-dev] A cross-language spec for WATT
Date:	Sun, 4 Sep 2005 15:36:52 +0100
User-agent:	Mutt/1.3.28i
There now follows modest proposal expanding the scope of WATT to
provide a general specification for application tracing toolkits, 
across any language. There are some attached files giving examples
of the file formats, and a couple of diagrams. Finally I've also
attached a proof-of-concept implementation for Perl.

Regards,
Dan.

   A cross-language developer support model
   ========================================

Introduction
============

Experiance working on a variety of different systems, from web
servers, to application servers, to RPC services has underlined
a common need for a 'developer support' capability to aid in 
understanding, tuning & debugging of the system. Such a capability 
would instrument a system recording information about 'interesting' 
operations being performed. If database statements were the operation
being instrumented, one use for the information would be to identify
transactions which run for an abnormally long time, or particular
statements which are slow to execute. The concept of an operation
can be easily extended to cover any interaction with external systems,
such as remote procedure calls, or interactions with messaging services
like IBM MQ.

Collection of data from the system is only one part of the service.
Equally, if not more important, is the presentation and analysis of
the data. For developers of a web application, live pages may be 
desired to provide just in time view of the data collected. For a
production support team, low detail, but long term aggregation of
operation statistics may be desired to identify potential trouble
spots, or abnormal runtime behaviour. For testers operating an
integration test harness, reports on the data may be desired to
provide a qualitative view on the system wrt to a previous baseline.
While, by neccessity, the process for extracting & persisting the
data must be different for every programming language, it should
be clear that there is potential for common presentation and
analysis tools. 

Thus it is the suggestion of this document, that a specification be 
written to define the structure & operation of a developer support 
system capable of being implemented in any programming language. A 
specification would encompass

 - Object model  - define the structure & properties of data to be 
                   captured & represented in the system
 - Storage mechanism - define common format for representing and
                       persisting the data on disk & in a DB.
 - API - define a set of APIs provided to applications for supplying
         stats data, and for later retrieving it.
 - Configuration - a set of parameters available to configure the
                   system, in particular the storage backends.

The remainder of this document, contains an working draft specification

Objects model
=============

Stage
-----

A logical section of code being instrumented. Stages can
be nested to arbitrary depth, and have operations associated
with them.

 - name. string (20 chars)
 - start time. milliseconds (8 bytes)
 - end time. milliseconds (8 bytes)
 - n * stage
 - n * operation

Process
-------

A specialization of a stage representing the top level
section of code being instrumented. It is further specialized
for different types of process, eg HTTP requests, scripts,
workflows. A process has an overall pass/fail result which
is independant of that of nested transactions. If the result
was a failure, an exception trace may also be associated with
the process. A set of context attributes allow for arbitrary 
(key, value) pairs to be associated with the process. These are 
intended to contain application domain specific data for later 
use in reporting & display aggregation. The host & pid attributes
identify the machine on which the process executed, and the OS
process/thread id.

 - host. string (20 chars)
 - pid. integer (4 bytes)
 - result. 'pass' or 'fail' (1 byte)
 - 0/1 * exception
 - n * context attribute

HTTPProcess
-----------

A specialization of a process representing a method which
services an HTTP request. This adds in a record of the URL
path requested, the URL parameters, the remote client's IP 
address and port, request method (POST/GET, etc) and any
cookies set.

 - URLpath. string (250 chars)
 - method. string (8 chars)
 - sourceIP. integer (4 bytes)
 - sourcePort. integer (2 bytes)
 - n * cookies
 - n * URL parameters

Script
------

A specialization of a process representing a command line
script running a job. This adds in a record of the file
path of the script, any command line parameters it
was supplied, and the user & group ids of the party invoking
the script.

 - filePath. string (250 chars)
 - n * arguments
 - uid. integer (2 bytes)
 - gid. integer (2 bytes)

Transaction
-----------

A specialization of a stage representing additional data
related to a transactional service, be it a regular database
transaction, or an 'XA' distributed transaction. The 
completion time records the amount of time required to perform
the commit operation (particularly relevant for XA transactions
which require co-ordination between multiple external systems).
A transaction also has a status designating whether it was
succesfully commited, or aborted. If the transaction aborted,
then an exception may be associated with it. This exception may
or may not be the same as an exception associated with the last
operation to executed.

 - completionTime. milliseconds (8 bytes)
 - status. 'comit' or 'abort' (1 byte)
 - 0/1 * exception

Context attribute
-----------------

A context attribute records a piece of metadata about a
process being executed. Perhaps the application or script
name, or the HTTP request URL, or the workflow name. This
data is application domain specific and not particular
intepretation is placed on the values. They may be used for
data aggregration when viewing recorded stats, or running
reports.

 - key. string (50 chars)
 - value. string (50 chars)

Operation
---------

An action being instrumented, perhaps a statement issued
against a database, an interaction with a message queue,
or a remote procedure call. The start and end time record
the duration of execution of the operation. If the operation
failed, then an exception may be associated with it. An 
optional stack trace may be associated with all operations
independant of the exception to enable identification of
the application code which triggered the operation.

 - start time. milliseconds (8 bytes)
 - end time. milliseconds (8 bytes)
 - 0/1 * exception
 - 0/1 * stack trace

Database Operation
------------------

A specialization of an operation representing data related
to a database statement, be it a SQL SELECT, DELETE, UPDATE,
INSERT or any other type. cf Java JDBC, Perl DBI. The statment
text contains the RAW SQL statement. If any bound variables 
are used, they location will be represented by a '?' placeholder
in the statement text, and the values associated with the
operation. The row count designates either the number of rows
affected by the DELETE/UPDATE/INSERT statemnt, or the number of
rows returned by the SELECT statement. If the row count is not
available, then it is recorded as -1. If the statement resulted
in an error, then the SQL error code is also recorded.

 - statement text. (~50-500 characters)
 - row count. (8 bytes)
 - error code (8 bytes)
 - n * bind variables (~1 kb ?)

Messaging Operation
-------------------

A specialization of an operation representing data related
to a messaging service operation, such as IBM MQ, or an
equivalent. The raw message text is recorded, along with
the name of the queue against which the operation was performed.
Finally, an action attribute records whether the operation was
a message get, or put.

 - message text (1-5 kb)
 - queue name (200 chars)
 - action. 'get' or 'put' (1 byte)

RPC Operation
-------------

A specialization of an operation representing data related
to a remote procedure call. cf DBus, XML-RPC, CORBA. The 
object name is a unique identifier of the object associated
with the operation. For method calls, the member name is the
name of the method called, an action is set to 'call'. For 
signal handling callbacks, the member refers to the signal
name, and action is set to 'signal'. If the method or signal
had any input parameters, these are also associated with the
operation.

 - object name (200 chars)
 - member name (20 chars)
 - action. 'call' or 'signal'
 - n * parameters (~1 kb ?)


Stack Trace
-----------

A stack trace is associated with an operation, identifying
the code generating the operation. A stack trace is simply
a list of stack frames, with the first element being the frame
at which the operation was sent WATT, and successive elements
being further up the stack, until the last element recording
the top level entry point of the process.

 - n * stack frame

Exception
---------

An exception is a specialization of a stack trace, represented
a failure condition. Exceptions can optionally reference a
cause exception, allowing recursive wrapping of exceptions. An
exception will also have a data type, along with a text based
message describing the problem.

 - type (100 chars)
 - message (200 chars)
 - 0/1 * exception

Stack Frame
-----------

A stack frame represents a location in the process being
instrumented. In OO languages, the object name records the 
object on which the method was invoked. The method name is
compulsory for all languages. If debug info is available, 
then the filename and line number of the specify the exact
location of execution within the method.

 - object name (100 chars)
 - method name (20 chars)
 - file name (200 chars)
 - line number (4 bytes)


Configuration
=============

Depending on the circumstances of the deployment, it can be
desirable to record different levels of detail. In production
systems, to avoid unneccessary overhead, only aggregated stats 
on the number of operations and average execution time may be
recorded. During testing, or investigation of abnormal system
behaviour, the logging level may be increased to also record
the details of individual operations. Finally during development
or serious debugging sessions, the all possible data right down
to a stack trace for every operation may be required. The 
suggested set of attributes which may be enabled or disabled
are:

  - summary - aggregated counts of operations & execution time
  - operations - details of individual operations
  - stages - break down of stages within a process
  - exceptions - stack traces & errors associated with operations
  - stack traces - complete stack traces against all operations

A basic set of logging levels would thus comprise

  - off  - no information collected
  - summary - only summary information
  - detail - summary, operations, stages & exceptions
  - complete - all information

Storage mechansim
=================

Potential scenarios in which the system might be deployed, vary
from low volume batch job scripts, to high volume messaging
processing systems. There could conceivably be anywhere between
100, and 100,000, or more proceses executed in a single day. The 
compound size of a process record could easily vary from a few 
kilobytes, to as much as a megabyte. It is thus not outside the 
realm of possiblity that 100 GB of raw data be generated in a
single day. The cummulative data volume at any point in time will
be determined by three key factors, the rate of data production,
the level of recording detail, and the minimum data retention 
time.

The data volume is not, however, the only consideration in choosing 
a storage mechanism. Equally important can be the ease of access to 
the data for bulk analysis & reporting, the form of data persistence 
(if any), and the overhead of persisting the data. 

There are three immediately obvious choices for data storage

 - Memory - accumulate records in a static list in system memory.
            This is fast & convenient to access, but limited in
            capacity and has no long term persistence

 - File   - persist records out to disk. This is reasonably fast,
            with appropriate file structure, efficient to access
            records, but is not easy to perform bulk analysis

 - Database - persist records to a fully normalized database. 
              This is high overhead on insert, and fairly space
              inefficient, but lends itself very well to bulk
              data analysis

Memory Storage
--------------

This is simply a first-in, first-out bounded list, with one
entry per process. The data is not persisted, nor is it
aggregated across multiple processes. The storage mechanism
is specific to the implementation and no attempt is made to
have shared storage.


File Storage
------------

This is the primary persistent storage mechanism, providing
shared access to multiple processes, with low persistence
overhead. The data is split across multiple files, with the
structure designed to avoid contention for files between multiple
threads. The first level of the directory structure is named
according to the hostname, thus separating data from different
web servers into independant trees. The second level is split
into 'n' buckets. When storing stats for a process, the bucket
is chosen pseudo-randomly, for example by taking modulus of
the UNIX PID wrt to 'n'.  ie, getpid % n. The choice of value
for 'n' is thus a factor of the ratio between the time for a
single process, and the time to store the process's stats. If
a web server has a max clients setting of 500, and a single
request typically takes x10 as long storing the stats, then
a minimum bucket size of (500 / 10) is a good starting point
(assuming requests are staggered - ie not all 500 completing
at once). At this point, the structure looks like:

 ROOT
  |
  +- web1.example.copm
  |   |
  |   +- 0
  |   +- 1
  |   +- 2
  |   ...
  |   ...
  |   +- n
  |
  +- web2.example.copm
      |
      +- 0
      +- 1
      +- 2
      ...
      ...
      +- n

Within each bucket, a ring buffer storage mechanism is used. 
Each bucket is intended to store 't' sets of stats, so, when 
the 't'th entry is created, it returns an overwrites the 1st 
entry again. Each entry is stored as an XML file, and two avoid
placing too many files in a single directory, a two level dir
structure is used, with 'm' subuckets. So, the name of the first
level directory is 't' / 'm', while the name of the file within
is 't' % 'm'. Each sub-bucket also contains an 'index.txt' file,
which contains the aggregrated stats for all transactions within
the bucket. 

In the top level of a bucket, there are two further files, the 'lock'
file is simply an empty file used with fcntl() call to aquire
an exclusive lock on the bucket while writing stats. The 'key'
file contains a single line, specifiying the number of the next
entry in the ring buffer to create. This file is updated each
time a stats entry is written, and when this it hits 't', then
it is reset to 0. 

So layout of stats files within a bucket looks like

  BUCKET
   |
   +- lock
   +- key
   |
   +- 0
   |  |
   |  +- index.txt
   |  +- detail-0.xml    - entry 0
   |  +- detail-0.xml    - entry 1
   |  ...
   |  +- detail-m.xml    - entry m - 1
   |
   +- 1
   |  |
   |  +- index.txt
   |  +- detail-0.xml    - entry m
   |  +- detail-1.xml    - entry m + 1
   |  +- ...
   |  +- detail-m.xml    - entry m + m - 1
   +- ...
   ...
   ...
   +- t/m

Index files
-----------

The index.txt file within each sub-bucket contains one line
per detail file. Each line contains a number of fields giving
basic numeric data about the process, each field separated
by a ':'. The fields recorded are:

  - id of the corresponding 'detail-n.xml' file
  - name of the host executing the process
  - process/thread id of the generating process
  - start time of the process
  - end time of the process
  - total time for all transaction commits
  - total time for all transaction aborts
  - total number of transaction commits
  - total number of transaction aborts
  - total time spent on db SELECT statements
  - total time spent on db UPDATE statements
  - total time spent on db INSERT statements
  - total time spent on db DELETE statements
  - total time spent on db OTHER statements
  - total number of db SELECT statements
  - total number of db UPDATE statements
  - total number of db INSERT statements
  - total number of db DELETE statements
  - total number of db OTHER statements
  - total time spent on messaging GET actions
  - total time spent on messaging PUT actions
  - total number of messaging GET actions
  - total number of messaging PUT actions

This is the core set of required data fields. An particular
implementation of the stats system, may choose to designate
a number of context attributes from the process object to also
be stored in the index file. This is to allow easy presentation 
of aggregrated stats. For example, if a web site was split into 
5 distinct functional areas (chat, forums, CMS, portals & admin), 
the 'application name' context attribute may be defined to be 
indexed. This enables the system to efficient provided a per
application aggregated view. The set of context attributes saved
in the index file are referred to as the 'indexed context attributes',
and their values will be appended to each line. If the value of a
context attribute includes a ':', this stuffed as '::'.

Detail files
------------

The detail files are stored in XML format, with all elements
under the 'watt:' prefix which has an associated XML namespace
of 'http://www.nongnu.org/watt/xmlns/detail/1.0'. Each object
described in the earlier object model section translates directly
into an XML tag with matching name. So a process object is 
stored as a 'watt:process' tag. All scalar data properties for 
an object are to be stored as simple attributes on the tag, while
list properties are stored as sub-elements.


-- 
|=-            GPG key: http://www.berrange.com/~dan/gpgkey.txt       -=|
|=-       Perl modules: http://search.cpan.org/~danberr/              -=|
|=-           Projects: http://freshmeat.net/~danielpb/               -=|
|=-   address@hidden  -  Daniel Berrange  -  address@hidden    -=|
inheritance.png
Description: PNG image
objects.png
Description: PNG image
index.txt
Description: Text document
detail.xml
Description: application/xml
Devel-WATT-0.0.1.tar.gz
Description: Binary data
pgpxWiZN1MGRo.pgp
Description: PGP signature
[Prev in Thread]
Current Thread
[Next in Thread]
[Watt-dev] A cross-language spec for WATT, Daniel P. Berrange <=
Index(es):
- Date
- Thread