|
From: | Daniel P. Berrange |
Subject: | [Watt-dev] A cross-language spec for WATT |
Date: | Sun, 4 Sep 2005 15:36:52 +0100 |
User-agent: | Mutt/1.3.28i |
There now follows modest proposal expanding the scope of WATT to provide a general specification for application tracing toolkits, across any language. There are some attached files giving examples of the file formats, and a couple of diagrams. Finally I've also attached a proof-of-concept implementation for Perl. Regards, Dan. A cross-language developer support model ======================================== Introduction ============ Experiance working on a variety of different systems, from web servers, to application servers, to RPC services has underlined a common need for a 'developer support' capability to aid in understanding, tuning & debugging of the system. Such a capability would instrument a system recording information about 'interesting' operations being performed. If database statements were the operation being instrumented, one use for the information would be to identify transactions which run for an abnormally long time, or particular statements which are slow to execute. The concept of an operation can be easily extended to cover any interaction with external systems, such as remote procedure calls, or interactions with messaging services like IBM MQ. Collection of data from the system is only one part of the service. Equally, if not more important, is the presentation and analysis of the data. For developers of a web application, live pages may be desired to provide just in time view of the data collected. For a production support team, low detail, but long term aggregation of operation statistics may be desired to identify potential trouble spots, or abnormal runtime behaviour. For testers operating an integration test harness, reports on the data may be desired to provide a qualitative view on the system wrt to a previous baseline. While, by neccessity, the process for extracting & persisting the data must be different for every programming language, it should be clear that there is potential for common presentation and analysis tools. Thus it is the suggestion of this document, that a specification be written to define the structure & operation of a developer support system capable of being implemented in any programming language. A specification would encompass - Object model - define the structure & properties of data to be captured & represented in the system - Storage mechanism - define common format for representing and persisting the data on disk & in a DB. - API - define a set of APIs provided to applications for supplying stats data, and for later retrieving it. - Configuration - a set of parameters available to configure the system, in particular the storage backends. The remainder of this document, contains an working draft specification Objects model ============= Stage ----- A logical section of code being instrumented. Stages can be nested to arbitrary depth, and have operations associated with them. - name. string (20 chars) - start time. milliseconds (8 bytes) - end time. milliseconds (8 bytes) - n * stage - n * operation Process ------- A specialization of a stage representing the top level section of code being instrumented. It is further specialized for different types of process, eg HTTP requests, scripts, workflows. A process has an overall pass/fail result which is independant of that of nested transactions. If the result was a failure, an exception trace may also be associated with the process. A set of context attributes allow for arbitrary (key, value) pairs to be associated with the process. These are intended to contain application domain specific data for later use in reporting & display aggregation. The host & pid attributes identify the machine on which the process executed, and the OS process/thread id. - host. string (20 chars) - pid. integer (4 bytes) - result. 'pass' or 'fail' (1 byte) - 0/1 * exception - n * context attribute HTTPProcess ----------- A specialization of a process representing a method which services an HTTP request. This adds in a record of the URL path requested, the URL parameters, the remote client's IP address and port, request method (POST/GET, etc) and any cookies set. - URLpath. string (250 chars) - method. string (8 chars) - sourceIP. integer (4 bytes) - sourcePort. integer (2 bytes) - n * cookies - n * URL parameters Script ------ A specialization of a process representing a command line script running a job. This adds in a record of the file path of the script, any command line parameters it was supplied, and the user & group ids of the party invoking the script. - filePath. string (250 chars) - n * arguments - uid. integer (2 bytes) - gid. integer (2 bytes) Transaction ----------- A specialization of a stage representing additional data related to a transactional service, be it a regular database transaction, or an 'XA' distributed transaction. The completion time records the amount of time required to perform the commit operation (particularly relevant for XA transactions which require co-ordination between multiple external systems). A transaction also has a status designating whether it was succesfully commited, or aborted. If the transaction aborted, then an exception may be associated with it. This exception may or may not be the same as an exception associated with the last operation to executed. - completionTime. milliseconds (8 bytes) - status. 'comit' or 'abort' (1 byte) - 0/1 * exception Context attribute ----------------- A context attribute records a piece of metadata about a process being executed. Perhaps the application or script name, or the HTTP request URL, or the workflow name. This data is application domain specific and not particular intepretation is placed on the values. They may be used for data aggregration when viewing recorded stats, or running reports. - key. string (50 chars) - value. string (50 chars) Operation --------- An action being instrumented, perhaps a statement issued against a database, an interaction with a message queue, or a remote procedure call. The start and end time record the duration of execution of the operation. If the operation failed, then an exception may be associated with it. An optional stack trace may be associated with all operations independant of the exception to enable identification of the application code which triggered the operation. - start time. milliseconds (8 bytes) - end time. milliseconds (8 bytes) - 0/1 * exception - 0/1 * stack trace Database Operation ------------------ A specialization of an operation representing data related to a database statement, be it a SQL SELECT, DELETE, UPDATE, INSERT or any other type. cf Java JDBC, Perl DBI. The statment text contains the RAW SQL statement. If any bound variables are used, they location will be represented by a '?' placeholder in the statement text, and the values associated with the operation. The row count designates either the number of rows affected by the DELETE/UPDATE/INSERT statemnt, or the number of rows returned by the SELECT statement. If the row count is not available, then it is recorded as -1. If the statement resulted in an error, then the SQL error code is also recorded. - statement text. (~50-500 characters) - row count. (8 bytes) - error code (8 bytes) - n * bind variables (~1 kb ?) Messaging Operation ------------------- A specialization of an operation representing data related to a messaging service operation, such as IBM MQ, or an equivalent. The raw message text is recorded, along with the name of the queue against which the operation was performed. Finally, an action attribute records whether the operation was a message get, or put. - message text (1-5 kb) - queue name (200 chars) - action. 'get' or 'put' (1 byte) RPC Operation ------------- A specialization of an operation representing data related to a remote procedure call. cf DBus, XML-RPC, CORBA. The object name is a unique identifier of the object associated with the operation. For method calls, the member name is the name of the method called, an action is set to 'call'. For signal handling callbacks, the member refers to the signal name, and action is set to 'signal'. If the method or signal had any input parameters, these are also associated with the operation. - object name (200 chars) - member name (20 chars) - action. 'call' or 'signal' - n * parameters (~1 kb ?) Stack Trace ----------- A stack trace is associated with an operation, identifying the code generating the operation. A stack trace is simply a list of stack frames, with the first element being the frame at which the operation was sent WATT, and successive elements being further up the stack, until the last element recording the top level entry point of the process. - n * stack frame Exception --------- An exception is a specialization of a stack trace, represented a failure condition. Exceptions can optionally reference a cause exception, allowing recursive wrapping of exceptions. An exception will also have a data type, along with a text based message describing the problem. - type (100 chars) - message (200 chars) - 0/1 * exception Stack Frame ----------- A stack frame represents a location in the process being instrumented. In OO languages, the object name records the object on which the method was invoked. The method name is compulsory for all languages. If debug info is available, then the filename and line number of the specify the exact location of execution within the method. - object name (100 chars) - method name (20 chars) - file name (200 chars) - line number (4 bytes) Configuration ============= Depending on the circumstances of the deployment, it can be desirable to record different levels of detail. In production systems, to avoid unneccessary overhead, only aggregated stats on the number of operations and average execution time may be recorded. During testing, or investigation of abnormal system behaviour, the logging level may be increased to also record the details of individual operations. Finally during development or serious debugging sessions, the all possible data right down to a stack trace for every operation may be required. The suggested set of attributes which may be enabled or disabled are: - summary - aggregated counts of operations & execution time - operations - details of individual operations - stages - break down of stages within a process - exceptions - stack traces & errors associated with operations - stack traces - complete stack traces against all operations A basic set of logging levels would thus comprise - off - no information collected - summary - only summary information - detail - summary, operations, stages & exceptions - complete - all information Storage mechansim ================= Potential scenarios in which the system might be deployed, vary from low volume batch job scripts, to high volume messaging processing systems. There could conceivably be anywhere between 100, and 100,000, or more proceses executed in a single day. The compound size of a process record could easily vary from a few kilobytes, to as much as a megabyte. It is thus not outside the realm of possiblity that 100 GB of raw data be generated in a single day. The cummulative data volume at any point in time will be determined by three key factors, the rate of data production, the level of recording detail, and the minimum data retention time. The data volume is not, however, the only consideration in choosing a storage mechanism. Equally important can be the ease of access to the data for bulk analysis & reporting, the form of data persistence (if any), and the overhead of persisting the data. There are three immediately obvious choices for data storage - Memory - accumulate records in a static list in system memory. This is fast & convenient to access, but limited in capacity and has no long term persistence - File - persist records out to disk. This is reasonably fast, with appropriate file structure, efficient to access records, but is not easy to perform bulk analysis - Database - persist records to a fully normalized database. This is high overhead on insert, and fairly space inefficient, but lends itself very well to bulk data analysis Memory Storage -------------- This is simply a first-in, first-out bounded list, with one entry per process. The data is not persisted, nor is it aggregated across multiple processes. The storage mechanism is specific to the implementation and no attempt is made to have shared storage. File Storage ------------ This is the primary persistent storage mechanism, providing shared access to multiple processes, with low persistence overhead. The data is split across multiple files, with the structure designed to avoid contention for files between multiple threads. The first level of the directory structure is named according to the hostname, thus separating data from different web servers into independant trees. The second level is split into 'n' buckets. When storing stats for a process, the bucket is chosen pseudo-randomly, for example by taking modulus of the UNIX PID wrt to 'n'. ie, getpid % n. The choice of value for 'n' is thus a factor of the ratio between the time for a single process, and the time to store the process's stats. If a web server has a max clients setting of 500, and a single request typically takes x10 as long storing the stats, then a minimum bucket size of (500 / 10) is a good starting point (assuming requests are staggered - ie not all 500 completing at once). At this point, the structure looks like: ROOT | +- web1.example.copm | | | +- 0 | +- 1 | +- 2 | ... | ... | +- n | +- web2.example.copm | +- 0 +- 1 +- 2 ... ... +- n Within each bucket, a ring buffer storage mechanism is used. Each bucket is intended to store 't' sets of stats, so, when the 't'th entry is created, it returns an overwrites the 1st entry again. Each entry is stored as an XML file, and two avoid placing too many files in a single directory, a two level dir structure is used, with 'm' subuckets. So, the name of the first level directory is 't' / 'm', while the name of the file within is 't' % 'm'. Each sub-bucket also contains an 'index.txt' file, which contains the aggregrated stats for all transactions within the bucket. In the top level of a bucket, there are two further files, the 'lock' file is simply an empty file used with fcntl() call to aquire an exclusive lock on the bucket while writing stats. The 'key' file contains a single line, specifiying the number of the next entry in the ring buffer to create. This file is updated each time a stats entry is written, and when this it hits 't', then it is reset to 0. So layout of stats files within a bucket looks like BUCKET | +- lock +- key | +- 0 | | | +- index.txt | +- detail-0.xml - entry 0 | +- detail-0.xml - entry 1 | ... | +- detail-m.xml - entry m - 1 | +- 1 | | | +- index.txt | +- detail-0.xml - entry m | +- detail-1.xml - entry m + 1 | +- ... | +- detail-m.xml - entry m + m - 1 +- ... ... ... +- t/m Index files ----------- The index.txt file within each sub-bucket contains one line per detail file. Each line contains a number of fields giving basic numeric data about the process, each field separated by a ':'. The fields recorded are: - id of the corresponding 'detail-n.xml' file - name of the host executing the process - process/thread id of the generating process - start time of the process - end time of the process - total time for all transaction commits - total time for all transaction aborts - total number of transaction commits - total number of transaction aborts - total time spent on db SELECT statements - total time spent on db UPDATE statements - total time spent on db INSERT statements - total time spent on db DELETE statements - total time spent on db OTHER statements - total number of db SELECT statements - total number of db UPDATE statements - total number of db INSERT statements - total number of db DELETE statements - total number of db OTHER statements - total time spent on messaging GET actions - total time spent on messaging PUT actions - total number of messaging GET actions - total number of messaging PUT actions This is the core set of required data fields. An particular implementation of the stats system, may choose to designate a number of context attributes from the process object to also be stored in the index file. This is to allow easy presentation of aggregrated stats. For example, if a web site was split into 5 distinct functional areas (chat, forums, CMS, portals & admin), the 'application name' context attribute may be defined to be indexed. This enables the system to efficient provided a per application aggregated view. The set of context attributes saved in the index file are referred to as the 'indexed context attributes', and their values will be appended to each line. If the value of a context attribute includes a ':', this stuffed as '::'. Detail files ------------ The detail files are stored in XML format, with all elements under the 'watt:' prefix which has an associated XML namespace of 'http://www.nongnu.org/watt/xmlns/detail/1.0'. Each object described in the earlier object model section translates directly into an XML tag with matching name. So a process object is stored as a 'watt:process' tag. All scalar data properties for an object are to be stored as simple attributes on the tag, while list properties are stored as sub-elements. -- |=- GPG key: http://www.berrange.com/~dan/gpgkey.txt -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- address@hidden - Daniel Berrange - address@hidden -=|
inheritance.png
Description: PNG image
objects.png
Description: PNG image
index.txt
Description: Text document
detail.xml
Description: application/xml
Devel-WATT-0.0.1.tar.gz
Description: Binary data
pgpxWiZN1MGRo.pgp
Description: PGP signature
[Prev in Thread] | Current Thread | [Next in Thread] |