DiTrace Documentation¶
Contents¶
Overview¶
DiTrace is a distributed tracing system.
Like the others distributed tracing system as Zipkin or Dapper, DiTrace is an instrument to find the problem zones in distributed systems.
But with different and more simple architecture.
How it works¶
Every time, when one of distributed system’s microservice made a call to another microservice, the data object, called “span” should be created.
Span has an arbitrary number of annotations such as request and response parameters (timestamps, url, response code, etc).
Spans are belong to one trace in hierarchic order with one root span.
Microservices are responsible for:
- Creating traces, spans and sending traceid and spanid within requests between each other.
- Collecting spans and sending it to the DiTrace gate.
Look DiTrace gate API for more details.
DiTrace gate are responsible for:
- Collecting and grouping spans from multiple microservices
- Saving traces to elasticsearch
Elasticsearch has a various stats aggregations for data analysis. UI is an visualization tool for this aggregations and data.
Installation¶
Manual Installation¶
There are following components you need to install before running DiTrace gate and UI:
1. golang version 1.5 or higher 3. elasticsearch version 2.2 4. web server e.g. nginx
Install DiTrace gate¶
export GOPATH=<your gopath>
go get github.com/ditrace/ditrace
Download Web UI Application¶
Configure¶
- Place configuration file to the default location,
/etc/ditrace/config.yml
You can dive into Configuration syntax on a separate page.
- Place nginx configuration file to
/etc/nginx/conf.d/ditrace.conf
# elasticsearch cluster for traces
upstream elastic {
server vm-ditrace1:9200;
server vm-ditrace2:9200;
server vm-ditrace3:9200;
}
server {
listen 0.0.0.0:80;
server_name vm-ditrace1;
location / {
root /var/local/www/ditrace/web/static/;
index index.html;
}
location /elasticsearch/ {
rewrite /elasticsearch/(.*) /$1 break;
proxy_pass http://elastic;
}
}
- Place UI config.json file to
/var/local/www/ditrace/config.json
Run¶
- Run nginx
- Run elasticsearch
- Setup indices template
curl -XPUT http://elasticsearch:9200/_template/traces --data-binary @template.json
- Run ditrace gate
$GOPATH/bin/ditrace --config=/etc/ditrace/config.yml
Configuration¶
By default, DiTrace gate will look for ./config.yml
, but you can change this by command-line parameter
log_dir: stdout
log_level: debug
# Send statistics to graphite
stats:
enable: true
graphite_host: "vm-graphite"
graphite_port: 2003
graphite_prefix: DevOps
http:
enable: true
address: ":8080"
# List replicas of elasticsearch cluster
elasticsearch:
- "http://vm-elastic:9200"
# Sampling can be used to limit part of incoming traces.
# Value of N means that only one of N traces will be written to elastic.
sampling: 1
replicas:
- "http://localhost:8080"
# Number of seconds to wait if trace is completed before write it to elastic,
min_ttl: 10
# Number of seconds to wait if trace is not completed before it will be cleaned out.
max_ttl: 120
profiling:
# https://golang.org/pkg/net/http/pprof/
enable: true
DiTrace gate API¶
Gate accept only the following http request.
-
POST
/spans?system=
(string)¶ System parameters is used if no system annotation in span json
Example request:
POST /spans?system=mysystem HTTP/1.1 Content-Type: application/x-ldjson { "traceId":"c38efe4edb2d4a008af2805ee4e061c1", "spanId":"8256", "timeline":{ "sr":"2015-04-24T09:53:49.5595869Z", "ss":"2015-04-24T09:53:50.5595869Z" }, "annotations":{ "url":"/url?arg1=arg1&arg2=arg2", "host":"hostname", "rqbl":"42", "rsbl":"4200", "targetId":"service-0" } }\r\n { "traceId":"c38efe4edb2d4a008af2805ee4e061c1", "parentSpanId":"8256", "spanId":"904a", "timeline":{ "sr":"2015-04-24T09:53:49.5595869Z", "ss":"2015-04-24T09:53:50.5595869Z" }, "annotations":{ "url":"/url", "host":"hostname", "rqbl":"42", "rsbl":"4200", "targetId":"service-1" } }
Example response:
HTTP/1.1 200 OK
Query Parameters: - system – spans source system name
Status Codes: - 200 OK – no error
- 400 Bad Request – server can’t parse request content or any of required field is missing (system, traceid, spanid)
Important
For better visibility example’s jsons are formatted with additional new lines. Only the new lines that separate jsons should be in real requests.
Span format¶
Field, preferrable format, description
- (required) TraceId, UUID, unique identifier to group multiple spans into one trace
- (required) SpanId, part of UUID, unique identifier of span within one trace
- (required) Annotations, object of arbitrary annotations
- (required) Timeline, object of timestamps annotations
- (optional) ParentSpanId, identifier of parent span
- (optional) ProfileId, UUID, unique identifier to group multiple traces
- (optional) System, string, unique identifier of distributed system
Known annotations¶
UI counts on following annotations
- (required) url string (/path)
- (optional) url_method string (POST, GET, DELETE, etc)
- (required) host string (hostname of target host, taken from URL)
- (required) targetId string (target service unique name)
- (optional) targetHost string (human-readable name of target host, overrides host annotation)
- (optional) srcId string (source service unique name)
- (optional) srcHost string (human-readable name of source host)
- (optional) rc int (response code)
- (optional) rqbl long (request body length)
- (optional) rsbl long (response body length)
- (optional) wrapper empty string (marks span as a wrapper of child spans to override targetid)
- (optional) root empty string (marks span as a root span)
- (optional) revision int (revision number to overwrite old annotations value)
Timeline annotations¶
DiTrace gate and UI are using this timeline annotations to calculate trace and spans durations. All timeline annotations should have RFC3389 datetime format.
- (optional) cs client has sent request
- (optional) cr client has recived response
- (optional) sr server has received request
- (optional) ss server has sent response
Integration¶
Distributed system’s services should have an DiTrace gate API client implementation.
The main requirement for clients is low load impact.
There are following techniques for achieve low impact:
- Sampling
- Ring buffering
- Async sending
Sampling¶
There is no need to trace 100% of requests to get correct statistical results.
You can set sampling to 10% or even lower.
Ring buffering¶
Client should handle unavailability of “DiTrace” gate. In the other hand, collecting of tracing data should not consume too much memory. Using ring buffer with certain limit is good practice to achieve that.
Async sending¶
Sending tracing data to the gate should be performed in async way, e.g. in separate thread.
C# client¶
CSharp client utilize “Logical Call Context” to flow tracing data.
How to use¶
Implement configuration provider
using Kontur.Tracing.Core.Config; public interface IConfigurationProvider { [NotNull] ITracingConfig GetConfig(); }
Init tracing with your configuration provider
using Kontur.Tracing.Core.Config; Trace.Initialize(configProvider);
Create traces
using Kontur.Tracing.Core; using (var rootContext = Trace.CreateRootContext("Processing client request")) { rootContext.RecordTimepoint(Timepoint.Start); rootContext.RecordAnnotation(Annotation.RequestUrl, requestUrl); // ... somewhere deep in your code using (var childContext = Trace.CreateChildContext("Fetching data from database")) { childContext.RecordTimepoint(Timepoint.Start); data = db.Fetch(); childContext.RecordTimepoint(Timepoint.Finish); } // ... rootContext.RecordTimepoint(Timepoint.Finish); }
Continue traces
Assume that service A make a call to service B, so service B should continue tracing.
using Kontur.Tracing.Core; HttpListenerContext context; RequestExtensions.ExtractFromHttpHeaders(context.Request.Headers, out traceId, out contextId, out isActive); using (var serverContext = Trace.ContinueContext(traceId, contextId, isActive ?? false, isRoot: false)) { serverContext.RecordTimepoint(Timepoint.ServerReceive); // ... // Handle service A request // ... serverContext.RecordTimepoint(Timepoint.ServerSend); }
Overview¶
DiTrace is a distributed tracing system.
Like the others distributed tracing system as Zipkin or Dapper, DiTrace is an instrument to find the problem zones in distributed systems.
But with different and more simple architecture.
How it works¶
Every time, when one of distributed system’s microservice made a call to another microservice, the data object, called “span” should be created.
Span has an arbitrary number of annotations such as request and response parameters (timestamps, url, response code, etc).
Spans are belong to one trace in hierarchic order with one root span.
Microservices are responsible for:
- Creating traces, spans and sending traceid and spanid within requests between each other.
- Collecting spans and sending it to the DiTrace gate.
Look DiTrace gate API for more details.
DiTrace gate are responsible for:
- Collecting and grouping spans from multiple microservices
- Saving traces to elasticsearch
Elasticsearch has a various stats aggregations for data analysis. UI is an visualization tool for this aggregations and data.