Linking Charts Demo – Connecting TeeChart for NET to data binding sources

This demo shows how TeeChart’s Charts can be made to work with each other on a Form in a Winform application.

The demo uses the basis of the company data of a small technical products supplier; the company sells electronic based products, small computers, tablets and mobile phones and has markets in some countries around the globe.

Demo code project, written with Visual Studio 2015, can be downloaded here: https://www.steema.com/files/public/teechart/net/LinkingCharts_TChartNET.zip. Version on Github here: https://github.com/Steema/TeeChart-for-.NET-CSharp-WinForms-samples/tree/master/LinkingCharts

Create the project

We create a new Winforms project, and rename the main form to ReportForm.
1. Add objects to Form. We add three charts, a dataGridView and a textbox to the form and position them to take up the full space of the form (see fig. 1).


Fig.1 Designtime in Visual Studio

2. Using the Chart Editor, we have added a dataseries to the charts plus a smoothing function that you can see populated with random data at this stage. Data series can be added at designtime or at runtime.
3. We’ve placed the company sales data in an MS Access file called TechProducts Ltd and placed the file in the LinkingCharts solution folder.

Add the data to the project

4. We’ve added the database to the project, publicising tables and queries (views).
5. Using BindingSource components we connect to the Access database. The database already has prepared queries for some of the data we wish to visualise. For other cases, during runtime execution, we will write some sql queries in the code.

Three BindingSources are added, connecting to the $_€_SalesByYear, $_SalesByCountry_2015 and $_SalesByContinent views.

Connect the Charts to the data

6. To connect to a bindingSource, open up the Chart Editor that you wish to modify and select the Series that has previously been added and go to its Datasource tab. If you click on the Dataset tab you can select the bindingSource to which to connect that Series. In the following selection boxes, you can relate the different source fields to the X and Y series valuelists. See Fig. 2.


Fig.2 Series datasource editor

7. Repeat the connection technique for the three charts. We have connected the left Bar Series Chart to the SalesByContinent data, the right, Pie chart, to the SalesByCountry_2015 data and the lower line chart to the SaleByYear data.

Linking the Charts

8. We now have three charts, data by continent, region (countries with sales, within a continent) and country. Our aim in this demo is to link the three chart so that, by actioning a click on a bar of the continent chart, we can fill the region chart and that by actioning a click on the region chart we can show the sales evolution, over the last few years, of the selected country. The dataGridView will update to show the contents of the region chart. Note that the demo will open showing all countries in the region chart. The textbox updates to show ‘simulation’ commentary by a “robot journalist”.

9. To link the different elements of the project page we’ll set up some chart events.

ClickSeries event
We are using the ClickSeries event to take the value from the clicked chart as argument in a query to populate the next, related chart.
We can add the event definition at designtime by selecting the chart from which we wish to action and selecting the events tab of Visual Studio’s Properties window and, in this case, selecting the click event from the list for the TChart. That creates the event method in the Form’s code page. Here we’ll need to add some content so relate the Series click with where we wish to go.
We have chosen to partially manually code the query for clarity. The method follows these steps:
Take the original query for the country/continent chart as the basis, modify it to collect data for all available years.

Modify region query:
@contiparam is the continent that you wish to view, sourced from the label of the Continents Chart value. One could use the index value or some other variable if that were to make a better index for the query.

string sqlStr = "SELECT Sum(Fact_Invoices.Invoice_Value) AS SumOfInvoice_Value, Lookup_Country.Country_name, Lookup_Continent.continent_name";
 sqlStr += " FROM((Fact_Invoices INNER JOIN Lookup_Customers ON Fact_Invoices.Cod_Customer = Lookup_Customers.Cod_Customer)";
 sqlStr += " INNER JOIN Lookup_Country ON Lookup_Customers.Country_code_A2 = Lookup_Country.Country_code_A2)";
 sqlStr += " INNER JOIN Lookup_Continent ON Lookup_Country.Continent = Lookup_Continent.continent";
 sqlStr += " where Lookup_Continent.continent_name = @contiParam GROUP BY Lookup_Country.Country_name, Lookup_Continent.continent_name";

DataSet customers = new DataSet();

System.Data.OleDb.OleDbDataAdapter myAdapter = new System.Data.OleDb.OleDbDataAdapter(sqlStr, this.___SalesByCountry_2015TableAdapter.Connection);

Modify country query:
We take the label of the country that we wish to review, as the argument for the country-over-time query.

string sqlStr = "SELECT Sum(Fact_Invoices.Invoice_Value)AS SumOfInvoice_Value, Fact_Invoices.Invoice_year AS AYear, Lookup_Country.Country_name";
sqlStr += " FROM(Lookup_Customers INNER JOIN Fact_Invoices ON Lookup_Customers.Cod_Customer = Fact_Invoices.Cod_Customer)";
sqlStr += " INNER JOIN Lookup_Country ON Lookup_Customers.Country_code_A2 = Lookup_Country.Country_code_A2";
sqlStr += " WHERE Lookup_Country.Country_name = @countryParam";
sqlStr += " GROUP BY Fact_Invoices.Invoice_year, Lookup_Country.Country_name";

Those queries are enough to link the charts. We have taken a click in the same way from the GridView. Taking just one of them as an example, for the region chart, you can see how a bindingSource is created for the recipient chart to ‘plug-into’.

Creating and using a new bindingSource with the sql query as source for Grid and Chart.

DataSet customers = new DataSet();

System.Data.OleDb.OleDbDataAdapter myAdapter = new System.Data.OleDb.OleDbDataAdapter(sqlStr, this.___SalesByCountry_2015TableAdapter.Connection);

//apply region code
myAdapter.SelectCommand.Parameters.Clear();
myAdapter.SelectCommand.Parameters.Insert(0, new System.Data.OleDb.OleDbParameter("contiParam", region));
myAdapter.Fill(customers, "Customers");

bindEurope = new BindingSource();
bindEurope.DataSource = customers;
bindEurope.DataMember = customers.Tables[0].ToString();

dataGridView1.DataSource = bindEurope;
dataGridView1.Refresh();

dataGridView1.Columns[0].HeaderText = "Sales Income";
dataGridView1.Columns[1].HeaderText = "Country";
dataGridView1.Columns[2].HeaderText = "Continent";

tChart3[0].DataSource = bindEurope;
tChart3[0].YValues.DataMember = "SumOfInvoice_Value";
tChart3[0].LabelMember = "Country_name";

tChart3[0].CheckDataSource();

The Chart Series’ CheckDatasource() method is used by TeeChart to refresh the Chart data.

Application output

The screenshots in Fig 3 & 4 show the application at runtime.


Fig. 3. Application startup


Fig. 4. Europe & Switzerland selected.

Big files: XML or JSON ? TeeBI !!

TeeBI Dashboards

 

Introduction

XML and JSON are very typical text formats used to store data, designed to be more comfortable than plain old “csv” text and allowing hierarchical (parent -> child) relationships.

However, even if there are many wonderful standard libraries to process them, there is still a speed problem when loading big quantities of data (say, hundreds or thousands of megabytes).

Content has to be parsed and stored into memory structures representing the “tree” nature of data nodes and attributes, which can be very fast (milliseconds) for small files, but terribly slow (minutes !) for big ones.

TeeBI core base class (TDataItem) is an “agnostic” memory structure providing parent -> child connections, using simple arrays to store data (one array per field or column).

Pseudo-code:


TDataItem = class
Name : String;
Items : Array of TDataItem;   // <--- Children
Kind : TDataKind;  // <-- Integer, String, Float, DateTime, Boolean, or "List of TDataItem"
Data : Array of ...    //  <--  one array for each Kind: "Int32Data : Array of Int32"
end

 

With a TDataItem, loading and saving big quantities of data is insanely fast (0.2 seconds to load or save 1 million rows with 4 columns on a normal PC).

The arrays are saved / loaded to / from Streams directly in one Write / Read operation.

That means we can import data from XML or JSON (or any other format like database datasets, server connections, Excel, etc, etc) into a TDataItem and then save it to a binary TeeBI file for later reuse.


Data := TDataItemPersistence.Load( 'my data.bi ')

 

Once a TDataItem is created or loaded, we can use it in many ways:

  • Search and modify data, re-structure columns
  • Sort data by multiple fields, and by custom expressions
  • Run ultra-fast SQL-like queries and summaries against TDataItems
  • Set master -> detail relationships between different TDataItems
  • Filter rows by code or using expressions (as strings or as TExpression classes)
  • Create calculated columns (using code or expressions)
  • Merge several TDataItems
  • Compare the structure and / or full data of TDataItems to obtain difference sets
  • Present TDataItems using Grids, Charts, Dashboards and PDF output
  • Connect TDataItems to a super-fast TBIDataset (a normal memory TDataset class)
  • Export to any other format (for example XML to JSON and vice-versa)
  • Access remote TDataItems from web servers transparently
  • Apply machine-learning algorithms using R or Python Scikit-learn
  • Access basic statistics of any TDataItem or child item

 

Note to TeeChart developers:

TeeBI includes a new TBIChart control (derived from TChart) that is capable of automatically creating new chart series and fill them from any TDataItem.

BIChart1.Data := MyDataItem;

A planned new feature is to integrate the Data property editor dialog inside the normal TeeChart editor, for design-time support (zero code charting !)

 

TeeBI library is available for download at the following link:

https://github.com/Steema/BI

Supported development environments:

  • Embarcadero Studio (Delphi, C++) from XE4 version and up
  • Lazarus FreePascal
  • …and soon for Microsoft Visual Studio .NET

Several 3rd party products can be optionally used with TeeBI:

https://github.com/Steema/BI/wiki/3rd-party-supported-products

 

For more information:

Please visit the TeeBI community at Google+ and the TeeBI home website for more information and technical details.

 

 

TeeBI Web Server returning a summary chart

TeeBI teaser

Small teaser of TeeBI.

This is the custom web server calculating a query and returning a chart. The only dependencies are TeeChart and Indy http web server component.

TeeBI Web Server returning a summary chart

 

Data is in column-oriented, memory-based custom format for speed reasons, no database or FireDac, etc are necessary. Data is imported from many sources (excel, text files, firedac or any other datasets, etc) to the custom format, which is then transparent to the user apps.

The web server is optional if data is not remote, and it can also return binary streams to normal desktop (vcl or fmx), or mobile apps (fmx).

A table of 2000×5 cells takes 180msec via web. For visual display, an fmx or vcl dbgrid easily handles 1 or more million rows with a custom “TBIDataset” linked to an in-memory data, allowing grouping, filtering and sorting using complex expressions involving columns from the same table or foreign-key(s) fields.

This speed is the basis for a next-coming set of machine-learning and data-mining algorithms, currently under development.

 

Clustering visualization

TeeChart Pro includes classes and components to perform “clustering” on your data, and optionally visualize the results using a chart “Tool” component.

Clustering is the process of grouping data automatically, according to how well related are the individual items.

As an unsupervised algorithm, its widely used in data-mining / machine-learning / B.I. (Business Intelligence) applications.

 

teechart_clustering

For more information on clustering visit the following Wikipedia link: http://en.wikipedia.org/wiki/Cluster_analysis

An executable example:

http://www.steema.us/files/public/teechart/vcl/demos/clustering/TeeChart_Clustering.zip

The clustering algorithm can be processed on custom data, not necessarily on TeeChart “Series” data.

 

Classes and Units

The TeeClustering.pas unit, for both VCL and Firemonkey, contains abstract “engine” classes that perform the clustering algorithms.

Three different clustering methods are provided:

• TKMeansClustering

• THierarchicalClustering

• TQTClustering (Quality Threshold)

These classes derive from a common abstract class: TBaseClustering.

Each clustering method has its own properties that determine how will the clusters be calculated. After calculating, you can access the Clusters property, which is a TList of TCluster objects.

A TCluster contains child clusters (Items[ ]), so you can check which input data items belong to which cluster, or in the case of the Hierarchical type, access the tree structure (clusters and sub-clusters).

The input data (your data) is not contained by the above classes.

Data is passed to the clustering engine through a “provider” class. There is currently one kind of data provider (TSeriesProvider) to cluster XY or XYZ Series points.

This class is implemented in the TeeClusteringTool.pas unit, together with a charting Tool class (TClusteringTool) to make things easier and more automatic.

 

Basic Example

Example runtime code (it can be done at design-time too, without coding) :

uses TeeClusteringTool;

var tool : TClusteringTool;

tool:=TClusteringTool.Create(Self);

tool.ParentChart:=Chart1;

tool.Series:=Series1; // your series

tool.Method:=cmKMeans;

tool.KMeans.NumClusters:=5;

tool.Execute;

 

After execution, you can loop on the resulting output clusters, for example:

var t : Integer;

for t:=0 to tool.Clusters.Count-1 do
Memo1.Lines.Add( ‘Cluster: ‘+IntToStr(t)+’ contains:  ‘+
IntToStr(tool.Clusters[t].Count)+’ points’ );

 

 

TClusteringTool

This tool automatically performs clustering using the choosen method and parameters, and optionally paints each source series point with a different color indicating which cluster they belong to, and/or draws polygons around each group of cluster’s items, among other things.

Properties:

ClusteringTool1.Method := cmHierarchical;

ClusteringTool1.ColorEach := True; // paint Series with one color per cluster

ClusteringTool1.ShowBounds := True; // draws convex polygons bounding each cluster points

ClusteringTool1.Centers.Visible := True; // shows cluster centers

ClusteringTool1.Centroids.Visible := True; // shows cluster centroids

Other properties include Brush, Pen and Transparency, used when drawing cluster polygon boundaries.

Methods:

Several helper methods are provided:

// Obtain cluster’s center and centroid XY points in Series scales:

var P : TPointFloat;

P:=ClusteringTool1.GetClusterCenter( ClusteringTool1.Clusters[3] );

P:=ClusteringTool1.GetClusterCentroid( ClusteringTool1.Clusters[2] );

// Obtain an array of XY points (in screen pixel coordinates), that belong to cluster:

var PP : TPointArray;

ClusteringTool1.GetClusterPoints( ClusteringTool1.Clusters[4], PP);

PP:=nil;

// Get cluster statistics:

var S : TClusterStats;

S:=ClusteringTool1.GetStats( ClusteringTool1.Clusters[0] );

 

Calculation parameters

Each clustering algorithm needs different parameters:

K-Means:

ClusteringTool1.KMeans.NumClusters := 10; // Number of minimum clusters (“K”)

ClusteringTool1.KMeans.MaxIterations := 1000; // Maximum number of iterations before stopping

Hierarchical:

ClusteringTool1.Hierarchical.NumClusters := 8; // Number of tree root clusters

QT:

ClusteringTool1.QTClustering.MinCount := 30; // Minimum number of points to form a cluster

ClusteringTool1.QTClustering.MaxDiameter := 100; // Maximum “diameter” a cluster can grow

 

Common parameters:

Distance

Cluster calculation is based on the “distance” between a data item and the other data items. There are several ways to calculate the “distance” between items.

The algorithms are agnostic, they call the Provider (ie: Series provider) to obtain the distances.

For example, on a XY scatter plot, the distance between points can be for example the hypotenuse (Pythagoras’ theorem), that is, the simple Euclidean distance between a point XY and another XY.

Distance calculations implemented:

dmEuclidean
dmSquaredEuclidean
dmManhattan
dmMinkowski
dmSorensen
dmChebyshev

 

Example:

ClusteringTool1.Distance := dmMinkowski;

ClusteringTool1.MinkowskiLambda := 4;

 

Linkage

There are several ways to calculate the “distance” between clusters when one or the two clusters have more than one item.

This is called “linkage”.

The most simple way is using each cluster “center” (this means no linkage occurs).

Other linkage styles implemented:

lmSingle

Also called “minimum”.

The distance between cluster A and B is the minimum distance between all items in cluster A and all items in cluster B.

lmComplete

Also called “maximum”.

The distance between cluster A and B is the maximum distance between all items in cluster A and all items in cluster B.

lmAverage

The distance between cluster A and B is the average distance between all items in cluster A and all items in cluster B.

lmWard

The result is the increase on “error sum of squares” when adding cluster B items to cluster A.

 

Calculation speed

Clustering is a slow process by nature. Each clustering method has different performance bottlenecks, proportional to the number of input data items.

The TeeClustering.pas unit has been greatly fine-tuned to optimize the speed of each algorithm, although much work is needed to find more advanced techniques that require less CPU cycles.

The QT Threshold algorithm benefits of parallelism, when multiple CPUs can be used together.

Speed examples (revisited):

(Time in milliseconds, Windows 8.1 x64 on Intel i7 4770 @ 3.4Ghz)

IDE XE8 Delphi, Win32, 5000 data points

Algorithm      Single CPU   Multiple CPU

K-Means              47                     31
Hierarchical    4328                4156
QT                     2859                  703

Notes:

x64 bit executables are a little bit faster than 32bit.

Speed is also very dependant on the “distance” calculation method that is used to compare data.  The default Euclidean calculation has a quite big CPU cost as it calculates the Hypotenuse between two data XY value pairs.