XML and JSON are very typical text formats used to store data, designed to be more comfortable than plain old “csv” text and allowing hierarchical (parent -> child) relationships.
However, even if there are many wonderful standard libraries to process them, there is still a speed problem when loading big quantities of data (say, hundreds or thousands of megabytes).
Content has to be parsed and stored into memory structures representing the “tree” nature of data nodes and attributes, which can be very fast (milliseconds) for small files, but terribly slow (minutes !) for big ones.
TeeBI core base class (TDataItem) is an “agnostic” memory structure providing parent -> child connections, using simple arrays to store data (one array per field or column).
TDataItem = class
Name : String;
Items : Array of TDataItem; // <--- Children
Kind : TDataKind; // <-- Integer, String, Float, DateTime, Boolean, or "List of TDataItem"
Data : Array of ... // <-- one array for each Kind: "Int32Data : Array of Int32"
With a TDataItem, loading and saving big quantities of data is insanely fast (0.2 seconds to load or save 1 million rows with 4 columns on a normal PC).
The arrays are saved / loaded to / from Streams directly in one Write / Read operation.
That means we can import data from XML or JSON (or any other format like database datasets, server connections, Excel, etc, etc) into a TDataItem and then save it to a binary TeeBI file for later reuse.
Data := TDataItemPersistence.Load( 'my data.bi ')
Once a TDataItem is created or loaded, we can use it in many ways:
- Search and modify data, re-structure columns
- Sort data by multiple fields, and by custom expressions
- Run ultra-fast SQL-like queries and summaries against TDataItems
- Set master -> detail relationships between different TDataItems
- Filter rows by code or using expressions (as strings or as TExpression classes)
- Create calculated columns (using code or expressions)
- Merge several TDataItems
- Compare the structure and / or full data of TDataItems to obtain difference sets
- Present TDataItems using Grids, Charts, Dashboards and PDF output
- Connect TDataItems to a super-fast TBIDataset (a normal memory TDataset class)
- Export to any other format (for example XML to JSON and vice-versa)
- Access remote TDataItems from web servers transparently
- Apply machine-learning algorithms using R or Python Scikit-learn
- Access basic statistics of any TDataItem or child item
Note to TeeChart developers:
TeeBI includes a new TBIChart control (derived from TChart) that is capable of automatically creating new chart series and fill them from any TDataItem.
BIChart1.Data := MyDataItem;
A planned new feature is to integrate the Data property editor dialog inside the normal TeeChart editor, for design-time support (zero code charting !)
TeeBI library is available for download at the following link:
Supported development environments:
- Embarcadero Studio (Delphi, C++) from XE4 version and up
- Lazarus FreePascal
- …and soon for Microsoft Visual Studio .NET
Several 3rd party products can be optionally used with TeeBI:
For more information:
Please visit the TeeBI community at Google+ and the TeeBI home website for more information and technical details.
4 thoughts to “Big files: XML or JSON ? TeeBI !!”
There are well-known binary formats like EBML (fast enough for FullHD video streaming) and XML/Binary
While I am confident you made it “insanely fast” – isn’t it just yet another ad-hoc binary format that can not be compared to XML/JSON exactly because being “yet another ad hoc” one ? It is not even YAML yet.
You could achieve amazing speed even with JSON.
Of course, you need to step away from the Delphi RTL JSON Library, which is slow and not able to handle huge JSON content.
For instance, JsonDataObjects or mORMot libraries do have high numbers.
Using mORMot unserialization, reading 185MB of JSON into a dynamic array of records is done in 1.44s, i.e. read 143,250 rows /s, and only consumes 113.5 MB of RAM once loaded.
The DOM is not mandatory: once you use a SAX approach, you get amazing results.
See also http://blog.synopse.info/post/2011/06/02/Fast-JSON-parsing
Yep, Delphi RTL is not specially fast.
The way TeeBI JSON class works is by using an “engine” class as a plugin to specify which library to use.
uses BI.Data.JSON, BI.Data.JSON.mORMot;
var tmp : TBIJSON;
I’ve done the plugins for the standard RTL JSON, SuperObject and FPC.
I’ve just started another plugin for mORMot but I’m stuck in simple stuff, maybe you can point me to some demo code to learn how to do it.
I’ve uploaded the small units here:
The one pending is BI.Data.JSON.mORMot.pas
There already is an open binary format for storing R and Python data frames, Apache Feather:
Why reinvent the wheel with a proprietary format rather than using an open one that facilitates data interchange?