The archiver module. Provides functions for messages and values archiving to file system.
License:
GPL
Introduction
The module is designed for archiving messages and values of OpenSCADA on the file system.
Any SCADA system provides the ability to archive the collected data, i.e. formation of history of the changes (dynamics) of processes. Archives conditionally can be divided into two types: archives of messages and archives of values.
A feature of the archives of messages is that so-called events are archived. The characteristic feature of the events is its time of occurrence. The archives of messages are usually used for archiving ,messages in the system, i.e. conducting of logs and reports. Depending on the source the messages can be classified according to different criteria. For example, this may be the reports of emergency situations, the reports of actions of the operators, reports of the glitches of connection and others.
A feature of the archives of values is their frequency, measured in the time lag between two adjacent values. Archives of values are used for archiving the history of continuous processes. As the process is continuous, it can only be archived by introducing the notion of quantization of time interviewing, because otherwise we get the archives of infinite dimensions in view of continuity of the nature of the process. In addition, practically, we can get value from the time limited by the data sources. For example, a fairly high-quality data sources in the industry, are rarely allowed to receive data at a frequency of more than 1kHz. And this is without taking into account of the sensors themselves, which have even less qualitative characteristics.
For conducting of archives in the system OpenSCADA the subsystem «Archives» is provided. This subsystem, according to the types of archives, consists of two parts: an archives of messages and archives of values. The subsystem, in general, is a module that allows you to create archives based on the different nature and methods of storing of data. This module provides a mechanism for the archiving on the file system for both: for the flow of messages, and for the flow of values.
1. Message Archiver
Archives of messages are formed by archiver. There can be the set of archivers, with individual settings, allowing to share archiving of different classes of messages.
The archiver of messages of this module allows you to store data in XML files or in the flat-text format. Markup language XML is a standard format that is easily understood by a lot of exterior applications. However, opening and reviewing of the files in this format requires considerable resources. On the other hand, the flat-text format requires far fewer resources, although not uniform, but also requires knowledge of its structure to deal with.
In any case, both formats are supported and the user can select any of them in accordance with his requirements.
Files of the archive are named by archivers based on the date of the first messages in the archive. For example so: "2006-06-21 17:11:04. Msg".
Files of the archive can be limited in size and time. After exceeding the limit a new file is created. Maximum number of files in a directory of the archiver can also be restricted. After exceeding the limit on the number of files old files will be deleted!
In order to optimize the use of disk space archivers support package of old archives by gzip packer. Packaging is made after a long non-use of the archive.
When you are using the archives in the form of XML, appropriate files are loaded entirely! For a long time unused archives unloading timeout of access to the archive is used, after the exceeding of which the archive is unloaded from memory and then is packaged.
Module provides additional settings for the archiving process (Fig. 1).
Fig.1. Additional settings of an archiving process of messages.
Those parameters include:
Files of the archive in XML — Enables archiving of messages by files in XML-format, rather than plain text. Use of archiving in XML-format requires more RAM because it needs for full downloading of the file, XML-parsing and storing the content into memory at the time of use.
Maximum size of archive's file, by kilobytes — Sets limit on the size of one archive file. Disabling the restriction can be performed by setting the parameter to zero.
Maximum number of the files — Limits the maximum number for files of the archive and additional with the size of single file it determines the size of archive on disk. Completely removing this restriction can be performed by setting the parameter to zero.
Time size of archive's file, by days — Sets limit on the size of single archive file on time.
Timeout to pack files of the archive, by minutes — Sets the time after which, in the absence of requests, the archive file will be packaged in a gzip archive. Set to zero for disabling the packing by gzip.
Period of the archives checking, by minutes — Sets for checking frequency of the archives for the files emergence or deletion into the directory of the archive, as well as exceeding the limits and removing for old files.
Use an info file for the packed archives — Specifies whether to create a file with information about the packed archive files by gzip-archiver. When copying files of archive to another station, this info file can speed up the target station process of first run by eliminating the need to decompress by gzip-archiver in order to obtain the information.
Prevent for duplicates — Enables checking for duplicate messages at the time of putting a message into the archive. If there is a duplicate the message does not fit into the archive. This feature some increases the recording time to archive, but in cases of placing messages in the archive by past time from external sources it allows to eliminate the duplication.
Mean as duplicates and prevent its for equal time, category, level — Enables checking for duplicate messages at the time of putting a message into the archive. As duplicates there mean messages which equal to time, category and level. If there is a duplicate then the new message will replace the old one into the archive. This feature mostly usable for text of messages changing in time, for alarm's state to example.
Check now for the directory of the archivator — The command, which allows you to immediately start for checking the archives, for example, after some manual changes into the directory of the archiver.
For the archivator's files control you can see to tab "Files" (Fig. 2).
Fig.2. Tab "Files" of the messages archivator object.
1.1. File format of archive messages
The table below shows the syntax of the archive file based on the XML-language:
Tag
Description
Attributes
Contains
FSArch
The root element. Identifies the file as belonging to the module.
Version — version of the archive file; Begin — the start time for the archive (hex - UTC in seconds from 01/01/1970); End — the end time for the archive (hex - UTC in seconds from 01/01/1970).
(m)
m
Tag of the single message.
tm — time of creation of the message (hex - UTC in seconds from 01/01/1970); tmu — microseconds of message's time; lv — message level cat — category of message.
Text of message
Archive file on the basis of the flat text consists of:
header in the format: "FSArch {vers} {charset} {beg_tm} {end_tm}"
Where:
vers — version of the archiving module;
charset — code page of the file (usually UTF8);
beg_tm — UTC start time for the archive from 01.01.1970, in hexadecimal form;
end_tm — UTC end time for the archive 01.01.1970, in hexadecimal form.
records of the messages in the format: "{tm} {lev} {cat} {mess}"
Where:
tm — message time in format "{utc_sec}:{usec}", where:
utc_sec — UTC time from 01.01.1970, in hexadecimal form;
usec — microseconds of time, in decimal form.
lev — the level of importance of the message;
cat — category of the message;
mess — text of the message.
Text of the message and its category are coded to exclude separator symbols (space character).
1.2. Example of the archive of messages file
Example of the contents of an archive file in format of the XML language:
<?xml version='1.0'encoding='UTF-8' ?>
<FSArch Version="1.3.0"Begin="4a27dfbc"End="4a28c990">
<m tm="4a28cd01"tmu="942937"lv="4"cat="/DemoStation/sub_DAQ/mod_DiamondBoards/">dscInit error.</m>
<m tm="4a28cd12"tmu="466631"lv="4"cat="/DemoStation/sub_Transport/mod_Sockets/out_HDDTemp/">Connect to Internet socket error: Operation now in progress!</m>
</FSArch>
Example of the contents of the archive file in the format of flat text:
Archives of values are formed particularly by archivers of the values for each registered archive. There can be a lot of archivers with individual settings that allow to divide the archives by various parameters, such as the accuracy and depth.
Archive of values is an independent component, which includes buffer processed by archivers. The main parameter of archive of value is a source of data. As a source of data may make the attributes of the parameters of subsystem "Data acquisition", as well as other external data sources (passive mode). Other sources of data could be: network archivers of remote OpenSCADA systems, environment of programming of systems OpenSCADA etc. No less important parameters are the parameters of the archive buffer. From the parameters of the buffer the opportunity of working of archivers depends on. Thus, the frequency of values in the buffer should be no more than the frequency of the fastest archiver, a buffer size not less than double the amount for the slowest archiver. Otherwise, the possible loss of data!
The overall scheme of archival of values vividly depicted in Fig. 3.
Fig.3. The overall scheme of process of archival values of module FSArch.
Files of archives are named by archivers based on the date of the first value in the archive and archive identifier. For example in this way: "MemInfo_use 2006-06-17 17:32:56.val".
Files of archives can be limited in time. After exceeding the limit the new file is created. Maximum number of files in a directory of archiver also may be limited. After exceeding the limit on the number of files old files will be deleted!
In order to optimize the use of disk space archivers support package of old archives by gzip packer. Packaging is made after a long non-use of the archive. For fast archives connection allow to other systems you can enable info-files using for packed files, that prevent all files forward unpackaging at other system.
The module provides additional settings for the archiving process (Fig. 4).
Fig.4. Additional settings of an archiving process of values.
Those parameters include:
Time size of archive's file, by hours — The parameter is set automatically when you change the values period by the archiver and generally proportional to values frequency of the archiver.
Large files of the archive will be processed long by there is long unpacking for gzip-files and the primary indexing, when accessing to parts of deep in the archives of history.
Maximum number of the files to one archive — Limits the maximum number for files of the archive and additional with the size of single file it determines the size of archive on disk. Completely removing this restriction can be performed by setting the parameter to zero.
Maximum capacity for all archives, by megabytes — Sets limit to maximum amount of the disk space of all arhive's files of the archiver. The testing performs the periodically checking for the archives, which resulted in, on exceeding the limit, for the oldest files removing from all archives. To completely remove this restriction you can set it to value < 1.
Rounding for numeric values (%) — Sets the percentage of boundary for values' difference of parameters into integer and real types where they are considered as identical and will be archived as a single value through the sequential packaging. Allows for well-packaging of slightly changing parameters which are outside certainty. To disable this property you can it set to zero.
Timeout to pack files of the archive, by minutes — Sets the time after which, in the absence of requests, the archive file will be packaged in a gzip archive. Set to zero for disabling the packing by gzip.
Period of the archives checking, by minutes — Sets for checking frequency of the archives for the files emergence or deletion into the directory of the archive, as well as exceeding the limits and removing for old files.
Use an info file for the packed archives — Specifies whether to create a file with information about the packed archive files by gzip-archiver. When copying files of archive to another station, this info file can speed up the target station process of first run by eliminating the need to decompress by gzip-archiver in order to obtain the information.
Check now for the directory of the archivator — The command, which allows you to immediately start for checking the archives, for example, after some manual changes into the directory of the archiver.
For the archivator's files control you can see to tab "Files" (Fig. 5).
Fig.5. Tab "Files" of the values archivator object.
2.1. File format of archive values
To implement the archiving to the file system the following requirements are to be done:
quick (easy) access to add to the archive and reading from the archive;
the possibility of changing the values of the existing archive (to fill holes in duplicate systems);
cycle (size restrictions);
the possibility of the compression by the method of packaging the same values sequence that preserves the possibility of quick access (consistent packaging);
the possibility of packaging obsolete data by standard archivers (gzip, bzip2 ...), with the possibility of extracting on access.
In accordance with the above requirements archiving is organized by method of plurality of files (for each source). Cyclical of archive sold at the file level, ie a new file is created, and the oldest one is removed. For fast compression the method of tightening to the last equal value is used. For this purpose, the bit archiving table is provided with the size of one to one with the number of stored data. Ie each bit corresponds to the single value in the archive. The presence of bit indicates the presence of value. For the thread of the same values bits reduced to zero. In the case of the string archive the table is not a bit but the byte one and contains the length of the appropriate value. In the case of reception of the thread of equal values, the length will be zero and the first same value will be read. As the table is bite one, the archive will be able to keep strings with the length more than 255 characters. Thus, the methods of storage can be divided into a method of fixed and not fixed data size. The overall structure of the archive is shown in Fig. 6.
Fig.6. The overall structure of the value archive.
When you create a new archive file there is formed: the title ( the structure of the title is in the table 1), zero bit table of package of the archive and the first false value. Thus, the archive will be initialized with false values. In the future, the new values will be inserted in the area of values with adjustment of index table of packaging. It follows that the passive archives will dwindle in the files with the size of the title and the bit table.
Table 1. The structure of the header of archive file
Field
Description
Size in bite(bit)
f_tp
System name of the archive ("OpenSCADA Val Arch.")
20
archive
Name of the archive to which the file belongs.
20
beg
Start time of the archive data (microseconds)
8
end
End time of the archive data (microseconds)
8
period
Periodicity of the archive (microseconds)
8
vtp
Type of value in the archive (Boolean, Integer, Real, String)
(3)
hgrid
Ñriterion of using of hard grid in the buffer of the archive
(1)
hres
Ñriterion of using of time of high resolution (mcs) in the buffer of the archive
(1)
reserve
Reserve
14
term
The symbol of the end of the header of file (0x55)
1
Explaining of the mechanism of consistent packaging is given in Fig. 7. As can be seen from the figure a sign of the package contains a length (not fixed types) or a sign of the package (fixed types) of the separately taken value. This means that to obtain the desired value of displacement it is necessary to sum up the length of previous valid values. The implementation of this operation each time and for each value is highly invoice operation. Therefore, the mechanism of caching of displacement of the values is provided. The mechanism caches displacement of values through predefined their quantity, as well as cashes the last value for which the access is made (separately for reading and writing).
Fig. 7. The mechanism of follow packaging of values.
Changes of the values in the existing archive is also provided. However, given the necessity to implement the shifting of the tail of the archive, it is recommended to perform this operation as sparingly as possible and with as far as possible large blocks.
3. Efficiency
In the design and implementation of the module it was built mechanisms improving the process of archiving.
The first mechanism is a mechanism of block (frame-accurate or transactive) location of data in the files of the archives of values. Such an arrangement allows to achieve a maximum speed of archiving, and thus allows to archive more data streams at the same time. The experience of the practical using showed that the system of K8-3000 with a regular IDE hard drive is able to archive to 300000 data streams at a frequency of 1 second, or K5-400 system with the IDE drive (2.5") can archive to 100 parameters with 1 millisecond intervals.
The second mechanism is the package of current values, and outdated files of archives to optimize the use of disk space. There are two packaging mechanisms: the consistent package (archives of values), and a mechanism of finish packaging of archives by means of standard packer (gzip). This approach allowed to achieve high productivity in the process of archiving of current data with the effective mechanism of consistent compression. And finish packaging by means of standard packer of obsolete archives completes the overall picture of the compact storage of large volumes of data. Statistics of practical using, in real noise signal (the worst situation), showed that the extent of consistent packaging is 10%, and the extent of the full packaging was 71%.