Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
D
Deadbeat
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package registry
Container registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
713
Warden
Deadbeat
Commits
3c8c7385
Commit
3c8c7385
authored
6 years ago
by
Pavel Kácha
Browse files
Options
Downloads
Patches
Plain Diff
More documentation
parent
0d9b683e
No related branches found
No related tags found
No related merge requests found
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
README.rst
+1
-1
1 addition, 1 deletion
README.rst
manual.rst
+164
-2
164 additions, 2 deletions
manual.rst
with
165 additions
and
3 deletions
README.rst
+
1
−
1
View file @
3c8c7385
...
@@ -16,7 +16,7 @@ Deadbeat - README
...
@@ -16,7 +16,7 @@ Deadbeat - README
With Deadbeat you can create well behaving *nix application/daemon/cron script which ingests,
With Deadbeat you can create well behaving *nix application/daemon/cron script which ingests,
processes, transforms and outputs realtime data. It allows you to create the geartrain of cogs,
processes, transforms and outputs realtime data. It allows you to create the geartrain of cogs,
where each cog does one specific action. The c
i
gs range from fetching information, parsing,
where each cog does one specific action. The c
o
gs range from fetching information, parsing,
modification, typecasting, transformations, to enrichment, filtering, anonymisation,
modification, typecasting, transformations, to enrichment, filtering, anonymisation,
deduplication, aggregation, formatting, marshalling and output.
deduplication, aggregation, formatting, marshalling and output.
...
...
This diff is collapsed.
Click to expand it.
manual.rst
+
164
−
2
View file @
3c8c7385
...
@@ -7,7 +7,7 @@
...
@@ -7,7 +7,7 @@
Use of this source is governed by an ISC license, see LICENSE file.
Use of this source is governed by an ISC license, see LICENSE file.
Welcome to
Deadbeat documentation
!
Deadbeat documentation
================================================================================
================================================================================
.. note::
.. note::
...
@@ -29,8 +29,170 @@ Welcome to Deadbeat documentation!
...
@@ -29,8 +29,170 @@ Welcome to Deadbeat documentation!
README
README
doc/_doclib/modules
doc/_doclib/modules
*Deadbeat* is a framework for creating event driven data transformation tools.
With Deadbeat you can create well behaving \*nix application/daemon/cron script which ingests,
processes, transforms and outputs realtime data. It allows you to create the geartrain of cogs,
where each cog does one specific action. The cigs range from fetching information, parsing,
modification, typecasting, transformations, to enrichment, filtering, anonymisation,
deduplication, aggregation, formatting, marshalling and output.
Deadbeat is event driven, it is able to watch and act upon common sources of events: timer,
I/O poll, inotify and unix signals. Usually Deadbeat just slacks around doing completely
nothing until some data arives (be it primary data feed or answers to network queries).
The code supports both Python 2.7+ and 3.0+.
The library is part of SABU_ project, loosely related to Warden_, and often used to process
IDEA_ events.
.. _SABU: https://sabu.cesnet.cz
.. _Warden: https://warden.cesnet.cz
.. _IDEA: https://idea.cesnet.cz
Example
--------------------------------------------------------------------------------
Short example to show basic concepts. Resulting script watches log file, expecting lines with IP
addresses, translates them into hostnames and outputs result into another text file. Tailing the
file is event driven (we spin the CPU only when new line arrives), DNS resolution also (other cogs
keep reading/processing the data while waiting for DNS replies).
::
# Basic imports
import sys
from deadbeat import movement, log, conf, dns, text, fs
from deadbeat.movement import itemsetter
from operator import itemgetter
# Define configuration, which will work both in JSON config file and as command line switches
conf_def = conf.cfg_root((
conf.cfg_item(
"input_files", conf.cast_path_list, "Input log files", default="test-input.txt"),
conf.cfg_item(
"output_file", str, "Output file", default="test-output.txt"),
conf.cfg_section("log", log.log_base_config, "Logging"),
conf.cfg_section("config", conf.cfg_base_config, "Configuration")
))
def main():
# Read and process config files and command line
cfg = conf.gather(typedef=conf_def, name="IP_Resolve", description="Proof-of-concept bulk IP resolver")
# Set up logging
log.configure(**cfg.log)
# Instantiate Train object, the holder for all the working cogs
train = movement.Train()
# Cog, which watches (tails) input files and gets new lines
file_watcher = fs.FileWatcherSupply(train.esc, filenames=cfg.input_files, tail=True)
# CSV parser cog, which for the sake of the example expects only one column
csv_parse = text.CSVParse(fieldsetters=(itemsetter("ip"),))
# Cog, which resolves IP addresses to hostnames
resolve = dns.IPtoPTR(train.esc)
# CSV writer cog, which creates line with two columns
serialise = text.CSVMarshall(fieldgetters=(itemgetter("ip"), itemgetter("hostnames")))
# Text file writer cog
output = fs.LineFileDrain(train=train, path=cfg.output_file, timestamp=False, flush=False)
# Now put the geartrain together into the Train object
train.update(movement.train_line(file_watcher, csv_parse, resolve, serialise, output))
# And run it
train()
if __name__ == '__main__':
sys.exit(main())
Concepts
--------------------------------------------------------------------------------
Events
~~~~~~
Various system actions can trigger events including signals, inotify, events related to file descriptors.
Also, events can be scheduled or triggered by cogs themselves.
Cogs
~~~~
Cogs are simple (ok, sometimes not so simple) objects, which act as callables, and once instantiated, wait
to be run with some unspecified data as an argument, while returning this data somehow altered. This alteration
can take the form of changing/adding/removing contents, splitting/merging the data, or generating or deleting
it altogether.
Cogs can also withdraw the data, returning them into the pipeline in some way later.
Cogs can (by means of Train and Escapement) plan themselves to be run later, or connect their methods with
various events.
Special types of cogs are *supplies*, which generate new data, usually from sources outside of the application,
and *drains*, which output the data outwards from the scope of the application. But they are special only
conceptually, from the view of the Train and application any cog can generate new data or swallow it.
Escapement
~~~~~~~~~~
This object (singleton within each application) takes care of various types of events in the system. It provides
API for cogs to register system and time based events.
Train
~~~~~
Another singleton, which contains all the cogs and maps oriented acyclic graph of their data flow. Also
takes care of broadcast events delivery and correct (block free) shutdown sequence of cogs.
Getters/setters
~~~~~~~~~~~~~~~
Cogs can operate upon various types of data. Cog initial configuration usually contains one or more getters
or setters.
*Getter* is meant to be a small piece of code (usually lambda function or py:func:`operator.itemgetter`), whose
function is to return piece of the data for the cog to operate on (for example a specific key from dictionary).
*Setter* is complementary small piece of code, whose function is to set/incorporate piece of data, generated or computed
by cog, into main data (for example setting specific key of a dictionary). To support both muttable and immutable
objects, setter must return the main data object as return value – so new immutable object can be created and replace
old one.
Special is ID getter and setter. If provided, ID setter is used by supply cogs to assign new ID to arriving data,
while ID getter can be used for logging, and for referencing data for example in broadcasts.
Data
~~~~
Anything, which flows within the geartrain/pipeline from one cog to the other.
Application
~~~~~~~~~~~
Deadbeat is meant for creating self-supporting applications, which sit somewhere, continuously watching some sources
of thata, ingesting, processing and transforming the data, acting upon them and forwarding them on.
Application is mainly a processing pipeline of the *cogs*, managed by *Train* (which contains *Escapement*).
Supporting submodules are also configuration, logging and daemonization one.
Config submodule takes application wide hierarchical configuration definition, and then it uses it for reading
and merging set of user provided configuration files together with command line arguments. Various cogs can also
be accompanied with piece of configuration definition insert, which can be used to simplify and reuse global
definition.
Log submodule provides simple logging initialization, configurable consistently with other parts of Deadbeat.
Daemon submodule provides well behaving \*nix daemonization, together with pidfile management.
Indices and tables
Indices and tables
================================================================================
--------------------------------------------------------------------------------
* :ref:`genindex`
* :ref:`genindex`
* :ref:`modindex`
* :ref:`modindex`
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment