+---------------------------------------+ | Warden Filer 3.0-beta2 for Warden 3.X | +---------------------------------------+ Content A. Introduction B. Dependencies C. Usage D. Configuration E. Directories and locking issues ------------------------------------------------------------------------------ A. Introduction Warden Filer (executable warden_filer.py) is daemon for easy handling of Idea events transfer between plain local files and Warden server. The tool can be instructed to run as one of two daemons - reader and sender. In reader mode, Filer polls Warden server and saves incoming events as plain files in directory. In writer mode, Filer polls directory and sends out all new files out to Warden server. ------------------------------------------------------------------------------ B. Dependencies 1. Platform Python 2.7+ 2. Python packages python-daemon 1.5+, warden_client 3.0+ ------------------------------------------------------------------------------ C. Usage warden_filer.py [-h] [-c CONFIG] [--oneshot] {sender,receiver} Save Warden events as files or send files to Warden positional arguments: {sender,receiver} choose direction: sender picks up files and submits them to Warden, receiver pulls events from Warden and saves them as files optional arguments: -h, --help show this help message and exit -c CONFIG, --config CONFIG configuration file path --oneshot don't daemonise, run just once -d, --daemon daemonize -p PID_FILE, --pid_file PID_FILE create PID file with this name CONFIG denotes path to configuration file, default is warden_filer.cfg in current directory. --oneshot instructs Filer to just do its work once (fetch available events or send event files present in directory), but obeys all other applicable options from configuration file (concerning logging, filtering, directories, etc.) --daemon instructs Filer to go to full unix daemon mode. Without it, Filer just stays on foreground. --pid_file makes Filer to create the usual PID file. Without it, no PID file gets created. ------------------------------------------------------------------------------ D. Configuration Configuration is JSON object in file - however, lines starting with "#" or "//" are allowed and will be ignored as comments. File must contain valid JSON object, containing configuration. See also warden_filer.cfg as example. warden - can contain Warden 3 configuration (see Warden doc), or path to Warden configuration file sender - configuration section for sender mode dir - directory, whose "incoming" subdir will be checked for Idea events to send out done_dir - directory, into which the messages will be moved after successful sending. If not set, processed messages will get deleted, which is default, and usually what you want. Note that this is just regular directory, no special locking precautions and no subdirectories are done here, however if "done_dir" is on the same filesystem as "dir" filter - filter fields (same as in Warden query, see Warden and Idea doc, possible keys: cat, nocat, group, nogroup, tag, notag), unmatched events get discarded and deleted node - o information about detector to be prepended into event Node array (see Idea doc). Note that Warden server may require it to correspond with client registration poll_time - how often to check incoming directory (in seconds, defaults to 5) owait_timeout - how long to opportunistically wait for possible new incoming files when number of files to process is less than send_events_limit (in seconds, defaults to poll_time) owait_poll_time - how often to check incoming directory during opportunistic timeout (in seconds, defaults to 1) receiver - configuration section for receiver mode dir - directory, whose "incoming" subdir will serve as target for events filter - filter fields for Warden query (see Warden and Idea doc, possible keys: cat, nocat, group, nogroup, tag, notag) node - o information about detector to be prepended into event Node array (see Idea doc). Be careful here, you may ruin Idea messages by wrongly formatted data and they are not checked here in any way poll_time - how often to check Warden server for new events (in seconds, defaults to 5) file_limit - limit number of files in "incoming" directory. When the limit is reached, polling is paused for "limit_wait_time" seconds limit_wait_time - wait this number of seconds if limit on number of files is reached (defaults to 5) Both the "sender" and "reciever" sections can also bear daemon configuration. work_dir - where should daemon chdir chroot_dir - confine daemon into chroot directory umask - explicitly set umask for created files uid, gid - uid/gid, under which daemon will run ------------------------------------------------------------------------------ E. Directories and locking issues Working directories are not just simple paths, but contain structure, loosely mimicked from Maildir with slightly changed names to avoid first look confusion. Simple path suffers locking issue: when one process saves file there, another process has no way to know whether file is already complete or not, and starting to read prematurely can lead to corrupted data read. Also, two concurrent processes may decide to work on one file, stomping on others legs. So, your scripts and tools inserting data or taking data from working directories must obey simple protocols, which use atomic "rename" to avoid locking issues. Also, your directory (and its structure) _must_ reside on the same filesystem to keep "rename" atomic. _Never_ try to mount some of the subdirectories ("tmp", "incoming", "errors") from other filesystem. 1. Inserting file * The file you want to create _must_ be created in the "tmp" subdirectory first, _not_ "incoming". Filename is arbitrary, but must be unique among all subdirectories. * When done writing, rename the file into "incoming" subdir. Rename is atomic operation, so for readers, file will appear either nonexistent or complete. For simple usage (bash scripts, etc.), just creating sufficiently random filename in "tmp" and then moving into "incoming" may be enough. Concatenating $RANDOM couple of times will do. :) For advanced or potentially concurrent usage inserting enough of unique information into name is recommended - Filer itself uses hostname, pid, unixtime, milliseconds, device number and file inode number to avoid locking issues both on local and network based filesystems and to be prepared for high traffic. 2. Picking up file * Rename the file to work with into "tmp" directory. * Do whatever you want with contents, and when finished, rename file back into "incoming", or remove, or move somewhere else, or move into "errors" directory - what suits your needs, after all, it's your file. Note that in concurrent environment file can disappear between directory enumeration and attempt to rename - then just pick another one (and possibly repeat), someone was swifter. ------------------------------------------------------------------------------ Copyright (C) 2011-2015 Cesnet z.s.p.o
Forked from
713 / Warden / Warden
46 commits behind the upstream repository.
Václav Bartoš
authored
Name | Last commit | Last update |
---|---|---|
.. | ||
LICENSE | ||
README | ||
check_file_count | ||
warden_filer.cfg.dist | ||
warden_filer.py | ||
warden_filer_logrotate.dist | ||
warden_filer_receiver | ||
warden_filer_sender |