Skip to content
Snippets Groups Projects
Forked from 713 / Warden / Warden
46 commits behind the upstream repository.
+---------------------------------------+
| Warden Filer 3.0-beta2 for Warden 3.X |
+---------------------------------------+

Content

  A. Introduction
  B. Dependencies
  C. Usage
  D. Configuration
  E. Directories and locking issues

------------------------------------------------------------------------------
A. Introduction

   Warden Filer (executable warden_filer.py) is daemon for easy handling of
Idea events transfer between plain local files and Warden server. The tool can
be instructed to run as one of two daemons - reader and sender.
   In reader mode, Filer polls Warden server and saves incoming events as
plain files in directory.
   In writer mode, Filer polls directory and sends out all new files out to
Warden server.

------------------------------------------------------------------------------
B. Dependencies

 1. Platform

    Python 2.7+

 2. Python packages

    python-daemon 1.5+, warden_client 3.0+

------------------------------------------------------------------------------
C. Usage

   warden_filer.py [-h] [-c CONFIG] [--oneshot] {sender,receiver}

   Save Warden events as files or send files to Warden

   positional arguments:
     {sender,receiver}     choose direction: sender picks up files and submits
                           them to Warden, receiver pulls events from Warden
                           and saves them as files

   optional arguments:
     -h, --help            show this help message and exit
     -c CONFIG, --config CONFIG
                           configuration file path
     --oneshot             don't daemonise, run just once
     -d, --daemon          daemonize
     -p PID_FILE, --pid_file PID_FILE
                           create PID file with this name


   CONFIG denotes path to configuration file, default is warden_filer.cfg in
current directory.
   --oneshot instructs Filer to just do its work once (fetch available events
or send event files present in directory), but obeys all other applicable
options from configuration file (concerning logging, filtering, directories,
etc.)
   --daemon instructs Filer to go to full unix daemon mode. Without it,
Filer just stays on foreground.
   --pid_file makes Filer to create the usual PID file. Without it, no PID
file gets created.

------------------------------------------------------------------------------
D. Configuration

   Configuration is JSON object in file - however, lines starting with "#"
or "//" are allowed and will be ignored as comments. File must contain valid
JSON object, containing configuration. See also warden_filer.cfg as example.

   warden - can contain Warden 3 configuration (see Warden doc), or path
            to Warden configuration file
   sender - configuration section for sender mode
      dir - directory, whose "incoming" subdir will be checked for Idea
            events to send out
      done_dir - directory, into which the messages will be moved after
            successful sending. If not set, processed messages will get
            deleted, which is default, and usually what you want. Note that
            this is just regular directory, no special locking precautions
            and no subdirectories are done here, however if "done_dir" is on
            the same filesystem as "dir"
      filter - filter fields (same as in Warden query, see Warden and Idea
            doc, possible keys: cat, nocat, group, nogroup, tag, notag),
            unmatched events get discarded and deleted
      node - o information about detector to be prepended into event Node
            array (see Idea doc). Note that Warden server may require it to
            correspond with client registration
      poll_time - how often to check incoming directory (in seconds, defaults
            to 5)
      owait_timeout - how long to opportunistically wait for possible new
            incoming files when number of files to process is less than
            send_events_limit (in seconds, defaults to poll_time)
      owait_poll_time - how often to check incoming directory during
            opportunistic timeout (in seconds, defaults to 1)
   receiver - configuration section for receiver mode
      dir - directory, whose "incoming" subdir will serve as target for events
      filter - filter fields for Warden query (see Warden and Idea doc,
               possible keys: cat, nocat, group, nogroup, tag, notag)
      node - o information about detector to be prepended into event Node
             array (see Idea doc). Be careful here, you may ruin Idea
             messages by wrongly formatted data and they are not checked
             here in any way
      poll_time - how often to check Warden server for new events (in seconds,
             defaults to 5)
      file_limit - limit number of files in "incoming" directory. When the limit
             is reached, polling is paused for "limit_wait_time" seconds
      limit_wait_time - wait this number of seconds if limit on number of files
             is reached (defaults to 5)
      

    Both the "sender" and "reciever" sections can also bear daemon
configuration.

      work_dir - where should daemon chdir
      chroot_dir - confine daemon into chroot directory
      umask - explicitly set umask for created files
      uid, gid - uid/gid, under which daemon will run

------------------------------------------------------------------------------
E. Directories and locking issues

   Working directories are not just simple paths, but contain structure,
loosely mimicked from Maildir with slightly changed names to avoid first look
confusion. Simple path suffers locking issue: when one process saves file
there, another process has no way to know whether file is already complete
or not, and starting to read prematurely can lead to corrupted data read.
Also, two concurrent processes may decide to work on one file, stomping on
others legs.
   So, your scripts and tools inserting data or taking data from working
directories must obey simple protocols, which use atomic "rename" to avoid
locking issues.
   Also, your directory (and its structure) _must_ reside on the same
filesystem to keep "rename" atomic. _Never_ try to mount some of the
subdirectories ("tmp", "incoming", "errors") from other filesystem.

 1. Inserting file

   * The file you want to create _must_ be created in the "tmp" subdirectory
     first, _not_ "incoming". Filename is arbitrary, but must be unique among
     all subdirectories.

   * When done writing, rename the file into "incoming" subdir. Rename is
     atomic operation, so for readers, file will appear either nonexistent
     or complete.

   For simple usage (bash scripts, etc.), just creating sufficiently random
   filename in "tmp" and then moving into "incoming" may be enough.
   Concatenating $RANDOM couple of times will do. :)
   
   For advanced or potentially concurrent usage inserting enough of unique
   information into name is recommended - Filer itself uses hostname, pid,
   unixtime, milliseconds, device number and file inode number to avoid
   locking issues both on local and network based filesystems and to be
   prepared for high traffic.

 2. Picking up file

   * Rename the file to work with into "tmp" directory.

   * Do whatever you want with contents, and when finished, rename file back
     into "incoming", or remove, or move somewhere else, or move into "errors"
     directory - what suits your needs, after all, it's your file.

   Note that in concurrent environment file can disappear between directory
   enumeration and attempt to rename - then just pick another one (and
   possibly repeat), someone was swifter.

------------------------------------------------------------------------------
Copyright (C) 2011-2015 Cesnet z.s.p.o