Skip to content
Snippets Groups Projects
Commit 99cc642d authored by Jan Mach's avatar Jan Mach
Browse files

Greatly improved database sanity check scripts.

The database sanity check scripts now produce better output, log the output aside to log files for later use and are easier to configure. The usage documentation was revised as well. (Redmine issue: #5101)
parent 13bb5028
No related branches found
No related tags found
No related merge requests found
...@@ -180,8 +180,21 @@ directory, which can be used to help with keeping the data quality on the sane ...@@ -180,8 +180,21 @@ directory, which can be used to help with keeping the data quality on the sane
levels. These scripts are currently really simple, they just perform hardcoded levels. These scripts are currently really simple, they just perform hardcoded
database query and send the query results via email to list of configured recipients. database query and send the query results via email to list of configured recipients.
Target email addressess can be configured in ``/etc/default/mentat`` configuration Target email addressess can be configured in ``/etc/default/mentat`` configuration
file with configuration key ``MENTAT_CHECKS_MAIL_TO``. This script can be set to file or passed directly to the script as command line parameters.
launch periodically via ``cron``:
To correctly correctly configure these scripts please pay attention to following
configurations ``/etc/default/mentat``:
``MENTAT_IS_ENABLED``
Master switch. Unless value is set to ``yes`` no checks will be performed.
``MENTAT_CHECKS_MAIL_TO``
List of recipients of check reports (must be array).
``MENTAT_HAWAT_URL``
Base URL to the Mentat`s web interface. It will be used to generate URLs to
example events.
To enable these scripts please configure them to be launched periodically via
``cron``.
``/etc/mentat/scripts/mentat-check-alive.sh`` ``/etc/mentat/scripts/mentat-check-alive.sh``
Query the IDEA event database and find a list of event detectors, that stopped Query the IDEA event database and find a list of event detectors, that stopped
...@@ -189,28 +202,44 @@ launch periodically via ``cron``: ...@@ -189,28 +202,44 @@ launch periodically via ``cron``:
going suddenly offline. going suddenly offline.
``/etc/mentat/scripts/mentat-check-inspectionerrors.sh`` ``/etc/mentat/scripts/mentat-check-inspectionerrors.sh``
Query the IDEA event database and detect list of all inspection errors along Query the IDEA event database and detect list of all inspection errors along
with example messages. One of the :ref:`section-bin-mentat-inspector` modules with example events. One of the :ref:`section-bin-mentat-inspector` modules
is by default configured to perform message sanity inspection and logs errors is by default configured to perform event sanity inspection and logs errors
it finds directly into the message. This script can provide summary of all it finds directly into the event. This script can provide summary of all
current inspection errors, so you can go and fix malfunctioning detectors. current inspection errors, so you can go and fix malfunctioning detectors.
``/etc/mentat/scripts/mentat-check-no-eventclass.sh`` ``/etc/mentat/scripts/mentat-check-noeventclass.sh``
Query the IDEA event database and detect list of events without assigned Query the IDEA event database and detect list of events without assigned
internal classification. The event classification is an internal mechanism internal classification. The event classification is an internal mechanism
for aggregating messages possibly from different detectors and representing for aggregating events possibly from different detectors and representing
similar events. similar event classess (e.g. SSH bruteforce attacks detected by different
detectors may by described by slightly different IDEA events. In a best case
scenario any IDEA event should be assigned exactly one event class and there
should not be any events without an event class.
``/etc/mentat/scripts/mentat-check-volatiledescription.sh``
Query the IDEA event database and detect list of detectors that are putting
variable data into ``Description`` key within the event. The description
should contain only constant data, things like IP addressess, timestamps and
so on should be placed into the ``Note`` key.
``/etc/mentat/scripts/mentat-check-test.sh`` ``/etc/mentat/scripts/mentat-check-test.sh``
Query the IDEA event database and detect list of detectors that are sending Query the IDEA event database and detect list of detectors that are sending
messages with ``Test`` category for "longer than normal" time. Ussually when events with ``Test`` category for "longer than normal" time. Ussually when
new detector is added to the system, it is smart to assess the quality of the new detector is added to the system, it is smart to assess the quality of the
data provided before letting the messages be handled in full. However detectors data provided before letting the messages be handled in full. However detectors
should not use this feature permanently, instead the data source should eiher should not use this feature permanently, instead the data source should eiher
move to production level by starting to omit the ``Test`` category, or stop move to production level by starting to omit the ``Test`` category, or stop
sending those messages. sending those messages altogether.
``/etc/mentat/scripts/mentat-check-volatile-description.sh``
Query the IDEA event database and detect list of detectors that are putting Following is an example ``cron`` configuration to enable all these checks.
variable data into ``Description`` key within the message. The description
should contain only constant data, things like IP addressess, timestamps and .. code-block:: shell
so on should be placed into the ``Note`` key.
# root@host$ crontab -e
10 0 * * mon /etc/mentat/scripts/mentat-check-alive.sh 7
11 0 * * mon /etc/mentat/scripts/mentat-check-inspectionerrors.sh 7
12 0 * * mon /etc/mentat/scripts/mentat-check-noeventclass.sh 7
# As an example use 14 days as check interval here instead of 7 days
13 0 * * mon /etc/mentat/scripts/mentat-check-volatiledescription.sh 14
# As an example send these reports to some different people
14 0 * * mon /etc/mentat/scripts/mentat-check-test.sh 7 admin@domain.org another-admin@domain.org
Monitoring message queues Monitoring message queues
......
...@@ -6,8 +6,20 @@ ...@@ -6,8 +6,20 @@
# Use of this source is governed by the MIT license, see LICENSE file. # Use of this source is governed by the MIT license, see LICENSE file.
#------------------------------------------------------------------------------- #-------------------------------------------------------------------------------
MENTAT_VENV=/var/mentat/venv # Master switch for Debian system scripts.
MENTAT_IS_ENABLED=yes MENTAT_IS_ENABLED=yes
#MENTAT_IS_ENABLED=no
# Location of custom Python virtual environment for Mentat system.
MENTAT_VENV=/var/mentat/venv
# Name of the system user and group to use to run the Mentat system.
MENTAT_USER=mentat MENTAT_USER=mentat
MENTAT_GROUP=mentat MENTAT_GROUP=mentat
# Recipients of the additional Mentat system check script reports. Please be
# aware, that this variable should be an array.
MENTAT_CHECKS_MAIL_TO=(root) MENTAT_CHECKS_MAIL_TO=(root)
# Base URL for Mentat`s web interface Hawat (with trailing slash).
MENTAT_HAWAT_URL=https://mentat.domain.org/mentat/
...@@ -20,4 +20,4 @@ open-source project. ...@@ -20,4 +20,4 @@ open-source project.
__author__ = "Jan Mach <jan.mach@cesnet.cz>" __author__ = "Jan Mach <jan.mach@cesnet.cz>"
__credits__ = "Pavel Kácha <pavel.kacha@cesnet.cz>, Andrea Kropáčová <andrea.kropacova@cesnet.cz>" __credits__ = "Pavel Kácha <pavel.kacha@cesnet.cz>, Andrea Kropáčová <andrea.kropacova@cesnet.cz>"
__version__ = "2.3.31" __version__ = "2.3.32"
...@@ -4,6 +4,7 @@ ...@@ -4,6 +4,7 @@
# Detectors that are dead over 2 days (but seen last week) # Detectors that are dead over 2 days (but seen last week)
# #
# Author: Pavel Kácha <ph@cesnet.cz> # Author: Pavel Kácha <ph@cesnet.cz>
# Contributions: Jan Mach <mek@cesnet.cz>
# Copyright (C) since 2011 CESNET, z.s.p.o # Copyright (C) since 2011 CESNET, z.s.p.o
# Use of this source is governed by the MIT license, see LICENSE file. # Use of this source is governed by the MIT license, see LICENSE file.
#------------------------------------------------------------------------------- #-------------------------------------------------------------------------------
...@@ -11,18 +12,54 @@ ...@@ -11,18 +12,54 @@
. /etc/default/mentat . /etc/default/mentat
cd / cd /
#sudo --user=postgres psql --dbname=mentat_events --expanded <<EOF # Check, that Mentat system is enabled.
sudo --user=postgres psql --dbname=mentat_events --expanded <<EOF | mail -s 'Mentat: Detectors dead over 2 days (but seen last week)' ${MENTAT_CHECKS_MAIL_TO[@]} if test "x$MENTAT_IS_ENABLED" != "xyes"; then
exit 0
fi
DAYS_SEEN=${1:?You must provide check time interval in days}
DAYS_DEAD=2
shift
ADDRS=${@}
CURRDATE=`date`
# In case list of report recipients is not given as command line argument, use
# the default list from /etc/default/mentat configuration file.
if [ -z "$ADDRS" ]
then
ADDRS=${MENTAT_CHECKS_MAIL_TO[@]}
fi
#-------------------------------------------------------------------------------
sudo --user=postgres psql --dbname=mentat_events --expanded <<EOF | tee --append /var/tmp/mentat-check-alive.sh.log | mail -s 'Mentat check: Detectors that appear to be dead' ${ADDRS[@]}
\set QUIET 1
SET timezone TO 'utc'; SET timezone TO 'utc';
\timing on
\unset QUIET
\echo Dear administrator,
\echo
\echo here is a list of detectors that have been seen in last $DAYS_SEEN day(s) but now appear to be dead for over $DAYS_DEAD day(s):
\echo
SELECT SELECT
node_name as "Node Name", node_name AS "Detector",
max(cesnet_storagetime) as "Storage Time" MAX(cesnet_storagetime) AS "Last event"
FROM FROM
events events
WHERE WHERE
cesnet_storagetime > LOCALTIMESTAMP - INTERVAL '7 day' cesnet_storagetime > LOCALTIMESTAMP - INTERVAL '$DAYS_SEEN day'
GROUP BY GROUP BY
node_name node_name
HAVING HAVING
MAX(cesnet_storagetime) < LOCALTIMESTAMP - INTERVAL '2 day'; MAX(cesnet_storagetime) < LOCALTIMESTAMP - INTERVAL '$DAYS_DEAD day';
\set QUIET 1
\timing off
\unset QUIET
\echo ---
\echo With regards
\echo
\echo Concerned watch script $0
\echo Generated at: $CURRDATE
\echo Mailed to: ${ADDRS[@]}
\echo
EOF EOF
...@@ -4,6 +4,7 @@ ...@@ -4,6 +4,7 @@
# IDEA message inspection errors # IDEA message inspection errors
# #
# Author: Pavel Kácha <ph@cesnet.cz> # Author: Pavel Kácha <ph@cesnet.cz>
# Contributions: Jan Mach <mek@cesnet.cz>
# Copyright (C) since 2011 CESNET, z.s.p.o # Copyright (C) since 2011 CESNET, z.s.p.o
# Use of this source is governed by the MIT license, see LICENSE file. # Use of this source is governed by the MIT license, see LICENSE file.
#------------------------------------------------------------------------------- #-------------------------------------------------------------------------------
...@@ -11,20 +12,56 @@ ...@@ -11,20 +12,56 @@
. /etc/default/mentat . /etc/default/mentat
cd / cd /
#sudo --user=postgres psql --dbname=mentat_events --expanded <<EOF # Check, that Mentat system is enabled.
sudo --user=postgres psql --dbname=mentat_events --expanded <<EOF | mail -s 'Mentat: IDEA inspection errors' ${MENTAT_CHECKS_MAIL_TO[@]} if test "x$MENTAT_IS_ENABLED" != "xyes"; then
exit 0
fi
DAYS=${1:?You must provide check time interval in days}
shift
ADDRS=${@}
CURRDATE=`date`
# In case list of report recipients is not given as command line argument, use
# the default list from /etc/default/mentat configuration file.
if [ -z "$ADDRS" ]
then
ADDRS=${MENTAT_CHECKS_MAIL_TO[@]}
fi
#-------------------------------------------------------------------------------
sudo --user=postgres psql --dbname=mentat_events --expanded <<EOF | tee --append /var/tmp/mentat-check-inspectionerrors.sh.log | mail -s 'Mentat check: Detectors sending events with inspection errors' ${ADDRS[@]}
\set QUIET 1
SET timezone TO 'utc'; SET timezone TO 'utc';
\timing on
\unset QUIET
\echo Dear administrator,
\echo
\echo here is a list of detectors producing events with inspection errors in last $DAYS day(s):
\echo
SELECT SELECT
node_name AS "Node Name", node_name AS "Detector",
cesnet_inspectionerrors AS "Inspection Errors", cesnet_inspectionerrors AS "Inspection errors",
'https://mentat-hub.cesnet.cz/mentat/events/show/' || max(id) AS "Example event" '${MENTAT_HAWAT_URL}events/' || MAX(id) || '/show' AS "Example event",
COUNT(*) AS "Count"
FROM FROM
events events
WHERE WHERE
cesnet_inspectionerrors!='{}' cesnet_inspectionerrors != '{}'
AND cesnet_storagetime > localtimestamp - INTERVAL '1 day' AND cesnet_storagetime > localtimestamp - INTERVAL '$DAYS day'
GROUP BY GROUP BY
node_name, cesnet_inspectionerrors node_name, cesnet_inspectionerrors
ORDER BY ORDER BY
node_name; node_name;
\set QUIET 1
\timing off
\unset QUIET
\echo ---
\echo With regards
\echo
\echo Concerned watch script $0
\echo Generated at: $CURRDATE
\echo Mailed to: ${ADDRS[@]}
\echo
EOF EOF
...@@ -4,6 +4,7 @@ ...@@ -4,6 +4,7 @@
# Events that did not fit any of our classes # Events that did not fit any of our classes
# #
# Author: Pavel Kácha <ph@cesnet.cz> # Author: Pavel Kácha <ph@cesnet.cz>
# Contributions: Jan Mach <mek@cesnet.cz>
# Copyright (C) since 2011 CESNET, z.s.p.o # Copyright (C) since 2011 CESNET, z.s.p.o
# Use of this source is governed by the MIT license, see LICENSE file. # Use of this source is governed by the MIT license, see LICENSE file.
#------------------------------------------------------------------------------- #-------------------------------------------------------------------------------
...@@ -11,20 +12,55 @@ ...@@ -11,20 +12,55 @@
. /etc/default/mentat . /etc/default/mentat
cd / cd /
#sudo --user=postgres psql --dbname=mentat_events --expanded <<EOF # Check, that Mentat system is enabled.
sudo --user=postgres psql --dbname=mentat_events --expanded <<EOF | mail -s 'Mentat: Events do not fit any of our classes' ${MENTAT_CHECKS_MAIL_TO[@]} if test "x$MENTAT_IS_ENABLED" != "xyes"; then
exit 0
fi
DAYS=${1:?You must provide check time interval in days}
shift
ADDRS=${@}
CURRDATE=`date`
# In case list of report recipients is not given as command line argument, use
# the default list from /etc/default/mentat configuration file.
if [ -z "$ADDRS" ]
then
ADDRS=${MENTAT_CHECKS_MAIL_TO[@]}
fi
#-------------------------------------------------------------------------------
sudo --user=postgres psql --dbname=mentat_events --expanded <<EOF | tee --append /var/tmp/mentat-check-noeventclass.sh.log | mail -s 'Mentat check: Detectors sending unclassified events' ${ADDRS[@]}
\set QUIET 1
SET timezone TO 'utc'; SET timezone TO 'utc';
\timing on
\unset QUIET
\echo Dear administrator,
\echo
\echo here is a list of detectors producing events not fitting any of the predefined classes in last $DAYS day(s):
\echo
SELECT SELECT
node_name AS "Node Name", node_name AS "Detector",
'https://mentat-hub.cesnet.cz/mentat/events/show/' || max(id) AS "Example event", '${MENTAT_HAWAT_URL}events/' || MAX(id) || '/show' AS "Example event",
COUNT(*) as Count COUNT(*) AS "Count"
FROM FROM
events events
WHERE WHERE
(cesnet_eventclass IS NULL OR cesnet_eventclass='') (cesnet_eventclass IS NULL OR cesnet_eventclass = '')
AND cesnet_storagetime > localtimestamp - INTERVAL '1 day' AND cesnet_storagetime > localtimestamp - INTERVAL '$DAYS day'
GROUP BY GROUP BY
node_name node_name
ORDER BY ORDER BY
node_name; node_name;
\set QUIET 1
\timing off
\unset QUIET
\echo ---
\echo With regards
\echo
\echo Concerned watch script $0
\echo Generated at: $CURRDATE
\echo Mailed to: ${ADDRS[@]}
\echo
EOF EOF
...@@ -4,6 +4,7 @@ ...@@ -4,6 +4,7 @@
# Clients still sending messages with Test category # Clients still sending messages with Test category
# #
# Author: Pavel Kácha <ph@cesnet.cz> # Author: Pavel Kácha <ph@cesnet.cz>
# Contributions: Jan Mach <mek@cesnet.cz>
# Copyright (C) since 2011 CESNET, z.s.p.o # Copyright (C) since 2011 CESNET, z.s.p.o
# Use of this source is governed by the MIT license, see LICENSE file. # Use of this source is governed by the MIT license, see LICENSE file.
#------------------------------------------------------------------------------- #-------------------------------------------------------------------------------
...@@ -11,19 +12,57 @@ ...@@ -11,19 +12,57 @@
. /etc/default/mentat . /etc/default/mentat
cd / cd /
#sudo --user=postgres psql --dbname=mentat_events --expanded <<EOF # Check, that Mentat system is enabled.
sudo --user=postgres psql --dbname=mentat_events --expanded <<EOF | mail -s 'Mentat: Detectors still sending Test' ${MENTAT_CHECKS_MAIL_TO[@]} if test "x$MENTAT_IS_ENABLED" != "xyes"; then
exit 0
fi
DAYS=${1:?You must provide check time interval in days}
shift
ADDRS=${@}
CURRDATE=`date`
# In case list of report recipients is not given as command line argument, use
# the default list from /etc/default/mentat configuration file.
if [ -z "$ADDRS" ]
then
ADDRS=${MENTAT_CHECKS_MAIL_TO[@]}
fi
#-------------------------------------------------------------------------------
sudo --user=postgres psql --dbname=mentat_events --expanded <<EOF | tee --append /var/tmp/mentat-check-test.sh.log | mail -s 'Mentat check: Detectors sending test events' ${ADDRS[@]}
\set QUIET 1
SET timezone TO 'utc'; SET timezone TO 'utc';
\timing on
\unset QUIET
\echo Dear administrator,
\echo
\echo here is a list of detectors producing events with Test category in last $DAYS day(s):
\echo
SELECT SELECT
node_name AS "Node Name", node_name AS "Detector",
max(category) AS "Category", MAX(category) AS "Example categories",
'https://mentat-hub.cesnet.cz/mentat/events/show/' || max(id) AS "Example event" MAX(description) as "Example description",
'${MENTAT_HAWAT_URL}events/' || MAX(id) || '/show' AS "Example event",
COUNT(*) AS "Count"
FROM FROM
events events
WHERE WHERE
'Test' = ANY(category) 'Test' = ANY(category)
AND cesnet_storagetime > localtimestamp - INTERVAL '$DAYS day'
GROUP BY GROUP BY
node_name node_name
ORDER BY ORDER BY
node_name; node_name;
\set QUIET 1
\timing off
\unset QUIET
\echo ---
\echo With regards
\echo
\echo Concerned watch script $0
\echo Generated at: $CURRDATE
\echo Mailed to: ${ADDRS[@]}
\echo
EOF EOF
...@@ -4,6 +4,7 @@ ...@@ -4,6 +4,7 @@
# Clients sending non static Descriptions (dynamic text like IPs should go to Note) # Clients sending non static Descriptions (dynamic text like IPs should go to Note)
# #
# Author: Pavel Kácha <ph@cesnet.cz> # Author: Pavel Kácha <ph@cesnet.cz>
# Contributions: Jan Mach <mek@cesnet.cz>
# Copyright (C) since 2011 CESNET, z.s.p.o # Copyright (C) since 2011 CESNET, z.s.p.o
# Use of this source is governed by the MIT license, see LICENSE file. # Use of this source is governed by the MIT license, see LICENSE file.
#------------------------------------------------------------------------------- #-------------------------------------------------------------------------------
...@@ -11,29 +12,64 @@ ...@@ -11,29 +12,64 @@
. /etc/default/mentat . /etc/default/mentat
cd / cd /
#sudo --user=postgres psql --dbname=mentat_events --expanded <<EOF # Check, that Mentat system is enabled.
sudo --user=postgres psql --dbname=mentat_events --expanded <<EOF | mail -s 'Mentat: Detectors sending non static Descriptions (dynamic text like IPs should go to Note)' ${MENTAT_CHECKS_MAIL_TO[@]} if test "x$MENTAT_IS_ENABLED" != "xyes"; then
exit 0
fi
DAYS=${1:?You must provide check time interval in days}
shift
ADDRS=${@}
CURRDATE=`date`
# In case list of report recipients is not given as command line argument, use
# the default list from /etc/default/mentat configuration file.
if [ -z "$ADDRS" ]
then
ADDRS=${MENTAT_CHECKS_MAIL_TO[@]}
fi
#-------------------------------------------------------------------------------
sudo --user=postgres psql --dbname=mentat_events --expanded <<EOF | tee --append /var/tmp/mentat-check-volatiledescription.sh.log | mail -s 'Mentat check: Detectors sending events with volatile descriptions' ${ADDRS[@]}
\set QUIET 1
SET timezone TO 'utc'; SET timezone TO 'utc';
\timing on
\unset QUIET
\echo Dear administrator,
\echo
\echo here is a list of detectors producing events with volatile description (dynamic text like IPs should go to Note) in last $DAYS day(s):
\echo
SELECT SELECT
node_name AS "Node Name", node_name AS "Detector",
category AS "Category", category AS "Categories",
COUNT(*) AS "Count", MAX(description) as "Example description",
MAX(description) as "Example Description", '${MENTAT_HAWAT_URL}events/' || MAX(id) || '/show' AS "Example event",
'https://mentat-hub.cesnet.cz/mentat/events/show/' || MAX(id) AS "Example event" COUNT(*) AS "Count"
FROM ( FROM (
SELECT SELECT
node_name, category, description, MAX(id) as id node_name, category, description, MAX(id) as id
FROM FROM
events events
WHERE WHERE
cesnet_storagetime > localtimestamp - INTERVAL '1 day' cesnet_storagetime > localtimestamp - INTERVAL '$DAYS day'
GROUP BY GROUP BY
node_name, category, description node_name, category, description
) AS subquery ) AS subquery
GROUP BY GROUP BY
node_name, category node_name, category
HAVING HAVING
COUNT(*) > 5 COUNT(*) > 10
ORDER BY ORDER BY
node_name, category node_name, category;
\set QUIET 1
\timing off
\unset QUIET
\echo ---
\echo With regards
\echo
\echo Concerned watch script $0
\echo Generated at: $CURRDATE
\echo Mailed to: ${ADDRS[@]}
\echo
EOF EOF
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment