Implement mentat-storage.py module

By Jan Mach on 2024-12-26T14:05:47

added C-Dev-Core Done 0% P4 S-New T-Feature labels

changed milestone to %2.0

Relation Task, parent id: 3374

By Jan Mach on 2017-03-22T10:02:15

added P3 label and removed P4 label

added S-In Progress label and removed S-New label

Implemented base libraries for representing IDEA messages in Mentat system and converting them from that internal representation to appropriate representation in MongoDB and back.

Because the implementation is based on Pavel's typedcol and idea.lite libraries, I would like to ask him to please make a quick review and provide feedback, whether the implementation makes sense to the author of original library.

Most relevant files for convenience:

source:lib/mentat/idea/internal.py
source:lib/mentat/idea/test_internal.py
source:lib/mentat/idea/mongodb.py
source:lib/mentat/idea/test_mongodb.py

By Jan Mach on 2024-12-28T17:45:54

assigned to @ph and unassigned @Jan_Mach

By Jan Mach on 2024-12-26T14:12:59

added S-Feedback label and removed S-In Progress label

I have encountered showstopper that is currently preventing us from successfully using the mentat.idea.mongodb library for storing messages into database. The issue seems to be with native BSON encoder, which is unable to encode objects of type typedcol.TypedList into BSON:

Traceback (most recent call last):
  File "test_mongodb.py", line 354, in test_04_basic
    result_b = self.collection.insert_one(idea_mongo_in_l)
  File "/usr/local/lib/python3.5/dist-packages/pymongo/collection.py", line 630, in insert_one
    bypass_doc_val=bypass_document_validation),
  File "/usr/local/lib/python3.5/dist-packages/pymongo/collection.py", line 535, in _insert
    check_keys, manipulate, write_concern, op_id, bypass_doc_val)
  File "/usr/local/lib/python3.5/dist-packages/pymongo/collection.py", line 516, in _insert_one
    check_keys=check_keys)
  File "/usr/local/lib/python3.5/dist-packages/pymongo/pool.py", line 244, in command
    self._raise_connection_failure(error)
  File "/usr/local/lib/python3.5/dist-packages/pymongo/pool.py", line 372, in _raise_connection_failure
    raise error
  File "/usr/local/lib/python3.5/dist-packages/pymongo/pool.py", line 239, in command
    read_concern)
  File "/usr/local/lib/python3.5/dist-packages/pymongo/network.py", line 82, in command
    None, codec_options, check_keys)
bson.errors.InvalidDocument: Cannot encode object: ['abuse@cesnet.cz']

----------------------------------------------------------------------

A have implemented horrible hack function primitivize() into module mentat.idea.mongodb to verify this. The approprite code can be found in separate branch dev_mek in repository, in unit test file mentat.idea.test_mongodb. The primitivize() function is directly in module mentat.idea.mongodb. I was unable to primitivize only TypedList to list and due to implementation conststraints had to primitivize also TypedDict to dict.

The bson.BSON.encode documentation sadly confirms, that encode method can cope only with MutableMapping objects:

https://api.mongodb.com/python/3.4.0/api/bson/index.html

By Jan Mach on 2017-04-07T13:14:04

Quick brainblender:

wrap TypedList derivatives in simple

return tl.data

try adding

bson._ENCODERS[TypedList] = _encode_list

By Pavel Kácha on 2017-04-07T13:58:48

@ph wrote:

Quick brainblender:

try adding bson._ENCODERS[TypedList] = _encode_list

Self answer: won't work, bson has also fast C version, which is used by default, and is not monkey-patchable.

By Pavel Kácha on 2017-04-10T14:45:21

Latest commit 6830a9e4 in development branch mek_dev fixed the problem with bson.BSON.encode being unable to encode typedcol.TypedList objects. In commit 9865c900 the source:lib/mentat/idea/mongodb.py library was unable to store IDEA messages into MongoDB. The issue was with bson.BSON encoder, which was hardcoded in a way that handled any unknown object as dict. We were not able to convince the encoder to treat TypedList objects as lists, so we had to use different approach and supply appropriate data structure. The mentat.idea.mongodb.IdeaIn convertor now produces data structure composed of simple dicts and lists instead of TypedDicts and TypedLists.

Current implementation should however be considered as prototype and proof of concept, because it probably will be possible to write it in more elegant way. The current problem is, that the idea.base.idea_typedef contains hardcoded calls for typedcol.typed_list(), which are not customizable from outside of the module via flavour mechanism. The addon feature was used to monkeypatch these definitions. This is of course not optimal solution, because any changes in underlying library must be propagated manually into source:lib/mentat/idea/mongodb.py library.

Additionally, IDEA messages stored in database contain some additional attributes, that are database specific and internal and should be stripped upor retrieving from database. Currently this must be done manually using truncate() function call, more optimal solution would be to incorporate this into typedcol library and strip these attributes during object instantination/conversion process.

By Jan Mach on 2024-12-28T17:53:12

Finished prototype of mentat-storage.py module.

The commit 25b51380 introduces finished working prototype of mentat-storage.py real-time message processing module including appropriate unit tests and basic documentation work. Key features are possible customization of target database and collection, usage of core database configuration file, which can be overridden with local config file, or command line options. Messages are currently stored in database one by one, however batch processing will possibly be implemented in the future.

Next work:

test deployment on development server with continuous processing of randomly generated messages
test deployment on production server with continuous processing of real messages and storing them to different database and collection
production deployment

By Jan Mach on 2017-04-13T11:53:12

@Jan_Mach wrote:

Current implementation should however be considered as prototype and proof of concept, because it probably will be possible to write it in more elegant way. The current problem is, that the idea.base.idea_typedef contains hardcoded calls for typedcol.typed_list(), which are not customizable from outside of the module via flavour mechanism. The addon feature was used to monkeypatch these definitions. This is of course not optimal solution, because any changes in underlying library must be propagated manually into source:lib/mentat/idea/mongodb.py library.

Ahha, hardcoded typedcol in idea.base.idea_typedef. Good point. Ok, how about changing:

def idea_typedef(flavour, list_flavour, defaults_flavour, source_target_dict, attach_dict, node_dict, addon=None)

to explicit

def idea_typedef(flavour, list_flavour, defaults_flavour, source_list, target_list, attach_list, node_list, addon=None)

Usage then would be something akin to:

typedef = base.idea_typedef(
        idea_types,
        idea_lists,
        idea_defaults,
        typedcol.typed_list("SourceList", SourceTargetDict),
        typedcol.typed_list("TargetList", SourceTargetDict),
        typedcol.typed_list("AttachList", AttachDict),
        typedcol.typed_list("NodeList", NodeDict)

or, for type-stripped version:

class SourceTargetList(typedcol.TypedList):
        item_type = simplify(SourceTargetDict)

    class AttachList(typedcol.TypedList):
        item_type = simplify(AttachDict)

    class NodeList(typedcol.TypedList):
        item_type = simplify(NodeDict)

    typedef = base.idea_typedef(
        idea_types,
        idea_lists,
        idea_defaults,
        simplify(SourceTargetList),
        simplify(SourceTargetList),
        simplify(AttachList),
        simplify(NodeList)

Would that be ok?

By Pavel Kácha on 2017-04-18T09:58:49

@ph wrote:

Would that be ok?

Yes, that was also my initial idea. That would definitely solve our issue and all custom libraries would be more robust and more customizable.

By Jan Mach on 2017-04-18T10:14:26

@Jan_Mach wrote:

Additionally, IDEA messages stored in database contain some additional attributes, that are database specific and internal and should be stripped upor retrieving from database. Currently this must be done manually using truncate() function call, more optimal solution would be to incorporate this into typedcol library and strip these attributes during object instantination/conversion process.

Are you positively sure that you want them stripped completely? You can't get to them later.

Is it necessary only statically and only in TypedDict (not TypedList)?

Possible solutions:

typedef = {
    "unwanted_one": {
        "drop": True
    }
}

Or, slightly more flexible (however seems out of scope of typedcol for me) possibility, usable also in TypedList:

# More pythonic
def UnwantedType(s):
    raise typedcol.Drop

or

# probably faster
def UnwantedType(s):
    return typedcol.Drop

and

typedef = {
    "unwanted_one": {
        "type": UnwantedType
    }
}

By Pavel Kácha on 2017-04-18T10:20:02

assigned to @Jan_Mach and unassigned @ph

By Pavel Kácha on 2024-12-26T14:13:02

Both in.

idea:102034e87b794fcb2f5c5ca2c225e167ebe4fcda

explicit list args in idea_typedef
list_factory callable in list_types

idea:3a43637b9b6c24cdee66a564b91bd68a8f0d924e

Discarding of elements in TypedDict (most common use: "type": Discard)

By Pavel Kácha on 2017-04-20T08:55:32

Please check the correctness of the generated structure for IPs - min, max, NO ip for networks, min == max == ip for single ips.

By Pavel Kácha on 2017-05-19T14:37:44

added Done 75% S-In Progress labels and removed Done 0% S-Feedback labels

Current state of this module is sufficient for production environment. We are finally releasing 2.0 version of Mentat system, so the period of frantic coding and implementation chaos is over. Any further improvements of this module will be done as they should in separate Redmine issues.

By Jan Mach on 2018-07-27T09:47:31

added Done 100% S-Closed labels and removed Done 75% S-In Progress labels

closed

Implement mentat-storage.py module

Designs

Child items 0

Activity