Skip to content
Snippets Groups Projects
  1. Oct 26, 2022
  2. Aug 10, 2022
    • Yorhel's avatar
      Improve exclude pattern matching performance (and behavior, a bit) · d523a77f
      Yorhel authored
      Behavioral changes:
      - A single wildcard ('*') does not cross directory boundary anymore.
        Previously 'a*b' would also match 'a/b', but no other tool that I am
        aware of matches paths that way. This change breaks compatibility with
        old exclude patterns but improves consistency with other tools.
      - Patterns with a trailing '/' now prevent recursing into the directory.
        Previously any directory excluded with such a pattern would show up as
        a regular directory with all its contents excluded, but now the
        directory entry itself shows up as excluded.
      - If the path given to ncdu matches one of the exclude patterns, the old
        implementation would exclude every file/dir being read, this new
        implementation would instead ignore the rule. Not quite sure how to
        best handle this case, perhaps just exit with an error message?
      
      Performance wise, I haven't yet found a scenario where this
      implementation is slower than the old one and it's *significantly*
      faster in some cases - in particular when using a large amount of
      patterns, especially with literal paths and file names.
      
      That's not to say this implementation is anywhere near optimal:
      - A list of relevant patterns is constructed for each directory being
        scanned. It may be possible to merge pattern lists that share
        the same prefix, which could both reduce memory use and the number of
        patterns that need to be matched upon entering a directory.
      - A hash table with dynamic arrays as values is just garbage from a
        memory allocation point of view.
      - This still uses libc fnmatch(), but there's an opportunity to
        precompile patterns for faster matching.
      d523a77f
Loading