Commits · d523a77fdc15f31e317b17ab5a467c8c70fff9b3 · 702 / Provoz / nccf

Aug 10, 2022

Improve exclude pattern matching performance (and behavior, a bit) · d523a77f

Yorhel authored 2 years ago

Behavioral changes:
- A single wildcard ('*') does not cross directory boundary anymore.
  Previously 'a*b' would also match 'a/b', but no other tool that I am
  aware of matches paths that way. This change breaks compatibility with
  old exclude patterns but improves consistency with other tools.
- Patterns with a trailing '/' now prevent recursing into the directory.
  Previously any directory excluded with such a pattern would show up as
  a regular directory with all its contents excluded, but now the
  directory entry itself shows up as excluded.
- If the path given to ncdu matches one of the exclude patterns, the old
  implementation would exclude every file/dir being read, this new
  implementation would instead ignore the rule. Not quite sure how to
  best handle this case, perhaps just exit with an error message?

Performance wise, I haven't yet found a scenario where this
implementation is slower than the old one and it's *significantly*
faster in some cases - in particular when using a large amount of
patterns, especially with literal paths and file names.

That's not to say this implementation is anywhere near optimal:
- A list of relevant patterns is constructed for each directory being
  scanned. It may be possible to merge pattern lists that share
  the same prefix, which could both reduce memory use and the number of
  patterns that need to be matched upon entering a directory.
- A hash table with dynamic arrays as values is just garbage from a
  memory allocation point of view.
- This still uses libc fnmatch(), but there's an opportunity to
  precompile patterns for faster matching.

d523a77f

Mar 24, 2022

Fix bad assertion in scan.zig:addSpecial() · 5f383966

Yorhel authored 3 years ago

While it's true that the root item can't be a special, the first item to
be added is not necessarily the root item. In particular, it isn't when
refreshing.

Probably fixes #194

5f383966

Feb 05, 2022

scan: Add UI message when counting hard links · e42db579

Yorhel authored 3 years ago

That *usually* doesn't take longer than a few milliseconds, but it can
take a few seconds for some extremely large dirs, on very slow computers
or with optimizations disabled. Better display a message than make it
seem as if ncdu has stopped doing anything.

e42db579

Jan 01, 2022
- Version 2.0.1 + copyright year bump · 01f1e918
  Yorhel authored 3 years ago
  
  View commits for tag v2.0.1 v2.0.1
  
  01f1e918
Dec 21, 2021
- Fixes and updates for Zig 0.9.0 · 6a68cd9b
  Yorhel authored 3 years ago
  
  6a68cd9b
Nov 02, 2021

Fix export feature · 5b462cfb

Yorhel authored 3 years ago

...by making sure that Context.parents is properly initialized to null
when not scanning to RAM.

Fixes #179.

5b462cfb

Oct 06, 2021
- Fix import of "special" dirs and excluded items · 929cc756
  Yorhel authored 3 years ago
  
  929cc756
- Add configuration file support · 4b1da958
  Yorhel authored 3 years ago
  
  4b1da958
Jul 28, 2021

Fix reporting of fatal scan error in -0 or -1 UIs · 3acab71f
Yorhel authored 3 years ago

3acab71f

Implement a more efficient hard link counting approach · 0d314ca0

Yorhel authored 3 years ago

As aluded to in the previous commit. This approach keeps track of hard
links information much the same way as ncdu 1.16, with the main
difference being that the actual /counting/ of hard link sizes is
deferred until the scan is complete, thus allowing the use of a more
efficient algorithm and amortizing the counting costs.

As an additional benefit, the links listing in the information window
now doesn't need a full scan through the in-memory tree anymore.

A few memory usage benchmarks:

              1.16  2.0-beta1  this commit
root:          429        162          164
backup:       3969       1686         1601
many links:    155        194          106
many links2*:  155        602          106

(I'm surprised my backup dir had enough hard links for this to be an
improvement)
(* this is the same as the "many links" benchmarks, but with a few
parent directories added to increase the tree depth. 2.0-beta1 doesn't
like that at all)

Performance-wise, refresh and delete operations can still be improved a
bit.

0d314ca0

Jul 26, 2021

Add parent node pointers to Dir struct + remove Parents abstraction · 36bc405a

Yorhel authored 3 years ago

While this simplifies the code a bit, it's a regression in the sense
that it increases memory use.

This commit is yak shaving for another hard link counting approach I'd
like to try out, which should be a *LOT* less memory hungry compared to
the current approach. Even though it does, indeed, add an extra cost of
these parent node pointers.

36bc405a

Jul 19, 2021
- Fix counting of sizes for new directories · a915fc08
  Yorhel authored 3 years ago
  
  a915fc08
Jul 18, 2021

scan: Don't allocate directory iterator on the stack · b96587c2

Yorhel authored 3 years ago

I had planned to checkout out async functions here so I could avoid
recursing onto the stack alltogether, but it's still unclear to me how
to safely call into libc from async functions so let's wait for all that
to get fleshed out a bit more.

b96587c2

Add REUSE-compliant copyright headers · c8636b89
Yorhel authored 3 years ago

c8636b89
Add Makefile with some standard/handy tools · ee92f403
Yorhel authored 3 years ago
```
+ a failed initial attempt at producing static binaries.
```
ee92f403

Jul 16, 2021
- Implement file deletion + a bunch of bug fixes · 3a21dea2
  Yorhel authored 3 years ago
  
  3a21dea2
Jul 13, 2021

Implement directory refresh · 6c2ab500

Yorhel authored 3 years ago

This complicated the scan code more than I had anticipated and has a
few inherent bugs with respect to calculating shared hardlink sizes.

Still, the merge approach avoids creating a full copy of the subtree, so
that's another memory usage related win compared to the C version.
On the other hand, it does leak memory if nodes can't be reused.

Not quite as well tested as I should have, so I'm sure there's bugs.

6c2ab500

Jul 06, 2021

Add link path listing to information window · ff3e3bcc

Yorhel authored 3 years ago

Two differences compared to the C version:
- You can now select individual paths in the listing, pressing enter
  will open the selected path in the browser window.
- Creating this listing is much slower and requires, in the worst case,
  a full traversal through the in-memory tree. I've tested this without
  the same-dev and shared-parent optimizations (i.e. worst case) on an
  import with 30M files and performance was still quite acceptable - the
  listing completed in a second - so I didn't bother adding a loading
  indicator. On slower systems and even larger trees this may be a
  little annoying, though.

(also, calling nonl() apparently breaks detection of the return key,
neither \n nor KEY_ENTER are emitted for some reason)

ff3e3bcc

Jun 01, 2021

Make some space for shared size in UI + speed up JSON import a bit · cc1966d6
Yorhel authored 3 years ago
```
It still feels kind of sluggish, but not entirely sure how to improve
it.
```
cc1966d6

Support hard link counts when importing old ncdu dumps · e6b2cff3

Yorhel authored 3 years ago

Under the assumption that there are no external references to files
mentioned in the dump, i.e. a file's nlink count matches the number of
times the file occurs in the dump.

This machinery could also be used for regular scans, when you want to
scan an individual directory without caring about external hard links.
Maybe that should be the default, even? Not sure...

e6b2cff3

May 29, 2021

Improved error reporting + minor cleanup · 59ef5fd2
Yorhel authored 3 years ago

59ef5fd2

Handle allocation failures · 23903088

Yorhel authored 3 years ago

In a similar way to the C version of ncdu: by wrapping malloc(). It's
simpler to handle allocation failures at the source to allow for easy
retries, pushing the retries up the stack will complicate code somewhat
more. Likewise, this is a best-effort approach to handling OOM,
allocation failures in ncurses aren't handled and display glitches may
occur when we get an OOM inside a drawing function.

This is a somewhat un-Zig-like way of handling errors and adds
scary-looking 'catch unreachable's all over the code, but that's okay.

23903088

Implement JSON file import · c077c5be

Yorhel authored 3 years ago

Performance is looking great, but the code is rather ugly and
potentially buggy. Also doesn't handle hard links without an "nlink"
field yet.

Error handling of the import code is different from what I've been doing
until now. That's intentional, I'll change error handling of other
pieces to call ui.die() directly rather than propagating error enums.
The approach is less testable but conceptually simpler, it's perfectly
fine for a tiny application like ncdu.

c077c5be

May 23, 2021

Implement all existing browsing display options + some fixes · 7b3ebf92

Yorhel authored 3 years ago

I plan to add more display options, but ran out of keys to bind.
Probably going for a quick-select menu thingy so that we can keep the
old key bindings for people accustomed to it.

The graph width algorithm is slightly different, but I think this one's
a minor improvement.

7b3ebf92

May 12, 2021

Implement export to file · 231ab103

Yorhel authored 3 years ago

The exported file format is fully compatible with ncdu 1.x, but has a
few minor differences. I've backported these changes in
ca51d4ed

231ab103

May 11, 2021
- Implement confirm quit · 4cc422d6
  Yorhel authored 3 years ago
  
  (+ 2 minor crash fixes due to out-of-bounds cursor_idx)
  4cc422d6
May 09, 2021
- Implement scanning UI (-0,-1,-2) · b0e81ea4
  Yorhel authored 3 years ago
  
  b0e81ea4
May 06, 2021

More UI stuff + shave off 16 bytes from model.Dir · 27cb599e

Yorhel authored 3 years ago

I initially wanted to keep a directory's block count and size as a
separate field so that exporting an in-memory tree to a JSON dump would
be easier to do, but that doesn't seem like a common operation to
optimize for. We'll probably need the algorithms to subtract sub-items
from directory counts anyway, so such an export can still be
implemented, albeit slower.

27cb599e

May 03, 2021

Implement --exclude-kernfs and --exclude-pattern · a28a0788

Yorhel authored 3 years ago

Eaiser to implement now that we're linking against libc.

But exclude pattern matching is extremely slow, so that should really be
rewritten with a custom fnmatch implementation. It's exactly as slow as
in ncdu 1.x as well, I'm surprised nobody's complained about it yet.
And while I'm at it, supporting .gitignore-style patterns would be
pretty neat, too.

a28a0788

May 01, 2021
- Correct int truncating/saturating + avoid one toPosixPath() · 3e27d370
  Yorhel authored 3 years ago
  
  3e27d370
Apr 30, 2021
- Fix some scanning bugs + support --exclude-caches and --follow-symlinks · 097f49d9
  Yorhel authored 3 years ago
  
  Supporting kernfs checking is going to be a bit more annoying. And so is exclude patterns. Ugh.
  097f49d9
Apr 29, 2021

Add CLI argument parsing · e2805da0
Yorhel authored 3 years ago

e2805da0

WIP: Experimenting with a rewrite to Zig & a new data model · 0783d357

Yorhel authored 3 years ago

The new data model is supposed to solve a few problems with ncdu 1.x's
'struct dir':
- Reduce memory overhead,
- Fix extremely slow counting of hard links in some scenarios
  (issue #121)
- Add support for counting 'shared' data with other directories
  (issue #36)

Quick memory usage comparison of my root directory with ~3.5 million
files (normal / extended mode):

  ncdu 1.15.1:     379M / 451M
  new (unaligned): 145M / 178M
  new (aligned):   155M / 200M

There's still a /lot/ of to-do's left before this is usable, however,
and there's a bunch of issues I haven't really decided on yet, such as
which TUI library to use.

Backporting this data model to the C version of ncdu is also possible,
but somewhat painful. Let's first see how far I get with Zig.

0783d357