Home - Waterfall Grid T-Grid Console Builders Recent Builds Buildslaves Changesources - JSON API - About

Console View


Tags: Platforms default
Legend:   Passed Failed Warnings Failed Again Running Exception Offline No data

Platforms default
Don Brady
Avoid fault diagnosis if multiple vdevs have errors

When multiple drives are throwing errors, it is likely not
a drive failing but rather a failure above the drives, like
a controller.  The active cases context of the drive's peers
is now considered when making a diagnosis.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Don Brady <don.brady@klarasystems.com>

Pull-request: #16531 part 1/1
Rich Ercolani
Add simd metadata in /proc on Linux

Too many times, people's performance problems have amounted to
"somehow your SIMD support isn't working", and determining that
at runtime is difficult to describe to people.

This adds a /proc/spl/kstat/zfs/simd node, which exposes
metadata about which instructions ZFS thinks it can use,
on AArch64 and x86_64 Linux, to make investigating things
like this much easier.

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>

Pull-request: #16530 part 1/1
Rick Macklem
Fix handling of DNS names with '-' in them for sharenfs

An old FreeBSD bugzilla report PR#168158 notes that DNS
names with '-'s in them cannot be used for the sharenfs
property.  This patch fixes the parsing of these DNS names.
The only negative affect this patch might have is that,
if a user has incorrectly separated options with a '-'
the sharenfs setting will no longer work once this patch
is applied.

Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>

Pull-request: #16529 part 1/1
sabi-tamra
Change version to 2.1.5.12

Pull-request: #16528 part 733/733
Don Brady
Multiple pool support for ztest

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Don Brady <don.brady@klarasystems.com>

Pull-request: #16526 part 1/1
Umer Saleem
Merge branch 'master' into NAS-130821-2

Signed-off-by: Umer Saleem <usaleem@ixsystems.com>

Pull-request: #16523 part 53/53
Tino Reichardt
Remove set but not used variable in ddt.c

module/zfs/ddt.c:2612:6: error: variable 'total' set but not used

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>

Pull-request: #16522 part 1/1
Alan Somers
Fix an uninitialized data access

zfs_acl_node_alloc allocates an uninitialized data buffer, but upstack
zfs_acl_chmod only partially initializes it.  KMSAN reported that this
memory remained uninitialized at the point when it was read by
lzjb_compress, which suggests a possible kernel memory disclosure bug.

The full KMSAN warning may be found in the PR.
https://github.com/openzfs/zfs/pull/16511

Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored by: Axcient

Pull-request: #16511 part 1/1
Theera K.
arcstat: add structural, types, states breakdown

Signed-off-by: Theera K. <tkittich@hotmail.com>

arcstat: add structural, types, states breakdown

add ARC structural breakdown, ARC types breakdown, ARC states breakdown similar to arc_summary

Signed-off-by: Theera K. <tkittich@hotmail.com>

rename column names to fit 5 chars

Signed-off-by: Theera K. <tkittich@hotmail.com>

rename column names to fit 5 chars

Signed-off-by: Theera K. <tkittich@hotmail.com>

add data target, metadata target

Signed-off-by: Theera K. <tkittich@hotmail.com>

add data target, metadata target

Signed-off-by: Theera K. <tkittich@hotmail.com>

fix abd typo

Signed-off-by: Theera K. <tkittich@hotmail.com>

fix abd typo

Signed-off-by: Theera K. <tkittich@hotmail.com>

make column name a bit shorter

Signed-off-by: Theera K. <tkittich@hotmail.com>

make column name a bit shorter

Signed-off-by: Theera K. <tkittich@hotmail.com>

fix structural typo

Signed-off-by: Theera K. <tkittich@hotmail.com>

Delete cmd/arc_summary.py

incorrect file extension

Signed-off-by: Theera K. <tkittich@hotmail.com>

fix structural typo

Signed-off-by: Theera K. <tkittich@hotmail.com>

arcstat: add structural, types, states breakdown

add ARC structural breakdown, ARC types breakdown, ARC states breakdown similar to arc_summary

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat: add Anonymous, MFU, MRU, Uncached

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat: add Anonymous, MFU, MRU, Uncached

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat: add Anonymous, MFU, MRU, Uncached

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat.1 : add mfusz, mrusz, l2wbytes

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat: add mfusz, mrusz, l2wbytes

mfusz: MFU size
mrusz: MRU size
l2wbytes: Bytes write per second to the L2ARC

Signed-off-by: Theera K. <tkittich@hotmail.com>

remove extra spaces

Signed-off-by: Theera K. <tkittich@hotmail.com>

arcstat: add target size of data, meta, MFU, MRU

arcstat: add target size of ARC data, ARC metadata, MFU, MRU

Signed-off-by: Theera K. <tkittich@hotmail.com>

arcstat: add target size of data, meta, MFU, MRU

arcstat: add target size of ARC data, ARC metadata, MFU, MRU

Signed-off-by: Theera K. <tkittich@hotmail.com>

shorten new column names; show decimal when < 10

Signed-off-by: Theera K. <tkittich@hotmail.com>

Pull-request: #16509 part 1/1
GitHub
shorten new column names; show decimal when < 10

Signed-off-by: Theera K. <tkittich@hotmail.com>

Pull-request: #16509 part 3/3
GitHub
shorten new column names; show decimal when < 10

Signed-off-by: Theera K. <tkittich@hotmail.com>

Pull-request: #16509 part 2/3
Theera K.
arcstat: add structural, types, states breakdown

add ARC structural breakdown, ARC types breakdown, ARC states breakdown similar to arc_summary

Signed-off-by: Theera K. <tkittich@hotmail.com>

rename column names to fit 5 chars

Signed-off-by: Theera K. <tkittich@hotmail.com>

rename column names to fit 5 chars

Signed-off-by: Theera K. <tkittich@hotmail.com>

add data target, metadata target

Signed-off-by: Theera K. <tkittich@hotmail.com>

add data target, metadata target

Signed-off-by: Theera K. <tkittich@hotmail.com>

fix abd typo

Signed-off-by: Theera K. <tkittich@hotmail.com>

fix abd typo

Signed-off-by: Theera K. <tkittich@hotmail.com>

make column name a bit shorter

Signed-off-by: Theera K. <tkittich@hotmail.com>

make column name a bit shorter

Signed-off-by: Theera K. <tkittich@hotmail.com>

fix structural typo

Signed-off-by: Theera K. <tkittich@hotmail.com>

Delete cmd/arc_summary.py

incorrect file extension

Signed-off-by: Theera K. <tkittich@hotmail.com>

fix structural typo

Signed-off-by: Theera K. <tkittich@hotmail.com>

arcstat: add structural, types, states breakdown

add ARC structural breakdown, ARC types breakdown, ARC states breakdown similar to arc_summary

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat: add Anonymous, MFU, MRU, Uncached

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat: add Anonymous, MFU, MRU, Uncached

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat: add Anonymous, MFU, MRU, Uncached

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat.1 : add mfusz, mrusz, l2wbytes

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat: add mfusz, mrusz, l2wbytes

mfusz: MFU size
mrusz: MRU size
l2wbytes: Bytes write per second to the L2ARC

Signed-off-by: Theera K. <tkittich@hotmail.com>

remove extra spaces

Signed-off-by: Theera K. <tkittich@hotmail.com>

arcstat: add target size of data, meta, MFU, MRU

arcstat: add target size of ARC data, ARC metadata, MFU, MRU

Signed-off-by: Theera K. <tkittich@hotmail.com>

arcstat: add target size of data, meta, MFU, MRU

arcstat: add target size of ARC data, ARC metadata, MFU, MRU

Signed-off-by: Theera K. <tkittich@hotmail.com>

Pull-request: #16509 part 1/3
GitHub
shorten new column names; show decimal when < 10

Signed-off-by: Theera K. <tkittich@hotmail.com>

Pull-request: #16509 part 2/2
Theera K.
arcstat: add structural, types, states breakdown

add ARC structural breakdown, ARC types breakdown, ARC states breakdown similar to arc_summary

Signed-off-by: Theera K. <tkittich@hotmail.com>

rename column names to fit 5 chars

Signed-off-by: Theera K. <tkittich@hotmail.com>

rename column names to fit 5 chars

Signed-off-by: Theera K. <tkittich@hotmail.com>

add data target, metadata target

Signed-off-by: Theera K. <tkittich@hotmail.com>

add data target, metadata target

Signed-off-by: Theera K. <tkittich@hotmail.com>

fix abd typo

Signed-off-by: Theera K. <tkittich@hotmail.com>

fix abd typo

Signed-off-by: Theera K. <tkittich@hotmail.com>

make column name a bit shorter

Signed-off-by: Theera K. <tkittich@hotmail.com>

make column name a bit shorter

Signed-off-by: Theera K. <tkittich@hotmail.com>

fix structural typo

Signed-off-by: Theera K. <tkittich@hotmail.com>

Delete cmd/arc_summary.py

incorrect file extension

Signed-off-by: Theera K. <tkittich@hotmail.com>

fix structural typo

Signed-off-by: Theera K. <tkittich@hotmail.com>

arcstat: add structural, types, states breakdown

add ARC structural breakdown, ARC types breakdown, ARC states breakdown similar to arc_summary

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat: add Anonymous, MFU, MRU, Uncached

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat: add Anonymous, MFU, MRU, Uncached

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat: add Anonymous, MFU, MRU, Uncached

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat.1 : add mfusz, mrusz, l2wbytes

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat: add mfusz, mrusz, l2wbytes

mfusz: MFU size
mrusz: MRU size
l2wbytes: Bytes write per second to the L2ARC

Signed-off-by: Theera K. <tkittich@hotmail.com>

remove extra spaces

Signed-off-by: Theera K. <tkittich@hotmail.com>

arcstat: add target size of data, meta, MFU, MRU

arcstat: add target size of ARC data, ARC metadata, MFU, MRU

Signed-off-by: Theera K. <tkittich@hotmail.com>

arcstat: add target size of data, meta, MFU, MRU

arcstat: add target size of ARC data, ARC metadata, MFU, MRU

Signed-off-by: Theera K. <tkittich@hotmail.com>

Pull-request: #16509 part 1/1
GitHub
arcstat: add target size of data, meta, MFU, MRU

arcstat: add target size of ARC data, ARC metadata, MFU, MRU

Signed-off-by: Theera K. <tkittich@hotmail.com>

Pull-request: #16509 part 3/3
GitHub
arcstat: add target size of data, meta, MFU, MRU

arcstat: add target size of ARC data, ARC metadata, MFU, MRU

Signed-off-by: Theera K. <tkittich@hotmail.com>

Pull-request: #16509 part 2/3
Theera K.
arcstat: add structural, types, states breakdown

add ARC structural breakdown, ARC types breakdown, ARC states breakdown similar to arc_summary

Signed-off-by: Theera K. <tkittich@hotmail.com>

rename column names to fit 5 chars

Signed-off-by: Theera K. <tkittich@hotmail.com>

rename column names to fit 5 chars

Signed-off-by: Theera K. <tkittich@hotmail.com>

add data target, metadata target

Signed-off-by: Theera K. <tkittich@hotmail.com>

add data target, metadata target

Signed-off-by: Theera K. <tkittich@hotmail.com>

fix abd typo

Signed-off-by: Theera K. <tkittich@hotmail.com>

fix abd typo

Signed-off-by: Theera K. <tkittich@hotmail.com>

make column name a bit shorter

Signed-off-by: Theera K. <tkittich@hotmail.com>

make column name a bit shorter

Signed-off-by: Theera K. <tkittich@hotmail.com>

fix structural typo

Signed-off-by: Theera K. <tkittich@hotmail.com>

Delete cmd/arc_summary.py

incorrect file extension

Signed-off-by: Theera K. <tkittich@hotmail.com>

fix structural typo

Signed-off-by: Theera K. <tkittich@hotmail.com>

arcstat: add structural, types, states breakdown

add ARC structural breakdown, ARC types breakdown, ARC states breakdown similar to arc_summary

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat: add Anonymous, MFU, MRU, Uncached

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat: add Anonymous, MFU, MRU, Uncached

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat: add Anonymous, MFU, MRU, Uncached

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat.1 : add mfusz, mrusz, l2wbytes

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat: add mfusz, mrusz, l2wbytes

mfusz: MFU size
mrusz: MRU size
l2wbytes: Bytes write per second to the L2ARC

Signed-off-by: Theera K. <tkittich@hotmail.com>

remove extra spaces

Signed-off-by: Theera K. <tkittich@hotmail.com>

Pull-request: #16509 part 1/3
GitHub
arcstat: add target size of data, meta, MFU, MRU

arcstat: add target size of ARC data, ARC metadata, MFU, MRU

Signed-off-by: Theera K. <tkittich@hotmail.com>

Pull-request: #16509 part 2/2
Theera K.
arcstat: add structural, types, states breakdown

add ARC structural breakdown, ARC types breakdown, ARC states breakdown similar to arc_summary

Signed-off-by: Theera K. <tkittich@hotmail.com>

rename column names to fit 5 chars

Signed-off-by: Theera K. <tkittich@hotmail.com>

rename column names to fit 5 chars

Signed-off-by: Theera K. <tkittich@hotmail.com>

add data target, metadata target

Signed-off-by: Theera K. <tkittich@hotmail.com>

add data target, metadata target

Signed-off-by: Theera K. <tkittich@hotmail.com>

fix abd typo

Signed-off-by: Theera K. <tkittich@hotmail.com>

fix abd typo

Signed-off-by: Theera K. <tkittich@hotmail.com>

make column name a bit shorter

Signed-off-by: Theera K. <tkittich@hotmail.com>

make column name a bit shorter

Signed-off-by: Theera K. <tkittich@hotmail.com>

fix structural typo

Signed-off-by: Theera K. <tkittich@hotmail.com>

Delete cmd/arc_summary.py

incorrect file extension

Signed-off-by: Theera K. <tkittich@hotmail.com>

fix structural typo

Signed-off-by: Theera K. <tkittich@hotmail.com>

arcstat: add structural, types, states breakdown

add ARC structural breakdown, ARC types breakdown, ARC states breakdown similar to arc_summary

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat: add Anonymous, MFU, MRU, Uncached

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat: add Anonymous, MFU, MRU, Uncached

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat: add Anonymous, MFU, MRU, Uncached

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat.1 : add mfusz, mrusz, l2wbytes

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat: add mfusz, mrusz, l2wbytes

mfusz: MFU size
mrusz: MRU size
l2wbytes: Bytes write per second to the L2ARC

Signed-off-by: Theera K. <tkittich@hotmail.com>

remove extra spaces

Signed-off-by: Theera K. <tkittich@hotmail.com>

Pull-request: #16509 part 1/1
GitHub
remove extra spaces

Signed-off-by: Theera K. <tkittich@hotmail.com>

Pull-request: #16509 part 2/2
Theera K.
arcstat: add structural, types, states breakdown

add ARC structural breakdown, ARC types breakdown, ARC states breakdown similar to arc_summary

Signed-off-by: Theera K. <tkittich@hotmail.com>

rename column names to fit 5 chars

Signed-off-by: Theera K. <tkittich@hotmail.com>

rename column names to fit 5 chars

Signed-off-by: Theera K. <tkittich@hotmail.com>

add data target, metadata target

Signed-off-by: Theera K. <tkittich@hotmail.com>

add data target, metadata target

Signed-off-by: Theera K. <tkittich@hotmail.com>

fix abd typo

Signed-off-by: Theera K. <tkittich@hotmail.com>

fix abd typo

Signed-off-by: Theera K. <tkittich@hotmail.com>

make column name a bit shorter

Signed-off-by: Theera K. <tkittich@hotmail.com>

make column name a bit shorter

Signed-off-by: Theera K. <tkittich@hotmail.com>

fix structural typo

Signed-off-by: Theera K. <tkittich@hotmail.com>

Delete cmd/arc_summary.py

incorrect file extension

Signed-off-by: Theera K. <tkittich@hotmail.com>

fix structural typo

Signed-off-by: Theera K. <tkittich@hotmail.com>

arcstat: add structural, types, states breakdown

add ARC structural breakdown, ARC types breakdown, ARC states breakdown similar to arc_summary

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat: add Anonymous, MFU, MRU, Uncached

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat: add Anonymous, MFU, MRU, Uncached

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat: add Anonymous, MFU, MRU, Uncached

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat.1 : add mfusz, mrusz, l2wbytes

Signed-off-by: Theera K. <tkittich@hotmail.com>

Update arcstat: add mfusz, mrusz, l2wbytes

mfusz: MFU size
mrusz: MRU size
l2wbytes: Bytes write per second to the L2ARC

Signed-off-by: Theera K. <tkittich@hotmail.com>

Pull-request: #16509 part 1/2
Tony Hutter
Remove spa_namespace_lock from zpool status

This commit removes spa_namespace_lock from the zpool status codepath.
This means that zpool status will not hang if a pool fails while holding
the spa_namespace_lock.

Background:

The spa_namespace_lock was originally meant to protect the
spa_namespace_avl AVL tree.  The spa_namespace_avl tree held the
mappings from pool names to spa_t's.  So if you wanted to lookup the
spa_t for the "tank" pool, you would do an AVL search for "tank" while
holding spa_namespace_lock.

Over time though the spa_namespace_lock was re-purposed to protect other
critical codepaths in the spa subsystem as well.  In many cases we don't
know what the original authors meant to protect with it, or if they
needed it for read-only or read-write protection.  It is simply "too big
and risky to fix properly".

The workaround is to add a new lightweight version of the
spa_namespace_lock called spa_namespace_lite_lock.
spa_namespace_lite_lock only protects the AVL tree, and nothing else.
It can be used for read-only access to the AVL tree without requiring
the spa_namespace_lock.  Calls to spa_lookup_lite() and spa_next_lite()
only need to acquire a reader lock on spa_namespace_lite_lock; they do
not need to also acquire the old spa_namespace_lock.  This allows us to
still run zpool status even if the zfs module has spa_namespace_lock
held.  Note that these AVL tree locks only protect the tree, not the
actual spa_t contents.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>

Pull-request: #16507 part 1/1
Tony Hutter
Remove spa_namespace_lock from zpool status

This commit removes spa_namespace_lock from the zpool status codepath.
This means that zpool status will not hang if a pool fails while holding
the spa_namespace_lock.

Background:

The spa_namespace_lock was originally meant to protect the
spa_namespace_avl AVL tree.  The spa_namespace_avl tree held the
mappings from pool names to spa_t's.  So if you wanted to lookup the
spa_t for the "tank" pool, you would do an AVL search for "tank" while
holding spa_namespace_lock.

Over time though the spa_namespace_lock was re-purposed to protect other
critical codepaths in the spa subsystem as well.  In many cases we don't
know what the original authors meant to protect with it, or if they
needed it for read-only or read-write protection.  It is simply "too big
and risky to fix properly".

The workaround is to add a new lightweight version of the
spa_namespace_lock called spa_namespace_lite_lock.
spa_namespace_lite_lock only protects the AVL tree, and nothing else.
It can be used for read-only access to the AVL tree without requiring
the spa_namespace_lock.  Calls to spa_lookup_lite() and spa_next_lite()
only need to acquire a reader lock on spa_namespace_lite_lock; they do
not need to also acquire the old spa_namespace_lock.  This allows us to
still run zpool status even if the zfs module has spa_namespace_lock
held.  Note that these AVL tree locks only protect the tree, not the
actual spa_t contents.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>

Pull-request: #16507 part 1/1
Tony Hutter
Remove spa_namespace_lock from zpool status

This commit removes spa_namespace_lock from the zpool status codepath.
This means that zpool status will not hang if a pool fails while holding
the spa_namespace_lock.

Background:

The spa_namespace_lock was originally meant to protect the
spa_namespace_avl AVL tree.  The spa_namespace_avl tree held the
mappings from pool names to spa_t's.  So if you wanted to lookup the
spa_t for the "tank" pool, you would do an AVL search for "tank" while
holding spa_namespace_lock.

Over time though the spa_namespace_lock was re-purposed to protect other
critical codepaths in the spa subsystem as well.  In many cases we don't
know what the original authors meant to protect with it, or if they
needed it for read-only or read-write protection.  It is simply "too big
and risky to fix properly".

The workaround is to add a new lightweight version of the
spa_namespace_lock called spa_namespace_lite_lock.
spa_namespace_lite_lock only protects the AVL tree, and nothing else.
It can be used for read-only access to the AVL tree without requiring
the spa_namespace_lock.  Calls to spa_lookup_lite() and spa_next_lite()
only need to acquire a reader lock on spa_namespace_lite_lock; they do
not need to also acquire the old spa_namespace_lock.  This allows us to
still run zpool status even if the zfs module has spa_namespace_lock
held.  Note that these AVL tree locks only protect the tree, not the
actual spa_t contents.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>

Pull-request: #16507 part 1/1
Tony Hutter
Remove spa_namespace_lock from zpool status

This commit removes spa_namespace_lock from the zpool status codepath.
This means that zpool status will not hang if a pool fails while holding
the spa_namespace_lock.

Background:

The spa_namespace_lock was originally meant to protect the
spa_namespace_avl AVL tree.  The spa_namespace_avl tree held the
mappings from pool names to spa_t's.  So if you wanted to lookup the
spa_t for the "tank" pool, you would do an AVL search for "tank" while
holding spa_namespace_lock.

Over time though the spa_namespace_lock was re-purposed to protect other
critical codepaths in the spa subsystem as well.  In many cases we don't
know what the original authors meant to protect with it, or if they
needed it for read-only or read-write protection.  It is simply "too big
and risky to fix properly".

The workaround is to add a new lightweight version of the
spa_namespace_lock called spa_namespace_lite_lock.
spa_namespace_lite_lock only protects the AVL tree, and nothing else.
It can be used for read-only access to the AVL tree without requiring
the spa_namespace_lock.  Calls to spa_lookup_lite() and spa_next_lite()
only need to acquire a reader lock on spa_namespace_lite_lock; they do
not need to also acquire the old spa_namespace_lock.  This allows us to
still run zpool status even if the zfs module has spa_namespace_lock
held.  Note that these AVL tree locks only protect the tree, not the
actual spa_t contents.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>

Pull-request: #16507 part 1/1
Tony Hutter
Remove spa_namespace_lock from zpool status

This commit removes spa_namespace_lock from the zpool status codepath.
This means that zpool status will not hang if a pool fails while holding
the spa_namespace_lock.

Background:

The spa_namespace_lock was originally meant to protect the
spa_namespace_avl AVL tree.  The spa_namespace_avl tree held the
mappings from pool names to spa_t's.  So if you wanted to lookup the
spa_t for the "tank" pool, you would do an AVL search for "tank" while
holding spa_namespace_lock.

Over time though the spa_namespace_lock was re-purposed to protect other
critical codepaths in the spa subsystem as well.  In many cases we don't
know what the original authors meant to protect with it, or if they
needed it for read-only or read-write protection.  It is simply "too big
and risky to fix properly".

The workaround is to add a new lightweight version of the
spa_namespace_lock called spa_namespace_lite_lock.
spa_namespace_lite_lock only protects the AVL tree, and nothing else.
It can be used for read-only access to the AVL tree without requiring
the spa_namespace_lock.  Calls to spa_lookup_lite() and spa_next_lite()
only need to acquire a reader lock on spa_namespace_lite_lock; they do
not need to also acquire the old spa_namespace_lock.  This allows us to
still run zpool status even if the zfs module has spa_namespace_lock
held.  Note that these AVL tree locks only protect the tree, not the
actual spa_t contents.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>

Pull-request: #16507 part 1/1
Mateusz Piotrowski
Implement parallel dbuf eviction

In the previous code, dbuf_evict_thread() would called dbuf_evict_one()
in a look while dbuf_cache_above_lowater().

dbuf_evict_one() would select a random sublist from the dbuf cache,
then walk it from the tail forward, attempting to acquire the lock on
each object until it succeeded, then evict that object and return.

As the name suggests, it would evict only a single object from the
cache. However, evicting one object is not likely to bring us below the
desired low water mark, so dbuf_evict_one() will be called again, where
it will loop over all of the same busy objects again, until it founds
one it can evict.

This has been replaced with dbuf_evict_many() which takes a specific
sublist as a parameter, as well as a desired amount of data to evict.
It then walks the sublist from the tail forward, evicting what it can
until the number of bytes evicted satisfies the input parameter or
the head of the sublist is reached.

The dbuf_evict_thread now runs is parallel as well, allowing it to
keep up with demand more easily. For the dbuf cache, if the single
thread was not able to keep up, ZFS would shift the work of evicting
some items to each incoming I/O thread. While that is still the case
it should be seen much less often now that dbuf_evict is more efficient
and no longer bottlenecked to a single thread.

Sponsored-by: Expensify, Inc.
Sponsored-by: Klara, Inc.
Co-authored-by: Allan Jude <allan@klarasystems.com>
Co-authored-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Signed-off-by: Alexander Stetsenko <alex.stetsenko@gmail.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>

Pull-request: #16487 part 1/1
Mateusz Piotrowski
Implement parallel dbuf eviction

In the previous code, dbuf_evict_thread() would called dbuf_evict_one()
in a look while dbuf_cache_above_lowater().

dbuf_evict_one() would select a random sublist from the dbuf cache,
then walk it from the tail forward, attempting to acquire the lock on
each object until it succeeded, then evict that object and return.

As the name suggests, it would evict only a single object from the
cache. However, evicting one object is not likely to bring us below the
desired low water mark, so dbuf_evict_one() will be called again, where
it will loop over all of the same busy objects again, until it founds
one it can evict.

This has been replaced with dbuf_evict_many() which takes a specific
sublist as a parameter, as well as a desired amount of data to evict.
It then walks the sublist from the tail forward, evicting what it can
until the number of bytes evicted satisfies the input parameter or
the head of the sublist is reached.

The dbuf_evict_thread now runs is parallel as well, allowing it to
keep up with demand more easily. For the dbuf cache, if the single
thread was not able to keep up, ZFS would shift the work of evicting
some items to each incoming I/O thread. While that is still the case
it should be seen much less often now that dbuf_evict is more efficient
and no longer bottlenecked to a single thread.

Sponsored-by: Expensify, Inc.
Sponsored-by: Klara, Inc.
Co-authored-by: Allan Jude <allan@klarasystems.com>
Co-authored-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Signed-off-by: Alexander Stetsenko <alex.stetsenko@gmail.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>

Pull-request: #16487 part 1/1
Mateusz Piotrowski
Implement parallel dbuf eviction

In the previous code, dbuf_evict_thread() would called dbuf_evict_one()
in a look while dbuf_cache_above_lowater().

dbuf_evict_one() would select a random sublist from the dbuf cache,
then walk it from the tail forward, attempting to acquire the lock on
each object until it succeeded, then evict that object and return.

As the name suggests, it would evict only a single object from the
cache. However, evicting one object is not likely to bring us below the
desired low water mark, so dbuf_evict_one() will be called again, where
it will loop over all of the same busy objects again, until it founds
one it can evict.

This has been replaced with dbuf_evict_many() which takes a specific
sublist as a parameter, as well as a desired amount of data to evict.
It then walks the sublist from the tail forward, evicting what it can
until the number of bytes evicted satisfies the input parameter or
the head of the sublist is reached.

The dbuf_evict_thread now runs is parallel as well, allowing it to
keep up with demand more easily. For the dbuf cache, if the single
thread was not able to keep up, ZFS would shift the work of evicting
some items to each incoming I/O thread. While that is still the case
it should be seen much less often now that dbuf_evict is more efficient
and no longer bottlenecked to a single thread.

Sponsored-by: Expensify, Inc.
Sponsored-by: Klara, Inc.
Co-authored-by: Allan Jude <allan@klarasystems.com>
Co-authored-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Signed-off-by: Alexander Stetsenko <alex.stetsenko@gmail.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>

Pull-request: #16487 part 1/1
Mateusz Piotrowski
Implement parallel ARC eviction

Read and write performance can become limited by the arc_evict
process being single threaded. Additional data cannot be added
to the ARC until sufficient existing data is evicted.

On many-core systems with TBs of RAM, a single thread becomes
a significant bottleneck.

With the change we see a 25% increase in read and write throughput

Sponsored-by: Expensify, Inc.
Sponsored-by: Klara, Inc.
Co-authored-by: Allan Jude <allan@klarasystems.com>
Co-authored-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Signed-off-by: Alexander Stetsenko <alex.stetsenko@klarasystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>

Pull-request: #16486 part 1/1
Shengqi Chen
cityhash: replace invocations with specialized versions when possible

So that we can get actual benefit from last commit.

See more discussion at https://github.com/openzfs/zfs/pull/16483.

Acked-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>

Pull-request: #16483 part 3/3
Shengqi Chen
zcommon: add specialized versions of cityhash4

Specializing cityhash4 on 32-bit architectures can reduce the size
of stack frames as well as instruction count. This is a tiny but
useful optimization, since some callers invoke it frequently.

When specializing into 1/2/3/4-arg versions, the stack usage
(in bytes) on some 32-bit arches are listed as follows:

- x86: 32, 32, 32, 40
- arm-v7a: 20, 20, 28, 36
- riscv: 0, 0, 0, 16
- power: 16, 16, 16, 32
- mipsel: 8, 8, 8, 24

And each actual argument (even if passing 0) contributes evenly
to the number of multiplication instructions generated:

- x86: 9, 12, 15 ,18
- arm-v7a: 6, 8, 10, 12
- riscv / power: 12, 18, 20, 24
- mipsel: 9, 12, 15, 19

On 64-bit architectures, the tendencies are similar. But both stack
sizes and instruction counts are significantly smaller thus negligible.

See more discussion at https://github.com/openzfs/zfs/pull/16483.

Acked-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>

Pull-request: #16483 part 2/3
Shengqi Chen
dmu_objset: replace dnode_hash impl with cityhash4

As mentioned in PR #16131, replacing CRC-based hash with cityhash4
could slightly improve the performance by eliminating memory access.
Replacing algorightm is safe since the hash result is not persisted.

See: openzfs/zfs#16131

Signed-off-by: Shengqi Chen <harry-chen@outlook.com>

Pull-request: #16483 part 1/3
Pawel Jakub Dawidek
Hierarchical bandwidth and operations rate limits.

Introduce six new properties: limit_{bw,op}_{read,write,total}.

The limit_bw_* properties limit the read, write, or combined bandwidth,
respectively, that a dataset and its descendants can consume.
Limits are applied to both file systems and ZFS volumes.

The configured limits are hierarchical, just like quotas; i.e., even if
a higher limit is configured on the child dataset, the parent's lower
limit will be enforced.

The limits are applied at the VFS level, not at the disk level.
The dataset is charged for each operation even if no disk access is
required (e.g., due to caching, compression, deduplication,
or NOP writes) or if the operation will cause more traffic (due to
the copies property, mirroring, or RAIDZ).

Read bandwidth consumption is based on:

- read-like syscalls, eg., aio_read(2), pread(2), preadv(2), read(2),
  readv(2), sendfile(2)

- syscalls like getdents(2) and getdirentries(2)

- reading via mmaped files

- zfs send

Write bandwidth consumption is based on:

- write-like syscalls, eg., aio_write(2), pwrite(2), pwritev(2),
  write(2), writev(2)

- writing via mmaped files

- zfs receive

The limit_op_* properties limit the read, write, or both metadata
operations, respectively, that dataset and its descendants can generate.

Read operations consumption is based on:

- read-like syscalls where the number of operations is equal to the
  number of blocks being read (never less than 1)

- reading via mmaped files, where the number of operations is equal
  to the number of pages being read (never less than 1)

- syscalls accessing metadata: readlink(2), stat(2)

Write operations consumption is based on:

- write-like syscalls where the number of operations is equal to the
  number of blocks being written (never less than 1)

- writing via mmaped files, where the number of operations is equal
  to the number of pages being written (never less than 1)

- syscalls modifing a directory's content: bind(2) (UNIX-domain
  sockets), link(2), mkdir(2), mkfifo(2), mknod(2), open(2) (file
  creation), rename(2), rmdir(2), symlink(2), unlink(2)

- syscalls modifing metadata: chflags(2), chmod(2), chown(2),
  utimes(2)

- updating the access time of a file when reading it

Just like limit_bw_* limits, the limit_op_* limits are also
hierarchical and applied at the VFS level.

Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>

Pull-request: #16205 part 2/2
Pawel Jakub Dawidek
Hierarchical bandwidth and operations rate limits.

Introduce six new properties: limit_{bw,op}_{read,write,total}.

The limit_bw_* properties limit the read, write, or combined bandwidth,
respectively, that a dataset and its descendants can consume.
Limits are applied to both file systems and ZFS volumes.

The configured limits are hierarchical, just like quotas; i.e., even if
a higher limit is configured on the child dataset, the parent's lower
limit will be enforced.

The limits are applied at the VFS level, not at the disk level.
The dataset is charged for each operation even if no disk access is
required (e.g., due to caching, compression, deduplication,
or NOP writes) or if the operation will cause more traffic (due to
the copies property, mirroring, or RAIDZ).

Read bandwidth consumption is based on:

- read-like syscalls, eg., aio_read(2), pread(2), preadv(2), read(2),
  readv(2), sendfile(2)

- syscalls like getdents(2) and getdirentries(2)

- reading via mmaped files

- zfs send

Write bandwidth consumption is based on:

- write-like syscalls, eg., aio_write(2), pwrite(2), pwritev(2),
  write(2), writev(2)

- writing via mmaped files

- zfs receive

The limit_op_* properties limit the read, write, or both metadata
operations, respectively, that dataset and its descendants can generate.

Read operations consumption is based on:

- read-like syscalls where the number of operations is equal to the
  number of blocks being read (never less than 1)

- reading via mmaped files, where the number of operations is equal
  to the number of pages being read (never less than 1)

- syscalls accessing metadata: readlink(2), stat(2)

Write operations consumption is based on:

- write-like syscalls where the number of operations is equal to the
  number of blocks being written (never less than 1)

- writing via mmaped files, where the number of operations is equal
  to the number of pages being written (never less than 1)

- syscalls modifing a directory's content: bind(2) (UNIX-domain
  sockets), link(2), mkdir(2), mkfifo(2), mknod(2), open(2) (file
  creation), rename(2), rmdir(2), symlink(2), unlink(2)

- syscalls modifing metadata: chflags(2), chmod(2), chown(2),
  utimes(2)

- updating the access time of a file when reading it

Just like limit_bw_* limits, the limit_op_* limits are also
hierarchical and applied at the VFS level.

Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>

Pull-request: #16205 part 1/2
Rob Norris
compress: add "slack" compression option

The "slack" option simply searches from the end of the block backwards
to the last non-zero byte, and sets that position as the "compressed"
size.

This patch is highly experimental; please see the associated PR for
discussion.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Pull-request: #15215 part 1/1
Jason Lee
ZFS Interface for Accelerators (Z.I.A.)

The ZIO pipeline has been modified to allow for external,
alternative implementations of existing operations to be
used. The original ZFS functions remain in the code as
fallback in case the external implementation fails.

Definitions:
    Accelerator - an entity (usually hardware) that is
                  intended to accelerate operations
    Offloader  - synonym of accelerator; used interchangeably
    Data Processing Unit Services Module (DPUSM)
                - https://github.com/hpc/dpusm
                - defines a "provider API" for accelerator
                  vendors to set up
                - defines a "user API" for accelerator consumers
                  to call
                - maintains list of providers and coordinates
                  interactions between providers and consumers.
    Provider    - a DPUSM wrapper for an accelerator's API
    Offload    - moving data from ZFS/memory to the accelerator
    Onload      - the opposite of offload

In order for Z.I.A. to be extensible, it does not directly
communicate with a fixed accelerator. Rather, Z.I.A. acquires
a handle to a DPUSM, which is then used to acquire handles
to providers.

Using ZFS with Z.I.A.:
    1. Build and start the DPUSM
    2. Implement, build, and register a provider with the DPUSM
    3. Reconfigure ZFS with '--with-zia=<DPUSM root>'
    4. Rebuild and start ZFS
    5. Create a zpool
    6. Select the provider
          zpool set zia_provider=<provider name> <zpool>
    7. Select operations to offload
          zpool set zia_<property>=on <zpool>

The operations that have been modified are:
    - compression
        - non-raw-writes only
    - decompression
    - checksum
        - not handling embedded checksums
        - checksum compute and checksum error call the same function
    - raidz
        - generation
        - reconstruction
    - vdev_file
        - open
        - write
        - close
    - vdev_disk
        - open
        - invalidate
        - write
        - flush
        - close

Successful operations do not bring data back into memory after
they complete, allowing for subsequent offloader operations
reuse the data. This results in only one data movement per ZIO
at the beginning of a pipeline that is necessary for getting
data from ZFS to the accelerator.

When errors ocurr and the offloaded data is still accessible,
the offloaded data will be onloaded (or dropped if it still
matches the in-memory copy) for that ZIO pipeline stage and
processed with ZFS. This will cause thrashing if a later
operation offloads data. This should not happen often, as
constant errors (resulting in data movement) is not expected
to be the norm.

Unrecoverable errors such as hardware failures will trigger
pipeline restarts (if necessary) in order to complete the
original ZIO using the software path.

The modifications to ZFS can be thought of as two sets of changes:
    - The ZIO write pipeline
        - compression, checksum, RAIDZ generation, and write
        - Each stage starts by offloading data that was not
          previously offloaded
            - This allows for ZIOs to be offloaded at any point
              in the pipeline
    - Resilver
        - vdev_raidz_io_done (RAIDZ reconstruction, checksum, and
          RAIDZ generation), and write
        - Because the core of resilver is vdev_raidz_io_done, data
          is only offloaded once at the beginning of
          vdev_raidz_io_done
            - Errors cause data to be onloaded, but will not
              re-offload in subsequent steps within resilver
            - Write is a separate ZIO pipeline stage, so it will
              attempt to offload data

The zio_decompress function has been modified to allow for
offloading but the ZIO read pipeline as a whole has not, so it
is not part of the above list.

An example provider implementation can be found in
module/zia-software-provider
    - The provider's "hardware" is actually software - data is
      "offloaded" to memory not owned by ZFS
    - Calls ZFS functions in order to not reimplement operations
    - Has kernel module parameters that can be used to trigger
      ZIA_ACCELERATOR_DOWN states for testing pipeline restarts.

abd_t, raidz_row_t, and vdev_t have each been given an additional
"void *<prefix>_zia_handle" member. These opaque handles point to
data that is located on an offloader. abds are still allocated,
but their payloads are expected to diverge from the offloaded copy
as operations are run.

Encryption and deduplication are disabled for zpools with Z.I.A.
operations enabled

Aggregation is disabled for offloaded abds

RPMs will build with Z.I.A.

Signed-off-by: Jason Lee <jasonlee@lanl.gov>

Pull-request: #13628 part 1/1
Brian Atkinson
Updating based on PR Feedback(6)

1. Updated typo in man page zfs.4.
2. Fix fat fingered typing errors in zpl_aio_write().
3. Fixed spelling O_DIRECT typo in zpl_direct_IO_impl()
4. Updated dmu_write_uio_dnode() to issue a write_size based multiple
  dn->dn_datablksz chunks at once.
5. Removed empty lines in zfs_write().
6. Returned code back to same indentation in zfs_get_data().
7. Removed duplicate ASSERT statements in dmu_buf_will_clone_or_dio().
8. Fixed spelling typo of cause in comment in
  dmu_buf_will_clone_or_dio().
9. Return 0 in FreeBSD zfs_uio_get_pages() when count != nr_pages.
10. Updated FreeBSD zfs_uio_get_dio_pages_alloc() to unhold pages in
    the event of an error.
11. Linux changed zfs_uio_iov_step() to use SET_ERROR() so it matches
    the FreeBSD implementation.
12. Upated zfs_read() to add back dio_remaining_resid to n in the event
    of an error.
13. Added an ASSERT in zio_ddt_write() making sure no Direct I/O writes
    are issued with deduplication. Also, added a comment with ASSERT to
    state why Direct I/O writes can not use deduplication.
14. Removed _KERNEL include guard around zfs_dio_page_aligned(). The
    proper uio_impl.h or uio.h is included through zfs_context.h.

Signed-off-by: Brian Atkinson <batkinson@lanl.gov>

Pull-request: #10018 part 7/7