Console View
Tags: Platforms default |
|
Platforms | default | ||||||||||||||
|
|
||||||||||||||
Don Brady
don.brady @klarasystems.com |
|
|
|||||||||||||
Avoid fault diagnosis if multiple vdevs have errors When multiple drives are throwing errors, it is likely not a drive failing but rather a failure above the drives, like a controller. The active cases context of the drive's peers is now considered when making a diagnosis. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Don Brady <don.brady@klarasystems.com> Pull-request: #16531 part 1/1 |
|||||||||||||||
Rich Ercolani
rincebrain @gmail.com |
|
|
|||||||||||||
Add simd metadata in /proc on Linux Too many times, people's performance problems have amounted to "somehow your SIMD support isn't working", and determining that at runtime is difficult to describe to people. This adds a /proc/spl/kstat/zfs/simd node, which exposes metadata about which instructions ZFS thinks it can use, on AArch64 and x86_64 Linux, to make investigating things like this much easier. Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Pull-request: #16530 part 1/1 |
|||||||||||||||
Rick Macklem
rmacklem @uoguelph.ca |
|
|
|||||||||||||
Fix handling of DNS names with '-' in them for sharenfs An old FreeBSD bugzilla report PR#168158 notes that DNS names with '-'s in them cannot be used for the sharenfs property. This patch fixes the parsing of these DNS names. The only negative affect this patch might have is that, if a user has incorrectly separated options with a '-' the sharenfs setting will no longer work once this patch is applied. Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca> Pull-request: #16529 part 1/1 |
|||||||||||||||
sabi-tamra
stamragouri @wasabi.com |
|
|
|||||||||||||
Change version to 2.1.5.12 Pull-request: #16528 part 733/733 |
|||||||||||||||
|
|||||||||||||||
Don Brady
don.brady @klarasystems.com |
|
|
|||||||||||||
Multiple pool support for ztest Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Don Brady <don.brady@klarasystems.com> Pull-request: #16526 part 1/1 |
|||||||||||||||
Umer Saleem
usaleem @ixsystems.com |
|
|
|||||||||||||
Merge branch 'master' into NAS-130821-2 Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Pull-request: #16523 part 53/53 |
|||||||||||||||
Tino Reichardt
milky-zfs @mcmilk.de |
|
|
|||||||||||||
Remove set but not used variable in ddt.c module/zfs/ddt.c:2612:6: error: variable 'total' set but not used Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Pull-request: #16522 part 1/1 |
|||||||||||||||
Alan Somers
asomers @gmail.com |
|
|
|||||||||||||
Fix an uninitialized data access zfs_acl_node_alloc allocates an uninitialized data buffer, but upstack zfs_acl_chmod only partially initializes it. KMSAN reported that this memory remained uninitialized at the point when it was read by lzjb_compress, which suggests a possible kernel memory disclosure bug. The full KMSAN warning may be found in the PR. https://github.com/openzfs/zfs/pull/16511 Signed-off-by: Alan Somers <asomers@gmail.com> Sponsored by: Axcient Pull-request: #16511 part 1/1 |
|||||||||||||||
Theera K.
tkittich @hotmail.com |
|
|
|||||||||||||
arcstat: add structural, types, states breakdown Signed-off-by: Theera K. <tkittich@hotmail.com> arcstat: add structural, types, states breakdown add ARC structural breakdown, ARC types breakdown, ARC states breakdown similar to arc_summary Signed-off-by: Theera K. <tkittich@hotmail.com> rename column names to fit 5 chars Signed-off-by: Theera K. <tkittich@hotmail.com> rename column names to fit 5 chars Signed-off-by: Theera K. <tkittich@hotmail.com> add data target, metadata target Signed-off-by: Theera K. <tkittich@hotmail.com> add data target, metadata target Signed-off-by: Theera K. <tkittich@hotmail.com> fix abd typo Signed-off-by: Theera K. <tkittich@hotmail.com> fix abd typo Signed-off-by: Theera K. <tkittich@hotmail.com> make column name a bit shorter Signed-off-by: Theera K. <tkittich@hotmail.com> make column name a bit shorter Signed-off-by: Theera K. <tkittich@hotmail.com> fix structural typo Signed-off-by: Theera K. <tkittich@hotmail.com> Delete cmd/arc_summary.py incorrect file extension Signed-off-by: Theera K. <tkittich@hotmail.com> fix structural typo Signed-off-by: Theera K. <tkittich@hotmail.com> arcstat: add structural, types, states breakdown add ARC structural breakdown, ARC types breakdown, ARC states breakdown similar to arc_summary Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat: add Anonymous, MFU, MRU, Uncached Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat: add Anonymous, MFU, MRU, Uncached Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat: add Anonymous, MFU, MRU, Uncached Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat.1 : add mfusz, mrusz, l2wbytes Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat: add mfusz, mrusz, l2wbytes mfusz: MFU size mrusz: MRU size l2wbytes: Bytes write per second to the L2ARC Signed-off-by: Theera K. <tkittich@hotmail.com> remove extra spaces Signed-off-by: Theera K. <tkittich@hotmail.com> arcstat: add target size of data, meta, MFU, MRU arcstat: add target size of ARC data, ARC metadata, MFU, MRU Signed-off-by: Theera K. <tkittich@hotmail.com> arcstat: add target size of data, meta, MFU, MRU arcstat: add target size of ARC data, ARC metadata, MFU, MRU Signed-off-by: Theera K. <tkittich@hotmail.com> shorten new column names; show decimal when < 10 Signed-off-by: Theera K. <tkittich@hotmail.com> Pull-request: #16509 part 1/1 |
|||||||||||||||
GitHub
noreply @github.com |
|
|
|||||||||||||
shorten new column names; show decimal when < 10 Signed-off-by: Theera K. <tkittich@hotmail.com> Pull-request: #16509 part 3/3 |
|||||||||||||||
GitHub
noreply @github.com |
|
|
|||||||||||||
shorten new column names; show decimal when < 10 Signed-off-by: Theera K. <tkittich@hotmail.com> Pull-request: #16509 part 2/3 |
|||||||||||||||
Theera K.
tkittich @hotmail.com |
|
|
|||||||||||||
arcstat: add structural, types, states breakdown add ARC structural breakdown, ARC types breakdown, ARC states breakdown similar to arc_summary Signed-off-by: Theera K. <tkittich@hotmail.com> rename column names to fit 5 chars Signed-off-by: Theera K. <tkittich@hotmail.com> rename column names to fit 5 chars Signed-off-by: Theera K. <tkittich@hotmail.com> add data target, metadata target Signed-off-by: Theera K. <tkittich@hotmail.com> add data target, metadata target Signed-off-by: Theera K. <tkittich@hotmail.com> fix abd typo Signed-off-by: Theera K. <tkittich@hotmail.com> fix abd typo Signed-off-by: Theera K. <tkittich@hotmail.com> make column name a bit shorter Signed-off-by: Theera K. <tkittich@hotmail.com> make column name a bit shorter Signed-off-by: Theera K. <tkittich@hotmail.com> fix structural typo Signed-off-by: Theera K. <tkittich@hotmail.com> Delete cmd/arc_summary.py incorrect file extension Signed-off-by: Theera K. <tkittich@hotmail.com> fix structural typo Signed-off-by: Theera K. <tkittich@hotmail.com> arcstat: add structural, types, states breakdown add ARC structural breakdown, ARC types breakdown, ARC states breakdown similar to arc_summary Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat: add Anonymous, MFU, MRU, Uncached Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat: add Anonymous, MFU, MRU, Uncached Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat: add Anonymous, MFU, MRU, Uncached Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat.1 : add mfusz, mrusz, l2wbytes Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat: add mfusz, mrusz, l2wbytes mfusz: MFU size mrusz: MRU size l2wbytes: Bytes write per second to the L2ARC Signed-off-by: Theera K. <tkittich@hotmail.com> remove extra spaces Signed-off-by: Theera K. <tkittich@hotmail.com> arcstat: add target size of data, meta, MFU, MRU arcstat: add target size of ARC data, ARC metadata, MFU, MRU Signed-off-by: Theera K. <tkittich@hotmail.com> arcstat: add target size of data, meta, MFU, MRU arcstat: add target size of ARC data, ARC metadata, MFU, MRU Signed-off-by: Theera K. <tkittich@hotmail.com> Pull-request: #16509 part 1/3 |
|||||||||||||||
GitHub
noreply @github.com |
|
|
|||||||||||||
shorten new column names; show decimal when < 10 Signed-off-by: Theera K. <tkittich@hotmail.com> Pull-request: #16509 part 2/2 |
|||||||||||||||
Theera K.
tkittich @hotmail.com |
|
|
|||||||||||||
arcstat: add structural, types, states breakdown add ARC structural breakdown, ARC types breakdown, ARC states breakdown similar to arc_summary Signed-off-by: Theera K. <tkittich@hotmail.com> rename column names to fit 5 chars Signed-off-by: Theera K. <tkittich@hotmail.com> rename column names to fit 5 chars Signed-off-by: Theera K. <tkittich@hotmail.com> add data target, metadata target Signed-off-by: Theera K. <tkittich@hotmail.com> add data target, metadata target Signed-off-by: Theera K. <tkittich@hotmail.com> fix abd typo Signed-off-by: Theera K. <tkittich@hotmail.com> fix abd typo Signed-off-by: Theera K. <tkittich@hotmail.com> make column name a bit shorter Signed-off-by: Theera K. <tkittich@hotmail.com> make column name a bit shorter Signed-off-by: Theera K. <tkittich@hotmail.com> fix structural typo Signed-off-by: Theera K. <tkittich@hotmail.com> Delete cmd/arc_summary.py incorrect file extension Signed-off-by: Theera K. <tkittich@hotmail.com> fix structural typo Signed-off-by: Theera K. <tkittich@hotmail.com> arcstat: add structural, types, states breakdown add ARC structural breakdown, ARC types breakdown, ARC states breakdown similar to arc_summary Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat: add Anonymous, MFU, MRU, Uncached Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat: add Anonymous, MFU, MRU, Uncached Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat: add Anonymous, MFU, MRU, Uncached Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat.1 : add mfusz, mrusz, l2wbytes Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat: add mfusz, mrusz, l2wbytes mfusz: MFU size mrusz: MRU size l2wbytes: Bytes write per second to the L2ARC Signed-off-by: Theera K. <tkittich@hotmail.com> remove extra spaces Signed-off-by: Theera K. <tkittich@hotmail.com> arcstat: add target size of data, meta, MFU, MRU arcstat: add target size of ARC data, ARC metadata, MFU, MRU Signed-off-by: Theera K. <tkittich@hotmail.com> arcstat: add target size of data, meta, MFU, MRU arcstat: add target size of ARC data, ARC metadata, MFU, MRU Signed-off-by: Theera K. <tkittich@hotmail.com> Pull-request: #16509 part 1/1 |
|||||||||||||||
GitHub
noreply @github.com |
|
|
|||||||||||||
arcstat: add target size of data, meta, MFU, MRU arcstat: add target size of ARC data, ARC metadata, MFU, MRU Signed-off-by: Theera K. <tkittich@hotmail.com> Pull-request: #16509 part 3/3 |
|||||||||||||||
GitHub
noreply @github.com |
|
|
|||||||||||||
arcstat: add target size of data, meta, MFU, MRU arcstat: add target size of ARC data, ARC metadata, MFU, MRU Signed-off-by: Theera K. <tkittich@hotmail.com> Pull-request: #16509 part 2/3 |
|||||||||||||||
Theera K.
tkittich @hotmail.com |
|
|
|||||||||||||
arcstat: add structural, types, states breakdown add ARC structural breakdown, ARC types breakdown, ARC states breakdown similar to arc_summary Signed-off-by: Theera K. <tkittich@hotmail.com> rename column names to fit 5 chars Signed-off-by: Theera K. <tkittich@hotmail.com> rename column names to fit 5 chars Signed-off-by: Theera K. <tkittich@hotmail.com> add data target, metadata target Signed-off-by: Theera K. <tkittich@hotmail.com> add data target, metadata target Signed-off-by: Theera K. <tkittich@hotmail.com> fix abd typo Signed-off-by: Theera K. <tkittich@hotmail.com> fix abd typo Signed-off-by: Theera K. <tkittich@hotmail.com> make column name a bit shorter Signed-off-by: Theera K. <tkittich@hotmail.com> make column name a bit shorter Signed-off-by: Theera K. <tkittich@hotmail.com> fix structural typo Signed-off-by: Theera K. <tkittich@hotmail.com> Delete cmd/arc_summary.py incorrect file extension Signed-off-by: Theera K. <tkittich@hotmail.com> fix structural typo Signed-off-by: Theera K. <tkittich@hotmail.com> arcstat: add structural, types, states breakdown add ARC structural breakdown, ARC types breakdown, ARC states breakdown similar to arc_summary Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat: add Anonymous, MFU, MRU, Uncached Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat: add Anonymous, MFU, MRU, Uncached Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat: add Anonymous, MFU, MRU, Uncached Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat.1 : add mfusz, mrusz, l2wbytes Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat: add mfusz, mrusz, l2wbytes mfusz: MFU size mrusz: MRU size l2wbytes: Bytes write per second to the L2ARC Signed-off-by: Theera K. <tkittich@hotmail.com> remove extra spaces Signed-off-by: Theera K. <tkittich@hotmail.com> Pull-request: #16509 part 1/3 |
|||||||||||||||
GitHub
noreply @github.com |
|
|
|||||||||||||
arcstat: add target size of data, meta, MFU, MRU arcstat: add target size of ARC data, ARC metadata, MFU, MRU Signed-off-by: Theera K. <tkittich@hotmail.com> Pull-request: #16509 part 2/2 |
|||||||||||||||
Theera K.
tkittich @hotmail.com |
|
|
|||||||||||||
arcstat: add structural, types, states breakdown add ARC structural breakdown, ARC types breakdown, ARC states breakdown similar to arc_summary Signed-off-by: Theera K. <tkittich@hotmail.com> rename column names to fit 5 chars Signed-off-by: Theera K. <tkittich@hotmail.com> rename column names to fit 5 chars Signed-off-by: Theera K. <tkittich@hotmail.com> add data target, metadata target Signed-off-by: Theera K. <tkittich@hotmail.com> add data target, metadata target Signed-off-by: Theera K. <tkittich@hotmail.com> fix abd typo Signed-off-by: Theera K. <tkittich@hotmail.com> fix abd typo Signed-off-by: Theera K. <tkittich@hotmail.com> make column name a bit shorter Signed-off-by: Theera K. <tkittich@hotmail.com> make column name a bit shorter Signed-off-by: Theera K. <tkittich@hotmail.com> fix structural typo Signed-off-by: Theera K. <tkittich@hotmail.com> Delete cmd/arc_summary.py incorrect file extension Signed-off-by: Theera K. <tkittich@hotmail.com> fix structural typo Signed-off-by: Theera K. <tkittich@hotmail.com> arcstat: add structural, types, states breakdown add ARC structural breakdown, ARC types breakdown, ARC states breakdown similar to arc_summary Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat: add Anonymous, MFU, MRU, Uncached Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat: add Anonymous, MFU, MRU, Uncached Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat: add Anonymous, MFU, MRU, Uncached Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat.1 : add mfusz, mrusz, l2wbytes Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat: add mfusz, mrusz, l2wbytes mfusz: MFU size mrusz: MRU size l2wbytes: Bytes write per second to the L2ARC Signed-off-by: Theera K. <tkittich@hotmail.com> remove extra spaces Signed-off-by: Theera K. <tkittich@hotmail.com> Pull-request: #16509 part 1/1 |
|||||||||||||||
GitHub
noreply @github.com |
|
|
|||||||||||||
remove extra spaces Signed-off-by: Theera K. <tkittich@hotmail.com> Pull-request: #16509 part 2/2 |
|||||||||||||||
Theera K.
tkittich @hotmail.com |
|
|
|||||||||||||
arcstat: add structural, types, states breakdown add ARC structural breakdown, ARC types breakdown, ARC states breakdown similar to arc_summary Signed-off-by: Theera K. <tkittich@hotmail.com> rename column names to fit 5 chars Signed-off-by: Theera K. <tkittich@hotmail.com> rename column names to fit 5 chars Signed-off-by: Theera K. <tkittich@hotmail.com> add data target, metadata target Signed-off-by: Theera K. <tkittich@hotmail.com> add data target, metadata target Signed-off-by: Theera K. <tkittich@hotmail.com> fix abd typo Signed-off-by: Theera K. <tkittich@hotmail.com> fix abd typo Signed-off-by: Theera K. <tkittich@hotmail.com> make column name a bit shorter Signed-off-by: Theera K. <tkittich@hotmail.com> make column name a bit shorter Signed-off-by: Theera K. <tkittich@hotmail.com> fix structural typo Signed-off-by: Theera K. <tkittich@hotmail.com> Delete cmd/arc_summary.py incorrect file extension Signed-off-by: Theera K. <tkittich@hotmail.com> fix structural typo Signed-off-by: Theera K. <tkittich@hotmail.com> arcstat: add structural, types, states breakdown add ARC structural breakdown, ARC types breakdown, ARC states breakdown similar to arc_summary Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat: add Anonymous, MFU, MRU, Uncached Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat: add Anonymous, MFU, MRU, Uncached Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat: add Anonymous, MFU, MRU, Uncached Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat.1 : add mfusz, mrusz, l2wbytes Signed-off-by: Theera K. <tkittich@hotmail.com> Update arcstat: add mfusz, mrusz, l2wbytes mfusz: MFU size mrusz: MRU size l2wbytes: Bytes write per second to the L2ARC Signed-off-by: Theera K. <tkittich@hotmail.com> Pull-request: #16509 part 1/2 |
|||||||||||||||
Tony Hutter
hutter2 @llnl.gov |
|
|
|||||||||||||
Remove spa_namespace_lock from zpool status This commit removes spa_namespace_lock from the zpool status codepath. This means that zpool status will not hang if a pool fails while holding the spa_namespace_lock. Background: The spa_namespace_lock was originally meant to protect the spa_namespace_avl AVL tree. The spa_namespace_avl tree held the mappings from pool names to spa_t's. So if you wanted to lookup the spa_t for the "tank" pool, you would do an AVL search for "tank" while holding spa_namespace_lock. Over time though the spa_namespace_lock was re-purposed to protect other critical codepaths in the spa subsystem as well. In many cases we don't know what the original authors meant to protect with it, or if they needed it for read-only or read-write protection. It is simply "too big and risky to fix properly". The workaround is to add a new lightweight version of the spa_namespace_lock called spa_namespace_lite_lock. spa_namespace_lite_lock only protects the AVL tree, and nothing else. It can be used for read-only access to the AVL tree without requiring the spa_namespace_lock. Calls to spa_lookup_lite() and spa_next_lite() only need to acquire a reader lock on spa_namespace_lite_lock; they do not need to also acquire the old spa_namespace_lock. This allows us to still run zpool status even if the zfs module has spa_namespace_lock held. Note that these AVL tree locks only protect the tree, not the actual spa_t contents. Signed-off-by: Tony Hutter <hutter2@llnl.gov> Pull-request: #16507 part 1/1 |
|||||||||||||||
Tony Hutter
hutter2 @llnl.gov |
|
|
|||||||||||||
Remove spa_namespace_lock from zpool status This commit removes spa_namespace_lock from the zpool status codepath. This means that zpool status will not hang if a pool fails while holding the spa_namespace_lock. Background: The spa_namespace_lock was originally meant to protect the spa_namespace_avl AVL tree. The spa_namespace_avl tree held the mappings from pool names to spa_t's. So if you wanted to lookup the spa_t for the "tank" pool, you would do an AVL search for "tank" while holding spa_namespace_lock. Over time though the spa_namespace_lock was re-purposed to protect other critical codepaths in the spa subsystem as well. In many cases we don't know what the original authors meant to protect with it, or if they needed it for read-only or read-write protection. It is simply "too big and risky to fix properly". The workaround is to add a new lightweight version of the spa_namespace_lock called spa_namespace_lite_lock. spa_namespace_lite_lock only protects the AVL tree, and nothing else. It can be used for read-only access to the AVL tree without requiring the spa_namespace_lock. Calls to spa_lookup_lite() and spa_next_lite() only need to acquire a reader lock on spa_namespace_lite_lock; they do not need to also acquire the old spa_namespace_lock. This allows us to still run zpool status even if the zfs module has spa_namespace_lock held. Note that these AVL tree locks only protect the tree, not the actual spa_t contents. Signed-off-by: Tony Hutter <hutter2@llnl.gov> Pull-request: #16507 part 1/1 |
|||||||||||||||
Tony Hutter
hutter2 @llnl.gov |
|
|
|||||||||||||
Remove spa_namespace_lock from zpool status This commit removes spa_namespace_lock from the zpool status codepath. This means that zpool status will not hang if a pool fails while holding the spa_namespace_lock. Background: The spa_namespace_lock was originally meant to protect the spa_namespace_avl AVL tree. The spa_namespace_avl tree held the mappings from pool names to spa_t's. So if you wanted to lookup the spa_t for the "tank" pool, you would do an AVL search for "tank" while holding spa_namespace_lock. Over time though the spa_namespace_lock was re-purposed to protect other critical codepaths in the spa subsystem as well. In many cases we don't know what the original authors meant to protect with it, or if they needed it for read-only or read-write protection. It is simply "too big and risky to fix properly". The workaround is to add a new lightweight version of the spa_namespace_lock called spa_namespace_lite_lock. spa_namespace_lite_lock only protects the AVL tree, and nothing else. It can be used for read-only access to the AVL tree without requiring the spa_namespace_lock. Calls to spa_lookup_lite() and spa_next_lite() only need to acquire a reader lock on spa_namespace_lite_lock; they do not need to also acquire the old spa_namespace_lock. This allows us to still run zpool status even if the zfs module has spa_namespace_lock held. Note that these AVL tree locks only protect the tree, not the actual spa_t contents. Signed-off-by: Tony Hutter <hutter2@llnl.gov> Pull-request: #16507 part 1/1 |
|||||||||||||||
Tony Hutter
hutter2 @llnl.gov |
|
|
|||||||||||||
Remove spa_namespace_lock from zpool status This commit removes spa_namespace_lock from the zpool status codepath. This means that zpool status will not hang if a pool fails while holding the spa_namespace_lock. Background: The spa_namespace_lock was originally meant to protect the spa_namespace_avl AVL tree. The spa_namespace_avl tree held the mappings from pool names to spa_t's. So if you wanted to lookup the spa_t for the "tank" pool, you would do an AVL search for "tank" while holding spa_namespace_lock. Over time though the spa_namespace_lock was re-purposed to protect other critical codepaths in the spa subsystem as well. In many cases we don't know what the original authors meant to protect with it, or if they needed it for read-only or read-write protection. It is simply "too big and risky to fix properly". The workaround is to add a new lightweight version of the spa_namespace_lock called spa_namespace_lite_lock. spa_namespace_lite_lock only protects the AVL tree, and nothing else. It can be used for read-only access to the AVL tree without requiring the spa_namespace_lock. Calls to spa_lookup_lite() and spa_next_lite() only need to acquire a reader lock on spa_namespace_lite_lock; they do not need to also acquire the old spa_namespace_lock. This allows us to still run zpool status even if the zfs module has spa_namespace_lock held. Note that these AVL tree locks only protect the tree, not the actual spa_t contents. Signed-off-by: Tony Hutter <hutter2@llnl.gov> Pull-request: #16507 part 1/1 |
|||||||||||||||
Tony Hutter
hutter2 @llnl.gov |
|
|
|||||||||||||
Remove spa_namespace_lock from zpool status This commit removes spa_namespace_lock from the zpool status codepath. This means that zpool status will not hang if a pool fails while holding the spa_namespace_lock. Background: The spa_namespace_lock was originally meant to protect the spa_namespace_avl AVL tree. The spa_namespace_avl tree held the mappings from pool names to spa_t's. So if you wanted to lookup the spa_t for the "tank" pool, you would do an AVL search for "tank" while holding spa_namespace_lock. Over time though the spa_namespace_lock was re-purposed to protect other critical codepaths in the spa subsystem as well. In many cases we don't know what the original authors meant to protect with it, or if they needed it for read-only or read-write protection. It is simply "too big and risky to fix properly". The workaround is to add a new lightweight version of the spa_namespace_lock called spa_namespace_lite_lock. spa_namespace_lite_lock only protects the AVL tree, and nothing else. It can be used for read-only access to the AVL tree without requiring the spa_namespace_lock. Calls to spa_lookup_lite() and spa_next_lite() only need to acquire a reader lock on spa_namespace_lite_lock; they do not need to also acquire the old spa_namespace_lock. This allows us to still run zpool status even if the zfs module has spa_namespace_lock held. Note that these AVL tree locks only protect the tree, not the actual spa_t contents. Signed-off-by: Tony Hutter <hutter2@llnl.gov> Pull-request: #16507 part 1/1 |
|||||||||||||||
Mateusz Piotrowski
mateusz.piotrowski @klarasystems.com |
|
|
|||||||||||||
Implement parallel dbuf eviction In the previous code, dbuf_evict_thread() would called dbuf_evict_one() in a look while dbuf_cache_above_lowater(). dbuf_evict_one() would select a random sublist from the dbuf cache, then walk it from the tail forward, attempting to acquire the lock on each object until it succeeded, then evict that object and return. As the name suggests, it would evict only a single object from the cache. However, evicting one object is not likely to bring us below the desired low water mark, so dbuf_evict_one() will be called again, where it will loop over all of the same busy objects again, until it founds one it can evict. This has been replaced with dbuf_evict_many() which takes a specific sublist as a parameter, as well as a desired amount of data to evict. It then walks the sublist from the tail forward, evicting what it can until the number of bytes evicted satisfies the input parameter or the head of the sublist is reached. The dbuf_evict_thread now runs is parallel as well, allowing it to keep up with demand more easily. For the dbuf cache, if the single thread was not able to keep up, ZFS would shift the work of evicting some items to each incoming I/O thread. While that is still the case it should be seen much less often now that dbuf_evict is more efficient and no longer bottlenecked to a single thread. Sponsored-by: Expensify, Inc. Sponsored-by: Klara, Inc. Co-authored-by: Allan Jude <allan@klarasystems.com> Co-authored-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com> Signed-off-by: Alexander Stetsenko <alex.stetsenko@gmail.com> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com> Pull-request: #16487 part 1/1 |
|||||||||||||||
Mateusz Piotrowski
mateusz.piotrowski @klarasystems.com |
|
|
|||||||||||||
Implement parallel dbuf eviction In the previous code, dbuf_evict_thread() would called dbuf_evict_one() in a look while dbuf_cache_above_lowater(). dbuf_evict_one() would select a random sublist from the dbuf cache, then walk it from the tail forward, attempting to acquire the lock on each object until it succeeded, then evict that object and return. As the name suggests, it would evict only a single object from the cache. However, evicting one object is not likely to bring us below the desired low water mark, so dbuf_evict_one() will be called again, where it will loop over all of the same busy objects again, until it founds one it can evict. This has been replaced with dbuf_evict_many() which takes a specific sublist as a parameter, as well as a desired amount of data to evict. It then walks the sublist from the tail forward, evicting what it can until the number of bytes evicted satisfies the input parameter or the head of the sublist is reached. The dbuf_evict_thread now runs is parallel as well, allowing it to keep up with demand more easily. For the dbuf cache, if the single thread was not able to keep up, ZFS would shift the work of evicting some items to each incoming I/O thread. While that is still the case it should be seen much less often now that dbuf_evict is more efficient and no longer bottlenecked to a single thread. Sponsored-by: Expensify, Inc. Sponsored-by: Klara, Inc. Co-authored-by: Allan Jude <allan@klarasystems.com> Co-authored-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com> Signed-off-by: Alexander Stetsenko <alex.stetsenko@gmail.com> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com> Pull-request: #16487 part 1/1 |
|||||||||||||||
Mateusz Piotrowski
mateusz.piotrowski @klarasystems.com |
|
|
|||||||||||||
Implement parallel dbuf eviction In the previous code, dbuf_evict_thread() would called dbuf_evict_one() in a look while dbuf_cache_above_lowater(). dbuf_evict_one() would select a random sublist from the dbuf cache, then walk it from the tail forward, attempting to acquire the lock on each object until it succeeded, then evict that object and return. As the name suggests, it would evict only a single object from the cache. However, evicting one object is not likely to bring us below the desired low water mark, so dbuf_evict_one() will be called again, where it will loop over all of the same busy objects again, until it founds one it can evict. This has been replaced with dbuf_evict_many() which takes a specific sublist as a parameter, as well as a desired amount of data to evict. It then walks the sublist from the tail forward, evicting what it can until the number of bytes evicted satisfies the input parameter or the head of the sublist is reached. The dbuf_evict_thread now runs is parallel as well, allowing it to keep up with demand more easily. For the dbuf cache, if the single thread was not able to keep up, ZFS would shift the work of evicting some items to each incoming I/O thread. While that is still the case it should be seen much less often now that dbuf_evict is more efficient and no longer bottlenecked to a single thread. Sponsored-by: Expensify, Inc. Sponsored-by: Klara, Inc. Co-authored-by: Allan Jude <allan@klarasystems.com> Co-authored-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com> Signed-off-by: Alexander Stetsenko <alex.stetsenko@gmail.com> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com> Pull-request: #16487 part 1/1 |
|||||||||||||||
Mateusz Piotrowski
mateusz.piotrowski @klarasystems.com |
|
|
|||||||||||||
Implement parallel ARC eviction Read and write performance can become limited by the arc_evict process being single threaded. Additional data cannot be added to the ARC until sufficient existing data is evicted. On many-core systems with TBs of RAM, a single thread becomes a significant bottleneck. With the change we see a 25% increase in read and write throughput Sponsored-by: Expensify, Inc. Sponsored-by: Klara, Inc. Co-authored-by: Allan Jude <allan@klarasystems.com> Co-authored-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com> Signed-off-by: Alexander Stetsenko <alex.stetsenko@klarasystems.com> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com> Pull-request: #16486 part 1/1 |
|||||||||||||||
Shengqi Chen
harry-chen @outlook.com |
|
|
|||||||||||||
cityhash: replace invocations with specialized versions when possible So that we can get actual benefit from last commit. See more discussion at https://github.com/openzfs/zfs/pull/16483. Acked-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Shengqi Chen <harry-chen@outlook.com> Pull-request: #16483 part 3/3 |
|||||||||||||||
Shengqi Chen
harry-chen @outlook.com |
|
|
|||||||||||||
zcommon: add specialized versions of cityhash4 Specializing cityhash4 on 32-bit architectures can reduce the size of stack frames as well as instruction count. This is a tiny but useful optimization, since some callers invoke it frequently. When specializing into 1/2/3/4-arg versions, the stack usage (in bytes) on some 32-bit arches are listed as follows: - x86: 32, 32, 32, 40 - arm-v7a: 20, 20, 28, 36 - riscv: 0, 0, 0, 16 - power: 16, 16, 16, 32 - mipsel: 8, 8, 8, 24 And each actual argument (even if passing 0) contributes evenly to the number of multiplication instructions generated: - x86: 9, 12, 15 ,18 - arm-v7a: 6, 8, 10, 12 - riscv / power: 12, 18, 20, 24 - mipsel: 9, 12, 15, 19 On 64-bit architectures, the tendencies are similar. But both stack sizes and instruction counts are significantly smaller thus negligible. See more discussion at https://github.com/openzfs/zfs/pull/16483. Acked-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Shengqi Chen <harry-chen@outlook.com> Pull-request: #16483 part 2/3 |
|||||||||||||||
Shengqi Chen
harry-chen @outlook.com |
|
|
|||||||||||||
dmu_objset: replace dnode_hash impl with cityhash4 As mentioned in PR #16131, replacing CRC-based hash with cityhash4 could slightly improve the performance by eliminating memory access. Replacing algorightm is safe since the hash result is not persisted. See: openzfs/zfs#16131 Signed-off-by: Shengqi Chen <harry-chen@outlook.com> Pull-request: #16483 part 1/3 |
|||||||||||||||
Pawel Jakub Dawidek
pawel @dawidek.net |
|
|
|||||||||||||
Hierarchical bandwidth and operations rate limits. Introduce six new properties: limit_{bw,op}_{read,write,total}. The limit_bw_* properties limit the read, write, or combined bandwidth, respectively, that a dataset and its descendants can consume. Limits are applied to both file systems and ZFS volumes. The configured limits are hierarchical, just like quotas; i.e., even if a higher limit is configured on the child dataset, the parent's lower limit will be enforced. The limits are applied at the VFS level, not at the disk level. The dataset is charged for each operation even if no disk access is required (e.g., due to caching, compression, deduplication, or NOP writes) or if the operation will cause more traffic (due to the copies property, mirroring, or RAIDZ). Read bandwidth consumption is based on: - read-like syscalls, eg., aio_read(2), pread(2), preadv(2), read(2), readv(2), sendfile(2) - syscalls like getdents(2) and getdirentries(2) - reading via mmaped files - zfs send Write bandwidth consumption is based on: - write-like syscalls, eg., aio_write(2), pwrite(2), pwritev(2), write(2), writev(2) - writing via mmaped files - zfs receive The limit_op_* properties limit the read, write, or both metadata operations, respectively, that dataset and its descendants can generate. Read operations consumption is based on: - read-like syscalls where the number of operations is equal to the number of blocks being read (never less than 1) - reading via mmaped files, where the number of operations is equal to the number of pages being read (never less than 1) - syscalls accessing metadata: readlink(2), stat(2) Write operations consumption is based on: - write-like syscalls where the number of operations is equal to the number of blocks being written (never less than 1) - writing via mmaped files, where the number of operations is equal to the number of pages being written (never less than 1) - syscalls modifing a directory's content: bind(2) (UNIX-domain sockets), link(2), mkdir(2), mkfifo(2), mknod(2), open(2) (file creation), rename(2), rmdir(2), symlink(2), unlink(2) - syscalls modifing metadata: chflags(2), chmod(2), chown(2), utimes(2) - updating the access time of a file when reading it Just like limit_bw_* limits, the limit_op_* limits are also hierarchical and applied at the VFS level. Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net> Pull-request: #16205 part 2/2 |
|||||||||||||||
|
|||||||||||||||
Pawel Jakub Dawidek
pawel @dawidek.net |
|
|
|||||||||||||
Hierarchical bandwidth and operations rate limits. Introduce six new properties: limit_{bw,op}_{read,write,total}. The limit_bw_* properties limit the read, write, or combined bandwidth, respectively, that a dataset and its descendants can consume. Limits are applied to both file systems and ZFS volumes. The configured limits are hierarchical, just like quotas; i.e., even if a higher limit is configured on the child dataset, the parent's lower limit will be enforced. The limits are applied at the VFS level, not at the disk level. The dataset is charged for each operation even if no disk access is required (e.g., due to caching, compression, deduplication, or NOP writes) or if the operation will cause more traffic (due to the copies property, mirroring, or RAIDZ). Read bandwidth consumption is based on: - read-like syscalls, eg., aio_read(2), pread(2), preadv(2), read(2), readv(2), sendfile(2) - syscalls like getdents(2) and getdirentries(2) - reading via mmaped files - zfs send Write bandwidth consumption is based on: - write-like syscalls, eg., aio_write(2), pwrite(2), pwritev(2), write(2), writev(2) - writing via mmaped files - zfs receive The limit_op_* properties limit the read, write, or both metadata operations, respectively, that dataset and its descendants can generate. Read operations consumption is based on: - read-like syscalls where the number of operations is equal to the number of blocks being read (never less than 1) - reading via mmaped files, where the number of operations is equal to the number of pages being read (never less than 1) - syscalls accessing metadata: readlink(2), stat(2) Write operations consumption is based on: - write-like syscalls where the number of operations is equal to the number of blocks being written (never less than 1) - writing via mmaped files, where the number of operations is equal to the number of pages being written (never less than 1) - syscalls modifing a directory's content: bind(2) (UNIX-domain sockets), link(2), mkdir(2), mkfifo(2), mknod(2), open(2) (file creation), rename(2), rmdir(2), symlink(2), unlink(2) - syscalls modifing metadata: chflags(2), chmod(2), chown(2), utimes(2) - updating the access time of a file when reading it Just like limit_bw_* limits, the limit_op_* limits are also hierarchical and applied at the VFS level. Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net> Pull-request: #16205 part 1/2 |
|||||||||||||||
Rob Norris
robn @despairlabs.com |
|
|
|||||||||||||
compress: add "slack" compression option The "slack" option simply searches from the end of the block backwards to the last non-zero byte, and sets that position as the "compressed" size. This patch is highly experimental; please see the associated PR for discussion. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Pull-request: #15215 part 1/1 |
|||||||||||||||
Jason Lee
jasonlee @lanl.gov |
|
|
|||||||||||||
ZFS Interface for Accelerators (Z.I.A.) The ZIO pipeline has been modified to allow for external, alternative implementations of existing operations to be used. The original ZFS functions remain in the code as fallback in case the external implementation fails. Definitions: Accelerator - an entity (usually hardware) that is intended to accelerate operations Offloader - synonym of accelerator; used interchangeably Data Processing Unit Services Module (DPUSM) - https://github.com/hpc/dpusm - defines a "provider API" for accelerator vendors to set up - defines a "user API" for accelerator consumers to call - maintains list of providers and coordinates interactions between providers and consumers. Provider - a DPUSM wrapper for an accelerator's API Offload - moving data from ZFS/memory to the accelerator Onload - the opposite of offload In order for Z.I.A. to be extensible, it does not directly communicate with a fixed accelerator. Rather, Z.I.A. acquires a handle to a DPUSM, which is then used to acquire handles to providers. Using ZFS with Z.I.A.: 1. Build and start the DPUSM 2. Implement, build, and register a provider with the DPUSM 3. Reconfigure ZFS with '--with-zia=<DPUSM root>' 4. Rebuild and start ZFS 5. Create a zpool 6. Select the provider zpool set zia_provider=<provider name> <zpool> 7. Select operations to offload zpool set zia_<property>=on <zpool> The operations that have been modified are: - compression - non-raw-writes only - decompression - checksum - not handling embedded checksums - checksum compute and checksum error call the same function - raidz - generation - reconstruction - vdev_file - open - write - close - vdev_disk - open - invalidate - write - flush - close Successful operations do not bring data back into memory after they complete, allowing for subsequent offloader operations reuse the data. This results in only one data movement per ZIO at the beginning of a pipeline that is necessary for getting data from ZFS to the accelerator. When errors ocurr and the offloaded data is still accessible, the offloaded data will be onloaded (or dropped if it still matches the in-memory copy) for that ZIO pipeline stage and processed with ZFS. This will cause thrashing if a later operation offloads data. This should not happen often, as constant errors (resulting in data movement) is not expected to be the norm. Unrecoverable errors such as hardware failures will trigger pipeline restarts (if necessary) in order to complete the original ZIO using the software path. The modifications to ZFS can be thought of as two sets of changes: - The ZIO write pipeline - compression, checksum, RAIDZ generation, and write - Each stage starts by offloading data that was not previously offloaded - This allows for ZIOs to be offloaded at any point in the pipeline - Resilver - vdev_raidz_io_done (RAIDZ reconstruction, checksum, and RAIDZ generation), and write - Because the core of resilver is vdev_raidz_io_done, data is only offloaded once at the beginning of vdev_raidz_io_done - Errors cause data to be onloaded, but will not re-offload in subsequent steps within resilver - Write is a separate ZIO pipeline stage, so it will attempt to offload data The zio_decompress function has been modified to allow for offloading but the ZIO read pipeline as a whole has not, so it is not part of the above list. An example provider implementation can be found in module/zia-software-provider - The provider's "hardware" is actually software - data is "offloaded" to memory not owned by ZFS - Calls ZFS functions in order to not reimplement operations - Has kernel module parameters that can be used to trigger ZIA_ACCELERATOR_DOWN states for testing pipeline restarts. abd_t, raidz_row_t, and vdev_t have each been given an additional "void *<prefix>_zia_handle" member. These opaque handles point to data that is located on an offloader. abds are still allocated, but their payloads are expected to diverge from the offloaded copy as operations are run. Encryption and deduplication are disabled for zpools with Z.I.A. operations enabled Aggregation is disabled for offloaded abds RPMs will build with Z.I.A. Signed-off-by: Jason Lee <jasonlee@lanl.gov> Pull-request: #13628 part 1/1 |
|||||||||||||||
Brian Atkinson
batkinson @lanl.gov |
|
|
|||||||||||||
Updating based on PR Feedback(6) 1. Updated typo in man page zfs.4. 2. Fix fat fingered typing errors in zpl_aio_write(). 3. Fixed spelling O_DIRECT typo in zpl_direct_IO_impl() 4. Updated dmu_write_uio_dnode() to issue a write_size based multiple dn->dn_datablksz chunks at once. 5. Removed empty lines in zfs_write(). 6. Returned code back to same indentation in zfs_get_data(). 7. Removed duplicate ASSERT statements in dmu_buf_will_clone_or_dio(). 8. Fixed spelling typo of cause in comment in dmu_buf_will_clone_or_dio(). 9. Return 0 in FreeBSD zfs_uio_get_pages() when count != nr_pages. 10. Updated FreeBSD zfs_uio_get_dio_pages_alloc() to unhold pages in the event of an error. 11. Linux changed zfs_uio_iov_step() to use SET_ERROR() so it matches the FreeBSD implementation. 12. Upated zfs_read() to add back dio_remaining_resid to n in the event of an error. 13. Added an ASSERT in zio_ddt_write() making sure no Direct I/O writes are issued with deduplication. Also, added a comment with ASSERT to state why Direct I/O writes can not use deduplication. 14. Removed _KERNEL include guard around zfs_dio_page_aligned(). The proper uio_impl.h or uio.h is included through zfs_context.h. Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Pull-request: #10018 part 7/7 |
|||||||||||||||