[schilytools] star hangs randomly during copy from ZFS to ZFS
Lasse Kliemann
lasse at lassekliemann.de
Wed Nov 6 22:44:59 CET 2024
Greg A. Woods on Tue 2024-11(Nov)-05 at 10:43 wrote:
> A hang sounds far more like an OS (i.e. kernel) issue than a userland
> issue. You need to do "ps lx" to see the WCHAN column, and you might
> want to run "fstat -p" against the process as well.
WCHAN: piperd
fstat -p on the one process:
USER CMD PID FD MOUNT INUM MODE SZ|DV R/W
root star 83577 text / 65859 -rwxr-xr-x 347232 r
root star 83577 ctty /dev 202 crw--w---- pts/3 rw
root star 83577 wd /test 34 drwxr-xr-x 3 r
root star 83577 root / 34 drwxr-xr-x 32 r
root star 83577 0 /dev 202 crw--w---- pts/3 rw
root star 83577 1 / 117583 -rw-r--r-- 82282 w
root star 83577 2 / 117583 -rw-r--r-- 82282 w
root star 83577 3* pipe fffffe00c6bbe888 <-> fffffe00c6bbe9e0 0 rw
root star 83577 4 /test 126762 -rw-r--r-- 1174138880 w
root star 83577 6* pipe fffffe00dd248158 <-> fffffe00dd248000 0 rw
fstat -p on the other process:
USER CMD PID FD MOUNT INUM MODE SZ|DV R/W
root star 83659 text / 65859 -rwxr-xr-x 347232 r
root star 83659 ctty /dev 202 crw--w---- pts/3 rw
root star 83659 wd /data1 34 drwxr-xr-x 5 r
root star 83659 root / 34 drwxr-xr-x 32 r
root star 83659 0 /dev 202 crw--w---- pts/3 rw
root star 83659 1 / 117583 -rw-r--r-- 82282 w
root star 83659 2 / 117583 -rw-r--r-- 82282 w
root star 83659 3 /data1 34 drwxr-xr-x 5 r
root star 83659 4* pipe fffffe00c6bbe9e0 <-> fffffe00c6bbe888 0 rw
root star 83659 5* pipe fffffe00dd248000 <-> fffffe00dd248158 0 rw
root star 83659 6 /data1 2 drwxr-x--- 4 r
root star 83659 7 /data1 17364 drwxr-x--- 10 r
root star 83659 8 /data1 15908 drwxr-x--- 832 r
root star 83659 9 /data1 17548 drwxr-x--- 6 r
root star 83659 10 /data1 40958 drwxr-x--- 6266 r
root star 83659 11 /data1 185773 -rw-r--r-- 1437608050 r
The fstat outputs seem stable, they do not change over time, once the processes are at 0% CPU.
> One of the things ZFS stresses the most in a system is memory. If it
> gets anywhere near running out of memory, perceived or otherwise, it
> will get stuck. How much memory does your machine have?
8 GB
> I don't know enough about ZFS to know for sure how much memory a
> system might need to have in order to effectively deal with shuffling
> around such large amounts of data, but I'll bet it wants a lot! One
> rule of thumb that's often repeated, but perhaps not so oftenly
> justified, is 1GB RAM per TB of disk, and that's just for the ZFS
> requirements alone -- everything else you run, including the rest of
> the kernel, will add up to more RAM.
There are 2 big zpools of size 25 TB and 32 TB connected via USB. Clearly, my RAM is lightyears away from what the rule that you cited recommends; this is a budged NAS project. However, with bsdtar, I transfered about 25 TB from one dataset to the other already, and the 'star -c -diff ..' process to check if bsdtar got it right has found no issues so far.
As per my latest experiments, it seems that the larger the fifo is for star copy, the longer it takes on average until the issue occurs. With fs=16m, it copied over 3TB. I am now starting a run with fs=32m.
Thanks, Lasse
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 259 bytes
Desc: not available
URL: <https://mlists.in-berlin.de/pipermail/schilytools-mlists.in-berlin.de/attachments/20241106/0f6912cb/attachment.sig>
More information about the schilytools
mailing list