[schilytools] star hangs randomly during copy from ZFS to ZFS

Lasse Kliemann lasse at lassekliemann.de
Wed Nov 6 22:44:59 CET 2024


Greg A. Woods on Tue 2024-11(Nov)-05 at 10:43 wrote:

> A hang sounds far more like an OS (i.e. kernel) issue than a userland
> issue.  You need to do "ps lx" to see the WCHAN column, and you might
> want to run "fstat -p" against the process as well.

WCHAN: piperd

fstat -p on the one process:

USER     CMD          PID   FD MOUNT      INUM MODE         SZ|DV R/W
root     star       83577 text /         65859 -rwxr-xr-x  347232  r
root     star       83577 ctty /dev        202 crw--w----   pts/3 rw
root     star       83577   wd /test        34 drwxr-xr-x       3  r
root     star       83577 root /            34 drwxr-xr-x      32  r
root     star       83577    0 /dev        202 crw--w----   pts/3 rw
root     star       83577    1 /        117583 -rw-r--r--   82282  w
root     star       83577    2 /        117583 -rw-r--r--   82282  w
root     star       83577    3* pipe fffffe00c6bbe888 <-> fffffe00c6bbe9e0      0 rw
root     star       83577    4 /test    126762 -rw-r--r--  1174138880  w
root     star       83577    6* pipe fffffe00dd248158 <-> fffffe00dd248000      0 rw

fstat -p on the other process:

USER     CMD          PID   FD MOUNT      INUM MODE         SZ|DV R/W
root     star       83659 text /         65859 -rwxr-xr-x  347232  r
root     star       83659 ctty /dev        202 crw--w----   pts/3 rw
root     star       83659   wd /data1       34 drwxr-xr-x       5  r
root     star       83659 root /            34 drwxr-xr-x      32  r
root     star       83659    0 /dev        202 crw--w----   pts/3 rw
root     star       83659    1 /        117583 -rw-r--r--   82282  w
root     star       83659    2 /        117583 -rw-r--r--   82282  w
root     star       83659    3 /data1       34 drwxr-xr-x       5  r
root     star       83659    4* pipe fffffe00c6bbe9e0 <-> fffffe00c6bbe888      0 rw
root     star       83659    5* pipe fffffe00dd248000 <-> fffffe00dd248158      0 rw
root     star       83659    6 /data1        2 drwxr-x---       4  r
root     star       83659    7 /data1    17364 drwxr-x---      10  r
root     star       83659    8 /data1    15908 drwxr-x---     832  r
root     star       83659    9 /data1    17548 drwxr-x---       6  r
root     star       83659   10 /data1    40958 drwxr-x---    6266  r
root     star       83659   11 /data1   185773 -rw-r--r--  1437608050  r

The fstat outputs seem stable, they do not change over time, once the processes are at 0% CPU.

> One of the things ZFS stresses the most in a system is memory.  If it
> gets anywhere near running out of memory, perceived or otherwise, it
> will get stuck.  How much memory does your machine have?

8 GB

> I don't know enough about ZFS to know for sure how much memory a
> system might need to have in order to effectively deal with shuffling
> around such large amounts of data, but I'll bet it wants a lot!  One
> rule of thumb that's often repeated, but perhaps not so oftenly
> justified, is 1GB RAM per TB of disk, and that's just for the ZFS
> requirements alone -- everything else you run, including the rest of
> the kernel, will add up to more RAM.

There are 2 big zpools of size 25 TB and 32 TB connected via USB. Clearly, my RAM is lightyears away from what the rule that you cited recommends; this is a budged NAS project. However, with bsdtar, I transfered about 25 TB from one dataset to the other already, and the 'star -c -diff ..' process to check if bsdtar got it right has found no issues so far.

As per my latest experiments, it seems that the larger the fifo is for star copy, the longer it takes on average until the issue occurs. With fs=16m, it copied over 3TB. I am now starting a run with fs=32m.

Thanks, Lasse
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 259 bytes
Desc: not available
URL: <https://mlists.in-berlin.de/pipermail/schilytools-mlists.in-berlin.de/attachments/20241106/0f6912cb/attachment.sig>


More information about the schilytools mailing list