[schilytools] star hangs randomly during copy from ZFS to ZFS
Nico Sonack
nsonack at herrhotzenplotz.de
Thu Nov 7 17:42:50 CET 2024
On Mon, Nov 04, 2024 at 09:56:12PM UTC, Lasse Kliemann wrote:
Ahoj!
> star -vv -debug -copy -p -xdot -acl -C /data1 . /test
>
> This command hangs, that is, it seemingly stops doing anything after a couple hundred GB have been transferred. Attempted several times. The number of GB copied before this happens is different each time. Output:
>
> star: Block size 20 blocks (10240 bytes).
> star: WARNING: fsync() disabled from '/usr/local/etc/default/star'.
> star: shared memory segment attached at: 82555E000 size 8413184
This is most definitely a fifo hang.
The source code in star/fifo.c says:
* If you ever see a hang in the fifo code, you need to report a stack
* trace for both processes (including the line number of the hanging call
* to swait()) and the relevant state of struct m_head.
You can do this by sending a SIGABRT using kill -ABRT <pid> to the
processes and then sliding aside the core files.
Then you can throw a debugger at them,
e.g. lldb -c <corefile> $(which star)
and look at the stack traces. Beware that you might have to rebuild
star from source with debug information given that you installed using
pkg(8).
There's also an option `-no-fifo' which I have no idea if it actually
works. It could be a workaround for your problem.
Nico
P.S.: Could we please stop top-posting? It just makes everything difficult
to read.
--
Sent from hades / FreeBSD 14.1-RELEASE
Please remember: https://useplaintext.email/#etiquette
HTML-formatted mail will likely end up in the Spam folder.
PGP Key: https://herrhotzenplotz.de/pgp.txt
PGP Fingerprint: 1A0E E08D 9D3B CFB9 8C85 A76E F373 F525 45A2 8D14
Nico Sonack <nsonack at herrhotzenplotz.de>
More information about the schilytools
mailing list