[linux-l] Bash-Bug??? Korrektur

So Mai 30 13:33:27 CEST 2004

Quoting Lutz Meinert <lutz.meinert at madvedge.de>:
> Hallo, 
> 
[ ... ]
> 
> Aber nun mit der bash-2.05b-305.1 unter SuSE 9.1 erlebe ich nun
> folgendes Phänomen:
> 
> lutz at p10:~/Tmp> touch aa ll zz AAA LLL ZZZ
> lutz at p10:~/Tmp> ls
> aa AAA ll LLL zz ZZZ
> lutz at p10:~/Tmp> rm [a-z]*
> lutz at p10:~/Tmp> ls
> ZZZ
> 
> Hier werden nicht nur alle Dateien mit Kleinbuchstaben sondern auch 
> bis auf "ZZZ" die Dateien mit Großbuchstaben gelöscht!
> 
> 

[wenn UTF-Zeichensatz eingestellt]

Vielleicht sind folgende Passagen aus dem bash-Manual hilfreich:

aus dem Abschnitt "Shell-Variables" 

LC_COLLATE
  This variable determines the  collation  order  used
  when sorting  the results  of  pathname  expansion,  and
  determines the behavior of range expressions, equivalence
  classes, and collating sequences within  path
  name expansion and pattern matching.

und aus dem Abschnitt "Pattern Matching"

[...]  Matches  any one of the enclosed characters.  A pair of characters sepa
       rated by a hyphen denotes a range expression; any character  that  sorts
       between those two characters, inclusive, using the current locale's col
       lating sequence and character set, is matched.  If the  first  character
       following  the  [  is  a  !   or  a ^ then any character not enclosed is
       matched.  The sorting order of characters in range expressions is deter
       mined  by the current locale and the value of the LC_COLLATE shell vari
       able, if set.  A - may be matched by including it as the first  or  last
       character  in  the set.  A ] may be matched by including it as the first
       character in the set.

       Within [ and ], character classes can  be  specified  using  the  syntax
       [:class:],  where  class  is one of the following classes defined in the
       POSIX.2 standard:
       alnum alpha ascii blank cntrl digit graph lower print punct space  upper
       word xdigit
       A  character  class  matches any character belonging to that class.  The
       word character class matches letters, digits, and the character _.

       Within [ and ], an equivalence class can be specified using  the  syntax
       [=c=],  which  matches all characters with the same collation weight (as
       defined by the current locale) as the character c.

       Within [ and ], the syntax [.symbol.] matches the collating symbol  sym
       bol.

Will sagen, dass die Angabe [a-z], [A-Z] nicht automatisch Kleinbuchstaben
bzw. Grossbuchstaben umfasst, sondern einen Bereich des zugrundenliegenden
Zeichensatzes angibt. Und der kann sich je nach eingestelltem Zeichnsatz
ändern, besonders im Übergang vom 8-bit ASCII zu UTF.  Diese Verhalten
kann durch Angabe von LC_COLLATE gesteuert werden (fragt mich nicht wie).
Besser finde ich es, wenn man in Skripten die Zeichenklasse upper bzw
lower benutzt.
Dies wuere heissen

ls -l [[:upper:]]* 

bzw.

ls -l [[:lower:]]*

Vielleicht probierst Du mal das aus.

Gruss
Thomas