Renommage massif de fichiers

Table des matières cliquable

  1. Position du problème de renommage

  2. Solution du problème avec Perl

  3. Solution du problème avec R

  4. Faire du script une commande

1. Position du problème de renommage

Il arrive régulièrement qu'on ait à renommer une série de fichiers, mais la commande mv d'Unix n'accepte pas de le faire avec une notation ambigue généralisée. Par exemple, supposons qu'on ait des fichiers dont le nom se termine par _001.gz et qu'on veuille les renommer en .gz seulement, c'est-à-dire éliminer la partie _001 du nom des fichiers. Vérifions-le par l'exemple :
     $gh > ls *gz | sort
     
     fichierA_001.gz
     fichierB_001.gz
     fichierC_001.gz
     
     $gh > mv *_001.gz *.gz
     mv: target 'fichierC_001.gz' is not a directory
     
     
L'idée serait donc de disposer d'une commande renomme qui fasse ce travail. Voici ce qu'on veut obtenir :
     $gh > ls *gz | sort
     fichierA_001.gz
     fichierB_001.gz
     fichierC_001.gz
     
     $gh > renomme _001.gz .gz
     
     $gh > ls *gz | sort
     fichierA.gz
     fichierB.gz
     fichierC.gz
     
Il n'y a aucune difficuté majeure à effectuer ce renommage, une simple boucle sur les fichiers concernés suffit. Toutefois, pour éviter de passer par un script Unix -- et pour fournir du code facilement transposable pour Window -- nous proposons une solution Perl et une solution R, ce qui offre un modèle pour traiter avec ces langages une liste de fichiers.

2. Solution du problème avec Perl

Le script tient en quelques dizaines de lignes puisqu'en gros, une fois les paramètres de renommage lus, disons before et after, la boucle de traitement doit ressembler à :


     foreach my $file (sort @files) {
       [...]
       rename($file,$newFile) ;
     } ; # fin pour chaque fichier

Dans les faits, c'est un tout petit peu plus compliqué si l'on veut faire les choses proprement. La commande rename de Perl écrase les fichiers. Comme il ne nous parait pas prudent d'accepter ce comportement, nous préférons tester si le fichier cible existe. Si c'est le cas, le renommage n'est pas effectué pour ce fichier. Voici le code Perl, téléchargeable via le fichier renomme.pl :


     # # (gH)   -_-  renomme.pl  ;  TimeStamp (unix) : 22 Avril 2016 vers 17:13
     
     #################################################################################
     #
     # this script renames all files corresponding to $before
     # by replacing $before with $after.
     #
     # for instance with $before = "_001.fastq" and $after = ".fastq"
     # the files Aba1_S1_L001_R1_001.fastq, Xa1_S2_L001_R2_001.fastq...
     # will be renamed Aba1_S1_L001_R1.fastq, Xa1_S2_L001_R2.fastq
     #
     #################################################################################
     
     use strict ;
     use warnings ;
     
     if ($#ARGV==-1) {
       print "\n" ;
       print " the script renomme.pl renames all files corresponding to the substring BEFORE \n" ;
       print " by replacing BEFORE with AFTER\n\n" ;
       print "syntax   : perl renomme.pl BEFORE AFTER      \n" ;
       print "examples : perl renomme.pl _OO1.fastq .fastq \n" ;
       print "           perl renomme.pl _XX1.fastq R1.fastq \n" ;
       print "\n\n" ;
       exit(-1) ;
     } ; # end if
     
     if ($#ARGV==0) {
       print "Argument 2 is missing.\n" ;
       print "Syntax is  : perl renomme.pl BEFORE AFTER \n" ;
       exit(-1) ;
     } ; # end if
     
     my $before = $ARGV[0] ;
     my $after  = $ARGV[1] ;
     my @files  = glob("*".$before."*")  ;
     
     # warn if no file corresponds to the specification
     
     my $nbf = @files ;
     if ($nbf==0) {
       print("\nAlas, no files corresponds to the specification \"$before\". Nothing to do !\n\n") ;
     } else {
       my $msg = "Renaming $nbf file(s) corresponding to the specification \"$before\"." ;
       my $sou = "=" x length($msg) ;
       print "\n".$msg."\n".$sou."\n\n" ;
     } ; # end if
     
     foreach my $file (sort @files) {
       my $newFile =  $file ;
       $newFile    =~ s/$before/$after/ ;
       if ($file ne $newFile) {
          if (-e $newFile) {
             print("File $newFile already exists, file $file will not be renamed.\n") ;
          } else {
             if (rename($file,$newFile)) {
               print("File $file has been renamed $newFile\n") ;
             } else {
               print("Cannot rename file $file as $newFile\n") ;
             } ; # end if
          } ; # end if
       } ; # end if
     } ; # end of foreach
     
     # end of script

Vérifions son comportement :


     $gh > ls *gz | sort
     fichierA_001.gz
     fichierB_001.gz
     fichierC_001.gz
     
     $gh > perl renomme.pl _001.gz .gz
     
     Renaming 3 file(s) corresponding to the specification "_001.gz".
     ================================================================
     
     File fichierA_001.gz has been renamed fichierA.gz
     File fichierB_001.gz has been renamed fichierB.gz
     File fichierC_001.gz has been renamed fichierC.gz
     
     $gh > ls *gz | sort
     fichierA.gz
     fichierB.gz
     fichierC.gz
     
     $gh > perl renomme.pl .gz _001.gz # revenons aux fichiers originaux
     [...]
     
     $gh > echo > fichierA.gz
     
     $gh > perl renomme.pl _001.gz .gz # on crée un fichier cible déjà présent
     
     Renaming 3 file(s) corresponding to the specification "_001.gz".
     ================================================================
     
     File fichierA.gz already exists, file fichierA_001.gz will not be renamed.
     File fichierB_001.gz has been renamed fichierB.gz
     File fichierC_001.gz has been renamed fichierC.gz

3. Solution du problème avec R

Le script R est aussi simple que le code Perl car les fonctions utilisées en Perl ont leur équivalent direct en R. Voici donc ce code R, téléchargeable via le fichier renomme.r :


     # # (gH)   -_-  renomme.r  ;  TimeStamp (unix) : 22 Avril 2016 vers 18:12
     
     #################################################################################
     #
     # this script renames all files corresponding to the variable before
     # by replacing before with the contents of the variable after.
     #
     # for instance with before = "_001.fastq" and after = ".fastq"
     # the files Aba1_S1_L001_R1_001.fastq, Xa1_S2_L001_R1_001.fastq...
     # will be renamed Aba1_S1_L001_R1.fastq, Xa1_S2_L001_R1.fastq
     #
     #################################################################################
     
     args   <- commandArgs(trailingOnly=TRUE)
     nbArgs <- length(args)
     
     if (nbArgs==0) {
       cat( "\n" ) ;
       cat( " the script renomme.r renames all files corresponding to the substring BEFORE \n") ;
       cat( " by replacing BEFORE with AFTER\n\n") ;
       cat( "syntax   : Rscript --vanilla renomme.r BEFORE AFTER      \n") ;
       cat( "examples : Rscript --vanilla renomme.r _OO1.fastq .fastq \n") ;
       cat( "           Rscript --vanilla renomme.r _XX1.fastq R1.fastq \n") ;
       cat( "\n\n" ) ;
       stop(-1) ;
     } ; # end if
     
     if (nbArgs==1) {
       cat("Argument 2 is missing.\n") ;
       cat("Syntax is  : Rscript --vanilla renomme.r BEFORE AFTER \n") ;
       stop(-2) ;
     } ; # end if
     
     before <- args[1]
     after  <- args[2]
     files  <- dir(path=".",pattern=paste("*",before,"*",sep=""))  ;
     nbf    <- length(files)
     
     # warn if no file corresponds to the specification
     
     if (nbf==0) {
       cat("\nAlas, no files corresponds to the specification \"",before,"\". Nothing to do !\n\n",sep="") ;
     } else {
       msg  <- paste("Renaming ",nbf," file(s) corresponding to the specification \"",before,"\".",sep="") ;
       sou  <- paste(rep(x="=",times=nchar(msg)),collapse="") ;
       cat("\n",msg,"\n",sou,"\n\n",sep="") ;
     } ; # end if
     
     for (file in sort(files)) {
       newFile  <-  gsub(x=file,pattern=before,replacement=after)
          if (newFile!=file) {
          if (file.exists(newFile)) {
            cat("File ",newFile," already exists, file ,",file," will not be renamed.\n",sep="") ;
          } else {
             if (file.rename(file,newFile)) {
               cat("File ",file," has been renamed ",newFile,"\n",sep="") ;
             } else {
               cat("Cannot rename file ",file," as ",newFile,"\n",sep="") ;
             } # end if
          } # end if
       } # end if
     } ; # end of foreach
     
     # end of script

Vérifions son comportement :


     $gh > ls *gz | sort
     fichierA_001.gz
     fichierA.gz
     fichierB_001.gz
     fichierC_001.gz
     
     $gh > Rscript --vanilla renomme.r .tgz _001.tgz
     Alas, no files corresponds to the specification ".tgz". Nothing to do !
     
     $gh > Rscript --vanilla renomme.r _001.gz .gz
     
     Renaming 3 file(s) corresponding to the specification "_001.gz".
     ================================================================
     
     File fichierA.gz already exists, file ,fichierA_001.gz will not be renamed.
     File fichierB_001.gz has been renamed fichierB.gz
     File fichierC_001.gz has been renamed fichierC.gz
     
     $gh > ls *gz | sort
     fichierA_001.gz
     fichierA.gz
     fichierB.gz
     fichierC.gz

Remarque : nous avons utilisé Rscript pour exécuter le script. Il serait possible de passer par l'exécutable R au prix d'une légère modification des paramètres. Ainsi, au lieu de l'appel


     Rscript --vanilla renomme.r _001.gz .gz

il faut écrire


     R --vanilla --slave --no-save --file=renomme.r --args _001.gz .gz

4. Faire du script une commande

Pour éviter de recopier le script renomme.pl ou renomme.r dans le répertoire courant, le plus simple est de le mettre dans un répertoire accessible par le PATH et de lui associer un fichier shell, disons renomme.sh qui contient juste l'appel du script. La commande renomme est alors un simple fichier texte, dont le contenu est sh renomme.sh $*, rendue exécutable par chmod +x renomme.

Par exemple, dans notre environnement, le répertoire ~/Bin contient tous nos scripts et exécutables personnels. Voici comment tout cela fonctionne :
     $gh > ls ~/Bin/renomme* | sort
     /home/gh/Bin/renomme
     /home/gh/Bin/renomme.pl
     /home/gh/Bin/renomme.r
     /home/gh/Bin/renomme.sh
     
     $gh > cat ~/Bin/renomme
     sh ~/Bin/renomme.sh $*
     
     $gh > cat ~/Bin/renomme.sh
     perl ~/Bin/renomme.pl $*
     
     $gh > renomme
     
     the script renomme renames all files corresponding to the substring BEFORE
     by replacing BEFORE with AFTER
     
     syntax   : renomme BEFORE AFTER
     examples : renomme _OO1.fastq .fastq
                renomme _XX1.fastq R1.fastq
     
     
Pourquoi un montage aussi compliqué au lieu de rendre simplement renomme.pl exécutable ? La réponse est dans la redirection des entrées et des sorties. Si on veut garder une trace de ce qui est renommé -- sans toucher au script -- il est simple de modifier le shell et la commande. Ainsi avec les nouveaux fichiers renomme2 et renomme2.sh définis par
     $gh > cat ~/Bin/renomme2
     sh ~/Bin/renomme2.sh $* | tee renommage.lst
     echo la trace des renommages est dans le fichier renommage.lst
     
     $gh > cat ~/Bin/renomme2.sh
     date
     perl ~/Bin/renomme.pl $*
     date
     
la commande renomme2 fonctionne comme avant et on dispose d'un fichier "log" de ce qui s'est passé nommé renommage.lst dans le répertoire courant. Passer par tee permet d'avoir à la fois un affichage à l'écran et dans un fichier, ce qui est pratique si on a -- comme nous, de temps en temps -- des centaines de fichiers à renommer d'un coup.

Retour à la page principale de (gH)