Thursday, 16 February 2012

Rename a batch of files from Pattern

I came upon the need to rename a set of files the other day matching some particular pattern, which brought me across this wonderful unix command: rename. Which is actually a tool developed in perl (rename is just a symbolic link to prename) - which you can easily determine:

$ which rename
$ ls -Alt /usr/bin/rename
lrwxrwxrwx 1 root root 24 2011-08-19 14:35 /usr/bin/rename -> /etc/alternatives/rename
$ ls -Alt /etc/alternatives/rename
lrwxrwxrwx 1 root root 16 2011-08-19 14:35 /etc/alternatives/rename -> /usr/bin/prename

In my particular case, I had a bunch of files all prefixed with the same text. So just say I have a bunch of text files that I want to remove the prefix from:

$ touch abc-1.txt abc-2.txt abc-3.txt abc-4.txt abc-5.txt abc-6.txt abc-7.txt abc-8.txt abc-9.txt

The main arguments of the command are 1. a perl regular expression for what to match, and what to rename to, and then 2. a match pattern of files to rename, which is typically the extension e.g. *.jpg. So, the example above isn't really using the full power of regular expressions, none the less, the beginning of the filename could be easily removed like so:

rename 's/abc-//' *.txt

So what is this: "s/abc-//" actually mean? Well, since that is the regular expression, the best bet is to look at the perl regular expression reference:
s/// The substitution operator. See Regexp Quote-Like Operators in perlop.
So basically, the first pattern is the string to search for, and the second pattern is what to replace said pattern with. The other most obvious example i've seen would be to rename photographs. Typically, mine come out as DCSC00001.jpg -> DCSC00999.jpg. Whilst you can't exactly do arithmetic to make sure the file names are in order, i.e. 1, 2, 3, 4, 5, 6 instead of 1,2,5,6 (in case some files were removed), you could give some more meaningful file descriptor than DCSC like Holidays2012, and remove the trailing 0's that you don't need. In the match pattern, you can get a $x reference based on the position in the pattern to manipulate the new file name in some way, based on the group it is in - see the section on capturing groups in the regular expression reference. So, a test case, for 10 digital photographs:

$ touch DSCS0001.jpg DSCS0002.jpg DSCS0003.jpg DSCS0004.jpg DSCS0005.jpg DSCS0006.jpg DSCS0007.jpg DSCS0008.jpg DSCS0009.jpg DSCS00010.jpg

# At this point, we can see there will be 2 trailing 0's we need to remove

$ rename -n 's/DSCS\d{2}(\d{2})/Holiday2012_$1/' *.jpg
DSCS0001.jpg renamed as Holiday2012_01.jpg
DSCS0002.jpg renamed as Holiday2012_02.jpg
DSCS0003.jpg renamed as Holiday2012_03.jpg
DSCS0004.jpg renamed as Holiday2012_04.jpg
DSCS0005.jpg renamed as Holiday2012_05.jpg
DSCS0006.jpg renamed as Holiday2012_06.jpg
DSCS0007.jpg renamed as Holiday2012_07.jpg
DSCS0008.jpg renamed as Holiday2012_08.jpg
DSCS0009.jpg renamed as Holiday2012_09.jpg
DSCS0010.jpg renamed as Holiday2012_10.jpg

# Or if you don't care about the trailing 0's, an alternative way would be:

$ rename -n 's/DSCS[0]+(\d+)/Holiday2012_$1/' *.jpg
DSCS0001.jpg renamed as Holiday2012_1.jpg
DSCS0002.jpg renamed as Holiday2012_2.jpg
DSCS0003.jpg renamed as Holiday2012_3.jpg
DSCS0004.jpg renamed as Holiday2012_4.jpg
DSCS0005.jpg renamed as Holiday2012_5.jpg
DSCS0006.jpg renamed as Holiday2012_6.jpg
DSCS0007.jpg renamed as Holiday2012_7.jpg
DSCS0008.jpg renamed as Holiday2012_8.jpg
DSCS0009.jpg renamed as Holiday2012_9.jpg
DSCS0010.jpg renamed as Holiday2012_10.jpg

We use the -n flag just for testing, to see what will actually happen. In the end, this probably isn't necessary, as the files would normally reside in their appropriate folders, but just serves as an example. I suppose that it is also good if you are sharing individual files with someone, to ensure file name uniqueness.