Thursday 16 February 2012

Rename a batch of files from Pattern

I came upon the need to rename a set of files the other day matching some particular pattern, which brought me across this wonderful unix command: rename. Which is actually a tool developed in perl (rename is just a symbolic link to prename) - which you can easily determine:

$ which rename
/usr/bin/rename
$ ls -Alt /usr/bin/rename
lrwxrwxrwx 1 root root 24 2011-08-19 14:35 /usr/bin/rename -> /etc/alternatives/rename
$ ls -Alt /etc/alternatives/rename
lrwxrwxrwx 1 root root 16 2011-08-19 14:35 /etc/alternatives/rename -> /usr/bin/prename

In my particular case, I had a bunch of files all prefixed with the same text. So just say I have a bunch of text files that I want to remove the prefix from:

$ touch abc-1.txt abc-2.txt abc-3.txt abc-4.txt abc-5.txt abc-6.txt abc-7.txt abc-8.txt abc-9.txt

The main arguments of the command are 1. a perl regular expression for what to match, and what to rename to, and then 2. a match pattern of files to rename, which is typically the extension e.g. *.jpg. So, the example above isn't really using the full power of regular expressions, none the less, the beginning of the filename could be easily removed like so:

rename 's/abc-//' *.txt

So what is this: "s/abc-//" actually mean? Well, since that is the regular expression, the best bet is to look at the perl regular expression reference: http://perldoc.perl.org/functions/s.html
s/// The substitution operator. See Regexp Quote-Like Operators in perlop.
So basically, the first pattern is the string to search for, and the second pattern is what to replace said pattern with. The other most obvious example i've seen would be to rename photographs. Typically, mine come out as DCSC00001.jpg -> DCSC00999.jpg. Whilst you can't exactly do arithmetic to make sure the file names are in order, i.e. 1, 2, 3, 4, 5, 6 instead of 1,2,5,6 (in case some files were removed), you could give some more meaningful file descriptor than DCSC like Holidays2012, and remove the trailing 0's that you don't need. In the match pattern, you can get a $x reference based on the position in the pattern to manipulate the new file name in some way, based on the group it is in - see the section on capturing groups in the regular expression reference. So, a test case, for 10 digital photographs:

$ touch DSCS0001.jpg DSCS0002.jpg DSCS0003.jpg DSCS0004.jpg DSCS0005.jpg DSCS0006.jpg DSCS0007.jpg DSCS0008.jpg DSCS0009.jpg DSCS00010.jpg

# At this point, we can see there will be 2 trailing 0's we need to remove

$ rename -n 's/DSCS\d{2}(\d{2})/Holiday2012_$1/' *.jpg
DSCS0001.jpg renamed as Holiday2012_01.jpg
DSCS0002.jpg renamed as Holiday2012_02.jpg
DSCS0003.jpg renamed as Holiday2012_03.jpg
DSCS0004.jpg renamed as Holiday2012_04.jpg
DSCS0005.jpg renamed as Holiday2012_05.jpg
DSCS0006.jpg renamed as Holiday2012_06.jpg
DSCS0007.jpg renamed as Holiday2012_07.jpg
DSCS0008.jpg renamed as Holiday2012_08.jpg
DSCS0009.jpg renamed as Holiday2012_09.jpg
DSCS0010.jpg renamed as Holiday2012_10.jpg

# Or if you don't care about the trailing 0's, an alternative way would be:

$ rename -n 's/DSCS[0]+(\d+)/Holiday2012_$1/' *.jpg
DSCS0001.jpg renamed as Holiday2012_1.jpg
DSCS0002.jpg renamed as Holiday2012_2.jpg
DSCS0003.jpg renamed as Holiday2012_3.jpg
DSCS0004.jpg renamed as Holiday2012_4.jpg
DSCS0005.jpg renamed as Holiday2012_5.jpg
DSCS0006.jpg renamed as Holiday2012_6.jpg
DSCS0007.jpg renamed as Holiday2012_7.jpg
DSCS0008.jpg renamed as Holiday2012_8.jpg
DSCS0009.jpg renamed as Holiday2012_9.jpg
DSCS0010.jpg renamed as Holiday2012_10.jpg

We use the -n flag just for testing, to see what will actually happen. In the end, this probably isn't necessary, as the files would normally reside in their appropriate folders, but just serves as an example. I suppose that it is also good if you are sharing individual files with someone, to ensure file name uniqueness.

Wednesday 11 January 2012

Ubuntu LabKey Upgrade Guide

Labkey has pretty thorough installation steps on their website, but just want to document the process using the manual upgrade guide, on Ubuntu 10.04. Guide assumed you are logged in as a local user and downloaded the setup files to your home directory. Also assumed tomcat has been installed through the repositories, which places tomcat binaries in /usr/share/tomcat6, configuration in /etc/tomcat6, labkey logs in /usr/share/tomcat6/logs.

First off, to avoid any issues, stop tomcat.

sudo service tomcat6 stop

Step 1. Untar the tarball

tar zxvf LabKey11.3-*-bin.tar.gz && cd LabKey11.3-*-bin

Step 2. Create a backupdir to move the existing labkey files into

cd /usr/local/labkey
sudo mkdir backup2
sudo mv labkeywebapp/ backup2/
sudo mv modules/ backup2/
sudo cp /etc/tomcat6/Catalina/localhost/labkey.xml backup2/

Step 3. Move the new files into the current directory

sudo cp -rd ~/LabKey11.3-*-bin/labkeywebapp/ ./
sudo cp -rd /home/trent/LabKey11.3-*-bin/modules/ ./

nb: The installation guide also suggest some MS1 and MS2 third party binaries, but I haven't bothered with those. And to avoid any issues, change the owner back to the tomcat user (tomcat6) of the directories just copied across:

sudo chown tomcat6.tomcat6 -R labkeywebapp/ modules/

If not already done, install graphviz

sudo apt-get install graphviz

Step 4. Copy the library files into the lib folder in tomcat, replacing any that already exist.

cd ~/LabKey11.3-*-bin/
sudo cp -i common-lib/* server-lib/* /usr/share/tomcat6/lib/

Step 5. Copy the labkey.xml file to the tomcat cnofiguratino directory

cd ~/LabKey11.3-*-bin
sudo cp -i labkey.xml /etc/tomcat6/Catalina/localhost/labkey.xml

Then it's just a matter of updating the labkey.xml file. Things to update: Point the docBase to the labkeywebapp folder (line 1) <Context docBase="/usr/local/labkey/labkeywebapp" debug="0" reloadable="true" crossContext="true"> Update the username and password in the Resource for the labkey database server Update the mail server configuration Any other resources that may need re-adding e.g. External data sources. Finally, start tomcat

sudo service tomcat6 start

Tuesday 3 January 2012

Analysing TCP Traffic

There are often times when you want to analyze TCP network traffic to see what is actually being sent over the network in a lower level. There are a few nifty tools around that are able to do this. Graphically, wireshark is one. However, i prefer to just use a command line tool - thankfully, one of the default apps available on Ubuntu on a fresh install can also dump traffic, and this is tcpdump.

By running this command with some default options, you can get aanalyze the packets being transmitted over the network. The packets can be captured in realtime, or to a dump file, which can be later analyzed. What I have found more useful, is to first capture it to an a dump file.

For example, I have a web based application running on port 8080, that I want to inspect to see what is going on, so I issue the following command:

sudo tcpdump -i eth0 -s 0 -nw output.dmp dst port 8080

The argument basically say, in their order:

interface: eth0
snarf: 0 bytes, rather than the default 68 - less information to go through - For me, I am only really interested in the headers
no address/name resolution - maintain their IP address value
write: to output.dmp
dst port: only capture packets going to port 8080; More complex filters can be captured, and documentation on the filter syntax can be viewed in the man page of pcap-filter

This produces a binary file, so it is no good trying to read this in a simple text editor, however you can output the contents by passing in the -r flag (read)

sudo tcpdump -r output.dmp -A

I prefer to use the A flag (ASCII), but you could also use the -X flag. I think the -A flag produces slightly more readable request headers. In saying that, the X flag would be useful for viewing the data in both hex and ASCII format.

Another useful tool (which you need to install) is tcpick, which can also parse the data dumps captured by tcpdump.

sudo tcpick -C -yP -r output.dmp

Which is basically saying, print with syntax highlighting, and to show data contained in the packets. No doubt, this does output the data nicely formatted, but other than that, I see no real reason not just to use tcpdump with the -A flag for viewing the captured packets.