A site devoted to discussing techniques that promote quality and ethical practices in software development.

Wednesday, May 23, 2007

Grep the tab - driving me mad

Most of the people would have heard of this great Unix tool called Grep. This is the authoritative site on the Grep and can be downloaded from here for Windows. This downloads the whole set of GNU tools, which are very good.

There appears to have a number of slight variants of Grep around, kind of like Linux/Unix and that the GNU Grep does not support the ability to find the TAB (0x9) character. For example, if you have a line like this:
<tab>Hello World
in a file file.txt and if you use this syntax

grep -E "\t+Hello" file.txt

You will get nothing. The reason is that according to GNU Grep, \t is not an acceptable special character. It simply looks for \ follows by t repeating one or more time followed by Hello, contrary to standard Regular Expression syntax. So grep does not use regular expression at all.

To search for white spaces, GNU Grep has this syntax: [:blank:] which indicates a space or tab. But it will not be just looking for TAB.

All is not lost! There is a variant of grep that supports the Perl Regular Expression mode. It is selected by -P or --Perl switch. You can download it from here for Windows version.

With this you can use this syntax to echo the above line:
grep -P "\t+Hello" file.txt

The only disadvantage of this version of grep is that it is not a single file program. It now requires the following DLL: libiconv2.dll, libintl3.dll, pcre3.dll and in addition, it requires MSVCP60.DLL.

It is nice to see this version of grep.exe also has the PE version information so that you can tell what version of software you have. It never amazes me why Unix/Linux never has anything like this. It is almost impossible to tell the vintage of the number of grep.exe in my machine and all of them have different MD5.

No comments:

Blog Archive