Thursday, May 3, 2012

How can I recursively grep through sub-directories?

This snippet looks especially interesting.  I hadn't tried this before.
"November, 2008: Excuse the interuption, but there is something new to talk about and I didn't want you to have to go all the way to the comments to find it. It's called "ack", it's written in Perl, and it addresses the things the things this page talks about. Find it at http://betterthangrep.com/.
"

How can I recursively grep through sub-directories?




You must mean that your ancient Unix doesn't have GNU grep, right? If you do, just go do a "man grep"; you don't need to read this (though you may want to just so you really appreciate GNU grep). Just add "-r" (with perhaps --include) and grep will search through subdirectories.
Say you wanted to search for "perl" in only *.html files in the current directory and every subdirectory. You could do :

grep -r --include="*.html" perl .
 
("." is current directory)
You don't even need the --include; "grep -r perl . " will search all files. If you had a directory strucure like this:
./alpha
./beta
./beta/dog
./beta/dog/perlinhere.html
 
either invocation would search "perlinhere" looking for "perl" inside. So would:
grep -r --include="*.html" perl b*
 
But this of course would not (because the file with the pattern is not under "a*"):
grep -r --include="*.html" perl a*
 
You can also use --exclude= to search every file except the ones that match your pattern.
(BSD grep has "-d recurse". That also works in GNU grep and is equivalent to "-r")
Easy enough, isn't it?
But if you are on some old Unix without recursive capabilities in its grep, it gets very hard. The problem with all the reponses that invariably pop up for this type of question is that none of them are ever truly fast and most of them aren't truly robust.
Typically, the answer is to use find, xargs, and grep. That's horribly slow for a full filesystem search, and it's painfully difficult to properly construct a pipeline that will avoid searching binaries if you don't want to, won't get stuck on named pipes or blow up on funky filenames (beginning with -, or sometimes spaces, punctuation etc). There are ways around all these things, but they are all ugly.
BTW, something that almost never gets mentioned but that I will frequently use under conditions where it is appropriate is a simple
grep pattern * */* */*/* 2>/dev/null
 
Not useful much beyond that, and may not even be good at that except for certain starting points, but it's faster than any find xargs pipeline can ever be if the set is small enough.






The simplistic approach using find is
find /whereveryouwantostart -exec grep whatever {} dev/null \;
 
That's not necessarily very efficient. Using xargs can help
find . | xargs grep whatever
 
But it also has bugs if the filenames could have "-" at their beginning. Fixing that can be a little nasty.
You may not want to grep binary files:
find .  -type f -print|xargs file|grep -i text|cut -fl -d:    | xargs grep whatever
 
That's pretty awful, but it's what you have to get into if you have special cases. Special cases are what makes this question more difficult. If you have a small number of files and subdirs to search, the simple approach may work fine for you. If not, you have to get more creative.
November, 2008: Excuse the interuption, but there is something new to talk about and I didn't want you to have to go all the way to the comments to find it. It's called "ack", it's written in Perl, and it addresses the things the things this page talks about. Find it at http://betterthangrep.com/.
Bill Campbell offers this Perl script:
I have a perlscript I call ``textfiles'' that I use for many
 things like this:
        textfiles dirname [dirname... ] | xargs ...
 
 Essentially it runs ``gfind @ARGV -type f'', then uses perl's -T
 option on each file to determine whether it's a text file.
 
 My textfiles script also has options to add options to the gnu
 find command like -xdev, -mindepth, and -maxdepth.
 
 Hell, it's short so I'm attaching it for anybody who wants to use
 it.  It does assume that the gnu version of find is in your PATH
 named gfind (I make a symlink to /usr/bin/find on Linux systems
 so that it works there as well).
 
 
 #!/usr/local/bin/perl
 eval ' exec /usr/local/bin/perl -S $0 "$@" '
        if $running_under_some_shell;
 
 # $Header: /u/usr/cvs/lbin/textfiles,v 1.7 2000/06/22 18:29:08 bill Exp $
 # $Date: 2000/06/22 18:29:08 $
 # @(#) $Id: textfiles,v 1.7 2000/06/22 18:29:08 bill Exp $
 # 
 #      find text files
 
 ( $progname = $0 ) =~ s!.*/!!; # save this very early
 
 $USAGE = "
 # Find text files
 #
 #   Usage: $progname [-v] [file [file...]]
 #
 # Options   Argument    Description
 #   -f                  Follow symlinks
 #   -M      maxdepth    maxdepth argument to gfind
 #   -m      mindepth    mindepth argument to gfind
 #   -x                  Don't cross device boundaries
 #   -v                  Verbose
 #
 ";
 
 sub usage {
        die join("\n",@_) .
        "\n$USAGE\n";
 }
 
 do "getopts.pl";
 
 &usage("Invalid Option") unless do Getopts("fM:m:xvV");
 
 $verbose = '-v' if $opt_v;
 $suffix = $$ unless $opt_v;
 
 $\ = "\n";     # use newlines as separators.
 
 # use current directory if there aren't any arguments
 push(@ARGV, '.') unless defined($ARGV[0]);
 
 $args = join(" ", @ARGV);
 $xdev = '-xdev' if $opt_x;
 $opt_f = '-follow' if $opt_f;
 $opt_m = "-mindepth $opt_m" if $opt_m;
 $opt_M = "-maxdepth $opt_M" if $opt_M;
 $cmd = "gfind @ARGV -type f $xdev $opt_f $opt_m $opt_M |";
 print STDERR "cmd = >$cmd<" if $verbose;
 
 open(INPUT, $cmd);
 while() {
        chop($name = $_);
        print STDERR "testing $name..." if $verbose;
        print $name if -T $name;
 }
 
 
John Dubois also comments on Glimpse:
Glimpse indexes files by the words contained in the file. Then when you want to search all of the files, it only runs its equivalent of grep (agrep) on the files that contain the words you're looking for. You can search for partial words too, though it takes longer. I have the man pages, include files, rfcs, source trees, my home directory, web pages, etc. all separately glimpse-indexed.
Binaries & man pages for OpenServer are at ftp://deepthought.armory.com/pub/scobins/glimpse.tar.Z
A front end that allows you to easily search any of multiple glimpse databases is at: ftp://ftp.armory.com/pub/scripts/search


Read more: http://aplawrence.com/Unixart/recursivegrep.html#ixzz1tqIJErxh

'via Blog this'

No comments:

Post a Comment