[SLL] grep and wc help needed

Scott Blachowicz scott.ssc at sabami.seaslug.org
Sat Feb 21 13:55:23 PST 2009


On Sat, Feb 21, 2009 at 01:25:57PM -0800, Ralph Sims wrote:
> (or perl, or ...)
> 
> I have a text file that contains alpha characters as well as puncuation 
> parks.  I tried grep [:alpha:] filename |wc but still get the punction 
> marks counted.  I've also used [aZ-zZ] and still get the same result.  
> What I'm looking for is a way to count the letters and words in a file 
> without punctuation, spaces, etc.

Grep just prints out the lines that match the pattern you specify -- it
doesn't do any sort of filtering on those lines. (i.e. if the line has
both alpha and non-alpha, you get both types of chars in your output)

So, you need to filter out all the chars you don't want, then wc the
result...(or do it all with a perl script)...

Maybe something like this:

cat INPUT | sed -e 's/[^a-zA-Z][^a-zA-Z]+/\
/g' -e '/^ */d' | wc

That first sed command to sed is intended to take runs of one or
more non-alpha characters and replace them with single newline chars.
The second sed command deletes blank lines.

The first number should be the number of characters in that stream which
should be the total number of letters plus the total number of newlines
and the second number should be the total number of newlines. After that
sed command, the total number of newlines should be equivalent to the
number of words. Roughly...approximately...maybe...or something like that?

Scott


More information about the linux-list mailing list