[SLL] grep and wc help needed

Paul Franz paul at eucleides.com
Sun Feb 22 21:29:34 PST 2009


On Sun, February 22, 2009 6:45 am, Adam Monsen wrote:

> I think the "tr -d ..." might not be necessary in the case where
> you're counting words. At least, on my system, wc appears to deal with
> punctuation properly.
>
> $ cat test.txt
> Hi there. How the-heck are you?
>
> $ wc -w test.txt
> 6 test.txt

But Ralph Sims said was:

> What I'm looking for is a way to count the letters and words in a file
> without punctuation, spaces, etc.

If "letters" means literally letters and no numbers. You have to strip them out before
counting characters. If he really means alphanumeric characters you wouldn't need to
but also said "without punctuation, spaces, etc." which I would assume means without
newlines too so to get both his correct word count and correct character count you
have to use both the command wc twice, both times with output piped from tr.

$ cat << END | wc
> "123456789 123456789 123456789 "
> END
      1       4      33

$ cat << END | tr -d [:punct:] | wc
> "123456789 123456789 123456789 "
> END
      1       3      31

Word count is one lower since no trailing \" left to count as a word.
Character count includes white space and newlines.

$ cat << END | tr -d [:punct:][:space:] | wc
> "123456789 123456789 123456789 "
> END
      0       1      27

Character count is now as intended but the word count is wrong. But if he really meant
no numbers then,

$ cat << END | tr -d [:punct:][:space:]0-9 | wc
> "123456789 123456789 123456789 "
> END
      0       0       0

Putting it all together using multiple lines,

$ cat << END | tr -d [:punct:][:space:]0-9 | wc
> "123456789 123456789 123456789 "
>  2 lines of mixed-content text
> END
      0       1      23

To see what was actually counted,

[paul at Beaker paul]$ cat << END | tr -d [:punct:][:space:]0-9
> "123456789 123456789 123456789 "
>  2 lines of mixed-content text
> END
linesofmixedcontenttext[paul at Beaker paul]$








-- 
Paul Franz
425.440.9505 (O)
425.241.1618 (C)


More information about the linux-list mailing list