[SLL] grep and wc help needed
Ana
christiana at hipointcoffee.com
Mon Feb 23 00:05:50 PST 2009
On Sun, February 22, 2009 6:45 am, Adam Monsen wrote:
>
> I think the "tr -d ..." might not be necessary in the case where
> you're counting words. At least, on my system, wc appears to deal with
> punctuation properly.
>
> $ cat test.txt
> Hi there. How the-heck are you?
>
> $ wc -w test.txt
> 6 test.txt
this is too much fun to let alone...
forgive the perl. the following little thing disassembles input into
individual "characters", counts them, and reports what it found when
it's done processing input.
perl -wne 'BEGIN{ %cnt=(); } @vs=split(//, $_); foreach $c (@vs){ $cnt{$c}++; } END{ foreach $c (sort keys %cnt){ printf ("%03o %s: %d\n", ord($c), ($c=~/[[:print:]]/ ? $c : "."), $cnt{$c}); } } '
run it like this:
$ cat ~/tmp/debpartial-mirror_files.tgz | perl -wne 'BEGIN{ %cnt=(); } @vs=split(//, $_); foreach $c (@vs){ $cnt{$c}++; } END{ foreach $c (sort keys %cnt){ printf ("%03o %s: %d\n", ord($c), ($c=~/[[:print:]]/ ? $c : "."), $cnt{$c}); } } ' | less
sample output:
$ cat ~/tmp/debpartial-mirror_files.tgz | perl -wne 'BEGIN{ %cnt=(); } @vs=split(//, $_); foreach $c (@vs){ $cnt{$c}++; } END{ foreach $c (sort keys %cnt){ printf ("%03o %s: %d\n", ord($c), ($c=~/[[:print:]]/ ? $c : "."), $cnt{$c}); } } '
...
031 .: 85
032 .: 74
033 .: 78
034 .: 97
035 .: 95
036 .: 86
037 .: 86
040 : 94
041 !: 83
042 ": 85
043 #: 91
044 $: 87
045 %: 75
046 &: 75
047 ': 100
050 (: 79
051 ): 84
052 *: 84
053 +: 83
054 ,: 81
055 -: 94
...
first column is the octal character representation. the second, if it's
"printable", is the character itself, and the third column is the number
of times the character was found while scanning.
I think this kind of thing will help you find out what's in the file and
help get a handle on what you actually want to count.
then, to do something like: extract only alphabetic characters:
$ cat ~/tmp/debpartial-mirror_files.tgz | perl -wpe 's/[^a-z]//gi'
QHxUHCsWNsBwWRrDcDNeUxZbQDQEybJjLWrEQKkcQQVHEdxEvePXBNpaPlJHpEyNRPWajPxVyMUDXpNPHrVfrHffDiUfxQEjIUPxdCqspsUKXBvsUoeDUrtRsQlxeCUTPeEQNLsnKAnYeXnvBUdCxIIDpVTrvurETvhluCPWAtubvfPTAIRVCsfdjBegBOKIFxpjfqsWDAXpUqfiRUmbAKaGWwGfXKPWQHBkTAWeJjlPXdmDxxzIKaUPOtdkAPIwUKZExhdxmFcGSKhKlERuAhFnDEjjHQgYtYBZlPRVuaeRpIaIxBEAhNYlSMaWdhDCTVNIbuDJOORUcxCnteHZRMPaRdbkUIuKHnvoZDmRUTONFJOrLjXNUYjGUmBTjDMZYSddJNFIPTFZlEGzsdxCAqDFdcGTdfqWCBScKHEoYzlUTjFWpWpdXfxHBarADhbMPbAMYCjlOIkVPdnAiwgcPBfHMFsBVPWTIIRALLzIQllAqQvAQNCvefquYUJjRxLDFCQrSQAiQalOgPXFkpyIQKevKIRbRGONOfUBEBGiKIrDeyfvSIGuCOlHSNHfyiPxdxIuEfZsWcNBOKPdSRtsIACJzNeMztIWKDClpZKRUMZRRlTuZjHcGTNSkIeoJHHQLCWDgFgDJxBbQZJqDHFzCtDyffGtXUXTJbFSOyBMLKESZTXZIXAqroMWETqupGFgTQvyHMiLjXYRCSjMeZAPsFxDlMxVNgQPRlBbCArEHRerPHZWjxxIhdWEDKxApzUMFrhIbijbUqtmyZtZsrzOh (much more data...)
and count them:
$ cat ~/tmp/debpartial-mirror_files.tgz | perl -wpe 's/[^a-z]//gi' | wc
0 1 4470
I expect I am one more voice, saying the same damend thing as everyone
else. let's hear it for too much information! :) cheers,
- Ana
More information about the linux-list
mailing list