[SLL] multi-threading
Paul A. Franz, P.E.
paul at eucleides.com
Fri Oct 3 02:02:14 PDT 2008
On Mon, September 29, 2008 8:33 pm, Robert Woodcock wrote:
> Below is the script I ended up with. I made a lot of stylistic changes so I
> could read it better, err, I mean, so Paul couldn't, err, I mean...
There more than a few cool things in this script.
This one, for example:
egrep -v '(NXDOMAIN|alias|SERVFAIL|^for$)'
I was using:
grep -v NXDOMAIN | grep -v alias | grep -v SERVFAIL | grep -v ^for$
Why did you use "egrep"? I see others doing that too. I assumed it was from
According to the grep manual page this is equivalent to "grep -e" and frankly I have
not seen any different capability with egrep, at least as configured in my Redhat
system.
Comment?
>
> (It would have been a rewrite one way or the other. :)
With a little modification and a lot of breaking it down to understand fully what
happens. I ran some test cases with clock timing added into the routines.
In a test case with 570 IP's, my single task method that took 7 minutes and 34 seconds
ran with your clever pseudo multi-threading method;
using MAXLOOKUPS=6 in 2 minutes and 9 seconds,
using MAXLOOKUPS=30 in 55 seconds,
using MAXLOOKUPS=60 in 43 seconds.
Successive tests run at other times of the day with more or less server load showed a
variation of no more than 1 second in all the tests.
I ran top and watched during the tests and noticed that the script never got beyond 5%
of CPU which was lower than the peak for named. I ran a test case with over 10,000
IP's MAXLOOKUPS=30 which completed in 7 minutes. I didn't delete the temporary file
structure for that run and noticed a couple of the files were a few hundred characters
and they contained a very odd error message something like:
Result returned from 206.124.128.3 while expecting DNS from 206.124.134.60. The first
IP is a DNS server belonging to my provider (blarg) and the second is my primary DNS
running on the same machine as the test. I am not how that occurred because I don't
recall ever using blarg's DNS in any of the machine's configuration. My router uses
blarg as a secondary though.
I believe this error caused a small anomaly in the printed output from the script.
If you want, I'll post the script I used (test.sh) and a list of spammer's IP's if
anyone wants to play with it. It is a really nice model for doing simultaneous work on
other machines.
My sincere thanks for the contribution to my learning are hereby expressed to:
"Robert Woodcock" <rcw(@NO-SPAM@)blarg.net>
>
> #!/bin/bash
> IPLIST=/tmp/ips
> RBLS="bl.spamcop.net dnsbl.sorbs.net no-more-funn.moensted.dk zen.spamhaus.org
> dnsbl.njabl.org dnsbl-3.uceprotect.net"
> MAXLOOKUPS=6
> OUTDIR=$(mktemp -td rblcheck.XXXXXX) || exit 1
> STARTED=$OUTDIR/started
> FINISHED=$OUTDIR/finished
> touch $STARTED $FINISHED
> cd $OUTDIR
> (
> # Generate list of queries
> for IP in $(cat $IPLIST)
> do
> echo $IP
> REVIP=$(echo $IP | sed 's/\([^.]*\)\.\([^.]*\)\.\([^.]*\)\.\([^.]*\)/\4.\3.\2.\1/')
> for RBL in $RBLS
> do
> echo $REVIP.$RBL
> done
> done
> ) | (
> # Run queries in parallel
> while read QUERY
> do
> while true
> do
> RUNNING=$(expr $(wc -l < $STARTED) - $(wc -l < $FINISHED))
> if [ $RUNNING -lt $MAXLOOKUPS ]; then
> break
> fi
> sleep 1
> done
> echo >> $STARTED
> (
> host $QUERY > $QUERY 2>&1
> echo >> $FINISHED
> ) &
> done
> )
> # Wait for all lookups to finish
> while [ $(wc -l < $FINISHED) -lt $(wc -l < $STARTED) ]
> do
> sleep 1
> done
> # Display output
> cat << ENDHEADER
> __ not listed __ bl.spamcop.net
> / __ listed / __ dnsbl.sorbs.net
> | / | / __ no-more-funn.moensted.dk
> | | | | / __ zen.spamhaus.org
> 0 1 | | | / __ dnsbl.njabl.org
> | | | | / __ dnsbl-3.uceprotect.net
> | | | | | /
> Check IP | | | | | | Total Reverse Lookup
> ENDHEADER
> for IP in $(cat $IPLIST)
> do
> printf "%-15s" $IP
> REVIP=$(echo $IP | sed 's/\([^.]*\)\.\([^.]*\)\.\([^.]*\)\.\([^.]*\)/\4.\3.\2.\1/')
> ONBLS=0
> BLNUM=0
> for RBL in $RBLS
> do
> RESULTFILE=$REVIP.$RBL
> if grep found $RESULTFILE >/dev/null; then
> # not found in DNS - not in blacklist
> echo -n " 0"
> else
> # no "not found" error - blacklisted
> echo -n " 1" # blacklisted
> ONBLS=$(expr $ONBLS + 1)
> COUNT[$BLNUM]=$(expr ${COUNT[$BLNUM]} + 1)
> fi
> BLNUM=$(expr $BLNUM + 1)
> done
> CLEANPTR=$(cut -d' ' -f5 $IP | egrep -v '(NXDOMAIN|alias|SERVFAIL|^for$)')
> echo -e " -- $ONBLS -- $CLEANPTR"
> done
> # generate column totals
> echo -e "\n Column totals for each RBL, in order tested."
> BLNUM=0
> for RBL in $RBLS
> do
> echo -e "${COUNT[$BLNUM]}\t${RBL}"
> BLNUM=$(expr $BLNUM + 1)
> done
> cd ..
> rm -rf $OUTDIR
>
> --
> Robert Woodcock - rcw at blarg.net
> "It's not based on any particular data point. We just wanted to choose a
> really large number."
> -- A US Treasury spokeswoman on how the $700 billion bailout figure
> was arrived at
>
--
Paul A. Franz, P.E.
PAF Consulting Engineers
Office 425.440.9505
Cell 425.241.1618
More information about the linux-list
mailing list