[SLL] multi-threading

Paul A. Franz, P.E. paul at eucleides.com
Fri Oct 3 02:02:14 PDT 2008


On Mon, September 29, 2008 8:33 pm, Robert Woodcock wrote:
> Below is the script I ended up with. I made a lot of stylistic changes so I
> could read it better, err, I mean, so Paul couldn't, err, I mean...

There more than a few cool things in this script.

This one, for example:

egrep -v '(NXDOMAIN|alias|SERVFAIL|^for$)'

I was using:

grep -v NXDOMAIN | grep -v alias | grep -v SERVFAIL | grep -v ^for$

Why did you use "egrep"? I see others doing that too. I assumed it was from

According to the grep manual page this is equivalent to "grep -e" and frankly I have
not seen any different capability with egrep, at least as configured in my Redhat
system.

Comment?

>
> (It would have been a rewrite one way or the other. :)

With a little modification and a lot of breaking it down to understand fully what
happens. I ran some test cases with clock timing added into the routines.

In a test case with 570 IP's, my single task method that took 7 minutes and 34 seconds
ran with your clever pseudo multi-threading method;

using MAXLOOKUPS=6 in 2 minutes and 9 seconds,
using MAXLOOKUPS=30 in 55 seconds,
using MAXLOOKUPS=60 in 43 seconds.

Successive tests run at other times of the day with more or less server load showed a
variation of no more than 1 second in all the tests.

I ran top and watched during the tests and noticed that the script never got beyond 5%
of CPU which was lower than the peak for named. I ran a test case with over 10,000
IP's MAXLOOKUPS=30 which completed in 7 minutes. I didn't delete the temporary file
structure for that run and noticed a couple of the files were a few hundred characters
and they contained a very odd error message something like:

Result returned from 206.124.128.3 while expecting DNS from 206.124.134.60. The first
IP is a DNS server belonging to my provider (blarg) and the second is my primary DNS
running on the same machine as the test. I am not how that occurred because I don't
recall ever using blarg's DNS in any of the machine's configuration. My router uses
blarg as a secondary though.

I believe this error caused a small anomaly in the printed output from the script.

If you want, I'll post the script I used (test.sh) and a list of spammer's IP's if
anyone wants to play with it. It is a really nice model for doing simultaneous work on
other machines.

My sincere thanks for the contribution to my learning are hereby expressed to:
	"Robert Woodcock" <rcw(@NO-SPAM@)blarg.net>

>
> #!/bin/bash
> IPLIST=/tmp/ips
> RBLS="bl.spamcop.net dnsbl.sorbs.net no-more-funn.moensted.dk zen.spamhaus.org
> dnsbl.njabl.org dnsbl-3.uceprotect.net"
> MAXLOOKUPS=6
> OUTDIR=$(mktemp -td rblcheck.XXXXXX) || exit 1
> STARTED=$OUTDIR/started
> FINISHED=$OUTDIR/finished
> touch $STARTED $FINISHED
> cd $OUTDIR
> (
> # Generate list of queries
> for IP in $(cat $IPLIST)
> do
>   echo $IP
>   REVIP=$(echo $IP | sed 's/\([^.]*\)\.\([^.]*\)\.\([^.]*\)\.\([^.]*\)/\4.\3.\2.\1/')
>   for RBL in $RBLS
>   do
>     echo $REVIP.$RBL
>   done
> done
> ) | (
> # Run queries in parallel
> while read QUERY
> do
>   while true
>   do
>     RUNNING=$(expr $(wc -l < $STARTED) - $(wc -l < $FINISHED))
>     if [ $RUNNING -lt $MAXLOOKUPS ]; then
>       break
>     fi
>     sleep 1
>   done
>   echo >> $STARTED
>   (
>     host $QUERY > $QUERY 2>&1
>     echo >> $FINISHED
>   ) &
>   done
> )
> # Wait for all lookups to finish
> while [ $(wc -l < $FINISHED) -lt $(wc -l < $STARTED) ]
> do
>   sleep 1
> done
> # Display output
> cat << ENDHEADER
>  __ not listed    __ bl.spamcop.net
> /   __ listed    /  __ dnsbl.sorbs.net
> |  /            |  /   __ no-more-funn.moensted.dk
> | |             | |  /   __ zen.spamhaus.org
> 0 1             | | |  /   __ dnsbl.njabl.org
>                 | | | |  /   __  dnsbl-3.uceprotect.net
>                 | | | | |  /
>   Check IP      | | | | | |  Total  Reverse Lookup
> ENDHEADER
> for IP in $(cat $IPLIST)
> do
>   printf "%-15s" $IP
>   REVIP=$(echo $IP | sed 's/\([^.]*\)\.\([^.]*\)\.\([^.]*\)\.\([^.]*\)/\4.\3.\2.\1/')
>   ONBLS=0
>   BLNUM=0
>   for RBL in $RBLS
>   do
>     RESULTFILE=$REVIP.$RBL
>     if grep found $RESULTFILE >/dev/null; then
>       # not found in DNS - not in blacklist
>       echo -n " 0"
>     else
>       # no "not found" error - blacklisted
>       echo -n " 1" # blacklisted
>       ONBLS=$(expr $ONBLS + 1)
>       COUNT[$BLNUM]=$(expr ${COUNT[$BLNUM]} + 1)
>     fi
>     BLNUM=$(expr $BLNUM + 1)
>   done
>   CLEANPTR=$(cut -d' ' -f5 $IP | egrep -v '(NXDOMAIN|alias|SERVFAIL|^for$)')
>   echo -e " -- $ONBLS -- $CLEANPTR"
> done
> # generate column totals
> echo -e "\n Column totals for each RBL, in order tested."
> BLNUM=0
> for RBL in $RBLS
> do
>   echo -e "${COUNT[$BLNUM]}\t${RBL}"
>   BLNUM=$(expr $BLNUM + 1)
> done
> cd ..
> rm -rf $OUTDIR
>
> --
> Robert Woodcock - rcw at blarg.net
> "It's not based on any particular data point. We just wanted to choose a
> really large number."
> 	-- A US Treasury spokeswoman on how the $700 billion bailout figure
>            was arrived at
>


-- 
Paul A. Franz, P.E.
PAF Consulting Engineers
Office 425.440.9505
Cell 425.241.1618


More information about the linux-list mailing list