[SLL] multi-threading
Robert Woodcock
rcw at blarg.net
Sun Sep 21 17:04:21 PDT 2008
On Sun, Sep 21, 2008 at 04:01:21PM -0700, Paul A. Franz, P.E. wrote:
> I have a script that I'd like to speed up. The problem is that it queries
> several hundred different hosts sequentially waiting for the response. I
> would like to launch these requests in batches of say 6, or multiples of 6
> all at once. I don't need the results of one query in order to do the next
> one.
>
> Is there some simple way to launch multiple processes from within a bash
> script?
Any shell solution is going to revolve around the "&" shell operator which
runs a task (or subshell) in the background instead of the foreground. I
suppose another way would be to use a Makefile with make's -j option.
A batch of 6, then another batch of 6, would be suboptimal because you'd
often end up with a single straggler process that has to finish before the
next batch gets kicked off.
My solution would be to run an increasing number of processes in the
background and have each one append to a temp file to let you know it's done
(which I think gets us atomicity since we don't care about ordering). Before
each one is started, check to see how many tasks are finished, and if
(Started - Finished) > MaxProcs, then sleep for a second and re-check:
#!/bin/sh
MAXPROCS=6
STARTED=0
COUNTER=$(mktemp /tmp/counter.XXXXXX)
# Read list of queries from stdin
while read QUERY
do
while true
do
FINISHED=$(wc -l < $COUNTER)
RUNNING=$(expr $STARTED - $FINISHED)
if [ $RUNNING -lt $MAXPROCS ]; then
break
fi
sleep 1
done
STARTED=$(expr $STARTED + 1)
(
$QUERY
echo >> $COUNTER
) &
done
# Wait for all backgrounded tasks to finish before exiting
while [ $FINISHED -lt $STARTED ]
do
sleep 1
FINISHED=$(wc -l < $COUNTER)
done
# Clean up
rm -f $COUNTER
The script takes a list of commands to run on stdin, runs a maximum of 6 at
a time, and doesn't return until they've all finished.
If, for example, all of your queries are HTTP queries using wget, you could
replace the "$QUERY" line with "wget $QUERY" and pass the script a list of
URLs on stdin. You could also have the script read from a file by putting
the main while/do/done loop in a subshell and piping a file to it:
[...]
cat queryfile | (
while read QUERY
do
[...]
done
)
[...]
--
Robert Woodcock - rcw at blarg.net
"Duct tape: The last refuge of the incompetent... because the competent
don't leave it for last."
-- seen on slashdot
More information about the linux-list
mailing list