[SLL] limit on number of files in a directory and hashed dir vs. flat dir file access time

Ana christiana at hipointcoffee.com
Thu Nov 1 00:01:55 PDT 2007


On Wed, Oct 31, 2007 at 08:20:45PM -0800, Chuck Wolber wrote:
> On Wed, 31 Oct 2007, Ana wrote:
> 
> > Here's an interesting thread on the topic.
> > 
> > http://linux.derkeiler.com/Mailing-Lists/Fedora/2005-07/3279.html
> 
> I disagree with the premise of his test. You cannot use "ls" and "rm" in a 
> test like that because they index the entire contents of a directory 
> before they actually do anything useful. This subverts the tree nature of 
> ext3 directory parsing. 
> 
> A better test is to simply create "n" files and then calculate the average 
> and standard deviation of the time to open and then close those files. 
> From there, increase "n" by a factor of 10 and re-calculate the values.
> 
> It would be an interesting data analysis problem to undertake. I've 
> informally undertaken a test and I do not see that the value creeps up as 
> the directory gets "fuller". My data set was too low to produce any useful 
> results, but it does show that if there is an effect, it is subtle.

It is a very interesting problem.

The test you describe will measure "seek-and-open" time as it relates to
single directory size.  right?  that's probably the most telling test
you could make.

I'm interested in finding out how "seek-and-open" time is affected by
the 0/3/2/4/5/03245.txt type directory structure.  If the ext3 file
system is well indexed then I have to wonder if having a directory
structure would actually slow things down.

Of course, one thing you always have to take into account is the
software you're going to be using.  For instance, given what you've told
us about rm, if we know that rm is the only tool we have then a tiered
directory structure might be best no matter how good ext3 is.

- Ana


More information about the linux-list mailing list