[SLL] limit on number of files in a directory and hashed dir vs. flat dir file access time
Ana
christiana at hipointcoffee.com
Thu Nov 1 15:13:22 PDT 2007
On Wed, Oct 31, 2007 at 10:13:55PM -0800, Chuck Wolber wrote:
> On Thu, 1 Nov 2007, Ana wrote:
...
> > single directory size. right? that's probably the most telling test
> > you could make.
>
> It's a self consistency issue. Stuff like drive caching, etc cancels out.
> All we care about is the slope of the graph, not the actual values on the
> graph. My hypothesis is that the slope is near zero as size increases.
forgive me for being a little slow. I want to make sure I understand.
By "slope is near zero as size increases", I think you mean the file
"seek-and-open" time (or whatever operating time) will not increase or
will increase very little as the number of files in a directory
increases. Is that right? ... "O(1)", or nearly so.
> Interestingly, if the slope is greater than zero, it may not necessarily
> point to an ext3 inefficiency. It could very well be the drive itself.
> Thus the test should be done over a variety of drives.
I think that the slope would have to be non-zero, no matter what
indexing method is used, because, in my experience/understanding, none
are perfect. hehe. actually, wouldn't perfect efficiency, even when
talking about computer science, violate thermal dynamics? ;)
I seem to remember reading, a long time, that ext2 places the data on
the actual disc at random positions, in order to avoid fragmentation.
I'm not sure, but it seems to me this randomness would indeed cancel out
drive caching, etc.
> > Of course, one thing you always have to take into account is the
> > software you're going to be using. For instance, given what you've told
> > us about rm, if we know that rm is the only tool we have then a tiered
> > directory structure might be best no matter how good ext3 is.
>
> Software is a red herrring. The test simply calls for opening and closing
> the files to prove that they can in fact be accessed and then assessing
> how long that access took. You could expand the test by writing something
> to the files while they're open. Something as simple as the following
> should do:
>
> echo "Hello World" > $file
Software would be a red herrring, when it comes to testing ext3
efficiency. What I'm talking about though is real-world applications
where actual programs are not perfect and programmer time is not cheap.
As you know, in "real world" settings, you usually have to make do with
the software at hand. Even if you have perfectly efficient file system,
if what you're forced to work with is something like "rm" that, for
whatever reason, insists on looking at every single entry in the target
directory before unlink()ing the named file, it might be best to keep
your directories as small as possible.
Testing and theory is fun for me, but I don't make a living in R&D (I
wish). I make a living creating "we need it now" products that leverage
existing software. I cannot forget my pragmatism. :)
- Ana
More information about the linux-list
mailing list