[SLL] limit on number of files in a directory and hashed dir vs. flat dir file access time
Xeno Campanoli
xcampanoli at gmail.com
Wed Oct 31 18:34:02 PDT 2007
Chuck Wolber wrote:
> On Wed, 31 Oct 2007, Adam Monsen wrote:
>
>> Is there a limit on the number of files in a directory other than just
>> the inode limit of a particular partition? I'm using the ext3
>> filesystem. Ubuntu 7.10.
Sorry man, but we tested that limit recently and it's exactly as I
documented in my post, which, oddly, hasn't come through. Hey,
moderator! Fix yourself!
>
> In old 2.4 kernels ext3 performanced inversely porportional to the number
> of files in a given directory. So in that sense there was a limit. They
> fixed that in 2.6 with a tree like data structure. "ls" will still perform
> dog slow because it tries to grok the entire dir before outputting, but
> file accesses should be fine.
>
> As for a physical limit on the number of files in a directory, if there
> is, it would probably have something to do with how many bits used to
> represent the tree structure. That number is probably so astronomically
> large that you'll probably exceed some other filesystem limit before you
> hit it, but that's just a guess.
>
> Have you tried asking that question on the ext3 developers list? I don't
> know if this is the *OFFICIAL* list, but Theodore Tso posts there all the
> time:
>
> https://www.redhat.com/mailman/listinfo/ext3-users
>
> Also, have you considered other types of filesystems for what you're
> trying to do? IIRC ReiserFS actually has some advantages over ext3 for
> lots of small files, or maybe that was xfs...
>
>
>> Follow up question: is it more efficient (as far as reads are concerned) to
>> "hash" files into subdirectories rather than just throw them all in a single
>> directory? For instance, say I have 1 million 100 kilobyte JPEG images named
>> as follows:
>> 000000.jpg
>> 000001.jpg
>> 000002.jpg
>> ...
>> 999999.jpg
>>
>> Would it speed up read time for a particular image if images were placed in
>> directories like:
>>
>> 0/0/0/0/0/1/000001.jpg and
>> 0/2/3/6/1/2/023612.jpg
>>
>> and so on?
>
> Not according to anything I've ever seen, but it's probably worth a test.
> Perhaps traversing the logical tree in a directory might add *some*
> overhead as the tree gets larger, but the 2.6 kernel really did fix a lot
> of the annoying performance problems in ext. Chopping the images into a
> directory hiearachy may make sense for other reasons though.
>
>
> ..Chuck..
>
--
The only sustainable organizing methods focus not on scale,
but on good design of the functional unit,
not on winning battles, but on preservation.
More information about the linux-list
mailing list