From eric@w3.org Fri Jan 22 17:40:08 1999
Message-ID: <19990122174008.A20549@w3.org>
Date: Fri, 22 Jan 1999 17:40:08 -0500
From: Eric Prud'hommeaux <eric@w3.org>
To: new-httpd@w3.org
Cc: Renaud Bruyeron <renaudb@w3.org>
Subject: mod_speling mem usage on large directories
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.93.2
X-UID: 4403
X-Keywords:                                                                                                  
Status: RO
Content-Length: 2231
Lines: 54

I have an apache server serving a directory of javadoc files. Because
javadoc files are named according to the class and the bulk of the
documented classes are under the w3c class, the files all have the
same basename. mod_speling looks at the basename (the portion of the
filename up to the first '.') when selecting the options for a
303. The resulting list of options ends up around 60K.

The problem is that mod_speling works by ap_strcating each entry onto
the big filelist that is the http body on the 303. The strcat looks
something like this:

  t = ap_strcat(pool, t, current entry);

This means that the original pointer is lost and never freed (until
the pool is cleared).

In this example, mod_speling was looking for alternatives to a URI that didn't exist. The URI was w3c.tools.forms.BooleanXXXXX. mod_speling provided a list of 476 alternates consisting of w3c.*. Assuming

m matching directory entries = 476
n characters per file name = 40
o extra chacters in the markup ~ 40

As mod_speling iterated on the ith of m entries in the list of files, it strcat'd (n plus some markup) characters into the alternates table and (2n (href and link text) plus some markup) characters into the body. The size of the alternates table and the body sume to:

i * (3n+o)

These allocations were lost on the next iteration leading to a total memory usage of:

i * (3n+o) + (i+1) * (3n+o)

Summing this from 0 to m, you get:

(m+1) * m/2 * (3n+o) = 18,164,160

The actual memory image was closer to 28M, but at least this model gets us withing an order of magnitude without poking around in mod_speling more.

Once the pool was cleared or destroyed, this memory was "freed", but
only to the thread. There is no provision for freeing excessive memory
back to the OS. A robot hitting some bad links in our big homogeneous
directory produced several 28M threads. Even though the memory in
these threads got stale quickly, it was a heavy system burden.

My plan was to take care of the bulk of the problem (assuming
fortuitous memory management) by using ap_realloc or ap_free. Since
these functions don't exist, I'm writing to you with a bug report,
rather than a patch. Sorry.
-- 
-eric

(eric@w3.org)
and
-renaud

(renaudb@w3.org)

