From eric@w3.org Fri Jan 22 17:40:08 1999 Message-ID: <19990122174008.A20549@w3.org> Date: Fri, 22 Jan 1999 17:40:08 -0500 From: Eric Prud'hommeaux To: new-httpd@w3.org Cc: Renaud Bruyeron Subject: mod_speling mem usage on large directories Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2 X-UID: 4403 X-Keywords: Status: RO Content-Length: 2231 Lines: 54 I have an apache server serving a directory of javadoc files. Because javadoc files are named according to the class and the bulk of the documented classes are under the w3c class, the files all have the same basename. mod_speling looks at the basename (the portion of the filename up to the first '.') when selecting the options for a 303. The resulting list of options ends up around 60K. The problem is that mod_speling works by ap_strcating each entry onto the big filelist that is the http body on the 303. The strcat looks something like this: t = ap_strcat(pool, t, current entry); This means that the original pointer is lost and never freed (until the pool is cleared). In this example, mod_speling was looking for alternatives to a URI that didn't exist. The URI was w3c.tools.forms.BooleanXXXXX. mod_speling provided a list of 476 alternates consisting of w3c.*. Assuming m matching directory entries = 476 n characters per file name = 40 o extra chacters in the markup ~ 40 As mod_speling iterated on the ith of m entries in the list of files, it strcat'd (n plus some markup) characters into the alternates table and (2n (href and link text) plus some markup) characters into the body. The size of the alternates table and the body sume to: i * (3n+o) These allocations were lost on the next iteration leading to a total memory usage of: i * (3n+o) + (i+1) * (3n+o) Summing this from 0 to m, you get: (m+1) * m/2 * (3n+o) = 18,164,160 The actual memory image was closer to 28M, but at least this model gets us withing an order of magnitude without poking around in mod_speling more. Once the pool was cleared or destroyed, this memory was "freed", but only to the thread. There is no provision for freeing excessive memory back to the OS. A robot hitting some bad links in our big homogeneous directory produced several 28M threads. Even though the memory in these threads got stale quickly, it was a heavy system burden. My plan was to take care of the bulk of the problem (assuming fortuitous memory management) by using ap_realloc or ap_free. Since these functions don't exist, I'm writing to you with a bug report, rather than a patch. Sorry. -- -eric (eric@w3.org) and -renaud (renaudb@w3.org)