Storage.remainingSpace (was: Re: DOM Storage feedback)

Ian Hickson wrote:
> On Fri, 21 Mar 2008, Sunava Dutta wrote:
>>
>> Storage.remainingSpace
>>
>> A straightforward and popular request, this API provides a script to 
>> check the remaining persistent storage spec available to it, in bytes. 
>> It's a very useful feature to allow pages to manage their store better.
>>
>> * <Open Issue> We currently return bytes but perhaps returning the 
>> number of characters is more useful? We'd love to hear thoughts here...
> 
> The problem with this feature is that there are a number of ways to store 
> data, and thus no way to know exactly how much data can be stored.
> 
> For example, if the UA stores data in UTF-8 characters, the number of 
> characters left to store will vary based on what characters are to be 
> stored. Similarly, if the UA stores data in a compressed fashion, the 
> number of bytes will vary based on how compressible the data is. 
> [...]
> Thus this API really can't easily work in an interoperable fashion.

This seems like it could be a useful feature if it could be made to 
work, so I'll try to propose the idea of a remainingSpacePercentage.

> [...] we don't want to preclude user agents from [...]

One additional thing we don't want to preclude is unlimited storage 
space, e.g. a user might say a photo-manipulation web app should be 
given as much space as it wants (until disk space runs out and the 
browser dies and it corrupts all the configuration files it tries to 
save while exiting, or whatever). That can't be handled nicely with 
remainingSpace; it could be changed from int to float so it can be 
Infinity, but that's a bit yucky.

> Furthermore, we don't want to preclude user agents from dynamically 
> increasing the amount of available storage based on user actions, for 
> example the UA could automatically increase the storage every time the 
> user interacts with the page, or could prompt the user to increase the 
> storage when it gets to 80%.

If, at any instant, storing a new value of some particular length will 
cause an over-quota exception, then clearly there is a space limit at 
that instant, so it's no different to a non-dynamically-sized storage 
area. If it won't cause an exception (ignoring rare cases like running 
out of physical disk space) regardless of its length, then the storage 
can just be considered unlimited. So dynamic sizing doesn't seem like a 
new problem, if the static and unlimited cases can already be handled.


I can think of two main use cases:

(1) Indicating to the user how much space is available, like in Gmail's 
"You are currently using 153 MB (2%) of your 7204 MB", so they know 
whether they need to delete some of their old data.

There are four pieces of information that might be relevant: the 
bytes(/characters/etc) used, the bytes(/etc) remaining, the total 
bytes(/etc) available, and the percentage used. The most useful for 
humans is the percentage - I have no idea how many bytes a typical email 
is, so I wouldn't be helped much by "You have 618KB remaining", but if I 
see I'm only using 38% of the space after a few months then I know I 
don't need to worry yet.

(2) Automatically cleaning up old/temporary data (e.g. caches) when 
running out of space, to recover space for new data.

That cleanup could happen as late as possible (i.e. just as you're about 
to store new data which doesn't fit), in which case the current setItem 
out-of-space exception seems adequate - you can wrap setItem in a 
function that tries to set, catches the exception, cleans up the cache 
and then tries again.

Or it could happen at some earlier time, e.g. when the user is idle and 
won't mind a bit of a pause while you clean up old data. That behaviour 
could be very application-dependent: it's determined by how big the 
caches are, how much data will be saved, how much space needs to be made 
available, how often the cleanup process will run, etc. Or it could be 
quite simple: if free space drops below 5%, expire old data until 
there's none left or free space reaches 15%. I don't know what people 
would want in practice, so I'll hope the latter is adequate.


I'm sure there must be other cases, but I don't know what they are. 
(What were the specific use cases that prompted IE to add remainingSpace?)


Then, some possible solutions:

~ ~ ~ ~

No API; the UA can just provide UI to view the available/used storage 
space for the current domain.

Pros:

  * Maximally simplifies API.

  * Prevents authors abusing the API and causing non-interoperability 
problems.

Cons:

  * Most users won't have any idea how to access that UI. Sites that 
want users to know how much space they're using shouldn't be forced to 
give instructions like "If you are using Firefox, open the Tools menu 
and click the whatever button etc. If you are using IE, open the ..." 
because that's horrid.

  * Doesn't help the cache-cleanup use case.

~ ~ ~ ~

As before, but with a <bb type=managestoragequota>.

Pros:

  * Same API considerations as before.

  * Lets pages make the storage UI discoverable to users (e.g. popping 
up a dialog box saying "you only have space for 10 more emails, _click_ 
_here_ to allocate more storage space").

Cons:

  * Requires the user to click a link before being able to see their 
quota, which is not a good user experience.

  * Doesn't help the cache-cleanup use case.

~ ~ ~ ~

remainingSpace API, exactly like what IE8b2 does:

Calculates space from the sum of key lengths plus value lengths, 
measured in UTF-16 code units (i.e. non-BMP characters count as 2).

Pros:

  * Allows pages to present some kind of storage space status to the user.

  * Allows pages to accurately predict whether enough space will be left 
after storing some more stuff in the future, so they can tidy up caches 
and expire old data until happiness ensues.

Cons:

  * Doesn't make it easy for pages to present particularly useful 
storage space status - you can't tell how much space is available in 
total (except by manually summing the lengths of all your stored data), 
so you can't give a usage percentage.

  * Doesn't correspond to physical storage space used (e.g. IE8b2's 
nominal ~5M character limit lets me store a million two-character keys 
and take up 60MB of file space because of the overhead in the storage 
format), which is particularly bad for resource-constrained devices 
where there's a hard limit on physical storage space. So I expect many 
browsers would be unwilling to implement it exactly like this.

  * Doesn't handle unlimited storage gracefully.

~ ~ ~ ~

remainingSpace API, but with no strict definition so browsers can 
measure whatever they want:

Pros:

  * Allows browsers to report and limit the physical storage space used 
(regardless of character encoding, compression, etc), instead of only 
being allowed to limit key/value characters.

Cons:

  * Will be very different between implementations, so pages are quite 
likely to rely on non-interoperable details (e.g. assuming that if 
remainingStorage >= 100 then they can safely store anything where 
key.length+value.length <= 39 and don't need to check for out-of-space 
exceptions; or assuming that if remainingStorage = 1e6 then they can 
tell the user there's space for about a thousand 1KB images; or various 
other plausible situations).

~ ~ ~ ~

remainingSpacePercentage API:

Returns the approximate percentage of space remaining.
If there is an unlimited quota, it must return 100.
Otherwise, it must return an integer value between 1 and 99 (inclusive).
(It intentionally avoids 0, to prevent people mistakenly thinking they 
can tell when the storage area is completely full.)
It should return a value that decreases linearly in the amount of data 
stored, but isn't required to (e.g. it could go up when you add new 
data, because maybe it suddenly compresses much better than before).

(The name is too long so it should probably be renamed, and maybe it 
should be switched from 'remaining' to 'used'.)

Pros:

  * Allows pages to present storage space status to the user in whatever 
way they feel is appropriate (e.g. text ("You're using 38% of your local 
storage space"), graphical progress bar, flashing warnings when 98% 
full, etc).

  * Allows pages to trigger automatic cleanups when the usage exceeds 
some threshold.

  * Discourages the abuse cases where pages might depend on non-portable 
implementation details, e.g. it doesn't tell them how many 
bytes/characters/etc are available and it's too imprecise for them to 
try to calculate the total.

  * Handles unlimited storage in a way that lets sophisticated pages 
detect it and present it nicely to users, and lets dumb pages ignore it 
entirely and treat it just like limited storage and it'll still act 
sensibly.

Cons:

  * Doesn't allow pages to know how many bytes are available, so:
   * They can't tell a technically-knowledgeable user how many bytes are 
available (but the user can still use their browser UI if they really 
want to check).
   * They can't clean up caches in advance of out-of-space errors with 
much accuracy, since they don't know how much space is really available.

  * Still provides ways for pages to be non-interoperable, e.g. they 
could save a megabyte of data and see how the percentage changes to work 
out the approximate total amount of space available and extrapolate from 
that. But that's a bit obscure, and I can't think of any 
obvious-but-wrong cases that people are likely to write accidentally.

~ ~ ~ ~

Have I missed many significant details and issues here?

-- 
Philip Taylor
pjt47@cam.ac.uk

Received on Friday, 17 October 2008 16:53:51 UTC