A Standardized Packaging System for Decentralized 
    Unix Distributions
Preface
    It is important to keep in mind while reading this that
      the ideas expressed are those of a single individual over the
      course of a couple
      months,  not of a standards body over the course of years. It is a 
      version 0.1 draft and nothing herein should be taken as
      final.  The document is open to critique, recommendations,
      additions, changes, and modifications. It is ultimately 
      envisioned as a community process, so feel free to
      
contact
      the author with contributions at your leisure.
 
  Introduction
    The Unix operating system has been receiving considerable
      public attention over the past few years, due largely in part to the 
      maturation of open-source Unix implementations such as 
      
Linux and 
      
FreeBSD. As a result of
      this growing interest, the numbers of users, developers, and
      available software packages have also increased, and the 
      environment has grown exponentially richer. However, because 
      of the somewhat
      anarchist nature of open-source development, the same fertile
      ground that has given rise to Unix popularity has also been 
      the source of some of its most significant troubles. Notably,
      the frequent lack of coordination between independent development
      groups has resulted in software releases that sometimes need
      considerable fine-tuning before they will work together. And because
      open-source software tends to be far more modular than closed
      software, it is at times rather difficult to find a small 
      but essential component when building or repairing a system.
      In fact, the task of finding the pieces and making them
      coexist generally requires the effort of an entire group
      to maintain the "distributions" that represent a complete, 
      working environment.
 
    The distribution maintainers do a reasonably adequate job
      under the circumstances, but the distributions are far from
      perfect. Broken packages and hardware incompatibilities abound,
      and it often takes a fair amount of tweaking on the part of 
      the users to get a working system. Unfortunately, it is possible
      that progress in this area is reaching its limits under
      the current Unix file hierarchy. Historically, Unix wasn't 
      designed as an decentralized, open-source 
      development environment; it was the product of a single vendor with a 
      proprietary code base. And for decades the Unix clones that arose 
      were also monolithic in this way. Individuals and groups might have
      written software packages to be installed atop the core system, but 
      rarely would a group duplicate core functionality that was 
      provided by the vendor. But now the landscape has changed, 
      and virtually every component of the open-source Unices is an
      independent development. And the foundation simply hasn't scaled
      to meet the new demands.
  The Need for Change
    Before getting into the proposed solution, package 
      modularization, it might first be useful to convince the
      skeptics that something really needs to be done.
    Traditionally, Unix files were
      stored according to usage: All system binaries would be kept
      in one location, all configuration files in another, all libraries
      in yet another, and so on. While it worked fine when 
      there were only a handful of independent groups producing software and
      signifant changes took place over years rather than weeks,
      this model is hopelessly broken in an era of rapid modular
      open-source development where the numerous components of an
      operating system are being developed at different paces and
      most with little attention paid to the pace of any others. 
      If the community development model is to operate at maximum
      efficiency, it needs an infrastructure that better 
      accomodates its needs.
    To the critics who would voice cries of heresy at the notion
      of breaking tradition, take a moment
      to consider the current state of affairs on a typical Unix system: 
      
- Configuration
          files are thrown together in a mindless heap under /etc, making
          it a painstaking chore to determine which packages rely on which
          configuration files. It may sound nice on the surface to have
          all of your configuration files in one place, but when they are
          given arcane names and contain little documentation, the purpose
          of them is often difficult to determine to one not intimately
          familiar with the system in the first place. Furthermore, 
          developers have to take care not choose a name that is already
          used, and there is no effective way to avoid naming conflicts.
          When a single vendor was in charge of everything under /etc this
          was not a problem, but when hundreds of independent developers
          are contributing to the system, the odds of a naming conflict
          increase dramatically. To circumvent this problem, some developers
          have taken to putting their configuration files under subdirectories
          in /etc, but unfortunately there is little or no standardization to
          this procedure and the chances of naming conflicts still exist.
          The only reliable way to avoid this problem is to store
          program configuration files with the packages, which (adding to
          the chaos) is exactly what many packages have chosen to 
          do.
 - System executables are scattered all over the place with
          no apparent consistency. On a typical system you will might
          find a particular executable that you're looking for in
          any one of /bin, /sbin, /opt/bin, /usr/bin, /usr/sbin, /usr/X11/bin,
          /usr/bin/X11, /usr/ccs/bin, /usr/ucb/bin, /usr/local/bin,
          /usr/local/sbin, as well as in subdirectories of a particular 
          package or in some random location according to the whims of 
          the system administrator(s). The reason for so many "official" 
          directories is really only a matter of
          historical interest; the important point is that on most
          systems the executables are organized with little regard
          for logic and consequently can be a burden to 
          maintain.
 - System libraries are generally tossed under /lib or
          /usr/lib, which is convenient but has caused significant
          problems when multiple library versions need to be supported.
          Major and minor numbers are a reasonable attempt to solve
          this problem, but the resultant structure is quite clumsy,
          inelegant, and error-prone. A hierarchical organization would 
          achieve the same goal (along with adding other benefits), and
          also unify the filesystem when implemented in conjunction
          with the modularized hierarchy presented in this 
          document.
 - Installation of multiple versions of system executables
          is essentially impossible without severe limitations.
          How does one install multiple versions of a program and 
          have both simultaneously available? If version x of a program
          is desired as the default because it offers enhanced 
          functionality, but a certain 
          script relies on version y, what is the solution? Currently
          there is no standardized solution, and not even a good solution 
          that is in widespread use.
 - Rollbacks can be a nightmare. If the installation of a new
          package breaks something on the system, it can bring even
          an experienced system administrator to his knees.
          With the current file
          organization, new packages are forced to either overwrite
          previous files or come up with clumsy, unstandardized renaming 
          conventions (like the deplorably useless "bash2"). If 
          existing files get overwritten, what is one's recourse when
          the new package is found to be broken? What if the new package
          was a broken version of the package retrieval tools, or an
          widely used essential system library?
 
 
     The list could continue with more examples of the problems 
       of the current system, but hopefully this is enough to convince 
       those resistant to change that the hurdles faced are practically
       insurmountable given the current foundation. In the early days 
       of a few corporate vendors providing most of the software,
       none of the above would have been issues. But for
       a decentralized system relying on the efforts of thousands
       of uncoordinated independent efforts, the current Unix file
       hierarchy is wholly inadequate.
  A Modular Solution
    The modular solution proposed is to create a package
      hierarchy in which all software packages have a unique
      location (kind of a like a namespace for you xml folks :). 
      The proposed schema for this is:
      
        /pkg/group/name/version/distributor/distid/.pkg
      
      The following is a concrete example of a translation for an 
      existing application and distribution:
      
        /pkg/gnu/gnome/gnumeric/0.64/debian/i686-source/.pkg
       
    The abstractions are defined as follows:
    
- pkg : The top-level directory under which all software
        packages are stored.
 - group : Zero or more directories that correspond 
        to the groups (if any) to which the package belongs.
 - name : The name of the software package being 
        distributed.
 - version : One or more entries that correspond to
        the particular version of the application. Multiple 
        version directories are allowed for fine grained version
        management (multiple version levels will serve a role similar
        to  major and minor numbers on system libraries).
 - distributor : The person or group responsible for packaging
        and distributing the software. The distributor may consist
        of 1 or more entries.
 - distid : The distributor ID for the package. The ID
        may consist of 1 or more entries.
 - .pkg : The directory under which all files are actually
        stored.
 
 
    [The peculiar-looking .pkg directory was chosen for a
      good reason which I haven't had time to explain yet. 
      Explanation will be added shortly.]
  Peer Distribution
    Because the package modification and creation process 
      is significantly simplified through modularity, we should
      see people increasingly motivated to contribute personalized
      distributions of various software packages.
      If this is the case, it would follow that a wider variety
      of specially-configured software would generally be available,
      and that obscure systems would have an easier time finding
      compatible software. Individual competition is also
      fostered in this model, as one is not limited to searching
      for either a "debian" package or a "redhat" package, but
      instead will have a sea of options that allow the best
      individual modules to be selected, rather than the set
      of modules distributed under the best brand name.
      By reducing the distributor to merely
      a field in the hierarchy (1) it encourages the interoperability
      of software packages from multiple distributors, (2) it levels
      the playing field for individuals or small groups to make 
      significant distribution contributions, (3) it discourages
      OS fragmentation by relieving the end-users from having to make
      a choice between one distributor or another.
 
 
    It will be interesting to see, however, how popular
      such a model becomes amongst the major distributions.
      Shareholder confidence may falter if a vendor can't 
      make some guarantees of generating revenue through
      vendor lock-in, and conceit may keep the non-profits
      from reducing themselves to a mere field in the hierarchy.
      In the long run, however, as long as there
      are motivated individuals who see the value in an open,
      standardized package distribution schema, the cause will
      not be lost. Distributors who resist adapting to the 
      streamlined model should eventually fall out of favor
      as the community effort ultimately produces a better
      product.
  Standardized Directories
    As previously mentioned, the .pkg directory is the
      repository for the package's files. And example .pkg
      directory might contain the following:
      
      .../.pkg/bin
      .../.pkg/include
      .../.pkg/info
      .../.pkg/lib
      .../.pkg/man
     
    Though there is no way to mandate the directory names in
      a decentralized system, it is important that developers make
      an attempt to follow unenforced standards for their directory
      choices. The following guidelines are recommended:
      
- bin : Executables (scripts or binaries) that 
           are intended to be called directly by a user
 - class : Java class (.class) files
 - conf : Program configuration files
 - doc : Documentation files that are not in 
          man or info format
 - include : C/C++ header files
 - info : GNU info files
 - java : Java source (.java) files
 - javadoc : Javadoc (Java documentation files)
 - lib : Libraries necessary for program 
          operation
 - man : Unix man pages
 - share : Files provided by the distribution 
          that should be accessible to users or programs, but do 
          not fit into  one of the other categories
 
      Obviously not all of the directories will be used by every
      program, and unfortunately there are numerous programs already
      in existence that use directories other than these for the 
      same purposes. However, in the long run it is likely that 
      distributions that follow standards will come to be more 
      popular and natural selection will weed out those that 
      don't.
 
  Linking it all Together
    If you're wondering what kind of chaos this is going to
      introduce to one's PATH variable, the answer will be a pleasant
      surprise: You need only one entry in the path. By standardizing
      the name for the executable directory (bin), one can create
      a single bin directory on the system, and maintain references
      to the different versions through symbolic links. Changing
      the system-wide default version of a program would not mean
      changing a hairy PATH variable; it would simply mean redirecting
      the links in the system bin directory through a simple scripted
      interface. Furthermore, since you'd be operating on entire
      packages, the effect is also extended to any man pages, libraries,
      class files, etc, that are included with the package. Switching
      between two versions of the same package becomes a one-line
      operation, and there's even the option of easily managing the
      use of intermingled files from two different versions.
  
  Checksums
    Because a peer distribution model comes with inherent risks,
      it should be standard practice to compute an md5 checksum of all
      files in the package before archiving, and to always verify the
      checksums on extracting. In addition, a checksum should be
      run on the overall archived file. In this way one is able
      to guarantee both integrity and searchability.
    
    For example, consider the distributor Alice who takes the 
      "foo" source code, compiles it for machine X, and archives
      it for distribution. On archival the checksum is computed
      for every file in the package and stored 
      along with the rest of the files in the package. The tarball
      is then created, gets its own checksum computed, and is
      renamed to its checksum. Now Alice can put her archive
      into any public distribution pool (e.g. a peer-to-peer client
      such as 
Freenet
      or an
      anonymous ftp site), and as long as she can reliably 
      distribute her search key (which contains the tarball checksum), 
      users will be able to locate her unique archive amongst the
      others in the pool, as well as verify the integrity of the
      package once it was retrieved.
 
  Recursive Linking
 
    In order to maximize the ability to select a version
      of a package to work with, the system administrator may choose
      to implement "recursive linking".
      Recursive linking is the process of setting up symbolic links
      recursively up the hierarchy that will allow users to 
      select to use certain packages by a symbolic name (e.g.
      "stable" or "recent") rather than a version or build number.
      For example, consider the following paths:
      
        /pkg/gnu/bash/1.14/debian/i386-01
        /pkg/gnu/bash/2.04/debian/i386-01
        /pkg/gnu/bash/2.04/redhat/i386-01
        /pkg/gnu/bash/2.04/redhat/i686-01
      
      The administrator has the option here to create symbolic links
      at points along the hierarchy that
      mean something to the casual user. For example, the admin may
      create the link 
.stable->2.04/debian/i386-01 
      at the version level so that a user only wishing to have a stable
      version of bash in her path needs only send a request for
      "stable", without having to know anything about which version
      is being requested.
 
    
  Self Maintence
    With the kind of flexible, dynamic model that supports
      and even encourages multiple versions of the same package, 
      a self-cleansing method must be implemented so that the user
      is not burdened with manually tracking down obsolete packages.
      Unix natively supports a number of ways to do this, including:
      
- Checking the last access time of files in the 
          package.
 - Scanning the system for symbolic links to files
          in the package.
 - Checking the dynamic dependencies of system 
          executables to see if they rely on any shared libraries 
          in the package.
 
 
    Any or all of these methods could be implemented, depending
      on how thorough the user wished to be. Also, the level of
      interaction is arbitrary: for a novice user the entire process
      could be wrapped up in a cron job with no prompting at all, 
      while an expert might want constant prompting for confirmation
      of what was being deleted.
  Conclusion
    
    Not ready for that yet ;)
  
Authors: