See also: IRC log
poll shows a few in favor of world-visible records; none in favor of confidential records
RESOLUTION: world-visible records
TimBL: Creative Commons Attribution Non-Commercial :)
Presenter: Matt Mackall, Selenic
<DanC> The Mercurial SCM
TimBL: slides don't seem to work with safari. remote/slidey issue?
<timbl> http://www.w3.org/Talks/Tools/Slidy/slidy.js Line 146: Can't hanlde non-ascii in string: "flèches"
<RalphS_> The Mercurial SCM (title slide)
<RalphS_> Why a New SCM? (slide 2)
Matt: "your tea is still warm"
came from someone at Sun, noticing merges were faster
... given two lines of history; stable branch and unstable
branch
... bug fixes often committed to 'stable' but you want them
also in 'unstable'
... each time you pull patches across to 'unstable' you want to
remember what happened the last time you merged
<Zakim> ericP, you wanted to ask for a use case for a repeated merge
EricP: like cvs -j ?
Matt: right. cvs and svn do not
remember the previous merge point
... so you have to revisit all the same conflicts again
<RalphS_> Why a decentralized system? (slide 3)
<karl> offline work++
Matt: centralized systems are
burdensome for people who work without write access as they
have larger merges to deal with, thus slow
... consider folks like freebsd developers; a few people have
write access to the repository, a large number of people have
to go through those few for commits
... so benefits of version control are not available to
many
... better to have ability to do local commits
... permit off-line commits
... sync whenever you feel you need to do so
<RalphS_> The Mercurial basics (slide 4)
Matt: every hg repository is a
working directory ala cvs plus a private store
... explicit checkout not required
... store has complete private copy of entire project
history
... no need to communicate with original server to do work on
the project
... the copy you have is equivalent to what you got from the
original server; it has full revision information
<RalphS_> Making a commit (slide 5)
<DanC> (the first time I used hg on a plane, it felt really revolutionary. http://dig.csail.mit.edu/breadcrumbs/node/96 )
<Zakim> timbl, you wanted to ask Is the store append-only? Optimised for getting latest version?
Tim: does the store keep the
history as append-only?
... for what is it optimized? latest version or fetching full
history?
Matt: append-only. small delta
tacked on to end of file for revision n+1
... uses
... uses 'revlog' format that I invented
... rougly like format of MPEG file
... small deltas and periodic full images
... you can seek directly in and read a small chunk
<DanC> revlog design note
Matt: the amount of data you have
to read is equivalent to the uncompressed size of the
content
... so both append-only and very fast
... for files with very long history the index is separate and
also append-only
... for files with historys < 128k, the index is combined in
the same file and read with one i/o
<Zakim> ht, you wanted to check about the nature of revs. . .
Henry: in cvs, every file has its
own revision history
... if you've chosen to branch you name the files in that
branch
... whereas in svn the world has a revision history and every
change is considered a change to the world
<DanC> (this is agendum 4, btw)
Henry: have I framed this correctly? which is hg?
Matt: hg makes atomic
project-wide snapshots
... revlog for each file plus a "manifest"
... manifest is a list of all files in a project for a given
changeset
... recursive pointers into manifest
... so hg is in the latter camp; a project-wide sense of the
state of each file
<RalphS_> Revisions, Changesets, Heads, and Tip (slide 6)
<timbl> I wonder whether projects can be nested.... concerned about 'project-wide" scalability... I giess that is agendum #4
<ericP> recursive hash, like rsync?
<RalphS> [I suppose, then, that the act of starting a new project is "collecting tips" :) ]
<RalphS_> Cloning, Merging, and Pulling (slide 7)
[note, slide 7 is animated]
<tlr> so both Alice and Bob need to do a merge, even though Bob had done one already?
<tlr> +
Matt: hg also has 'pushing',
opposite of 'pulling'
... pushing requires more privs
Thomas: am I understanding that both Alice and Bob do merges of e,f,g that lead to the same h?
Matt: no, Alice pulls Bob's
revision h
... so if she hasn't worked in the meantime she is guaranteed
to get Bob's h
... it's not uncommon to regularly merge back and forth
... in general it's very little work to do one of these
merges
<Zakim> timbl, you wanted to ask whether pulling needs a hg server running on the pulled site, or just read access (http, file:. ftP:. etc)
Tim: push requires write access. does pull work by reading arbitrary files/ftp/http or do I need an hg server?
Matt: see next slide ...
<RalphS_> Multiple ways to share repositories (slide 8)
<DanC> (an example of the CGI interface: http://homer.w3.org/~connolly/projects/ )
Matt: can put a bunch of revlogs
in an http directory but this is slower
... in practice, most people use the hgserve approach
... "bundles" are highly compressed version of the
project
... bundles use the hgserve wire format
... tarballs can be pulled from hgserve
... so browsable source tree has a 'pull' button
... can pull raw versions of patch log via Web interface
... any network filesys will work
<RalphS_> Using cheap branches to manage changes (slide 9)
Matt: the idea is that branches are very light-weight; create and destroy them at will
<RalphS_> New features can be added with extensions (slide 10)
Matt: open source project have
found mq to be useful in managing streams of updates as
patches
... forest extension manages 'subprojects'; whereas cvs has a
tree of repositories, the notion of subproject is more
difficult in hg
<RalphS_> For more information (slide 11)
<mpm> http://hgbook.red-bean.com/
<RalphS_> current manual (replaces URI in slides)
<Zakim> timbl, you wanted to ask whether you can also brek out a subproject/
Tim: for a large project that grows and grows, will forest let me split off a subproject?
Matt: yes, if I'm understanding
you
... a project is a boundary where you sensibly want to do
atomic commits
... given two components in which you want to make simultaneous
changes in an atomic commit, these should be in the same
'project'
Tim: organization of projects
might not be so tied to atomic commits
... issue of clashes does not arise so often in some styles of
work
... so management issue is to permit a single checkout to get
everything
... nice thing about svn is that a subdirectory can be declared
to be a soft link to a project somewhere else
... svn subproject can use its own access protocol
Matt: that's effectively what
'forest' does
... a forest is a set of projects and their relevant
changesets
... I'm not as familiar with forest, as it was written by
someone else but that's what people are using it for
<Zakim> DanC, you wanted to ask about splitting out, e.g. a utility tool from a big project
DanC: suppose I started writing a module as part of a project but then decide it should be split out. Is there a straightforward way to do this?
Matt: no, no easy way to delete
old changesets or remove files from a tree
... hg histories are more immutable than in other systems
... the obvious way to do this would be to clone a project and
delete all the unwanted stuff
... in future we may allow trimming history; e.g. 'commits
before rev X are no longer relevant', or 'commits outside this
tree are no longer relevant'
EricP: is it easy to write code that hacks the revlogs, like sed on cvs histories?
Matt: yep, people have been successful in hacking pretty low-level things
DanC: there's an API too, don't assume you can just hack files
<ted> not to worry eric, i bet there's a mode or will be shortly
Matt: several other projects have
cloned the revlog approach now
... e.g. monotone, bzr
... svn may also be looking at adopting revlog
Tim: describe some typical
topologies
... large numbers of developers, with subgroups who sync with
each other but not the 'main' repository
... others hacking with only email access
... what sort of workflows are established?
... I worry about patches that are shared only between a subset
of developers who heard about them
... so accidentially you have a patch that lots of people have
but never gets into the 'main' repository
... hash across the project but no single place to go for the
'tip'
Matt: I think the optimal model
is the linux model
... a central person, Linus, who is the only one with push
access to a central repository
... most efficient if everyone only does pull
... with push you wind up with two heads
... second 'pusher' gets a message instructing them to do
pull-merge-push
<tlr> yikes, that means it won't work well with datespace
Matt: if a single person does
pulls from 'lieutenants' then a push to the central repository
it's more efficient
... scales well; can assign lieutenants for subprojects and
they can do pull-merge for their subprojects
Tim: ironic that the technology is Web-like and decentralized but works best with a centralized social process
Matt: in the end you want to wind
up with one version
... in fact there are many long-lived independent branches,
even in the linux world
Eric: in the W3C case we have
'dev' that is largly synchronous and a 'cvs.w3.org' space that
is asynchronous
... we care that the code in dev.w3.org is synchronized
... but the Web space cvs.w3.org largely has pages that do not
depend on each other
... in the svn model that requires a single revision number for
100k's of documents ...
... how would you arrange this in hg?
Matt: we currently do not
implmeent 'partial repositories' where you'd ignore a subset of
the repository
... there are existing projects with order 100k's of docs
... but you'd probably not want to use hg to do wikipedia
<DanC> (this is what I meant by "Fractal project organization" again, fyi)
Matt: I've had a back-burner idea
of doing something rcs-like, managing a single file but keeping
O(n)
... basically breaking out revlog but with a single file for
management
... for a 100k document case you'd want to break this into
smaller projects
EricP: perhaps year-month
<ted> 1.75M resources in www.w3.org webspace at present
Matt: you don't want to divide by time, you want to divide across the tree
DanC: the W3C naming scheme lets you use year-month as a default naming scheme, files can still change
<Zakim> timbl2, you wanted to ask whether one could turn off the project-wideness
Tim: I can imagine someone
wanting to establish a new project at a given point in the
www.w3.org tree
... 'this is a new software system' or 'this is a
tutorial'
... making a self-consistent subsystem
... but such a subsystem is unusual
... e.g. these meeting minutes do not need to be synchronized
with anything else
... initial description of hg sounds similar to cvs
... how big a change would it be to remove the project-wide
idea?
Matt: pretty large change. easier
to work with something like forest
... or start with project with many subprojects and mark
subprojects as 'ignore for commit'
... this part of the problem space is very different from cvs
and a little tricky for people to adapt to if they've
structured their projects for the way cvs works
... these hurdles can be surpassed but you do have to think
about things a little differently
DanC: thinking more rcs-like?
Matt: yes, if everything is
really independent then you probably want something like
rcs
... wikipedia only thinks about changing single files
Eric: on a human level we
frequently manage file dependencies
... if you could say '{...} are independent by default, {...}
are mutually dependent, ...' that might be useful
Matt: that's blue-sky from where
we are today
... if we wanted to support wikipedia with what we currently
have we understand the changes but we haven't worked on it
<ht> I really don't see how the structuring into projects makes any sense for our datespace
<Zakim> timbl1, you wanted to ask about comparisons with Darcs
Matt: I understand that Darcs is
(a) orders of magnitude slower and (b) magical
... has nice properties for cherry-picking in a way people seem
to like but that I believe are problematic
<tlr> http://darcs.net/DarcsWiki
Matt: cherry-picking is ...
... given 2 branches where you want to bring in single changes
without regard to surrounding history, Darcs lets you do
this
... what I understand from bzr folk is that the way Darcs
handles patches internally it's possible to reorder patches
such that older versions of the project cannot be
reconstructed
... if what I've heard is correct and we're understanding this
correctly, this is a serious failing in a version control
system
... so (c) Darcs may have some theoretical and practical
serious issues
Tim: I'm surprised to hear that Darcs may have this issue; I thought it was append-only
DanC: no, what was revolutionary about Darcs was that patches commute
Ted: when groups do [lots of
branching], it has tended to become a big headache
... but the lieutenant model who are responsible for merging
upward ...
... would be nice to get all revision history along with [the
lieutenant's merge]
... capture all the history in the main repository
DanC: yeah, but consider 'why hg'
initially
... using cvs as the 'main truth' causes you to loose
interesting properties of the whole system
Matt: complicated merges in cvs
require lots of wizardry
... cvs admin-type person has to understand nitty gritty
details of cvs
... much less the case with hg; once you know the basics of
push, pull, merge there's not a lot more
... you avoid big cvs merge problem
... people tend to do their own merges without a lot of
help
<ted> branching and merging in cvs with our wgs has often been a headache when introduced, the central repo and sticking with a main branch has simplified life for most
<ted> merging from numerous branches seems like it could get very tedious
<ted> what might work with our environment is if a group wishes to go off with a decentralized repo with a handful of 'leutenants' to handle collecting and committing to main [cvs] tree that could work
<ted> it would be nice to get the revision history as part of that though
<gerald> (+1 to ted)
<ted> my biggest concern in changing our base revision system is migrating 300+ users to another platform, the user support for these users has cost us considerably
DanC: one of the motivations for
this Project Review ...
... main www.w3.org cvs repository (cvs.w3.org) seems not a
good match for hg
... a long time ago we split out dev.w3.org so world could read
the history
... socially dev.w3.org works like a bunch of independent
projects
... in this style of work, hosting 200 hg projects on a server,
does this use more disk i/o, etc [than cvs]?
Matt: hg is designed to minimize i/o
<gerald> (I'm more worried about user support and migration costs than CPU/IO which is relatively cheap)
Matt: if you're using a fast cgi
approach, like @@ framework
... you can plug it into any of the pythong cgi interfaces;
zope sorts of things
<ted> we would have to hack quite a few tools for hg if it results in mirroring
Matt: if you're doing something
that generates a lot of load you should use fast cgi
... so indices are kept in memory
... there's an import of freebsd repository in to hg and they
haven't yet gotten to the point where they need fast cgi
... if you get to the point where a single system is too slow
you simply clone it and do round-robin load balancing
Tim: and somehow sharing patches?
Matt: you'd probably keep a
single push point
... commits have to be single-threaded
... hg backend uses lock-less pull
... multiple readers, one writer
... so no lock contention
... e.g. dev{1,2,3}.w3.org could all pull from a backend
server, push.w3.org
... pushes would all go to push.w3.org
DanC: are there any statistics [for dev.w3.org]?
<Zakim> s-mon, you wanted to ask about Matt's perspective on how folks have dealt with migrations.
Simon: there exist tools for importing cvs repositories into hg. how well have these worked in practice?
Matt: small projects get along pretty well.
<s-mon> tools for cvs<>hg conversion - http://www.selenic.com/mercurial/wiki/index.cgi/RepositoryConversion?action=show&redirect=ConvertingRepositories#head-8f6fdc4a130232720c51de0b4417e213898f28ad
Matt: projects that do repository hacking are problematic
<DanC> (my experiense is that the migration tools are pretty immature. )
Matt: firefox developers have had
more trouble than others
... freebsd folks seem to get along ok, though they've not yet
committed to using hg
... the biggest trick with cvs is converting histories to
changesets
... requires going through the history and identifying
co-occurring changes as changsets
<Zakim> ht, you wanted to raise agendum 9
Matt: the tools to do this can get confused
Henry: seems to me that there are
3 parts of W3C usage we'd need to consider:
... (1) datespace, for which hg doesn't seem the right model;
single-document-orientation more appropriate
... (2) /TR space, for which we now tend to have a subdirectory
for a document containing a set of files
<sandro> ohhhhh. Have the pubs from a given WG be a Project.
<DanC> sandro, that raises all the fractal questions: what about WDs shared by the Query and XSL WGs.
Henry: having each TR document be
a "project" may be appropriate
... but design goal of making branching and merging fast is
borderline irrelevant for /TR
... though sometimes we do have both editor and webmaster
simultaneously changing doc at the last minute
... (3) an Area; e.g. within XML, SVG, editors tend to have
workspaces
... some editorial teams may find branching and merging
relevant
... might have been helpful in cases where MSM and I worked
together
... (4) dev.w3.org clearly has projects and branching/merging
might well support that community better
<Zakim> plh, you wanted to ask for eclipse extensions
Philippe: I've been very
successful getting people to use eclipse for dev.w3.org
... because they're pretty autonomous and eclipse integrates
ssh support\
... is there any support for eclispe within hg?
<DanC> (googling yields http://www.vectrace.com/mercurialeclipse/ )
Matt: yes, there is an eclispe
plugin but I've not used it myself
... there may even be competing eclispe plugins
... but I can't speak from personal experience how useful they
are
... see wiki
... 'OtherTools'
<mpm> http://www.selenic.com/mercurial/wiki/index.cgi/OtherTools
<Zakim> ted, you wanted to follow ? of plh's re clients
Ted: biggest job in getting new
W3C users has been cvs startup
... what experience can you relay on hg learning curve?
Matt: first big project adopter
was xen, a linux hypervisor
... hg started in April 2005
... hypervisor started using hg in June/July 2005
<DanC> "Xen - a free hypervisor for virtualising kernels" among http://www.selenic.com/mercurial/wiki/index.cgi/ProjectsUsingMercurial
Matt: we got 3 or 4 patches from
them then they went quiet because they were happy
... I assume they're still happy
<ted> +1 to ht, our user base's tech skill sets varies widely
<DanC> yes, good point, ht (that our user base is very different from most opensource software dev projects - they would all know about SCM systems before, whereas many of ours have never seen one before )
Matt: Sun sends some questions;
they have long-lived software processes that were closely
adapted to teamware system and were slight mismatch for
hg
... for the most part, Sun's questions are not about usage but
about obscure bugs
... so I think people adapt to the hg model quickly once they
understand push and pull
Ted: hypervisor is a group of
geeks
... W3C [document editors] have very varied skill sets
... editor-type people in the mix
... how well do these people adapt?
Matt: not much experience
there
... comments are that hg is easier to understand than cvs
... so people adopting version control for the first time have
little difficulty
... I've tried to make hg similer to cvs where that was
sensible
... but I've been annoyed by the usability of cvs so I've fixed
some of that
... hopefully hg is easier to use than cvs
Thomas: in W3C datespace we have
a lot of concurrent editing of independent files in a tree that
is only Team-editable
... and an occasional subdirectory in which we give Member
write access
... these are confined and @@ projects
... it's often painful to manage access rights in these
subdirectories, give access to changelogs, etc
... could we adapt hg or something like it into a part of W3C
webspace?
... use this as a way to grant Member write access?
DanC: I've run that experiment
with the GRDDL test repository
... it sort-of worked
... it's straighforward to export cvs history into hg a little
bit at a time
<ht> Test suites are another interesting example -- project concept really doesn't make a lot of sense
DanC: but hg is more expressive
than cvs so importing hg history back into cvs loses data
... there are things in the hg history that have no analog in
cvs
Thomas: I'd accept loss of history data as a way to experiment with hg
<gerald> +1 to Thomas
Tim: save the revlogs in cvs?
Matt: it's theoretically possible to build a cvs gateway to hg; something that looks like a cvs server but is backed by hg
<tlr> I'm looking at CVS purely as a way to bridge an hg repository into web space
DanC: such a gateway would have to round off various corners
Matt: yes, the gateway would be responsible for that
Henry: would work fine from cvs point-of-view, only problem would be an hg user also making changes
Matt: a cvs client could still checkout and would get an approximation of the log
<ericP> i need to go. thank you very much, Matt and DanC for putting this together
Matt: have an acl extension for
managing push permissions for parts of a tree
... even without direct commit access it's still useful to be
able to make local commits and use this to publish patches,
produce changesets, etc.
... send one of these changesets to someone with push
permission
DanC: Thanks, Matt!
[adjourn]
<Zakim> ted, you wanted to ask about access control in distributed environment where people can pull from others
Ted: is there a way to specify
'upstream' what can be pulled from the next generation?
... e.g. when someone without write access still makes changes
available to others for pull
Matt: no, nothing like this. something like gpg'd files but that would be inefficient
<timbl> RSS?
<ted> so one could circumvent inadvertently patent policy and member confidentiality
Ted: we have policies about
Member confidentiality
... sometimes people don't realize what category a given
document falls into
... so inadvertently share a private document
Matt: no provisions in hg for any sort of rights management