comparison ja/concepts.tex @ 781:93d19b27859c

started concepts.tex
author Yoshiki Yazawa <yaz@honeyplanet.jp>
date Tue, 12 May 2009 17:49:11 +0900
parents b2d447356c42
children 43556decb81f
comparison
equal deleted inserted replaced
780:2a93b1d35990 781:93d19b27859c
1 \chapter{Behind the scenes} 1 %\chapter{Behind the scenes}
2 \chapter{$BIqBfN"(B}
2 \label{chap:concepts} 3 \label{chap:concepts}
3 4
4 Unlike many revision control systems, the concepts upon which 5 %Unlike many revision control systems, the concepts upon which
5 Mercurial is built are simple enough that it's easy to understand how 6 %Mercurial is built are simple enough that it's easy to understand how
6 the software really works. Knowing this certainly isn't necessary, 7 %the software really works. Knowing this certainly isn't necessary,
7 but I find it useful to have a ``mental model'' of what's going on. 8 %but I find it useful to have a ``mental model'' of what's going on.
8 9
9 This understanding gives me confidence that Mercurial has been 10 $BB?$/$N%j%S%8%g%s%3%s%H%m!<%k%7%9%F%`$H0c$C$F!$(B Mercurial$B$NF0:n$N4pK\$H$J$C(B
10 carefully designed to be both \emph{safe} and \emph{efficient}. And 11 $B$F$$$k%3%s%;%W%H$rM}2r$9$k$3$H$OMF0W$$!%(B
11 just as importantly, if it's easy for me to retain a good idea of what 12 $B$3$l$rM}2r$9$k$3$H$OI,$:$7$bI,MW$G$O$J$$$,!$I.<T$O!$2?$,5/$-$F$$$k$N$+$K(B
12 the software is doing when I perform a revision control task, I'm less 13 $B$D$$$F%b%G%k$r0U<1$7$F$$$k$3$H$OM-MQ$G$"$k$H9M$($F$$$k!%(B
13 likely to be surprised by its behaviour. 14
14 15 %This understanding gives me confidence that Mercurial has been
15 In this chapter, we'll initially cover the core concepts behind 16 %carefully designed to be both \emph{safe} and \emph{efficient}. And
16 Mercurial's design, then continue to discuss some of the interesting 17 %just as importantly, if it's easy for me to retain a good idea of what
17 details of its implementation. 18 %the software is doing when I perform a revision control task, I'm less
18 19 %likely to be surprised by its behaviour.
19 \section{Mercurial's historical record} 20
20 21 $B$3$N$3$H$rM}2r$7$?8e$G!$I.<T$O(BMercurial$B$,(B\emph{$B0BA4(B}$B$H(B\emph{$B8zN((B}$B$r<B8=$9(B
21 \subsection{Tracking the history of a single file} 22 $B$k$h$&$KCm0U?<$/@_7W$5$l$F$$$k$H3N?.$9$k$K;j$C$?!%(B
22 23 $B$^$?!$=EMW$JE@$H$7$F!$%j%S%8%g%s%3%s%H%m!<%k$NA`:n$r9T$&:]$K%=%U%H%&%'%"(B
23 When Mercurial tracks modifications to a file, it stores the history 24 $B$,2?$r$9$k$N$+$r5-21$KN1$a$F$*$/$3$H$K$h$C$F!$IT0U$N5sF0$G6C$/$3$H$,>/$J(B
24 of that file in a metadata object called a \emph{filelog}. Each entry 25 $B$/$J$C$?!%(B
25 in the filelog contains enough information to reconstruct one revision 26
26 of the file that is being tracked. Filelogs are stored as files in 27 %In this chapter, we'll initially cover the core concepts behind
27 the \sdirname{.hg/store/data} directory. A filelog contains two kinds 28 %Mercurial's design, then continue to discuss some of the interesting
28 of information: revision data, and an index to help Mercurial to find 29 %details of its implementation.
29 a revision efficiently. 30
30 31 $B$3$N>O$G$O$^$:(BMercurial$B$N@_7W$N%3%"%3%s%;%W%H$r%+%P!<$9$k!%$=$7$F<BAu>e$N(B
31 A file that is large, or has a lot of history, has its filelog stored 32 $B$$$/$D$+$N6=L#?<$$E@$N>\:Y$K$D$$$F5DO@$9$k!%(B
32 in separate data (``\texttt{.d}'' suffix) and index (``\texttt{.i}'' 33
33 suffix) files. For small files without much history, the revision 34 %\section{Mercurial's historical record}
34 data and index are combined in a single ``\texttt{.i}'' file. The 35 \section{Mercurial$B$NMzNr5-O?(B}
35 correspondence between a file in the working directory and the filelog 36
36 that tracks its history in the repository is illustrated in 37 %\subsection{Tracking the history of a single file}
37 figure~\ref{fig:concepts:filelog}. 38 \subsection{$B%U%!%$%kMzNr$NDI@W(B}
39
40 %When Mercurial tracks modifications to a file, it stores the history
41 %of that file in a metadata object called a \emph{filelog}. Each entry
42 %in the filelog contains enough information to reconstruct one revision
43 %of the file that is being tracked. Filelogs are stored as files in
44 %the \sdirname{.hg/store/data} directory. A filelog contains two kinds
45 %of information: revision data, and an index to help Mercurial to find
46 %a revision efficiently.
47
48 Mercurial$B$O%U%!%$%k$X$NJQ99$rDI@W$9$k;~!$%U%!%$%k$NMzNr$r(B\emph{filelog}$B$H(B
49 $B8F$P$l$k%a%?%G!<%?%*%V%8%'%/%H$K3JG<$9$k!%%U%!%$%k%m%0Fb$N3F!9$N%(%s%H%j(B
50 $B$O!$DI@WBP>]$N%U%!%$%k$N%j%S%8%g%s$r:F7z$9$k$N$K==J,$J>pJs$r;}$D!%%U%!%$(B
51 $B%k%m%0$O(B\sdirname{.hg/store/data}$B%G%#%l%/%H%j$K%U%!%$%k$H$7$FJ]B8$5$l$k!%(B
52 $B%U%!%$%k%m%0$O%j%S%8%g%s%G!<%?$H(BMercurial$B$,%j%S%8%g%s$r8zN(E*$K8+$D$1$i$l(B
53 $B$k$h$&$K$9$k$?$a$N%$%s%G%C%/%9$N(B2$B<oN`$N>pJs$r;}$D!%(B
54
55 %A file that is large, or has a lot of history, has its filelog stored
56 %in separate data (``\texttt{.d}'' suffix) and index (``\texttt{.i}''
57 %suffix) files. For small files without much history, the revision
58 %data and index are combined in a single ``\texttt{.i}'' file. The
59 %correspondence between a file in the working directory and the filelog
60 %that tracks its history in the repository is illustrated in
61 %figure~\ref{fig:concepts:filelog}.
62
63 $B%5%$%:$NBg$-$J%U%!%$%k$d!$KDBg$JMzNr$r;}$D%U%!%$%k$O!$%G!<%?$,(B
64 (``\texttt{.d}'' suffix) $B$*$h$S%$%s%G%C%/%9(B (``\texttt{.i}'' suffix)$B$N%U%!(B
65 $B%$%k$KJ,3d$5$l$?(Bfilelog$B$r;}$D!%%5%$%:$,>.$5$/!$MzNr$NBg$-$/$J$$%U%!%$%k$O(B
66 $B%j%S%8%g%s%G!<%?$H%$%s%G%C%/%9$,(B1$B$D$N(B``\texttt{.i}''$B%U%!%$%k$K7k9g$5$l$F(B
67 $B$$$k!%%o!<%-%s%0%G%#%l%/%H%jFb$N%U%!%$%k$H%j%]%8%H%jFb$NMzNr$rDI@W$9$k(B
68 filelog$B$H$NBP1~$r?^(B~\ref{fig:concepts:filelog}$B$K<($9!%(B
38 69
39 \begin{figure}[ht] 70 \begin{figure}[ht]
40 \centering 71 \centering
41 % \grafix{filelog} 72 % \grafix{filelog}
42 \includegraphics{filelog} 73 \includegraphics{filelog}
43 \caption{Relationships between files in working directory and 74 % \caption{Relationships between files in working directory and
44 filelogs in repository} 75 % filelogs in repository}
76 \caption{$B%o!<%-%s%0%G%#%l%/%H%jFb$N%U%!%$%k$H%j%]%8%H%j$N%U%!%$%k%m%0(B
77 $B$N4X78(B}
45 \label{fig:concepts:filelog} 78 \label{fig:concepts:filelog}
46 \end{figure} 79 \end{figure}
47 80
48 \subsection{Managing tracked files} 81 %\subsection{Managing tracked files}
49 82 \subsection{$BDI@W$5$l$F$$$k%U%!%$%k$N4IM}(B}
50 Mercurial uses a structure called a \emph{manifest} to collect 83
51 together information about the files that it tracks. Each entry in 84 %Mercurial uses a structure called a \emph{manifest} to collect
52 the manifest contains information about the files present in a single 85 %together information about the files that it tracks. Each entry in
53 changeset. An entry records which files are present in the changeset, 86 %the manifest contains information about the files present in a single
54 the revision of each file, and a few other pieces of file metadata. 87 %changeset. An entry records which files are present in the changeset,
55 88 %the revision of each file, and a few other pieces of file metadata.
56 \subsection{Recording changeset information} 89
57 90 Mercurial$B$O(B\emph{$B%^%K%U%'%9%H(B}$B$H8F$P$l$k9=B$$rMQ$$$F!$DI@W$9$Y$-%U%!%$%k(B
58 The \emph{changelog} contains information about each changeset. Each 91 $B$N>pJs$r=8$a$F$$$k!%%^%K%U%'%9%HFb$N3F!9$N%(%s%H%j$O!$C10l$N%A%'%s%8%;%C(B
59 revision records who committed a change, the changeset comment, other 92 $B%HFb$KB8:_$9$k%U%!%$%k$N>pJs$r;}$C$F$$$k!%%(%s%H%j$O$I$N%U%!%$%k$,%A%'%s(B
60 pieces of changeset-related information, and the revision of the 93 $B%8%;%C%H$KB8:_$7$F$$$k$+!$$=$l$i$N%j%S%8%g%s$,2?$G$"$k$N$+$H$$$&>pJs$H!$(B
61 manifest to use. 94 $B$$$/$D$+$NB>$N%U%!%$%k%a%?%G!<%?$r5-O?$7$F$$$k!%(B
62 95
63 \subsection{Relationships between revisions} 96 %\subsection{Recording changeset information}
64 97 \subsection{$B%A%'%s%8%;%C%H>pJs$N5-O?(B}
65 Within a changelog, a manifest, or a filelog, each revision stores a 98
66 pointer to its immediate parent (or to its two parents, if it's a 99 %The \emph{changelog} contains information about each changeset. Each
67 merge revision). As I mentioned above, there are also relationships 100 %revision records who committed a change, the changeset comment, other
68 between revisions \emph{across} these structures, and they are 101 %pieces of changeset-related information, and the revision of the
69 hierarchical in nature. 102 %manifest to use.
70 103
71 For every changeset in a repository, there is exactly one revision 104 \emph{changelog}$B$O3F!9$N%A%'%s%8%;%C%H$N>pJs$r;}$D!%3F!9$N%j%S%8%g%s5-O?(B
72 stored in the changelog. Each revision of the changelog contains a 105 $B$O!$C/$,JQ99$r%3%_%C%H$7$?$N$+!$%A%'%s%8%;%C%H$N%3%a%s%H!$JQ99$K4XO"$7$?(B
73 pointer to a single revision of the manifest. A revision of the 106 $BB>$N>pJs!$;HMQ$5$l$k%^%K%U%'%9%H$N%j%S%8%g%s$r;}$D!%(B
74 manifest stores a pointer to a single revision of each filelog tracked 107
75 when that changeset was created. These relationships are illustrated 108 %\subsection{Relationships between revisions}
76 in figure~\ref{fig:concepts:metadata}. 109 \subsection{$B%j%S%8%g%s4V$N4X78(B}
110
111 %Within a changelog, a manifest, or a filelog, each revision stores a
112 %pointer to its immediate parent (or to its two parents, if it's a
113 %merge revision). As I mentioned above, there are also relationships
114 %between revisions \emph{across} these structures, and they are
115 %hierarchical in nature.
116
117 $B%A%'%s%8%m%0!$%^%K%U%'%9%H!$%U%!%$%k%m%0Fb$G!$3F!9$N%j%S%8%g%s$OD>@\$N?F(B
118 $B!J$"$k$$$O%^!<%8$N>l9g$O$=$NN>?F!K$X$N%]%$%s%?$r;}$D!%(B
119 $B$9$G$K=R$Y$?$h$&$K!$$3$N9=B$$K8=$l$k%j%S%8%g%s$N4V$K$O4X78$,$"$j!$K\<A(B
120 $BE*$K3,AXE*$G$"$k!%(B
121
122 %For every changeset in a repository, there is exactly one revision
123 %stored in the changelog. Each revision of the changelog contains a
124 %pointer to a single revision of the manifest. A revision of the
125 %manifest stores a pointer to a single revision of each filelog tracked
126 %when that changeset was created. These relationships are illustrated
127 %in figure~\ref{fig:concepts:metadata}.
128
129 $B%j%]%8%H%j$K$"$k$9$Y$F$N%A%'%s%8%;%C%H$O!$%A%'%s%8%m%0Fb$G@53N$K(B1$B$D$N%j%S(B
130 $B%8%g%s$r;}$D!%%A%'%s%8%m%0$N3F!9$N%j%S%8%g%s$O!$%^%K%U%'%9%H$NC10l$N%P!<(B
131 $B%8%g%s$X$N%]%$%s%?$r;}$D!%%^%K%U%'%9%H$N3F!9$N%j%S%8%g%s$O!$%A%'%s%8%;%C(B
132 $B%H$,@8@.$5$l$?;~$N%U%!%$%k%m%0$N3F!9$NC10l%j%S%8%g%s$X$N%]%$%s%?$r;}$D!%(B
133 $B$3$N4X78$r?^(B~\ref{fig:concepts:metadata}$B$K<($9!%(B
77 134
78 \begin{figure}[ht] 135 \begin{figure}[ht]
79 \centering 136 \centering
80 \includegraphics{metadata} 137 \includegraphics{metadata}
81 \caption{Metadata relationships} 138 % \caption{Metadata relationships}
139 \caption{$B%a%?%G!<%?$N4X78(B}
82 \label{fig:concepts:metadata} 140 \label{fig:concepts:metadata}
83 \end{figure} 141 \end{figure}
84 142
85 As the illustration shows, there is \emph{not} a ``one to one'' 143 %As the illustration shows, there is \emph{not} a ``one to one''
86 relationship between revisions in the changelog, manifest, or filelog. 144 %relationship between revisions in the changelog, manifest, or filelog.
87 If the manifest hasn't changed between two changesets, the changelog 145 %If the manifest hasn't changed between two changesets, the changelog
88 entries for those changesets will point to the same revision of the 146 %entries for those changesets will point to the same revision of the
89 manifest. If a file that Mercurial tracks hasn't changed between two 147 %manifest. If a file that Mercurial tracks hasn't changed between two
90 changesets, the entry for that file in the two revisions of the 148 %changesets, the entry for that file in the two revisions of the
91 manifest will point to the same revision of its filelog. 149 %manifest will point to the same revision of its filelog.
92 150
93 \section{Safe, efficient storage} 151 $B%$%i%9%H$G<($5$l$F$$$k$h$&$K!$%j%S%8%g%s$H%A%'%s%8%m%0!$%^%K%U%'%9%H$^$?(B
94 152 $B$O%U%!%$%k%m%0$N4V$K$O(B1$BBP#1$N4X78$O$J$$!%%^%K%U%'%9%H$,(B2$B$D$N%A%'%s%8%;%C(B
95 The underpinnings of changelogs, manifests, and filelogs are provided 153 $B%H$N4V$GJQ2=$7$J$+$C$?>l9g$O!$%A%'%s%8%m%0Fb$G$3$l$i$N%A%'%s%8%;%C%H$rI=(B
96 by a single structure called the \emph{revlog}. 154 $B$9%(%s%H%j$OF1$8%P!<%8%g%s$N%^%K%U%'%9%H$r<($9!%(BMercurial$B$,DI@W$9$k%U%!%$(B
97 155 $B%k$,!$(B2$B$D$N%A%'%s%8%;%C%H4V$GJQ2=$7$J$+$C$?>l9g$O!$(B 2$B$D$N%^%K%U%'%9%H$N%j(B
98 \subsection{Efficient storage} 156 $B%S%8%g%s$G!$$=$N%U%!%$%k$r<($9%(%s%H%j$O%U%!%$%k%m%0$NF1$8%j%S%8%g%s$r<((B
99 157 $B$9!%(B
100 The revlog provides efficient storage of revisions using a 158
101 \emph{delta} mechanism. Instead of storing a complete copy of a file 159 %\section{Safe, efficient storage}
102 for each revision, it stores the changes needed to transform an older 160 \section{$B0BA4$+$D8zN(E*$J%9%H%l!<%8(B}
103 revision into the new revision. For many kinds of file data, these 161
104 deltas are typically a fraction of a percent of the size of a full 162 %The underpinnings of changelogs, manifests, and filelogs are provided
105 copy of a file. 163 %by a single structure called the \emph{revlog}.
106 164
107 Some obsolete revision control systems can only work with deltas of 165 $B%A%'%s%8%m%0!$%^%K%U%'%9%H!$%U%!%$%k%m%0$NEZBf$K;H$o$l$F$$$k$O6&DL$N(B
108 text files. They must either store binary files as complete snapshots 166 \emph{revlog}$B$H$$$&9=B$BN$G$"$k!%(B
109 or encoded into a text representation, both of which are wasteful 167
110 approaches. Mercurial can efficiently handle deltas of files with 168 %\subsection{Efficient storage}
111 arbitrary binary contents; it doesn't need to treat text as special. 169 \subsection{$B8zN(E*$J%9%H%l!<%8(B}
112 170
113 \subsection{Safe operation} 171 %The revlog provides efficient storage of revisions using a
172 %\emph{delta} mechanism. Instead of storing a complete copy of a file
173 %for each revision, it stores the changes needed to transform an older
174 %revision into the new revision. For many kinds of file data, these
175 %deltas are typically a fraction of a percent of the size of a full
176 %copy of a file.
177
178 revlog$B$O(B\emph{delta}$B5!9=$r;H$C$F%j%S%8%g%s$N8zN(E*$J5-21$rDs6!$9$k!%%U%!(B
179 $B%$%k$N3F!9$N%P!<%8%g%s$N40A4$J%3%T!<$rJ]B8$9$k$N$G$O$J$/!$8E$$%j%S%8%g%s(B
180 $B$r?7$7$$%P!<%8%g%s$XJQ49$9$k$N$KI,MW$JJQ99$rJ]B8$9$k!%B?$/$N%U%!%$%k%G!<(B
181 $B%?$KBP$7$F!$(B delta$B$OE57?E*$K$O%U%!%$%k$N%U%k%3%T!<$N(B1$B%Q!<%;%s%HL$K~$G$"$"(B
182 $B$k!%(B
183
184 %Some obsolete revision control systems can only work with deltas of
185 %text files. They must either store binary files as complete snapshots
186 %or encoded into a text representation, both of which are wasteful
187 %approaches. Mercurial can efficiently handle deltas of files with
188 %arbitrary binary contents; it doesn't need to treat text as special.
189
190 $B8E$$%j%S%8%g%s%3%s%H%m!<%k%7%9%F%`$N$$$/$D$+$O%F%-%9%H%U%!%$%k$N(Bdelta$B$KBP(B
191 $B$7$F$7$+5!G=$7$J$$!%$=$l$i$N%7%9%F%`$G$O%P%$%J%j%U%!%$%k$O40A4$J%9%J%C%W(B
192 $B%7%g%C%H$+!$%F%-%9%HI=8=$K%(%s%3!<%I$5$l$?7A<0$G$"$kI,MW$,$"$k!%$3$l$i$O(B
193 $B6&$KL5BL$NB?$$%"%W%m!<%A$G$"$k!%(B Mercurial$B$OG$0U$N%P%$%J%j%U%!%$%k$K$D$$(B
194 $B$F!$(Bdelta$B$r8zN(E*$K07$&$3$H$,$G$-!$%F%-%9%H$rFCJL07$$$9$kI,MW$,$J$$!%(B
195
196 %\subsection{Safe operation}
197 \subsection{$B0BA4$JF0:n(B}
114 \label{sec:concepts:txn} 198 \label{sec:concepts:txn}
115 199
116 Mercurial only ever \emph{appends} data to the end of a revlog file. 200 %Mercurial only ever \emph{appends} data to the end of a revlog file.
117 It never modifies a section of a file after it has written it. This 201 %It never modifies a section of a file after it has written it. This
118 is both more robust and efficient than schemes that need to modify or 202 %is both more robust and efficient than schemes that need to modify or
119 rewrite data. 203 %rewrite data.
120 204
121 In addition, Mercurial treats every write as part of a 205 Mercurial$B$O(Brevlog$B%U%!%$%k$NKvHx$K%G!<%?$N(B\emph{$BDI2C(B}$B$N$_$r9T$&!%0lEY=q$-(B
122 \emph{transaction} that can span a number of files. A transaction is 206 $B9~$^$l$?ItJ,$O8e$K$J$C$FJQ99$5$l$k$3$H$O$J$$!%$3$l$O%G!<%?$NJQ99$d:F=q$-(B
123 \emph{atomic}: either the entire transaction succeeds and its effects 207 $B9~$_$r9T$&J}K!$h$j$b4h6/$+$D8zN(E*$G$"$k!%(B
124 are all visible to readers in one go, or the whole thing is undone. 208
125 This guarantee of atomicity means that if you're running two copies of 209 %In addition, Mercurial treats every write as part of a
126 Mercurial, where one is reading data and one is writing it, the reader 210 %\emph{transaction} that can span a number of files. A transaction is
127 will never see a partially written result that might confuse it. 211 %\emph{atomic}: either the entire transaction succeeds and its effects
128 212 %are all visible to readers in one go, or the whole thing is undone.
129 The fact that Mercurial only appends to files makes it easier to 213 %This guarantee of atomicity means that if you're running two copies of
130 provide this transactional guarantee. The easier it is to do stuff 214 %Mercurial, where one is reading data and one is writing it, the reader
131 like this, the more confident you should be that it's done correctly. 215 %will never see a partially written result that might confuse it.
132 216
133 \subsection{Fast retrieval} 217 $B2C$($F!$(BMercurial$B$O$"$i$f$k=q$-9~$_$rJ#?t$N%U%!%$%k$X$N(B\emph{$B%H%i%s%6%/%7%g(B
134 218 $B%s(B}$B$N0lIt$H8+$J$9!%%H%i%s%6%/%7%g%sA4BN$,@.8y$7!$FI$_=P$7B&$K0lEY$K7k2L$,(B
135 Mercurial cleverly avoids a pitfall common to all earlier 219 $B8+$($k>l9g$b!$$=$&$G$J$$>l9g$b%H%i%s%6%/%7%g%s$O(B\emph{$B%"%H%_%C%/(B}$B$G$"$k!%(B
136 revision control systems: the problem of \emph{inefficient retrieval}. 220 $B$3$N%"%H%_%C%/@-$NJ]>Z$O!$(B 2$B$D$N(BMercurial$B$r!$JRJ}$O%G!<%?$NFI$_=P$7!$$b$&(B
137 Most revision control systems store the contents of a revision as an 221 $B0lJ}$O=q$-9~$_$G<B9T$7$F$$$k>l9g!$FI$_=P$7B&$K$O:.Mp$N860x$H$J$kItJ,E*$J(B
138 incremental series of modifications against a ``snapshot''. To 222 $B=q$-9~$_7k2L$O8+$($J$$$3$H$r0UL#$9$k!%(B
139 reconstruct a specific revision, you must first read the snapshot, and 223
140 then every one of the revisions between the snapshot and your target 224 %The fact that Mercurial only appends to files makes it easier to
141 revision. The more history that a file accumulates, the more 225 %provide this transactional guarantee. The easier it is to do stuff
142 revisions you must read, hence the longer it takes to reconstruct a 226 %like this, the more confident you should be that it's done correctly.
143 particular revision. 227
228 Mercurial$B$O%U%!%$%k$KDI5-$N$_$r$9$k$3$H$G!$%H%i%s%6%/%7%g%s$NJ]>Z$rMF0W$K(B
229 $B$7$F$$$k!%J*;v$rC1=c2=$9$k$3$H$K$h$C$F!$=hM}$N@5$7$5$r3N<B$K$9$k$h$&$K$7(B
230 $B$F$$$k!%(B
231
232 %\subsection{Fast retrieval}
233 \subsection{$B9bB.$J<hF@(B}
234
235 %Mercurial cleverly avoids a pitfall common to all earlier
236 %revision control systems: the problem of \emph{inefficient retrieval}.
237 %Most revision control systems store the contents of a revision as an
238 %incremental series of modifications against a ``snapshot''. To
239 %reconstruct a specific revision, you must first read the snapshot, and
240 %then every one of the revisions between the snapshot and your target
241 %revision. The more history that a file accumulates, the more
242 %revisions you must read, hence the longer it takes to reconstruct a
243 %particular revision.
244
245 Mercurial$B$O$3$l$^$G$N%j%S%8%g%s%3%s%H%m!<%k%7%9%F%`$K6&DL$7$?Mn$H$77j$r(B
246 $B8-L@$KHr$1$F$$$k!%LdBj$@$C$?$N$O(B\emph{$BHs8zN(E*$J<hF@(B}$B$G$"$C$?!%(B
247 $BBgDq$N%j%S%8%g%s%3%s%H%m!<%k%7%9%F%`$O%j%S%8%g%s$NFbMF$r%9%J%C%W%7%g%C%H(B
248 $B$X$N0lO"$NJQ99$NA}J,$H$7$FJ]B8$7$F$$$k!%FCDj$N%j%S%8%g%s$r:F8=$9$k$?$a$K(B
249 $B$O$^$:%9%J%C%W%7%g%C%H$rFI$_9~$_!$$7$+$k8e$KL\E*$N%j%S%8%g%s$rFI$`I,MW$,(B
250 $B$"$C$?!%%U%!%$%k$X$NMzNr$,A}$($l$PA}$($k$[$IFI$_9~$^$J$1$l$P$J$i$J$$%j%S(B
251 $B%8%g%s$,B?$/$J$j!$FCDj$N%j%S%8%g%s$r:F8=$9$k$N$K;~4V$,$+$+$k$h$&$K$J$k!%(B
144 252
145 \begin{figure}[ht] 253 \begin{figure}[ht]
146 \centering 254 \centering
147 \includegraphics{snapshot} 255 \includegraphics{snapshot}
148 \caption{Snapshot of a revlog, with incremental deltas} 256 % \caption{Snapshot of a revlog, with incremental deltas}
257 \caption{$B:9J,$rMQ$$$?(Brevlog$B$N%9%J%C%W%7%g%C%H(B}
149 \label{fig:concepts:snapshot} 258 \label{fig:concepts:snapshot}
150 \end{figure} 259 \end{figure}
151 260
152 The innovation that Mercurial applies to this problem is simple but 261 %The innovation that Mercurial applies to this problem is simple but
153 effective. Once the cumulative amount of delta information stored 262 %effective. Once the cumulative amount of delta information stored
154 since the last snapshot exceeds a fixed threshold, it stores a new 263 %since the last snapshot exceeds a fixed threshold, it stores a new
155 snapshot (compressed, of course), instead of another delta. This 264 %snapshot (compressed, of course), instead of another delta. This
156 makes it possible to reconstruct \emph{any} revision of a file 265 %makes it possible to reconstruct \emph{any} revision of a file
157 quickly. This approach works so well that it has since been copied by 266 %quickly. This approach works so well that it has since been copied by
158 several other revision control systems. 267 %several other revision control systems.
159 268
160 Figure~\ref{fig:concepts:snapshot} illustrates the idea. In an entry 269 Mercurial$B$G$O$3$NLdBj$rC1=c$@$,8z2LE*$JJ}K!$G2r7h$7$F$$$k!%A02s$K%9%J%C%W(B
161 in a revlog's index file, Mercurial stores the range of entries from 270 $B%7%g%C%H$r:n@.$7$?;~E@$+$i$NN_@QE*$JA}J,$,0lDj$NogCM$r1[$($k$H!$?7$?$JA}(B
162 the data file that it must read to reconstruct a particular revision. 271 $BJ,$G$O$J$/!$?7$?$J%9%J%C%W%7%g%C%H$,J]B8$5$l$k!J$3$N%9%J%C%W%7%g%C%H$OL^(B
163 272 $BO@05=L$5$l$F$$$k!K!%$3$l$K$h$C$F(B\emph{$B$$$+$J$k(B}$B%j%S%8%g%s$b?WB.$K:F8=$5$l(B
164 \subsubsection{Aside: the influence of video compression} 273 $B$k!%$3$N%"%W%m!<%A$O0J8eB>$N%j%S%8%g%s%3%s%H%m!<%k%7%9%F%`$K%3%T!<$5$l$k(B
165 274 $B$[$I$&$^$/5!G=$7$F$$$k!%(B
166 If you're familiar with video compression or have ever watched a TV 275
167 feed through a digital cable or satellite service, you may know that 276 %Figure~\ref{fig:concepts:snapshot} illustrates the idea. In an entry
168 most video compression schemes store each frame of video as a delta 277 %in a revlog's index file, Mercurial stores the range of entries from
169 against its predecessor frame. In addition, these schemes use 278 %the data file that it must read to reconstruct a particular revision.
170 ``lossy'' compression techniques to increase the compression ratio, so 279
171 visual errors accumulate over the course of a number of inter-frame 280 $B?^(B~\ref{fig:concepts:snapshot}$B$K35G0$r<($9!%(BMercurial$B$O(Brevlog$B$N%$%s%G%C%/(B
172 deltas. 281 $B%9%U%!%$%k$N%(%s%H%j$KFCDj$N%j%S%8%g%s$r:F8=$9$k$N$KI,MW$J%G!<%?%U%!%$%k(B
173 282 $BFb$N$"$kHO0O$N%(%s%H%j$rJ]B8$9$k!%(B
174 Because it's possible for a video stream to ``drop out'' occasionally 283
175 due to signal glitches, and to limit the accumulation of artefacts 284
176 introduced by the lossy compression process, video encoders 285
177 periodically insert a complete frame (called a ``key frame'') into the 286
178 video stream; the next delta is generated against that frame. This 287
179 means that if the video signal gets interrupted, it will resume once 288
180 the next key frame is received. Also, the accumulation of encoding 289
181 errors restarts anew with each key frame. 290
182 291
183 \subsection{Identification and strong integrity} 292
184 293
185 Along with delta or snapshot information, a revlog entry contains a 294 %\subsubsection{Aside: the influence of video compression}
186 cryptographic hash of the data that it represents. This makes it 295 \subsubsection{$B$=$NB>(B: $B%S%G%*05=L$N1F6A(B}
187 difficult to forge the contents of a revision, and easy to detect 296
188 accidental corruption. 297 %If you're familiar with video compression or have ever watched a TV
189 298 %feed through a digital cable or satellite service, you may know that
190 Hashes provide more than a mere check against corruption; they are 299 %most video compression schemes store each frame of video as a delta
191 used as the identifiers for revisions. The changeset identification 300 %against its predecessor frame. In addition, these schemes use
192 hashes that you see as an end user are from revisions of the 301 %``lossy'' compression techniques to increase the compression ratio, so
193 changelog. Although filelogs and the manifest also use hashes, 302 %visual errors accumulate over the course of a number of inter-frame
194 Mercurial only uses these behind the scenes. 303 %deltas.
195 304
196 Mercurial verifies that hashes are correct when it retrieves file 305 %Because it's possible for a video stream to ``drop out'' occasionally
197 revisions and when it pulls changes from another repository. If it 306 %due to signal glitches, and to limit the accumulation of artefacts
198 encounters an integrity problem, it will complain and stop whatever 307 %introduced by the lossy compression process, video encoders
199 it's doing. 308 %periodically insert a complete frame (called a ``key frame'') into the
200 309 %video stream; the next delta is generated against that frame. This
201 In addition to the effect it has on retrieval efficiency, Mercurial's 310 %means that if the video signal gets interrupted, it will resume once
202 use of periodic snapshots makes it more robust against partial data 311 %the next key frame is received. Also, the accumulation of encoding
203 corruption. If a revlog becomes partly corrupted due to a hardware 312 %errors restarts anew with each key frame.
204 error or system bug, it's often possible to reconstruct some or most 313
205 revisions from the uncorrupted sections of the revlog, both before and 314 %\subsection{Identification and strong integrity}
206 after the corrupted section. This would not be possible with a 315 \subsection{$B<1JL$H6/$$0l4S@-(B}
207 delta-only storage model. 316
208 317 %Along with delta or snapshot information, a revlog entry contains a
209 \section{Revision history, branching, 318 %cryptographic hash of the data that it represents. This makes it
210 and merging} 319 %difficult to forge the contents of a revision, and easy to detect
211 320 %accidental corruption.
212 Every entry in a Mercurial revlog knows the identity of its immediate 321
213 ancestor revision, usually referred to as its \emph{parent}. In fact, 322 %Hashes provide more than a mere check against corruption; they are
214 a revision contains room for not one parent, but two. Mercurial uses 323 %used as the identifiers for revisions. The changeset identification
215 a special hash, called the ``null ID'', to represent the idea ``there 324 %hashes that you see as an end user are from revisions of the
216 is no parent here''. This hash is simply a string of zeroes. 325 %changelog. Although filelogs and the manifest also use hashes,
217 326 %Mercurial only uses these behind the scenes.
218 In figure~\ref{fig:concepts:revlog}, you can see an example of the 327
219 conceptual structure of a revlog. Filelogs, manifests, and changelogs 328 %Mercurial verifies that hashes are correct when it retrieves file
220 all have this same structure; they differ only in the kind of data 329 %revisions and when it pulls changes from another repository. If it
221 stored in each delta or snapshot. 330 %encounters an integrity problem, it will complain and stop whatever
222 331 %it's doing.
223 The first revision in a revlog (at the bottom of the image) has the 332
224 null ID in both of its parent slots. For a ``normal'' revision, its 333 %In addition to the effect it has on retrieval efficiency, Mercurial's
225 first parent slot contains the ID of its parent revision, and its 334 %use of periodic snapshots makes it more robust against partial data
226 second contains the null ID, indicating that the revision has only one 335 %corruption. If a revlog becomes partly corrupted due to a hardware
227 real parent. Any two revisions that have the same parent ID are 336 %error or system bug, it's often possible to reconstruct some or most
228 branches. A revision that represents a merge between branches has two 337 %revisions from the uncorrupted sections of the revlog, both before and
229 normal revision IDs in its parent slots. 338 %after the corrupted section. This would not be possible with a
339 %delta-only storage model.
340
341 %\section{Revision history, branching, and merging}
342 \section{$B%j%S%8%g%sMzNr!$%V%i%s%A!$%^!<%8(B}
343
344 %Every entry in a Mercurial revlog knows the identity of its immediate
345 %ancestor revision, usually referred to as its \emph{parent}. In fact,
346 %a revision contains room for not one parent, but two. Mercurial uses
347 %a special hash, called the ``null ID'', to represent the idea ``there
348 %is no parent here''. This hash is simply a string of zeroes.
349
350 %In figure~\ref{fig:concepts:revlog}, you can see an example of the
351 %conceptual structure of a revlog. Filelogs, manifests, and changelogs
352 %all have this same structure; they differ only in the kind of data
353 %stored in each delta or snapshot.
354
355 %The first revision in a revlog (at the bottom of the image) has the
356 %null ID in both of its parent slots. For a ``normal'' revision, its
357 %first parent slot contains the ID of its parent revision, and its
358 %second contains the null ID, indicating that the revision has only one
359 %real parent. Any two revisions that have the same parent ID are
360 %branches. A revision that represents a merge between branches has two
361 %normal revision IDs in its parent slots.
230 362
231 \begin{figure}[ht] 363 \begin{figure}[ht]
232 \centering 364 \centering
233 \includegraphics{revlog} 365 \includegraphics{revlog}
234 \caption{} 366 \caption{}
235 \label{fig:concepts:revlog} 367 \label{fig:concepts:revlog}
236 \end{figure} 368 \end{figure}
237 369
238 \section{The working directory} 370 %\section{The working directory}
239 371 \section{$B%o!<%-%s%0%G%#%l%/%H%j(B}
240 In the working directory, Mercurial stores a snapshot of the files 372
241 from the repository as of a particular changeset. 373 %In the working directory, Mercurial stores a snapshot of the files
242 374 %from the repository as of a particular changeset.
243 The working directory ``knows'' which changeset it contains. When you 375
244 update the working directory to contain a particular changeset, 376 %The working directory ``knows'' which changeset it contains. When you
245 Mercurial looks up the appropriate revision of the manifest to find 377 %update the working directory to contain a particular changeset,
246 out which files it was tracking at the time that changeset was 378 %Mercurial looks up the appropriate revision of the manifest to find
247 committed, and which revision of each file was then current. It then 379 %out which files it was tracking at the time that changeset was
248 recreates a copy of each of those files, with the same contents it had 380 %committed, and which revision of each file was then current. It then
249 when the changeset was committed. 381 %recreates a copy of each of those files, with the same contents it had
250 382 %when the changeset was committed.
251 The \emph{dirstate} contains Mercurial's knowledge of the working 383
252 directory. This details which changeset the working directory is 384 %The \emph{dirstate} contains Mercurial's knowledge of the working
253 updated to, and all of the files that Mercurial is tracking in the 385 %directory. This details which changeset the working directory is
254 working directory. 386 %updated to, and all of the files that Mercurial is tracking in the
255 387 %working directory.
256 Just as a revision of a revlog has room for two parents, so that it 388
257 can represent either a normal revision (with one parent) or a merge of 389 %Just as a revision of a revlog has room for two parents, so that it
258 two earlier revisions, the dirstate has slots for two parents. When 390 %can represent either a normal revision (with one parent) or a merge of
259 you use the \hgcmd{update} command, the changeset that you update to 391 %two earlier revisions, the dirstate has slots for two parents. When
260 is stored in the ``first parent'' slot, and the null ID in the second. 392 %you use the \hgcmd{update} command, the changeset that you update to
261 When you \hgcmd{merge} with another changeset, the first parent 393 %is stored in the ``first parent'' slot, and the null ID in the second.
262 remains unchanged, and the second parent is filled in with the 394 %When you \hgcmd{merge} with another changeset, the first parent
263 changeset you're merging with. The \hgcmd{parents} command tells you 395 %remains unchanged, and the second parent is filled in with the
264 what the parents of the dirstate are. 396 %changeset you're merging with. The \hgcmd{parents} command tells you
265 397 %what the parents of the dirstate are.
266 \subsection{What happens when you commit} 398
267 399 %\subsection{What happens when you commit}
268 The dirstate stores parent information for more than just book-keeping 400 \subsection{$B%3%_%C%H;~$K2?$,5/$-$k$N$+(B}
269 purposes. Mercurial uses the parents of the dirstate as \emph{the 401
270 parents of a new changeset} when you perform a commit. 402 %The dirstate stores parent information for more than just book-keeping
403 %purposes. Mercurial uses the parents of the dirstate as \emph{the
404 % parents of a new changeset} when you perform a commit.
271 405
272 \begin{figure}[ht] 406 \begin{figure}[ht]
273 \centering 407 \centering
274 \includegraphics{wdir} 408 \includegraphics{wdir}
275 \caption{The working directory can have two parents} 409 % \caption{The working directory can have two parents}
410 \caption{$B%o!<%-%s%0%G%#%l%/%H%j$O(B2$B$D$N?F$r;}$AF@$k(B}
276 \label{fig:concepts:wdir} 411 \label{fig:concepts:wdir}
277 \end{figure} 412 \end{figure}
278 413
279 Figure~\ref{fig:concepts:wdir} shows the normal state of the working 414 %Figure~\ref{fig:concepts:wdir} shows the normal state of the working
280 directory, where it has a single changeset as parent. That changeset 415 %directory, where it has a single changeset as parent. That changeset
281 is the \emph{tip}, the newest changeset in the repository that has no 416 %is the \emph{tip}, the newest changeset in the repository that has no
282 children. 417 %children.
283 418
284 \begin{figure}[ht] 419 \begin{figure}[ht]
285 \centering 420 \centering
286 \includegraphics{wdir-after-commit} 421 \includegraphics{wdir-after-commit}
287 \caption{The working directory gains new parents after a commit} 422 % \caption{The working directory gains new parents after a commit}
423 \caption{$B%3%_%C%H8e!$%o!<%-%s%0%G%#%l%/%H%j$O?7$?$JN>?F$r;}$D(B}
288 \label{fig:concepts:wdir-after-commit} 424 \label{fig:concepts:wdir-after-commit}
289 \end{figure} 425 \end{figure}
290 426
291 It's useful to think of the working directory as ``the changeset I'm 427 %It's useful to think of the working directory as ``the changeset I'm
292 about to commit''. Any files that you tell Mercurial that you've 428 %about to commit''. Any files that you tell Mercurial that you've
293 added, removed, renamed, or copied will be reflected in that 429 %added, removed, renamed, or copied will be reflected in that
294 changeset, as will modifications to any files that Mercurial is 430 %changeset, as will modifications to any files that Mercurial is
295 already tracking; the new changeset will have the parents of the 431 %already tracking; the new changeset will have the parents of the
296 working directory as its parents. 432 %working directory as its parents.
297 433
298 After a commit, Mercurial will update the parents of the working 434 %After a commit, Mercurial will update the parents of the working
299 directory, so that the first parent is the ID of the new changeset, 435 %directory, so that the first parent is the ID of the new changeset,
300 and the second is the null ID. This is shown in 436 %and the second is the null ID. This is shown in
301 figure~\ref{fig:concepts:wdir-after-commit}. Mercurial doesn't touch 437 %figure~\ref{fig:concepts:wdir-after-commit}. Mercurial doesn't touch
302 any of the files in the working directory when you commit; it just 438 %any of the files in the working directory when you commit; it just
303 modifies the dirstate to note its new parents. 439 %modifies the dirstate to note its new parents.
304 440
305 \subsection{Creating a new head} 441 %\subsection{Creating a new head}
306 442 \subsection{$B?7$?$J%X%C%I$r:n$k(B}
307 It's perfectly normal to update the working directory to a changeset 443
308 other than the current tip. For example, you might want to know what 444 %It's perfectly normal to update the working directory to a changeset
309 your project looked like last Tuesday, or you could be looking through 445 %other than the current tip. For example, you might want to know what
310 changesets to see which one introduced a bug. In cases like this, the 446 %your project looked like last Tuesday, or you could be looking through
311 natural thing to do is update the working directory to the changeset 447 %changesets to see which one introduced a bug. In cases like this, the
312 you're interested in, and then examine the files in the working 448 %natural thing to do is update the working directory to the changeset
313 directory directly to see their contents as they werea when you 449 %you're interested in, and then examine the files in the working
314 committed that changeset. The effect of this is shown in 450 %directory directly to see their contents as they werea when you
315 figure~\ref{fig:concepts:wdir-pre-branch}. 451 %committed that changeset. The effect of this is shown in
452 %figure~\ref{fig:concepts:wdir-pre-branch}.
316 453
317 \begin{figure}[ht] 454 \begin{figure}[ht]
318 \centering 455 \centering
319 \includegraphics{wdir-pre-branch} 456 \includegraphics{wdir-pre-branch}
320 \caption{The working directory, updated to an older changeset} 457 % \caption{The working directory, updated to an older changeset}
458 \caption{$B8E$$%A%'%s%8%;%C%H$X$H99?7$5$l$?%o!<%-%s%0%G%#%l%/%H%j(B}
321 \label{fig:concepts:wdir-pre-branch} 459 \label{fig:concepts:wdir-pre-branch}
322 \end{figure} 460 \end{figure}
323 461
324 Having updated the working directory to an older changeset, what 462 %Having updated the working directory to an older changeset, what
325 happens if you make some changes, and then commit? Mercurial behaves 463 %happens if you make some changes, and then commit? Mercurial behaves
326 in the same way as I outlined above. The parents of the working 464 %in the same way as I outlined above. The parents of the working
327 directory become the parents of the new changeset. This new changeset 465 %directory become the parents of the new changeset. This new changeset
328 has no children, so it becomes the new tip. And the repository now 466 %has no children, so it becomes the new tip. And the repository now
329 contains two changesets that have no children; we call these 467 %contains two changesets that have no children; we call these
330 \emph{heads}. You can see the structure that this creates in 468 %\emph{heads}. You can see the structure that this creates in
331 figure~\ref{fig:concepts:wdir-branch}. 469 %figure~\ref{fig:concepts:wdir-branch}.
332 470
333 \begin{figure}[ht] 471 \begin{figure}[ht]
334 \centering 472 \centering
335 \includegraphics{wdir-branch} 473 \includegraphics{wdir-branch}
336 \caption{After a commit made while synced to an older changeset} 474 % \caption{After a commit made while synced to an older changeset}
475 \caption{$B8E$$%A%'%s%8%;%C%H$KF14|Cf$K%3%_%C%H$,9T$o$l$?>l9g(B}
337 \label{fig:concepts:wdir-branch} 476 \label{fig:concepts:wdir-branch}
338 \end{figure} 477 \end{figure}
339 478
340 \begin{note} 479 %\begin{note}
341 If you're new to Mercurial, you should keep in mind a common 480 % If you're new to Mercurial, you should keep in mind a common
342 ``error'', which is to use the \hgcmd{pull} command without any 481 % ``error'', which is to use the \hgcmd{pull} command without any
343 options. By default, the \hgcmd{pull} command \emph{does not} 482 % options. By default, the \hgcmd{pull} command \emph{does not}
344 update the working directory, so you'll bring new changesets into 483 % update the working directory, so you'll bring new changesets into
345 your repository, but the working directory will stay synced at the 484 % your repository, but the working directory will stay synced at the
346 same changeset as before the pull. If you make some changes and 485 % same changeset as before the pull. If you make some changes and
347 commit afterwards, you'll thus create a new head, because your 486 % commit afterwards, you'll thus create a new head, because your
348 working directory isn't synced to whatever the current tip is. 487 % working directory isn't synced to whatever the current tip is.
349 488 %
350 I put the word ``error'' in quotes because all that you need to do 489 % I put the word ``error'' in quotes because all that you need to do
351 to rectify this situation is \hgcmd{merge}, then \hgcmd{commit}. In 490 % to rectify this situation is \hgcmd{merge}, then \hgcmd{commit}. In
352 other words, this almost never has negative consequences; it just 491 % other words, this almost never has negative consequences; it just
353 surprises people. I'll discuss other ways to avoid this behaviour, 492 % surprises people. I'll discuss other ways to avoid this behaviour,
354 and why Mercurial behaves in this initially surprising way, later 493 % and why Mercurial behaves in this initially surprising way, later
355 on. 494 % on.
356 \end{note} 495 %\end{note}
357 496
358 \subsection{Merging heads} 497 %\subsection{Merging heads}
359 498 \subsection{$B%X%C%I4V$N%^!<%8(B}
360 When you run the \hgcmd{merge} command, Mercurial leaves the first 499
361 parent of the working directory unchanged, and sets the second parent 500 %When you run the \hgcmd{merge} command, Mercurial leaves the first
362 to the changeset you're merging with, as shown in 501 %parent of the working directory unchanged, and sets the second parent
363 figure~\ref{fig:concepts:wdir-merge}. 502 %to the changeset you're merging with, as shown in
503 %figure~\ref{fig:concepts:wdir-merge}.
364 504
365 \begin{figure}[ht] 505 \begin{figure}[ht]
366 \centering 506 \centering
367 \includegraphics{wdir-merge} 507 \includegraphics{wdir-merge}
368 \caption{Merging two heads} 508 % \caption{Merging two heads}
509 \caption{2$B$D$N%X%C%I$N%^!<%8(B}
369 \label{fig:concepts:wdir-merge} 510 \label{fig:concepts:wdir-merge}
370 \end{figure} 511 \end{figure}
371 512
372 Mercurial also has to modify the working directory, to merge the files 513 Mercurial also has to modify the working directory, to merge the files
373 managed in the two changesets. Simplified a little, the merging 514 managed in the two changesets. Simplified a little, the merging
407 Mercurial only tracks two parents for both revisions and the working 548 Mercurial only tracks two parents for both revisions and the working
408 directory. While it would be technically possible to merge multiple 549 directory. While it would be technically possible to merge multiple
409 changesets at once, the prospect of user confusion and making a 550 changesets at once, the prospect of user confusion and making a
410 terrible mess of a merge immediately becomes overwhelming. 551 terrible mess of a merge immediately becomes overwhelming.
411 552
412 \section{Other interesting design features} 553 %\section{Other interesting design features}
554 \section{$B@_7W$NB>$N6=L#?<$$E@(B}
413 555
414 In the sections above, I've tried to highlight some of the most 556 In the sections above, I've tried to highlight some of the most
415 important aspects of Mercurial's design, to illustrate that it pays 557 important aspects of Mercurial's design, to illustrate that it pays
416 careful attention to reliability and performance. However, the 558 careful attention to reliability and performance. However, the
417 attention to detail doesn't stop there. There are a number of other 559 attention to detail doesn't stop there. There are a number of other
419 interesting. I'll detail a few of them here, separate from the ``big 561 interesting. I'll detail a few of them here, separate from the ``big
420 ticket'' items above, so that if you're interested, you can gain a 562 ticket'' items above, so that if you're interested, you can gain a
421 better idea of the amount of thinking that goes into a well-designed 563 better idea of the amount of thinking that goes into a well-designed
422 system. 564 system.
423 565
424 \subsection{Clever compression} 566 %\subsection{Clever compression}
567 \subsection{$B8-$$05=L(B}
425 568
426 When appropriate, Mercurial will store both snapshots and deltas in 569 When appropriate, Mercurial will store both snapshots and deltas in
427 compressed form. It does this by always \emph{trying to} compress a 570 compressed form. It does this by always \emph{trying to} compress a
428 snapshot or delta, but only storing the compressed version if it's 571 snapshot or delta, but only storing the compressed version if it's
429 smaller than the uncompressed version. 572 smaller than the uncompressed version.
439 these cases. It finds that such a delta exceeds the threshold at 582 these cases. It finds that such a delta exceeds the threshold at
440 which it should store a complete snapshot of the file, so it stores 583 which it should store a complete snapshot of the file, so it stores
441 the snapshot, again saving space compared to a naive delta-only 584 the snapshot, again saving space compared to a naive delta-only
442 approach. 585 approach.
443 586
444 \subsubsection{Network recompression} 587 %\subsubsection{Network recompression}
445 588 \subsubsection{$B%M%C%H%o!<%/:F05=L(B}
446 When storing revisions on disk, Mercurial uses the ``deflate'' 589
447 compression algorithm (the same one used by the popular \texttt{zip} 590 %When storing revisions on disk, Mercurial uses the ``deflate''
448 archive format), which balances good speed with a respectable 591 %compression algorithm (the same one used by the popular \texttt{zip}
449 compression ratio. However, when transmitting revision data over a 592 %archive format), which balances good speed with a respectable
450 network connection, Mercurial uncompresses the compressed revision 593 %compression ratio. However, when transmitting revision data over a
451 data. 594 %network connection, Mercurial uncompresses the compressed revision
452 595 %data.
453 If the connection is over HTTP, Mercurial recompresses the entire 596
454 stream of data using a compression algorithm that gives a better 597 %If the connection is over HTTP, Mercurial recompresses the entire
455 compression ratio (the Burrows-Wheeler algorithm from the widely used 598 %stream of data using a compression algorithm that gives a better
456 \texttt{bzip2} compression package). This combination of algorithm 599 %compression ratio (the Burrows-Wheeler algorithm from the widely used
457 and compression of the entire stream (instead of a revision at a time) 600 %\texttt{bzip2} compression package). This combination of algorithm
458 substantially reduces the number of bytes to be transferred, yielding 601 %and compression of the entire stream (instead of a revision at a time)
459 better network performance over almost all kinds of network. 602 %substantially reduces the number of bytes to be transferred, yielding
460 603 %better network performance over almost all kinds of network.
461 (If the connection is over \command{ssh}, Mercurial \emph{doesn't} 604
462 recompress the stream, because \command{ssh} can already do this 605 %(If the connection is over \command{ssh}, Mercurial \emph{doesn't}
463 itself.) 606 %recompress the stream, because \command{ssh} can already do this
464 607 %itself.)
465 \subsection{Read/write ordering and atomicity} 608
466 609 %\subsection{Read/write ordering and atomicity}
467 Appending to files isn't the whole story when it comes to guaranteeing 610 \subsection{$BFI$_=q$-$N=g=x$H%"%H%_%C%/@-(B}
468 that a reader won't see a partial write. If you recall 611
469 figure~\ref{fig:concepts:metadata}, revisions in the changelog point to 612 %Appending to files isn't the whole story when it comes to guaranteeing
470 revisions in the manifest, and revisions in the manifest point to 613 %that a reader won't see a partial write. If you recall
471 revisions in filelogs. This hierarchy is deliberate. 614 %figure~\ref{fig:concepts:metadata}, revisions in the changelog point to
472 615 %revisions in the manifest, and revisions in the manifest point to
473 A writer starts a transaction by writing filelog and manifest data, 616 %revisions in filelogs. This hierarchy is deliberate.
474 and doesn't write any changelog data until those are finished. A 617
475 reader starts by reading changelog data, then manifest data, followed 618 %A writer starts a transaction by writing filelog and manifest data,
476 by filelog data. 619 %and doesn't write any changelog data until those are finished. A
477 620 %reader starts by reading changelog data, then manifest data, followed
478 Since the writer has always finished writing filelog and manifest data 621 %by filelog data.
479 before it writes to the changelog, a reader will never read a pointer 622
480 to a partially written manifest revision from the changelog, and it will 623 %Since the writer has always finished writing filelog and manifest data
481 never read a pointer to a partially written filelog revision from the 624 %before it writes to the changelog, a reader will never read a pointer
482 manifest. 625 %to a partially written manifest revision from the changelog, and it will
483 626 %never read a pointer to a partially written filelog revision from the
484 \subsection{Concurrent access} 627 %manifest.
485 628
486 The read/write ordering and atomicity guarantees mean that Mercurial 629 %\subsection{Concurrent access}
487 never needs to \emph{lock} a repository when it's reading data, even 630 \subsection{$BF1;~%"%/%;%9(B}
488 if the repository is being written to while the read is occurring. 631
489 This has a big effect on scalability; you can have an arbitrary number 632 %The read/write ordering and atomicity guarantees mean that Mercurial
490 of Mercurial processes safely reading data from a repository safely 633 %never needs to \emph{lock} a repository when it's reading data, even
491 all at once, no matter whether it's being written to or not. 634 %if the repository is being written to while the read is occurring.
492 635 %This has a big effect on scalability; you can have an arbitrary number
493 The lockless nature of reading means that if you're sharing a 636 %of Mercurial processes safely reading data from a repository safely
494 repository on a multi-user system, you don't need to grant other local 637 %all at once, no matter whether it's being written to or not.
495 users permission to \emph{write} to your repository in order for them 638
496 to be able to clone it or pull changes from it; they only need 639 %The lockless nature of reading means that if you're sharing a
497 \emph{read} permission. (This is \emph{not} a common feature among 640 %repository on a multi-user system, you don't need to grant other local
498 revision control systems, so don't take it for granted! Most require 641 %users permission to \emph{write} to your repository in order for them
499 readers to be able to lock a repository to access it safely, and this 642 %to be able to clone it or pull changes from it; they only need
500 requires write permission on at least one directory, which of course 643 %\emph{read} permission. (This is \emph{not} a common feature among
501 makes for all kinds of nasty and annoying security and administrative 644 %revision control systems, so don't take it for granted! Most require
502 problems.) 645 %readers to be able to lock a repository to access it safely, and this
503 646 %requires write permission on at least one directory, which of course
504 Mercurial uses locks to ensure that only one process can write to a 647 %makes for all kinds of nasty and annoying security and administrative
505 repository at a time (the locking mechanism is safe even over 648 %problems.)
506 filesystems that are notoriously hostile to locking, such as NFS). If 649
507 a repository is locked, a writer will wait for a while to retry if the 650 %Mercurial uses locks to ensure that only one process can write to a
508 repository becomes unlocked, but if the repository remains locked for 651 %repository at a time (the locking mechanism is safe even over
509 too long, the process attempting to write will time out after a while. 652 %filesystems that are notoriously hostile to locking, such as NFS). If
510 This means that your daily automated scripts won't get stuck forever 653 %a repository is locked, a writer will wait for a while to retry if the
511 and pile up if a system crashes unnoticed, for example. (Yes, the 654 %repository becomes unlocked, but if the repository remains locked for
512 timeout is configurable, from zero to infinity.) 655 %too long, the process attempting to write will time out after a while.
513 656 %This means that your daily automated scripts won't get stuck forever
514 \subsubsection{Safe dirstate access} 657 %and pile up if a system crashes unnoticed, for example. (Yes, the
515 658 %timeout is configurable, from zero to infinity.)
516 As with revision data, Mercurial doesn't take a lock to read the 659
517 dirstate file; it does acquire a lock to write it. To avoid the 660 %\subsubsection{Safe dirstate access}
518 possibility of reading a partially written copy of the dirstate file, 661 \subsubsection{$B0BA4$J(Bdirstate$B%"%/%;%9(B}
519 Mercurial writes to a file with a unique name in the same directory as 662
520 the dirstate file, then renames the temporary file atomically to 663 %As with revision data, Mercurial doesn't take a lock to read the
521 \filename{dirstate}. The file named \filename{dirstate} is thus 664 %dirstate file; it does acquire a lock to write it. To avoid the
522 guaranteed to be complete, not partially written. 665 %possibility of reading a partially written copy of the dirstate file,
523 666 %Mercurial writes to a file with a unique name in the same directory as
524 \subsection{Avoiding seeks} 667 %the dirstate file, then renames the temporary file atomically to
525 668 %\filename{dirstate}. The file named \filename{dirstate} is thus
526 Critical to Mercurial's performance is the avoidance of seeks of the 669 %guaranteed to be complete, not partially written.
527 disk head, since any seek is far more expensive than even a 670
528 comparatively large read operation. 671 %\subsection{Avoiding seeks}
529 672 \subsection{$B%7!<%/$N2sHr(B}
530 This is why, for example, the dirstate is stored in a single file. If 673
531 there were a dirstate file per directory that Mercurial tracked, the 674 %Critical to Mercurial's performance is the avoidance of seeks of the
532 disk would seek once per directory. Instead, Mercurial reads the 675 %disk head, since any seek is far more expensive than even a
533 entire single dirstate file in one step. 676 %comparatively large read operation.
534 677
535 Mercurial also uses a ``copy on write'' scheme when cloning a 678 Mercurial$B$N@-G=$K$O!$%G%#%9%/%X%C%I$N%7!<%/$rHr$1$k$3$H$,IT2D7g$G$"$k!%(B
536 repository on local storage. Instead of copying every revlog file 679 $B%7!<%/$OBg5,LO$JFI$_=P$7A`:n$HHf3S$7$F$bHs>o$K9b$/$D$/!%(B
537 from the old repository into the new repository, it makes a ``hard 680
538 link'', which is a shorthand way to say ``these two names point to the 681 %This is why, for example, the dirstate is stored in a single file. If
539 same file''. When Mercurial is about to write to one of a revlog's 682 %there were a dirstate file per directory that Mercurial tracked, the
540 files, it checks to see if the number of names pointing at the file is 683 %disk would seek once per directory. Instead, Mercurial reads the
541 greater than one. If it is, more than one repository is using the 684 %entire single dirstate file in one step.
542 file, so Mercurial makes a new copy of the file that is private to 685
543 this repository. 686 $B$=$NM}M3$O!$Nc$($P!$(Bdirstate$B$O(B1$B$D$N%U%!%$%k$KJ]B8$5$l$F$$$k$?$a$G!$(B
544 687 Mercurial$B$,DI@W$7$F$$$k(Bdirstate$B%U%!%$%k$,%G%#%l%/%H%jKh$K$"$k(B
545 A few revision control developers have pointed out that this idea of 688 $B$H!$(BMercurial$B$O%G%#%l%/%H%jKh$K%7!<%/$r9T$&$3$H$K$J$k!%<B:]$K$O(BMercurial
546 making a complete private copy of a file is not very efficient in its 689 $B$OA4BN$G0l$D$N(Bdirstate$B%U%!%$%k$r%o%s%9%F%C%W$GFI$`!%(B
547 use of storage. While this is true, storage is cheap, and this method 690
548 gives the highest performance while deferring most book-keeping to the 691 %Mercurial also uses a ``copy on write'' scheme when cloning a
549 operating system. An alternative scheme would most likely reduce 692 %repository on local storage. Instead of copying every revlog file
550 performance and increase the complexity of the software, each of which 693
551 is much more important to the ``feel'' of day-to-day use. 694 %from the old repository into the new repository, it makes a ``hard
552 695 %link'', which is a shorthand way to say ``these two names point to the
553 \subsection{Other contents of the dirstate} 696 %same file''. When Mercurial is about to write to one of a revlog's
554 697 %files, it checks to see if the number of names pointing at the file is
555 Because Mercurial doesn't force you to tell it when you're modifying a 698 %greater than one. If it is, more than one repository is using the
556 file, it uses the dirstate to store some extra information so it can 699 %file, so Mercurial makes a new copy of the file that is private to
557 determine efficiently whether you have modified a file. For each file 700 %this repository.
558 in the working directory, it stores the time that it last modified the 701
559 file itself, and the size of the file at that time. 702 Mercurial$B$O%j%]%8%H%j$r%m!<%+%k%9%H%l!<%8$K%/%m!<%s$9$k:]$K(B``$B%3%T!<%*%s%i(B
560 703 $B%$%H(B''$B<jK!$r;H$C$F$$$k!%(B
561 When you explicitly \hgcmd{add}, \hgcmd{remove}, \hgcmd{rename} or 704
562 \hgcmd{copy} files, Mercurial updates the dirstate so that it knows 705
563 what to do with those files when you commit. 706 %A few revision control developers have pointed out that this idea of
564 707 %making a complete private copy of a file is not very efficient in its
565 When Mercurial is checking the states of files in the working 708 %use of storage. While this is true, storage is cheap, and this method
566 directory, it first checks a file's modification time. If that has 709 %gives the highest performance while deferring most book-keeping to the
567 not changed, the file must not have been modified. If the file's size 710 %operating system. An alternative scheme would most likely reduce
568 has changed, the file must have been modified. If the modification 711 %performance and increase the complexity of the software, each of which
569 time has changed, but the size has not, only then does Mercurial need 712 %is much more important to the ``feel'' of day-to-day use.
570 to read the actual contents of the file to see if they've changed. 713
571 Storing these few extra pieces of information dramatically reduces the 714 %\subsection{Other contents of the dirstate}
572 amount of data that Mercurial needs to read, which yields large 715 \subsection{dirstate$B$NB>$NFbMF(B}
573 performance improvements compared to other revision control systems. 716
574 717 %Because Mercurial doesn't force you to tell it when you're modifying a
575 %%% Local Variables: 718 %file, it uses the dirstate to store some extra information so it can
719 %determine efficiently whether you have modified a file. For each file
720 %in the working directory, it stores the time that it last modified the
721 %file itself, and the size of the file at that time.
722
723 %When you explicitly \hgcmd{add}, \hgcmd{remove}, \hgcmd{rename} or
724 %\hgcmd{copy} files, Mercurial updates the dirstate so that it knows
725 %what to do with those files when you commit.
726
727 %When Mercurial is checking the states of files in the working
728 %directory, it first checks a file's modification time. If that has
729 %not changed, the file must not have been modified. If the file's size
730 %has changed, the file must have been modified. If the modification
731 %time has changed, but the size has not, only then does Mercurial need
732 %to read the actual contents of the file to see if they've changed.
733 %Storing these few extra pieces of information dramatically reduces the
734 %amount of data that Mercurial needs to read, which yields large
735 %performance improvements compared to other revision control systems.
736
737 %%% Local Variables:
576 %%% mode: yatex 738 %%% mode: yatex
577 %%% TeX-master: "00book" 739 %%% TeX-master: "00book"
578 %%% End: 740 %%% End: