Mercurial > hgbook
comparison ja/concepts.tex @ 781:93d19b27859c
started concepts.tex
| author | Yoshiki Yazawa <yaz@honeyplanet.jp> |
|---|---|
| date | Tue, 12 May 2009 17:49:11 +0900 |
| parents | b2d447356c42 |
| children | 43556decb81f |
comparison
equal
deleted
inserted
replaced
| 780:2a93b1d35990 | 781:93d19b27859c |
|---|---|
| 1 \chapter{Behind the scenes} | 1 %\chapter{Behind the scenes} |
| 2 \chapter{$BIqBfN"(B} | |
| 2 \label{chap:concepts} | 3 \label{chap:concepts} |
| 3 | 4 |
| 4 Unlike many revision control systems, the concepts upon which | 5 %Unlike many revision control systems, the concepts upon which |
| 5 Mercurial is built are simple enough that it's easy to understand how | 6 %Mercurial is built are simple enough that it's easy to understand how |
| 6 the software really works. Knowing this certainly isn't necessary, | 7 %the software really works. Knowing this certainly isn't necessary, |
| 7 but I find it useful to have a ``mental model'' of what's going on. | 8 %but I find it useful to have a ``mental model'' of what's going on. |
| 8 | 9 |
| 9 This understanding gives me confidence that Mercurial has been | 10 $BB?$/$N%j%S%8%g%s%3%s%H%m!<%k%7%9%F%`$H0c$C$F!$(B Mercurial$B$NF0:n$N4pK\$H$J$C(B |
| 10 carefully designed to be both \emph{safe} and \emph{efficient}. And | 11 $B$F$$$k%3%s%;%W%H$rM}2r$9$k$3$H$OMF0W$$!%(B |
| 11 just as importantly, if it's easy for me to retain a good idea of what | 12 $B$3$l$rM}2r$9$k$3$H$OI,$:$7$bI,MW$G$O$J$$$,!$I.<T$O!$2?$,5/$-$F$$$k$N$+$K(B |
| 12 the software is doing when I perform a revision control task, I'm less | 13 $B$D$$$F%b%G%k$r0U<1$7$F$$$k$3$H$OM-MQ$G$"$k$H9M$($F$$$k!%(B |
| 13 likely to be surprised by its behaviour. | 14 |
| 14 | 15 %This understanding gives me confidence that Mercurial has been |
| 15 In this chapter, we'll initially cover the core concepts behind | 16 %carefully designed to be both \emph{safe} and \emph{efficient}. And |
| 16 Mercurial's design, then continue to discuss some of the interesting | 17 %just as importantly, if it's easy for me to retain a good idea of what |
| 17 details of its implementation. | 18 %the software is doing when I perform a revision control task, I'm less |
| 18 | 19 %likely to be surprised by its behaviour. |
| 19 \section{Mercurial's historical record} | 20 |
| 20 | 21 $B$3$N$3$H$rM}2r$7$?8e$G!$I.<T$O(BMercurial$B$,(B\emph{$B0BA4(B}$B$H(B\emph{$B8zN((B}$B$r<B8=$9(B |
| 21 \subsection{Tracking the history of a single file} | 22 $B$k$h$&$KCm0U?<$/@_7W$5$l$F$$$k$H3N?.$9$k$K;j$C$?!%(B |
| 22 | 23 $B$^$?!$=EMW$JE@$H$7$F!$%j%S%8%g%s%3%s%H%m!<%k$NA`:n$r9T$&:]$K%=%U%H%&%'%"(B |
| 23 When Mercurial tracks modifications to a file, it stores the history | 24 $B$,2?$r$9$k$N$+$r5-21$KN1$a$F$*$/$3$H$K$h$C$F!$IT0U$N5sF0$G6C$/$3$H$,>/$J(B |
| 24 of that file in a metadata object called a \emph{filelog}. Each entry | 25 $B$/$J$C$?!%(B |
| 25 in the filelog contains enough information to reconstruct one revision | 26 |
| 26 of the file that is being tracked. Filelogs are stored as files in | 27 %In this chapter, we'll initially cover the core concepts behind |
| 27 the \sdirname{.hg/store/data} directory. A filelog contains two kinds | 28 %Mercurial's design, then continue to discuss some of the interesting |
| 28 of information: revision data, and an index to help Mercurial to find | 29 %details of its implementation. |
| 29 a revision efficiently. | 30 |
| 30 | 31 $B$3$N>O$G$O$^$:(BMercurial$B$N@_7W$N%3%"%3%s%;%W%H$r%+%P!<$9$k!%$=$7$F<BAu>e$N(B |
| 31 A file that is large, or has a lot of history, has its filelog stored | 32 $B$$$/$D$+$N6=L#?<$$E@$N>\:Y$K$D$$$F5DO@$9$k!%(B |
| 32 in separate data (``\texttt{.d}'' suffix) and index (``\texttt{.i}'' | 33 |
| 33 suffix) files. For small files without much history, the revision | 34 %\section{Mercurial's historical record} |
| 34 data and index are combined in a single ``\texttt{.i}'' file. The | 35 \section{Mercurial$B$NMzNr5-O?(B} |
| 35 correspondence between a file in the working directory and the filelog | 36 |
| 36 that tracks its history in the repository is illustrated in | 37 %\subsection{Tracking the history of a single file} |
| 37 figure~\ref{fig:concepts:filelog}. | 38 \subsection{$B%U%!%$%kMzNr$NDI@W(B} |
| 39 | |
| 40 %When Mercurial tracks modifications to a file, it stores the history | |
| 41 %of that file in a metadata object called a \emph{filelog}. Each entry | |
| 42 %in the filelog contains enough information to reconstruct one revision | |
| 43 %of the file that is being tracked. Filelogs are stored as files in | |
| 44 %the \sdirname{.hg/store/data} directory. A filelog contains two kinds | |
| 45 %of information: revision data, and an index to help Mercurial to find | |
| 46 %a revision efficiently. | |
| 47 | |
| 48 Mercurial$B$O%U%!%$%k$X$NJQ99$rDI@W$9$k;~!$%U%!%$%k$NMzNr$r(B\emph{filelog}$B$H(B | |
| 49 $B8F$P$l$k%a%?%G!<%?%*%V%8%'%/%H$K3JG<$9$k!%%U%!%$%k%m%0Fb$N3F!9$N%(%s%H%j(B | |
| 50 $B$O!$DI@WBP>]$N%U%!%$%k$N%j%S%8%g%s$r:F7z$9$k$N$K==J,$J>pJs$r;}$D!%%U%!%$(B | |
| 51 $B%k%m%0$O(B\sdirname{.hg/store/data}$B%G%#%l%/%H%j$K%U%!%$%k$H$7$FJ]B8$5$l$k!%(B | |
| 52 $B%U%!%$%k%m%0$O%j%S%8%g%s%G!<%?$H(BMercurial$B$,%j%S%8%g%s$r8zN(E*$K8+$D$1$i$l(B | |
| 53 $B$k$h$&$K$9$k$?$a$N%$%s%G%C%/%9$N(B2$B<oN`$N>pJs$r;}$D!%(B | |
| 54 | |
| 55 %A file that is large, or has a lot of history, has its filelog stored | |
| 56 %in separate data (``\texttt{.d}'' suffix) and index (``\texttt{.i}'' | |
| 57 %suffix) files. For small files without much history, the revision | |
| 58 %data and index are combined in a single ``\texttt{.i}'' file. The | |
| 59 %correspondence between a file in the working directory and the filelog | |
| 60 %that tracks its history in the repository is illustrated in | |
| 61 %figure~\ref{fig:concepts:filelog}. | |
| 62 | |
| 63 $B%5%$%:$NBg$-$J%U%!%$%k$d!$KDBg$JMzNr$r;}$D%U%!%$%k$O!$%G!<%?$,(B | |
| 64 (``\texttt{.d}'' suffix) $B$*$h$S%$%s%G%C%/%9(B (``\texttt{.i}'' suffix)$B$N%U%!(B | |
| 65 $B%$%k$KJ,3d$5$l$?(Bfilelog$B$r;}$D!%%5%$%:$,>.$5$/!$MzNr$NBg$-$/$J$$%U%!%$%k$O(B | |
| 66 $B%j%S%8%g%s%G!<%?$H%$%s%G%C%/%9$,(B1$B$D$N(B``\texttt{.i}''$B%U%!%$%k$K7k9g$5$l$F(B | |
| 67 $B$$$k!%%o!<%-%s%0%G%#%l%/%H%jFb$N%U%!%$%k$H%j%]%8%H%jFb$NMzNr$rDI@W$9$k(B | |
| 68 filelog$B$H$NBP1~$r?^(B~\ref{fig:concepts:filelog}$B$K<($9!%(B | |
| 38 | 69 |
| 39 \begin{figure}[ht] | 70 \begin{figure}[ht] |
| 40 \centering | 71 \centering |
| 41 % \grafix{filelog} | 72 % \grafix{filelog} |
| 42 \includegraphics{filelog} | 73 \includegraphics{filelog} |
| 43 \caption{Relationships between files in working directory and | 74 % \caption{Relationships between files in working directory and |
| 44 filelogs in repository} | 75 % filelogs in repository} |
| 76 \caption{$B%o!<%-%s%0%G%#%l%/%H%jFb$N%U%!%$%k$H%j%]%8%H%j$N%U%!%$%k%m%0(B | |
| 77 $B$N4X78(B} | |
| 45 \label{fig:concepts:filelog} | 78 \label{fig:concepts:filelog} |
| 46 \end{figure} | 79 \end{figure} |
| 47 | 80 |
| 48 \subsection{Managing tracked files} | 81 %\subsection{Managing tracked files} |
| 49 | 82 \subsection{$BDI@W$5$l$F$$$k%U%!%$%k$N4IM}(B} |
| 50 Mercurial uses a structure called a \emph{manifest} to collect | 83 |
| 51 together information about the files that it tracks. Each entry in | 84 %Mercurial uses a structure called a \emph{manifest} to collect |
| 52 the manifest contains information about the files present in a single | 85 %together information about the files that it tracks. Each entry in |
| 53 changeset. An entry records which files are present in the changeset, | 86 %the manifest contains information about the files present in a single |
| 54 the revision of each file, and a few other pieces of file metadata. | 87 %changeset. An entry records which files are present in the changeset, |
| 55 | 88 %the revision of each file, and a few other pieces of file metadata. |
| 56 \subsection{Recording changeset information} | 89 |
| 57 | 90 Mercurial$B$O(B\emph{$B%^%K%U%'%9%H(B}$B$H8F$P$l$k9=B$$rMQ$$$F!$DI@W$9$Y$-%U%!%$%k(B |
| 58 The \emph{changelog} contains information about each changeset. Each | 91 $B$N>pJs$r=8$a$F$$$k!%%^%K%U%'%9%HFb$N3F!9$N%(%s%H%j$O!$C10l$N%A%'%s%8%;%C(B |
| 59 revision records who committed a change, the changeset comment, other | 92 $B%HFb$KB8:_$9$k%U%!%$%k$N>pJs$r;}$C$F$$$k!%%(%s%H%j$O$I$N%U%!%$%k$,%A%'%s(B |
| 60 pieces of changeset-related information, and the revision of the | 93 $B%8%;%C%H$KB8:_$7$F$$$k$+!$$=$l$i$N%j%S%8%g%s$,2?$G$"$k$N$+$H$$$&>pJs$H!$(B |
| 61 manifest to use. | 94 $B$$$/$D$+$NB>$N%U%!%$%k%a%?%G!<%?$r5-O?$7$F$$$k!%(B |
| 62 | 95 |
| 63 \subsection{Relationships between revisions} | 96 %\subsection{Recording changeset information} |
| 64 | 97 \subsection{$B%A%'%s%8%;%C%H>pJs$N5-O?(B} |
| 65 Within a changelog, a manifest, or a filelog, each revision stores a | 98 |
| 66 pointer to its immediate parent (or to its two parents, if it's a | 99 %The \emph{changelog} contains information about each changeset. Each |
| 67 merge revision). As I mentioned above, there are also relationships | 100 %revision records who committed a change, the changeset comment, other |
| 68 between revisions \emph{across} these structures, and they are | 101 %pieces of changeset-related information, and the revision of the |
| 69 hierarchical in nature. | 102 %manifest to use. |
| 70 | 103 |
| 71 For every changeset in a repository, there is exactly one revision | 104 \emph{changelog}$B$O3F!9$N%A%'%s%8%;%C%H$N>pJs$r;}$D!%3F!9$N%j%S%8%g%s5-O?(B |
| 72 stored in the changelog. Each revision of the changelog contains a | 105 $B$O!$C/$,JQ99$r%3%_%C%H$7$?$N$+!$%A%'%s%8%;%C%H$N%3%a%s%H!$JQ99$K4XO"$7$?(B |
| 73 pointer to a single revision of the manifest. A revision of the | 106 $BB>$N>pJs!$;HMQ$5$l$k%^%K%U%'%9%H$N%j%S%8%g%s$r;}$D!%(B |
| 74 manifest stores a pointer to a single revision of each filelog tracked | 107 |
| 75 when that changeset was created. These relationships are illustrated | 108 %\subsection{Relationships between revisions} |
| 76 in figure~\ref{fig:concepts:metadata}. | 109 \subsection{$B%j%S%8%g%s4V$N4X78(B} |
| 110 | |
| 111 %Within a changelog, a manifest, or a filelog, each revision stores a | |
| 112 %pointer to its immediate parent (or to its two parents, if it's a | |
| 113 %merge revision). As I mentioned above, there are also relationships | |
| 114 %between revisions \emph{across} these structures, and they are | |
| 115 %hierarchical in nature. | |
| 116 | |
| 117 $B%A%'%s%8%m%0!$%^%K%U%'%9%H!$%U%!%$%k%m%0Fb$G!$3F!9$N%j%S%8%g%s$OD>@\$N?F(B | |
| 118 $B!J$"$k$$$O%^!<%8$N>l9g$O$=$NN>?F!K$X$N%]%$%s%?$r;}$D!%(B | |
| 119 $B$9$G$K=R$Y$?$h$&$K!$$3$N9=B$$K8=$l$k%j%S%8%g%s$N4V$K$O4X78$,$"$j!$K\<A(B | |
| 120 $BE*$K3,AXE*$G$"$k!%(B | |
| 121 | |
| 122 %For every changeset in a repository, there is exactly one revision | |
| 123 %stored in the changelog. Each revision of the changelog contains a | |
| 124 %pointer to a single revision of the manifest. A revision of the | |
| 125 %manifest stores a pointer to a single revision of each filelog tracked | |
| 126 %when that changeset was created. These relationships are illustrated | |
| 127 %in figure~\ref{fig:concepts:metadata}. | |
| 128 | |
| 129 $B%j%]%8%H%j$K$"$k$9$Y$F$N%A%'%s%8%;%C%H$O!$%A%'%s%8%m%0Fb$G@53N$K(B1$B$D$N%j%S(B | |
| 130 $B%8%g%s$r;}$D!%%A%'%s%8%m%0$N3F!9$N%j%S%8%g%s$O!$%^%K%U%'%9%H$NC10l$N%P!<(B | |
| 131 $B%8%g%s$X$N%]%$%s%?$r;}$D!%%^%K%U%'%9%H$N3F!9$N%j%S%8%g%s$O!$%A%'%s%8%;%C(B | |
| 132 $B%H$,@8@.$5$l$?;~$N%U%!%$%k%m%0$N3F!9$NC10l%j%S%8%g%s$X$N%]%$%s%?$r;}$D!%(B | |
| 133 $B$3$N4X78$r?^(B~\ref{fig:concepts:metadata}$B$K<($9!%(B | |
| 77 | 134 |
| 78 \begin{figure}[ht] | 135 \begin{figure}[ht] |
| 79 \centering | 136 \centering |
| 80 \includegraphics{metadata} | 137 \includegraphics{metadata} |
| 81 \caption{Metadata relationships} | 138 % \caption{Metadata relationships} |
| 139 \caption{$B%a%?%G!<%?$N4X78(B} | |
| 82 \label{fig:concepts:metadata} | 140 \label{fig:concepts:metadata} |
| 83 \end{figure} | 141 \end{figure} |
| 84 | 142 |
| 85 As the illustration shows, there is \emph{not} a ``one to one'' | 143 %As the illustration shows, there is \emph{not} a ``one to one'' |
| 86 relationship between revisions in the changelog, manifest, or filelog. | 144 %relationship between revisions in the changelog, manifest, or filelog. |
| 87 If the manifest hasn't changed between two changesets, the changelog | 145 %If the manifest hasn't changed between two changesets, the changelog |
| 88 entries for those changesets will point to the same revision of the | 146 %entries for those changesets will point to the same revision of the |
| 89 manifest. If a file that Mercurial tracks hasn't changed between two | 147 %manifest. If a file that Mercurial tracks hasn't changed between two |
| 90 changesets, the entry for that file in the two revisions of the | 148 %changesets, the entry for that file in the two revisions of the |
| 91 manifest will point to the same revision of its filelog. | 149 %manifest will point to the same revision of its filelog. |
| 92 | 150 |
| 93 \section{Safe, efficient storage} | 151 $B%$%i%9%H$G<($5$l$F$$$k$h$&$K!$%j%S%8%g%s$H%A%'%s%8%m%0!$%^%K%U%'%9%H$^$?(B |
| 94 | 152 $B$O%U%!%$%k%m%0$N4V$K$O(B1$BBP#1$N4X78$O$J$$!%%^%K%U%'%9%H$,(B2$B$D$N%A%'%s%8%;%C(B |
| 95 The underpinnings of changelogs, manifests, and filelogs are provided | 153 $B%H$N4V$GJQ2=$7$J$+$C$?>l9g$O!$%A%'%s%8%m%0Fb$G$3$l$i$N%A%'%s%8%;%C%H$rI=(B |
| 96 by a single structure called the \emph{revlog}. | 154 $B$9%(%s%H%j$OF1$8%P!<%8%g%s$N%^%K%U%'%9%H$r<($9!%(BMercurial$B$,DI@W$9$k%U%!%$(B |
| 97 | 155 $B%k$,!$(B2$B$D$N%A%'%s%8%;%C%H4V$GJQ2=$7$J$+$C$?>l9g$O!$(B 2$B$D$N%^%K%U%'%9%H$N%j(B |
| 98 \subsection{Efficient storage} | 156 $B%S%8%g%s$G!$$=$N%U%!%$%k$r<($9%(%s%H%j$O%U%!%$%k%m%0$NF1$8%j%S%8%g%s$r<((B |
| 99 | 157 $B$9!%(B |
| 100 The revlog provides efficient storage of revisions using a | 158 |
| 101 \emph{delta} mechanism. Instead of storing a complete copy of a file | 159 %\section{Safe, efficient storage} |
| 102 for each revision, it stores the changes needed to transform an older | 160 \section{$B0BA4$+$D8zN(E*$J%9%H%l!<%8(B} |
| 103 revision into the new revision. For many kinds of file data, these | 161 |
| 104 deltas are typically a fraction of a percent of the size of a full | 162 %The underpinnings of changelogs, manifests, and filelogs are provided |
| 105 copy of a file. | 163 %by a single structure called the \emph{revlog}. |
| 106 | 164 |
| 107 Some obsolete revision control systems can only work with deltas of | 165 $B%A%'%s%8%m%0!$%^%K%U%'%9%H!$%U%!%$%k%m%0$NEZBf$K;H$o$l$F$$$k$O6&DL$N(B |
| 108 text files. They must either store binary files as complete snapshots | 166 \emph{revlog}$B$H$$$&9=B$BN$G$"$k!%(B |
| 109 or encoded into a text representation, both of which are wasteful | 167 |
| 110 approaches. Mercurial can efficiently handle deltas of files with | 168 %\subsection{Efficient storage} |
| 111 arbitrary binary contents; it doesn't need to treat text as special. | 169 \subsection{$B8zN(E*$J%9%H%l!<%8(B} |
| 112 | 170 |
| 113 \subsection{Safe operation} | 171 %The revlog provides efficient storage of revisions using a |
| 172 %\emph{delta} mechanism. Instead of storing a complete copy of a file | |
| 173 %for each revision, it stores the changes needed to transform an older | |
| 174 %revision into the new revision. For many kinds of file data, these | |
| 175 %deltas are typically a fraction of a percent of the size of a full | |
| 176 %copy of a file. | |
| 177 | |
| 178 revlog$B$O(B\emph{delta}$B5!9=$r;H$C$F%j%S%8%g%s$N8zN(E*$J5-21$rDs6!$9$k!%%U%!(B | |
| 179 $B%$%k$N3F!9$N%P!<%8%g%s$N40A4$J%3%T!<$rJ]B8$9$k$N$G$O$J$/!$8E$$%j%S%8%g%s(B | |
| 180 $B$r?7$7$$%P!<%8%g%s$XJQ49$9$k$N$KI,MW$JJQ99$rJ]B8$9$k!%B?$/$N%U%!%$%k%G!<(B | |
| 181 $B%?$KBP$7$F!$(B delta$B$OE57?E*$K$O%U%!%$%k$N%U%k%3%T!<$N(B1$B%Q!<%;%s%HL$K~$G$"$"(B | |
| 182 $B$k!%(B | |
| 183 | |
| 184 %Some obsolete revision control systems can only work with deltas of | |
| 185 %text files. They must either store binary files as complete snapshots | |
| 186 %or encoded into a text representation, both of which are wasteful | |
| 187 %approaches. Mercurial can efficiently handle deltas of files with | |
| 188 %arbitrary binary contents; it doesn't need to treat text as special. | |
| 189 | |
| 190 $B8E$$%j%S%8%g%s%3%s%H%m!<%k%7%9%F%`$N$$$/$D$+$O%F%-%9%H%U%!%$%k$N(Bdelta$B$KBP(B | |
| 191 $B$7$F$7$+5!G=$7$J$$!%$=$l$i$N%7%9%F%`$G$O%P%$%J%j%U%!%$%k$O40A4$J%9%J%C%W(B | |
| 192 $B%7%g%C%H$+!$%F%-%9%HI=8=$K%(%s%3!<%I$5$l$?7A<0$G$"$kI,MW$,$"$k!%$3$l$i$O(B | |
| 193 $B6&$KL5BL$NB?$$%"%W%m!<%A$G$"$k!%(B Mercurial$B$OG$0U$N%P%$%J%j%U%!%$%k$K$D$$(B | |
| 194 $B$F!$(Bdelta$B$r8zN(E*$K07$&$3$H$,$G$-!$%F%-%9%H$rFCJL07$$$9$kI,MW$,$J$$!%(B | |
| 195 | |
| 196 %\subsection{Safe operation} | |
| 197 \subsection{$B0BA4$JF0:n(B} | |
| 114 \label{sec:concepts:txn} | 198 \label{sec:concepts:txn} |
| 115 | 199 |
| 116 Mercurial only ever \emph{appends} data to the end of a revlog file. | 200 %Mercurial only ever \emph{appends} data to the end of a revlog file. |
| 117 It never modifies a section of a file after it has written it. This | 201 %It never modifies a section of a file after it has written it. This |
| 118 is both more robust and efficient than schemes that need to modify or | 202 %is both more robust and efficient than schemes that need to modify or |
| 119 rewrite data. | 203 %rewrite data. |
| 120 | 204 |
| 121 In addition, Mercurial treats every write as part of a | 205 Mercurial$B$O(Brevlog$B%U%!%$%k$NKvHx$K%G!<%?$N(B\emph{$BDI2C(B}$B$N$_$r9T$&!%0lEY=q$-(B |
| 122 \emph{transaction} that can span a number of files. A transaction is | 206 $B9~$^$l$?ItJ,$O8e$K$J$C$FJQ99$5$l$k$3$H$O$J$$!%$3$l$O%G!<%?$NJQ99$d:F=q$-(B |
| 123 \emph{atomic}: either the entire transaction succeeds and its effects | 207 $B9~$_$r9T$&J}K!$h$j$b4h6/$+$D8zN(E*$G$"$k!%(B |
| 124 are all visible to readers in one go, or the whole thing is undone. | 208 |
| 125 This guarantee of atomicity means that if you're running two copies of | 209 %In addition, Mercurial treats every write as part of a |
| 126 Mercurial, where one is reading data and one is writing it, the reader | 210 %\emph{transaction} that can span a number of files. A transaction is |
| 127 will never see a partially written result that might confuse it. | 211 %\emph{atomic}: either the entire transaction succeeds and its effects |
| 128 | 212 %are all visible to readers in one go, or the whole thing is undone. |
| 129 The fact that Mercurial only appends to files makes it easier to | 213 %This guarantee of atomicity means that if you're running two copies of |
| 130 provide this transactional guarantee. The easier it is to do stuff | 214 %Mercurial, where one is reading data and one is writing it, the reader |
| 131 like this, the more confident you should be that it's done correctly. | 215 %will never see a partially written result that might confuse it. |
| 132 | 216 |
| 133 \subsection{Fast retrieval} | 217 $B2C$($F!$(BMercurial$B$O$"$i$f$k=q$-9~$_$rJ#?t$N%U%!%$%k$X$N(B\emph{$B%H%i%s%6%/%7%g(B |
| 134 | 218 $B%s(B}$B$N0lIt$H8+$J$9!%%H%i%s%6%/%7%g%sA4BN$,@.8y$7!$FI$_=P$7B&$K0lEY$K7k2L$,(B |
| 135 Mercurial cleverly avoids a pitfall common to all earlier | 219 $B8+$($k>l9g$b!$$=$&$G$J$$>l9g$b%H%i%s%6%/%7%g%s$O(B\emph{$B%"%H%_%C%/(B}$B$G$"$k!%(B |
| 136 revision control systems: the problem of \emph{inefficient retrieval}. | 220 $B$3$N%"%H%_%C%/@-$NJ]>Z$O!$(B 2$B$D$N(BMercurial$B$r!$JRJ}$O%G!<%?$NFI$_=P$7!$$b$&(B |
| 137 Most revision control systems store the contents of a revision as an | 221 $B0lJ}$O=q$-9~$_$G<B9T$7$F$$$k>l9g!$FI$_=P$7B&$K$O:.Mp$N860x$H$J$kItJ,E*$J(B |
| 138 incremental series of modifications against a ``snapshot''. To | 222 $B=q$-9~$_7k2L$O8+$($J$$$3$H$r0UL#$9$k!%(B |
| 139 reconstruct a specific revision, you must first read the snapshot, and | 223 |
| 140 then every one of the revisions between the snapshot and your target | 224 %The fact that Mercurial only appends to files makes it easier to |
| 141 revision. The more history that a file accumulates, the more | 225 %provide this transactional guarantee. The easier it is to do stuff |
| 142 revisions you must read, hence the longer it takes to reconstruct a | 226 %like this, the more confident you should be that it's done correctly. |
| 143 particular revision. | 227 |
| 228 Mercurial$B$O%U%!%$%k$KDI5-$N$_$r$9$k$3$H$G!$%H%i%s%6%/%7%g%s$NJ]>Z$rMF0W$K(B | |
| 229 $B$7$F$$$k!%J*;v$rC1=c2=$9$k$3$H$K$h$C$F!$=hM}$N@5$7$5$r3N<B$K$9$k$h$&$K$7(B | |
| 230 $B$F$$$k!%(B | |
| 231 | |
| 232 %\subsection{Fast retrieval} | |
| 233 \subsection{$B9bB.$J<hF@(B} | |
| 234 | |
| 235 %Mercurial cleverly avoids a pitfall common to all earlier | |
| 236 %revision control systems: the problem of \emph{inefficient retrieval}. | |
| 237 %Most revision control systems store the contents of a revision as an | |
| 238 %incremental series of modifications against a ``snapshot''. To | |
| 239 %reconstruct a specific revision, you must first read the snapshot, and | |
| 240 %then every one of the revisions between the snapshot and your target | |
| 241 %revision. The more history that a file accumulates, the more | |
| 242 %revisions you must read, hence the longer it takes to reconstruct a | |
| 243 %particular revision. | |
| 244 | |
| 245 Mercurial$B$O$3$l$^$G$N%j%S%8%g%s%3%s%H%m!<%k%7%9%F%`$K6&DL$7$?Mn$H$77j$r(B | |
| 246 $B8-L@$KHr$1$F$$$k!%LdBj$@$C$?$N$O(B\emph{$BHs8zN(E*$J<hF@(B}$B$G$"$C$?!%(B | |
| 247 $BBgDq$N%j%S%8%g%s%3%s%H%m!<%k%7%9%F%`$O%j%S%8%g%s$NFbMF$r%9%J%C%W%7%g%C%H(B | |
| 248 $B$X$N0lO"$NJQ99$NA}J,$H$7$FJ]B8$7$F$$$k!%FCDj$N%j%S%8%g%s$r:F8=$9$k$?$a$K(B | |
| 249 $B$O$^$:%9%J%C%W%7%g%C%H$rFI$_9~$_!$$7$+$k8e$KL\E*$N%j%S%8%g%s$rFI$`I,MW$,(B | |
| 250 $B$"$C$?!%%U%!%$%k$X$NMzNr$,A}$($l$PA}$($k$[$IFI$_9~$^$J$1$l$P$J$i$J$$%j%S(B | |
| 251 $B%8%g%s$,B?$/$J$j!$FCDj$N%j%S%8%g%s$r:F8=$9$k$N$K;~4V$,$+$+$k$h$&$K$J$k!%(B | |
| 144 | 252 |
| 145 \begin{figure}[ht] | 253 \begin{figure}[ht] |
| 146 \centering | 254 \centering |
| 147 \includegraphics{snapshot} | 255 \includegraphics{snapshot} |
| 148 \caption{Snapshot of a revlog, with incremental deltas} | 256 % \caption{Snapshot of a revlog, with incremental deltas} |
| 257 \caption{$B:9J,$rMQ$$$?(Brevlog$B$N%9%J%C%W%7%g%C%H(B} | |
| 149 \label{fig:concepts:snapshot} | 258 \label{fig:concepts:snapshot} |
| 150 \end{figure} | 259 \end{figure} |
| 151 | 260 |
| 152 The innovation that Mercurial applies to this problem is simple but | 261 %The innovation that Mercurial applies to this problem is simple but |
| 153 effective. Once the cumulative amount of delta information stored | 262 %effective. Once the cumulative amount of delta information stored |
| 154 since the last snapshot exceeds a fixed threshold, it stores a new | 263 %since the last snapshot exceeds a fixed threshold, it stores a new |
| 155 snapshot (compressed, of course), instead of another delta. This | 264 %snapshot (compressed, of course), instead of another delta. This |
| 156 makes it possible to reconstruct \emph{any} revision of a file | 265 %makes it possible to reconstruct \emph{any} revision of a file |
| 157 quickly. This approach works so well that it has since been copied by | 266 %quickly. This approach works so well that it has since been copied by |
| 158 several other revision control systems. | 267 %several other revision control systems. |
| 159 | 268 |
| 160 Figure~\ref{fig:concepts:snapshot} illustrates the idea. In an entry | 269 Mercurial$B$G$O$3$NLdBj$rC1=c$@$,8z2LE*$JJ}K!$G2r7h$7$F$$$k!%A02s$K%9%J%C%W(B |
| 161 in a revlog's index file, Mercurial stores the range of entries from | 270 $B%7%g%C%H$r:n@.$7$?;~E@$+$i$NN_@QE*$JA}J,$,0lDj$NogCM$r1[$($k$H!$?7$?$JA}(B |
| 162 the data file that it must read to reconstruct a particular revision. | 271 $BJ,$G$O$J$/!$?7$?$J%9%J%C%W%7%g%C%H$,J]B8$5$l$k!J$3$N%9%J%C%W%7%g%C%H$OL^(B |
| 163 | 272 $BO@05=L$5$l$F$$$k!K!%$3$l$K$h$C$F(B\emph{$B$$$+$J$k(B}$B%j%S%8%g%s$b?WB.$K:F8=$5$l(B |
| 164 \subsubsection{Aside: the influence of video compression} | 273 $B$k!%$3$N%"%W%m!<%A$O0J8eB>$N%j%S%8%g%s%3%s%H%m!<%k%7%9%F%`$K%3%T!<$5$l$k(B |
| 165 | 274 $B$[$I$&$^$/5!G=$7$F$$$k!%(B |
| 166 If you're familiar with video compression or have ever watched a TV | 275 |
| 167 feed through a digital cable or satellite service, you may know that | 276 %Figure~\ref{fig:concepts:snapshot} illustrates the idea. In an entry |
| 168 most video compression schemes store each frame of video as a delta | 277 %in a revlog's index file, Mercurial stores the range of entries from |
| 169 against its predecessor frame. In addition, these schemes use | 278 %the data file that it must read to reconstruct a particular revision. |
| 170 ``lossy'' compression techniques to increase the compression ratio, so | 279 |
| 171 visual errors accumulate over the course of a number of inter-frame | 280 $B?^(B~\ref{fig:concepts:snapshot}$B$K35G0$r<($9!%(BMercurial$B$O(Brevlog$B$N%$%s%G%C%/(B |
| 172 deltas. | 281 $B%9%U%!%$%k$N%(%s%H%j$KFCDj$N%j%S%8%g%s$r:F8=$9$k$N$KI,MW$J%G!<%?%U%!%$%k(B |
| 173 | 282 $BFb$N$"$kHO0O$N%(%s%H%j$rJ]B8$9$k!%(B |
| 174 Because it's possible for a video stream to ``drop out'' occasionally | 283 |
| 175 due to signal glitches, and to limit the accumulation of artefacts | 284 |
| 176 introduced by the lossy compression process, video encoders | 285 |
| 177 periodically insert a complete frame (called a ``key frame'') into the | 286 |
| 178 video stream; the next delta is generated against that frame. This | 287 |
| 179 means that if the video signal gets interrupted, it will resume once | 288 |
| 180 the next key frame is received. Also, the accumulation of encoding | 289 |
| 181 errors restarts anew with each key frame. | 290 |
| 182 | 291 |
| 183 \subsection{Identification and strong integrity} | 292 |
| 184 | 293 |
| 185 Along with delta or snapshot information, a revlog entry contains a | 294 %\subsubsection{Aside: the influence of video compression} |
| 186 cryptographic hash of the data that it represents. This makes it | 295 \subsubsection{$B$=$NB>(B: $B%S%G%*05=L$N1F6A(B} |
| 187 difficult to forge the contents of a revision, and easy to detect | 296 |
| 188 accidental corruption. | 297 %If you're familiar with video compression or have ever watched a TV |
| 189 | 298 %feed through a digital cable or satellite service, you may know that |
| 190 Hashes provide more than a mere check against corruption; they are | 299 %most video compression schemes store each frame of video as a delta |
| 191 used as the identifiers for revisions. The changeset identification | 300 %against its predecessor frame. In addition, these schemes use |
| 192 hashes that you see as an end user are from revisions of the | 301 %``lossy'' compression techniques to increase the compression ratio, so |
| 193 changelog. Although filelogs and the manifest also use hashes, | 302 %visual errors accumulate over the course of a number of inter-frame |
| 194 Mercurial only uses these behind the scenes. | 303 %deltas. |
| 195 | 304 |
| 196 Mercurial verifies that hashes are correct when it retrieves file | 305 %Because it's possible for a video stream to ``drop out'' occasionally |
| 197 revisions and when it pulls changes from another repository. If it | 306 %due to signal glitches, and to limit the accumulation of artefacts |
| 198 encounters an integrity problem, it will complain and stop whatever | 307 %introduced by the lossy compression process, video encoders |
| 199 it's doing. | 308 %periodically insert a complete frame (called a ``key frame'') into the |
| 200 | 309 %video stream; the next delta is generated against that frame. This |
| 201 In addition to the effect it has on retrieval efficiency, Mercurial's | 310 %means that if the video signal gets interrupted, it will resume once |
| 202 use of periodic snapshots makes it more robust against partial data | 311 %the next key frame is received. Also, the accumulation of encoding |
| 203 corruption. If a revlog becomes partly corrupted due to a hardware | 312 %errors restarts anew with each key frame. |
| 204 error or system bug, it's often possible to reconstruct some or most | 313 |
| 205 revisions from the uncorrupted sections of the revlog, both before and | 314 %\subsection{Identification and strong integrity} |
| 206 after the corrupted section. This would not be possible with a | 315 \subsection{$B<1JL$H6/$$0l4S@-(B} |
| 207 delta-only storage model. | 316 |
| 208 | 317 %Along with delta or snapshot information, a revlog entry contains a |
| 209 \section{Revision history, branching, | 318 %cryptographic hash of the data that it represents. This makes it |
| 210 and merging} | 319 %difficult to forge the contents of a revision, and easy to detect |
| 211 | 320 %accidental corruption. |
| 212 Every entry in a Mercurial revlog knows the identity of its immediate | 321 |
| 213 ancestor revision, usually referred to as its \emph{parent}. In fact, | 322 %Hashes provide more than a mere check against corruption; they are |
| 214 a revision contains room for not one parent, but two. Mercurial uses | 323 %used as the identifiers for revisions. The changeset identification |
| 215 a special hash, called the ``null ID'', to represent the idea ``there | 324 %hashes that you see as an end user are from revisions of the |
| 216 is no parent here''. This hash is simply a string of zeroes. | 325 %changelog. Although filelogs and the manifest also use hashes, |
| 217 | 326 %Mercurial only uses these behind the scenes. |
| 218 In figure~\ref{fig:concepts:revlog}, you can see an example of the | 327 |
| 219 conceptual structure of a revlog. Filelogs, manifests, and changelogs | 328 %Mercurial verifies that hashes are correct when it retrieves file |
| 220 all have this same structure; they differ only in the kind of data | 329 %revisions and when it pulls changes from another repository. If it |
| 221 stored in each delta or snapshot. | 330 %encounters an integrity problem, it will complain and stop whatever |
| 222 | 331 %it's doing. |
| 223 The first revision in a revlog (at the bottom of the image) has the | 332 |
| 224 null ID in both of its parent slots. For a ``normal'' revision, its | 333 %In addition to the effect it has on retrieval efficiency, Mercurial's |
| 225 first parent slot contains the ID of its parent revision, and its | 334 %use of periodic snapshots makes it more robust against partial data |
| 226 second contains the null ID, indicating that the revision has only one | 335 %corruption. If a revlog becomes partly corrupted due to a hardware |
| 227 real parent. Any two revisions that have the same parent ID are | 336 %error or system bug, it's often possible to reconstruct some or most |
| 228 branches. A revision that represents a merge between branches has two | 337 %revisions from the uncorrupted sections of the revlog, both before and |
| 229 normal revision IDs in its parent slots. | 338 %after the corrupted section. This would not be possible with a |
| 339 %delta-only storage model. | |
| 340 | |
| 341 %\section{Revision history, branching, and merging} | |
| 342 \section{$B%j%S%8%g%sMzNr!$%V%i%s%A!$%^!<%8(B} | |
| 343 | |
| 344 %Every entry in a Mercurial revlog knows the identity of its immediate | |
| 345 %ancestor revision, usually referred to as its \emph{parent}. In fact, | |
| 346 %a revision contains room for not one parent, but two. Mercurial uses | |
| 347 %a special hash, called the ``null ID'', to represent the idea ``there | |
| 348 %is no parent here''. This hash is simply a string of zeroes. | |
| 349 | |
| 350 %In figure~\ref{fig:concepts:revlog}, you can see an example of the | |
| 351 %conceptual structure of a revlog. Filelogs, manifests, and changelogs | |
| 352 %all have this same structure; they differ only in the kind of data | |
| 353 %stored in each delta or snapshot. | |
| 354 | |
| 355 %The first revision in a revlog (at the bottom of the image) has the | |
| 356 %null ID in both of its parent slots. For a ``normal'' revision, its | |
| 357 %first parent slot contains the ID of its parent revision, and its | |
| 358 %second contains the null ID, indicating that the revision has only one | |
| 359 %real parent. Any two revisions that have the same parent ID are | |
| 360 %branches. A revision that represents a merge between branches has two | |
| 361 %normal revision IDs in its parent slots. | |
| 230 | 362 |
| 231 \begin{figure}[ht] | 363 \begin{figure}[ht] |
| 232 \centering | 364 \centering |
| 233 \includegraphics{revlog} | 365 \includegraphics{revlog} |
| 234 \caption{} | 366 \caption{} |
| 235 \label{fig:concepts:revlog} | 367 \label{fig:concepts:revlog} |
| 236 \end{figure} | 368 \end{figure} |
| 237 | 369 |
| 238 \section{The working directory} | 370 %\section{The working directory} |
| 239 | 371 \section{$B%o!<%-%s%0%G%#%l%/%H%j(B} |
| 240 In the working directory, Mercurial stores a snapshot of the files | 372 |
| 241 from the repository as of a particular changeset. | 373 %In the working directory, Mercurial stores a snapshot of the files |
| 242 | 374 %from the repository as of a particular changeset. |
| 243 The working directory ``knows'' which changeset it contains. When you | 375 |
| 244 update the working directory to contain a particular changeset, | 376 %The working directory ``knows'' which changeset it contains. When you |
| 245 Mercurial looks up the appropriate revision of the manifest to find | 377 %update the working directory to contain a particular changeset, |
| 246 out which files it was tracking at the time that changeset was | 378 %Mercurial looks up the appropriate revision of the manifest to find |
| 247 committed, and which revision of each file was then current. It then | 379 %out which files it was tracking at the time that changeset was |
| 248 recreates a copy of each of those files, with the same contents it had | 380 %committed, and which revision of each file was then current. It then |
| 249 when the changeset was committed. | 381 %recreates a copy of each of those files, with the same contents it had |
| 250 | 382 %when the changeset was committed. |
| 251 The \emph{dirstate} contains Mercurial's knowledge of the working | 383 |
| 252 directory. This details which changeset the working directory is | 384 %The \emph{dirstate} contains Mercurial's knowledge of the working |
| 253 updated to, and all of the files that Mercurial is tracking in the | 385 %directory. This details which changeset the working directory is |
| 254 working directory. | 386 %updated to, and all of the files that Mercurial is tracking in the |
| 255 | 387 %working directory. |
| 256 Just as a revision of a revlog has room for two parents, so that it | 388 |
| 257 can represent either a normal revision (with one parent) or a merge of | 389 %Just as a revision of a revlog has room for two parents, so that it |
| 258 two earlier revisions, the dirstate has slots for two parents. When | 390 %can represent either a normal revision (with one parent) or a merge of |
| 259 you use the \hgcmd{update} command, the changeset that you update to | 391 %two earlier revisions, the dirstate has slots for two parents. When |
| 260 is stored in the ``first parent'' slot, and the null ID in the second. | 392 %you use the \hgcmd{update} command, the changeset that you update to |
| 261 When you \hgcmd{merge} with another changeset, the first parent | 393 %is stored in the ``first parent'' slot, and the null ID in the second. |
| 262 remains unchanged, and the second parent is filled in with the | 394 %When you \hgcmd{merge} with another changeset, the first parent |
| 263 changeset you're merging with. The \hgcmd{parents} command tells you | 395 %remains unchanged, and the second parent is filled in with the |
| 264 what the parents of the dirstate are. | 396 %changeset you're merging with. The \hgcmd{parents} command tells you |
| 265 | 397 %what the parents of the dirstate are. |
| 266 \subsection{What happens when you commit} | 398 |
| 267 | 399 %\subsection{What happens when you commit} |
| 268 The dirstate stores parent information for more than just book-keeping | 400 \subsection{$B%3%_%C%H;~$K2?$,5/$-$k$N$+(B} |
| 269 purposes. Mercurial uses the parents of the dirstate as \emph{the | 401 |
| 270 parents of a new changeset} when you perform a commit. | 402 %The dirstate stores parent information for more than just book-keeping |
| 403 %purposes. Mercurial uses the parents of the dirstate as \emph{the | |
| 404 % parents of a new changeset} when you perform a commit. | |
| 271 | 405 |
| 272 \begin{figure}[ht] | 406 \begin{figure}[ht] |
| 273 \centering | 407 \centering |
| 274 \includegraphics{wdir} | 408 \includegraphics{wdir} |
| 275 \caption{The working directory can have two parents} | 409 % \caption{The working directory can have two parents} |
| 410 \caption{$B%o!<%-%s%0%G%#%l%/%H%j$O(B2$B$D$N?F$r;}$AF@$k(B} | |
| 276 \label{fig:concepts:wdir} | 411 \label{fig:concepts:wdir} |
| 277 \end{figure} | 412 \end{figure} |
| 278 | 413 |
| 279 Figure~\ref{fig:concepts:wdir} shows the normal state of the working | 414 %Figure~\ref{fig:concepts:wdir} shows the normal state of the working |
| 280 directory, where it has a single changeset as parent. That changeset | 415 %directory, where it has a single changeset as parent. That changeset |
| 281 is the \emph{tip}, the newest changeset in the repository that has no | 416 %is the \emph{tip}, the newest changeset in the repository that has no |
| 282 children. | 417 %children. |
| 283 | 418 |
| 284 \begin{figure}[ht] | 419 \begin{figure}[ht] |
| 285 \centering | 420 \centering |
| 286 \includegraphics{wdir-after-commit} | 421 \includegraphics{wdir-after-commit} |
| 287 \caption{The working directory gains new parents after a commit} | 422 % \caption{The working directory gains new parents after a commit} |
| 423 \caption{$B%3%_%C%H8e!$%o!<%-%s%0%G%#%l%/%H%j$O?7$?$JN>?F$r;}$D(B} | |
| 288 \label{fig:concepts:wdir-after-commit} | 424 \label{fig:concepts:wdir-after-commit} |
| 289 \end{figure} | 425 \end{figure} |
| 290 | 426 |
| 291 It's useful to think of the working directory as ``the changeset I'm | 427 %It's useful to think of the working directory as ``the changeset I'm |
| 292 about to commit''. Any files that you tell Mercurial that you've | 428 %about to commit''. Any files that you tell Mercurial that you've |
| 293 added, removed, renamed, or copied will be reflected in that | 429 %added, removed, renamed, or copied will be reflected in that |
| 294 changeset, as will modifications to any files that Mercurial is | 430 %changeset, as will modifications to any files that Mercurial is |
| 295 already tracking; the new changeset will have the parents of the | 431 %already tracking; the new changeset will have the parents of the |
| 296 working directory as its parents. | 432 %working directory as its parents. |
| 297 | 433 |
| 298 After a commit, Mercurial will update the parents of the working | 434 %After a commit, Mercurial will update the parents of the working |
| 299 directory, so that the first parent is the ID of the new changeset, | 435 %directory, so that the first parent is the ID of the new changeset, |
| 300 and the second is the null ID. This is shown in | 436 %and the second is the null ID. This is shown in |
| 301 figure~\ref{fig:concepts:wdir-after-commit}. Mercurial doesn't touch | 437 %figure~\ref{fig:concepts:wdir-after-commit}. Mercurial doesn't touch |
| 302 any of the files in the working directory when you commit; it just | 438 %any of the files in the working directory when you commit; it just |
| 303 modifies the dirstate to note its new parents. | 439 %modifies the dirstate to note its new parents. |
| 304 | 440 |
| 305 \subsection{Creating a new head} | 441 %\subsection{Creating a new head} |
| 306 | 442 \subsection{$B?7$?$J%X%C%I$r:n$k(B} |
| 307 It's perfectly normal to update the working directory to a changeset | 443 |
| 308 other than the current tip. For example, you might want to know what | 444 %It's perfectly normal to update the working directory to a changeset |
| 309 your project looked like last Tuesday, or you could be looking through | 445 %other than the current tip. For example, you might want to know what |
| 310 changesets to see which one introduced a bug. In cases like this, the | 446 %your project looked like last Tuesday, or you could be looking through |
| 311 natural thing to do is update the working directory to the changeset | 447 %changesets to see which one introduced a bug. In cases like this, the |
| 312 you're interested in, and then examine the files in the working | 448 %natural thing to do is update the working directory to the changeset |
| 313 directory directly to see their contents as they werea when you | 449 %you're interested in, and then examine the files in the working |
| 314 committed that changeset. The effect of this is shown in | 450 %directory directly to see their contents as they werea when you |
| 315 figure~\ref{fig:concepts:wdir-pre-branch}. | 451 %committed that changeset. The effect of this is shown in |
| 452 %figure~\ref{fig:concepts:wdir-pre-branch}. | |
| 316 | 453 |
| 317 \begin{figure}[ht] | 454 \begin{figure}[ht] |
| 318 \centering | 455 \centering |
| 319 \includegraphics{wdir-pre-branch} | 456 \includegraphics{wdir-pre-branch} |
| 320 \caption{The working directory, updated to an older changeset} | 457 % \caption{The working directory, updated to an older changeset} |
| 458 \caption{$B8E$$%A%'%s%8%;%C%H$X$H99?7$5$l$?%o!<%-%s%0%G%#%l%/%H%j(B} | |
| 321 \label{fig:concepts:wdir-pre-branch} | 459 \label{fig:concepts:wdir-pre-branch} |
| 322 \end{figure} | 460 \end{figure} |
| 323 | 461 |
| 324 Having updated the working directory to an older changeset, what | 462 %Having updated the working directory to an older changeset, what |
| 325 happens if you make some changes, and then commit? Mercurial behaves | 463 %happens if you make some changes, and then commit? Mercurial behaves |
| 326 in the same way as I outlined above. The parents of the working | 464 %in the same way as I outlined above. The parents of the working |
| 327 directory become the parents of the new changeset. This new changeset | 465 %directory become the parents of the new changeset. This new changeset |
| 328 has no children, so it becomes the new tip. And the repository now | 466 %has no children, so it becomes the new tip. And the repository now |
| 329 contains two changesets that have no children; we call these | 467 %contains two changesets that have no children; we call these |
| 330 \emph{heads}. You can see the structure that this creates in | 468 %\emph{heads}. You can see the structure that this creates in |
| 331 figure~\ref{fig:concepts:wdir-branch}. | 469 %figure~\ref{fig:concepts:wdir-branch}. |
| 332 | 470 |
| 333 \begin{figure}[ht] | 471 \begin{figure}[ht] |
| 334 \centering | 472 \centering |
| 335 \includegraphics{wdir-branch} | 473 \includegraphics{wdir-branch} |
| 336 \caption{After a commit made while synced to an older changeset} | 474 % \caption{After a commit made while synced to an older changeset} |
| 475 \caption{$B8E$$%A%'%s%8%;%C%H$KF14|Cf$K%3%_%C%H$,9T$o$l$?>l9g(B} | |
| 337 \label{fig:concepts:wdir-branch} | 476 \label{fig:concepts:wdir-branch} |
| 338 \end{figure} | 477 \end{figure} |
| 339 | 478 |
| 340 \begin{note} | 479 %\begin{note} |
| 341 If you're new to Mercurial, you should keep in mind a common | 480 % If you're new to Mercurial, you should keep in mind a common |
| 342 ``error'', which is to use the \hgcmd{pull} command without any | 481 % ``error'', which is to use the \hgcmd{pull} command without any |
| 343 options. By default, the \hgcmd{pull} command \emph{does not} | 482 % options. By default, the \hgcmd{pull} command \emph{does not} |
| 344 update the working directory, so you'll bring new changesets into | 483 % update the working directory, so you'll bring new changesets into |
| 345 your repository, but the working directory will stay synced at the | 484 % your repository, but the working directory will stay synced at the |
| 346 same changeset as before the pull. If you make some changes and | 485 % same changeset as before the pull. If you make some changes and |
| 347 commit afterwards, you'll thus create a new head, because your | 486 % commit afterwards, you'll thus create a new head, because your |
| 348 working directory isn't synced to whatever the current tip is. | 487 % working directory isn't synced to whatever the current tip is. |
| 349 | 488 % |
| 350 I put the word ``error'' in quotes because all that you need to do | 489 % I put the word ``error'' in quotes because all that you need to do |
| 351 to rectify this situation is \hgcmd{merge}, then \hgcmd{commit}. In | 490 % to rectify this situation is \hgcmd{merge}, then \hgcmd{commit}. In |
| 352 other words, this almost never has negative consequences; it just | 491 % other words, this almost never has negative consequences; it just |
| 353 surprises people. I'll discuss other ways to avoid this behaviour, | 492 % surprises people. I'll discuss other ways to avoid this behaviour, |
| 354 and why Mercurial behaves in this initially surprising way, later | 493 % and why Mercurial behaves in this initially surprising way, later |
| 355 on. | 494 % on. |
| 356 \end{note} | 495 %\end{note} |
| 357 | 496 |
| 358 \subsection{Merging heads} | 497 %\subsection{Merging heads} |
| 359 | 498 \subsection{$B%X%C%I4V$N%^!<%8(B} |
| 360 When you run the \hgcmd{merge} command, Mercurial leaves the first | 499 |
| 361 parent of the working directory unchanged, and sets the second parent | 500 %When you run the \hgcmd{merge} command, Mercurial leaves the first |
| 362 to the changeset you're merging with, as shown in | 501 %parent of the working directory unchanged, and sets the second parent |
| 363 figure~\ref{fig:concepts:wdir-merge}. | 502 %to the changeset you're merging with, as shown in |
| 503 %figure~\ref{fig:concepts:wdir-merge}. | |
| 364 | 504 |
| 365 \begin{figure}[ht] | 505 \begin{figure}[ht] |
| 366 \centering | 506 \centering |
| 367 \includegraphics{wdir-merge} | 507 \includegraphics{wdir-merge} |
| 368 \caption{Merging two heads} | 508 % \caption{Merging two heads} |
| 509 \caption{2$B$D$N%X%C%I$N%^!<%8(B} | |
| 369 \label{fig:concepts:wdir-merge} | 510 \label{fig:concepts:wdir-merge} |
| 370 \end{figure} | 511 \end{figure} |
| 371 | 512 |
| 372 Mercurial also has to modify the working directory, to merge the files | 513 Mercurial also has to modify the working directory, to merge the files |
| 373 managed in the two changesets. Simplified a little, the merging | 514 managed in the two changesets. Simplified a little, the merging |
| 407 Mercurial only tracks two parents for both revisions and the working | 548 Mercurial only tracks two parents for both revisions and the working |
| 408 directory. While it would be technically possible to merge multiple | 549 directory. While it would be technically possible to merge multiple |
| 409 changesets at once, the prospect of user confusion and making a | 550 changesets at once, the prospect of user confusion and making a |
| 410 terrible mess of a merge immediately becomes overwhelming. | 551 terrible mess of a merge immediately becomes overwhelming. |
| 411 | 552 |
| 412 \section{Other interesting design features} | 553 %\section{Other interesting design features} |
| 554 \section{$B@_7W$NB>$N6=L#?<$$E@(B} | |
| 413 | 555 |
| 414 In the sections above, I've tried to highlight some of the most | 556 In the sections above, I've tried to highlight some of the most |
| 415 important aspects of Mercurial's design, to illustrate that it pays | 557 important aspects of Mercurial's design, to illustrate that it pays |
| 416 careful attention to reliability and performance. However, the | 558 careful attention to reliability and performance. However, the |
| 417 attention to detail doesn't stop there. There are a number of other | 559 attention to detail doesn't stop there. There are a number of other |
| 419 interesting. I'll detail a few of them here, separate from the ``big | 561 interesting. I'll detail a few of them here, separate from the ``big |
| 420 ticket'' items above, so that if you're interested, you can gain a | 562 ticket'' items above, so that if you're interested, you can gain a |
| 421 better idea of the amount of thinking that goes into a well-designed | 563 better idea of the amount of thinking that goes into a well-designed |
| 422 system. | 564 system. |
| 423 | 565 |
| 424 \subsection{Clever compression} | 566 %\subsection{Clever compression} |
| 567 \subsection{$B8-$$05=L(B} | |
| 425 | 568 |
| 426 When appropriate, Mercurial will store both snapshots and deltas in | 569 When appropriate, Mercurial will store both snapshots and deltas in |
| 427 compressed form. It does this by always \emph{trying to} compress a | 570 compressed form. It does this by always \emph{trying to} compress a |
| 428 snapshot or delta, but only storing the compressed version if it's | 571 snapshot or delta, but only storing the compressed version if it's |
| 429 smaller than the uncompressed version. | 572 smaller than the uncompressed version. |
| 439 these cases. It finds that such a delta exceeds the threshold at | 582 these cases. It finds that such a delta exceeds the threshold at |
| 440 which it should store a complete snapshot of the file, so it stores | 583 which it should store a complete snapshot of the file, so it stores |
| 441 the snapshot, again saving space compared to a naive delta-only | 584 the snapshot, again saving space compared to a naive delta-only |
| 442 approach. | 585 approach. |
| 443 | 586 |
| 444 \subsubsection{Network recompression} | 587 %\subsubsection{Network recompression} |
| 445 | 588 \subsubsection{$B%M%C%H%o!<%/:F05=L(B} |
| 446 When storing revisions on disk, Mercurial uses the ``deflate'' | 589 |
| 447 compression algorithm (the same one used by the popular \texttt{zip} | 590 %When storing revisions on disk, Mercurial uses the ``deflate'' |
| 448 archive format), which balances good speed with a respectable | 591 %compression algorithm (the same one used by the popular \texttt{zip} |
| 449 compression ratio. However, when transmitting revision data over a | 592 %archive format), which balances good speed with a respectable |
| 450 network connection, Mercurial uncompresses the compressed revision | 593 %compression ratio. However, when transmitting revision data over a |
| 451 data. | 594 %network connection, Mercurial uncompresses the compressed revision |
| 452 | 595 %data. |
| 453 If the connection is over HTTP, Mercurial recompresses the entire | 596 |
| 454 stream of data using a compression algorithm that gives a better | 597 %If the connection is over HTTP, Mercurial recompresses the entire |
| 455 compression ratio (the Burrows-Wheeler algorithm from the widely used | 598 %stream of data using a compression algorithm that gives a better |
| 456 \texttt{bzip2} compression package). This combination of algorithm | 599 %compression ratio (the Burrows-Wheeler algorithm from the widely used |
| 457 and compression of the entire stream (instead of a revision at a time) | 600 %\texttt{bzip2} compression package). This combination of algorithm |
| 458 substantially reduces the number of bytes to be transferred, yielding | 601 %and compression of the entire stream (instead of a revision at a time) |
| 459 better network performance over almost all kinds of network. | 602 %substantially reduces the number of bytes to be transferred, yielding |
| 460 | 603 %better network performance over almost all kinds of network. |
| 461 (If the connection is over \command{ssh}, Mercurial \emph{doesn't} | 604 |
| 462 recompress the stream, because \command{ssh} can already do this | 605 %(If the connection is over \command{ssh}, Mercurial \emph{doesn't} |
| 463 itself.) | 606 %recompress the stream, because \command{ssh} can already do this |
| 464 | 607 %itself.) |
| 465 \subsection{Read/write ordering and atomicity} | 608 |
| 466 | 609 %\subsection{Read/write ordering and atomicity} |
| 467 Appending to files isn't the whole story when it comes to guaranteeing | 610 \subsection{$BFI$_=q$-$N=g=x$H%"%H%_%C%/@-(B} |
| 468 that a reader won't see a partial write. If you recall | 611 |
| 469 figure~\ref{fig:concepts:metadata}, revisions in the changelog point to | 612 %Appending to files isn't the whole story when it comes to guaranteeing |
| 470 revisions in the manifest, and revisions in the manifest point to | 613 %that a reader won't see a partial write. If you recall |
| 471 revisions in filelogs. This hierarchy is deliberate. | 614 %figure~\ref{fig:concepts:metadata}, revisions in the changelog point to |
| 472 | 615 %revisions in the manifest, and revisions in the manifest point to |
| 473 A writer starts a transaction by writing filelog and manifest data, | 616 %revisions in filelogs. This hierarchy is deliberate. |
| 474 and doesn't write any changelog data until those are finished. A | 617 |
| 475 reader starts by reading changelog data, then manifest data, followed | 618 %A writer starts a transaction by writing filelog and manifest data, |
| 476 by filelog data. | 619 %and doesn't write any changelog data until those are finished. A |
| 477 | 620 %reader starts by reading changelog data, then manifest data, followed |
| 478 Since the writer has always finished writing filelog and manifest data | 621 %by filelog data. |
| 479 before it writes to the changelog, a reader will never read a pointer | 622 |
| 480 to a partially written manifest revision from the changelog, and it will | 623 %Since the writer has always finished writing filelog and manifest data |
| 481 never read a pointer to a partially written filelog revision from the | 624 %before it writes to the changelog, a reader will never read a pointer |
| 482 manifest. | 625 %to a partially written manifest revision from the changelog, and it will |
| 483 | 626 %never read a pointer to a partially written filelog revision from the |
| 484 \subsection{Concurrent access} | 627 %manifest. |
| 485 | 628 |
| 486 The read/write ordering and atomicity guarantees mean that Mercurial | 629 %\subsection{Concurrent access} |
| 487 never needs to \emph{lock} a repository when it's reading data, even | 630 \subsection{$BF1;~%"%/%;%9(B} |
| 488 if the repository is being written to while the read is occurring. | 631 |
| 489 This has a big effect on scalability; you can have an arbitrary number | 632 %The read/write ordering and atomicity guarantees mean that Mercurial |
| 490 of Mercurial processes safely reading data from a repository safely | 633 %never needs to \emph{lock} a repository when it's reading data, even |
| 491 all at once, no matter whether it's being written to or not. | 634 %if the repository is being written to while the read is occurring. |
| 492 | 635 %This has a big effect on scalability; you can have an arbitrary number |
| 493 The lockless nature of reading means that if you're sharing a | 636 %of Mercurial processes safely reading data from a repository safely |
| 494 repository on a multi-user system, you don't need to grant other local | 637 %all at once, no matter whether it's being written to or not. |
| 495 users permission to \emph{write} to your repository in order for them | 638 |
| 496 to be able to clone it or pull changes from it; they only need | 639 %The lockless nature of reading means that if you're sharing a |
| 497 \emph{read} permission. (This is \emph{not} a common feature among | 640 %repository on a multi-user system, you don't need to grant other local |
| 498 revision control systems, so don't take it for granted! Most require | 641 %users permission to \emph{write} to your repository in order for them |
| 499 readers to be able to lock a repository to access it safely, and this | 642 %to be able to clone it or pull changes from it; they only need |
| 500 requires write permission on at least one directory, which of course | 643 %\emph{read} permission. (This is \emph{not} a common feature among |
| 501 makes for all kinds of nasty and annoying security and administrative | 644 %revision control systems, so don't take it for granted! Most require |
| 502 problems.) | 645 %readers to be able to lock a repository to access it safely, and this |
| 503 | 646 %requires write permission on at least one directory, which of course |
| 504 Mercurial uses locks to ensure that only one process can write to a | 647 %makes for all kinds of nasty and annoying security and administrative |
| 505 repository at a time (the locking mechanism is safe even over | 648 %problems.) |
| 506 filesystems that are notoriously hostile to locking, such as NFS). If | 649 |
| 507 a repository is locked, a writer will wait for a while to retry if the | 650 %Mercurial uses locks to ensure that only one process can write to a |
| 508 repository becomes unlocked, but if the repository remains locked for | 651 %repository at a time (the locking mechanism is safe even over |
| 509 too long, the process attempting to write will time out after a while. | 652 %filesystems that are notoriously hostile to locking, such as NFS). If |
| 510 This means that your daily automated scripts won't get stuck forever | 653 %a repository is locked, a writer will wait for a while to retry if the |
| 511 and pile up if a system crashes unnoticed, for example. (Yes, the | 654 %repository becomes unlocked, but if the repository remains locked for |
| 512 timeout is configurable, from zero to infinity.) | 655 %too long, the process attempting to write will time out after a while. |
| 513 | 656 %This means that your daily automated scripts won't get stuck forever |
| 514 \subsubsection{Safe dirstate access} | 657 %and pile up if a system crashes unnoticed, for example. (Yes, the |
| 515 | 658 %timeout is configurable, from zero to infinity.) |
| 516 As with revision data, Mercurial doesn't take a lock to read the | 659 |
| 517 dirstate file; it does acquire a lock to write it. To avoid the | 660 %\subsubsection{Safe dirstate access} |
| 518 possibility of reading a partially written copy of the dirstate file, | 661 \subsubsection{$B0BA4$J(Bdirstate$B%"%/%;%9(B} |
| 519 Mercurial writes to a file with a unique name in the same directory as | 662 |
| 520 the dirstate file, then renames the temporary file atomically to | 663 %As with revision data, Mercurial doesn't take a lock to read the |
| 521 \filename{dirstate}. The file named \filename{dirstate} is thus | 664 %dirstate file; it does acquire a lock to write it. To avoid the |
| 522 guaranteed to be complete, not partially written. | 665 %possibility of reading a partially written copy of the dirstate file, |
| 523 | 666 %Mercurial writes to a file with a unique name in the same directory as |
| 524 \subsection{Avoiding seeks} | 667 %the dirstate file, then renames the temporary file atomically to |
| 525 | 668 %\filename{dirstate}. The file named \filename{dirstate} is thus |
| 526 Critical to Mercurial's performance is the avoidance of seeks of the | 669 %guaranteed to be complete, not partially written. |
| 527 disk head, since any seek is far more expensive than even a | 670 |
| 528 comparatively large read operation. | 671 %\subsection{Avoiding seeks} |
| 529 | 672 \subsection{$B%7!<%/$N2sHr(B} |
| 530 This is why, for example, the dirstate is stored in a single file. If | 673 |
| 531 there were a dirstate file per directory that Mercurial tracked, the | 674 %Critical to Mercurial's performance is the avoidance of seeks of the |
| 532 disk would seek once per directory. Instead, Mercurial reads the | 675 %disk head, since any seek is far more expensive than even a |
| 533 entire single dirstate file in one step. | 676 %comparatively large read operation. |
| 534 | 677 |
| 535 Mercurial also uses a ``copy on write'' scheme when cloning a | 678 Mercurial$B$N@-G=$K$O!$%G%#%9%/%X%C%I$N%7!<%/$rHr$1$k$3$H$,IT2D7g$G$"$k!%(B |
| 536 repository on local storage. Instead of copying every revlog file | 679 $B%7!<%/$OBg5,LO$JFI$_=P$7A`:n$HHf3S$7$F$bHs>o$K9b$/$D$/!%(B |
| 537 from the old repository into the new repository, it makes a ``hard | 680 |
| 538 link'', which is a shorthand way to say ``these two names point to the | 681 %This is why, for example, the dirstate is stored in a single file. If |
| 539 same file''. When Mercurial is about to write to one of a revlog's | 682 %there were a dirstate file per directory that Mercurial tracked, the |
| 540 files, it checks to see if the number of names pointing at the file is | 683 %disk would seek once per directory. Instead, Mercurial reads the |
| 541 greater than one. If it is, more than one repository is using the | 684 %entire single dirstate file in one step. |
| 542 file, so Mercurial makes a new copy of the file that is private to | 685 |
| 543 this repository. | 686 $B$=$NM}M3$O!$Nc$($P!$(Bdirstate$B$O(B1$B$D$N%U%!%$%k$KJ]B8$5$l$F$$$k$?$a$G!$(B |
| 544 | 687 Mercurial$B$,DI@W$7$F$$$k(Bdirstate$B%U%!%$%k$,%G%#%l%/%H%jKh$K$"$k(B |
| 545 A few revision control developers have pointed out that this idea of | 688 $B$H!$(BMercurial$B$O%G%#%l%/%H%jKh$K%7!<%/$r9T$&$3$H$K$J$k!%<B:]$K$O(BMercurial |
| 546 making a complete private copy of a file is not very efficient in its | 689 $B$OA4BN$G0l$D$N(Bdirstate$B%U%!%$%k$r%o%s%9%F%C%W$GFI$`!%(B |
| 547 use of storage. While this is true, storage is cheap, and this method | 690 |
| 548 gives the highest performance while deferring most book-keeping to the | 691 %Mercurial also uses a ``copy on write'' scheme when cloning a |
| 549 operating system. An alternative scheme would most likely reduce | 692 %repository on local storage. Instead of copying every revlog file |
| 550 performance and increase the complexity of the software, each of which | 693 |
| 551 is much more important to the ``feel'' of day-to-day use. | 694 %from the old repository into the new repository, it makes a ``hard |
| 552 | 695 %link'', which is a shorthand way to say ``these two names point to the |
| 553 \subsection{Other contents of the dirstate} | 696 %same file''. When Mercurial is about to write to one of a revlog's |
| 554 | 697 %files, it checks to see if the number of names pointing at the file is |
| 555 Because Mercurial doesn't force you to tell it when you're modifying a | 698 %greater than one. If it is, more than one repository is using the |
| 556 file, it uses the dirstate to store some extra information so it can | 699 %file, so Mercurial makes a new copy of the file that is private to |
| 557 determine efficiently whether you have modified a file. For each file | 700 %this repository. |
| 558 in the working directory, it stores the time that it last modified the | 701 |
| 559 file itself, and the size of the file at that time. | 702 Mercurial$B$O%j%]%8%H%j$r%m!<%+%k%9%H%l!<%8$K%/%m!<%s$9$k:]$K(B``$B%3%T!<%*%s%i(B |
| 560 | 703 $B%$%H(B''$B<jK!$r;H$C$F$$$k!%(B |
| 561 When you explicitly \hgcmd{add}, \hgcmd{remove}, \hgcmd{rename} or | 704 |
| 562 \hgcmd{copy} files, Mercurial updates the dirstate so that it knows | 705 |
| 563 what to do with those files when you commit. | 706 %A few revision control developers have pointed out that this idea of |
| 564 | 707 %making a complete private copy of a file is not very efficient in its |
| 565 When Mercurial is checking the states of files in the working | 708 %use of storage. While this is true, storage is cheap, and this method |
| 566 directory, it first checks a file's modification time. If that has | 709 %gives the highest performance while deferring most book-keeping to the |
| 567 not changed, the file must not have been modified. If the file's size | 710 %operating system. An alternative scheme would most likely reduce |
| 568 has changed, the file must have been modified. If the modification | 711 %performance and increase the complexity of the software, each of which |
| 569 time has changed, but the size has not, only then does Mercurial need | 712 %is much more important to the ``feel'' of day-to-day use. |
| 570 to read the actual contents of the file to see if they've changed. | 713 |
| 571 Storing these few extra pieces of information dramatically reduces the | 714 %\subsection{Other contents of the dirstate} |
| 572 amount of data that Mercurial needs to read, which yields large | 715 \subsection{dirstate$B$NB>$NFbMF(B} |
| 573 performance improvements compared to other revision control systems. | 716 |
| 574 | 717 %Because Mercurial doesn't force you to tell it when you're modifying a |
| 575 %%% Local Variables: | 718 %file, it uses the dirstate to store some extra information so it can |
| 719 %determine efficiently whether you have modified a file. For each file | |
| 720 %in the working directory, it stores the time that it last modified the | |
| 721 %file itself, and the size of the file at that time. | |
| 722 | |
| 723 %When you explicitly \hgcmd{add}, \hgcmd{remove}, \hgcmd{rename} or | |
| 724 %\hgcmd{copy} files, Mercurial updates the dirstate so that it knows | |
| 725 %what to do with those files when you commit. | |
| 726 | |
| 727 %When Mercurial is checking the states of files in the working | |
| 728 %directory, it first checks a file's modification time. If that has | |
| 729 %not changed, the file must not have been modified. If the file's size | |
| 730 %has changed, the file must have been modified. If the modification | |
| 731 %time has changed, but the size has not, only then does Mercurial need | |
| 732 %to read the actual contents of the file to see if they've changed. | |
| 733 %Storing these few extra pieces of information dramatically reduces the | |
| 734 %amount of data that Mercurial needs to read, which yields large | |
| 735 %performance improvements compared to other revision control systems. | |
| 736 | |
| 737 %%% Local Variables: | |
| 576 %%% mode: yatex | 738 %%% mode: yatex |
| 577 %%% TeX-master: "00book" | 739 %%% TeX-master: "00book" |
| 578 %%% End: | 740 %%% End: |
