Work: Practical CVS, part 1

Jeffreys Copeland & Haemer

(SunExpert, August 1997)


Last time, we sketched you an RCS overview. In the process, we stumbled over, and pointed out, RCS's single-file myopia. In closing, we promised you a discussion of CVS, a widely used, freely available, extension to RCS, built to handles the file hierarchies that we all use to build products.

Here it comes.


Getting started

The nice thing about RCS is that it's easy to use. For the most part, all you need are two commands, ci and co, which check files in and out. As you'd expect, dealing with trees requires, unavoidably, more work, Still, CVS tries to mirror RCS's simplicity, and doesn't do badly. We'll illustrate this by beginning the same way we did last month.

$ echo "Use the right tool for the job." > jeff
$ cvs ci jeff
cvs commit: No CVSROOT specified!
  Please use the `-d' option
cvs [commit aborted]:
  or set the CVSROOT environment variable.

We want to draw your attention to two noteworthy things about our example:

First, the command we used, cvs ci, is almost the same one we would have used for RCS. There is also a cvs co. While these and other features of CVS use easy-to-remember analogues to RCS, ci is an argument to the command cvs, not a command on its own.

(We could tell you that for RCS you need to learn two commands, but for CVS, you only need to learn one; however, if we could say something like this with a straight face, we'd be in marketing, making a lot more money.)

Second, it didn't work. We did something, that seemed like it might make sense, but it didn't. Normally, we'd shrug, say, ``First time for everything.'' try the exact same thing once or twice (We're not in marketing. We're in software.), and then, when all else failed, we'd read the manual.

In this case, though, CVS told us what to do.

This is an important principle of software design: don't just say what's wrong, say how to fix it. Contrast the SunOS

sed: Unknown flag: X
with the Linux usage message
Usage: sed [-nV] [--quiet] [--silent]
  [--version] [-e script]
  [-f script-file] [--expression=script]
  [--file=script-file] [file...]

This principle has a long history in UNIX, and has been clearly set out on many occasions. Although we normally try to use standards-conforming library functions, we still steer clear of the POSIX getopt() because it doesn't force a usage message.

(In contrast, our perl programs routinely have lines like

getopt('LSMFT') or die $usage;
The module Getopt::Std doesn't automatically emit ``Hey! Don't do that'' messages, so we feel better about using it.)

So let's try taking the advice cvs offers.

$ CVSROOT=/cvs; export CVSROOT
$ cvs ci jeff
cvs commit: cannot open CVS/Entries for reading:
  No such file or directory
cvs commit: nothing known about `jeff'
cvs [commit aborted]:
  correct above errors first!

Progress. Now we're making new mistakes.

We like learning by making mistakes for several reasons. First, we make a lot of mistakes, so it's important to know right from the outset what software will let us shoot ourselves in the foot before we do it. By this assay, CVS turns out to be relatively safe. Second, we like to see and decode as many error messages as possible. We know from experience that we'll see them again; if we don't generate them ourselves by accident, someone else will invariably appear at our door demanding to know what they mean. (We're tempted to call all this ``Learning by not doing,''; however if we yielded easily to temptation, we'd be in politics, making a lot more money.)

So what's really going on here? CVS is designed to let you working with collections of files. To do so, you need to keep those collections in a repository. The environment variable $CVSROOT points at the root of this repository. You can have more than one repository, but each repository can hold many unrelated collections.

The first time we ran cvs, we hadn't designated a repository. Now, we've designated a repository, but the commands cvs ci and cvs co only work on source code collections that have already been put into the repository. To get a collection of files in, you need the command cvs import.

We could try importing the file jeff, but that wouldn't let us show off CVS's ability to deal with file collections, so let's put something bigger in. First, we'll grab a copy of the entire /etc directory:

$ cp -r /etc/ .
Next we'll import it:
$ cvs import etc
cvs [import aborted]: /cvs/CVSROOT:
  No such file or directory
Again, a new error message. This one's telling us that CVS uses a suite of administrative files, all of which it expects to find in the directory $CVSROOT/CVSROOT.

(We think this choice of names is genuinely horrible. We didn't write CVS. We just use it.)

We could show you how to create these by hand, but the command cvs init does the job for you.

$ cvs init
$ ls -RFC $CVSROOT
CVSROOT/

/cvs/CVSROOT:
checkoutlist   cvswrappers,v  loginfo,v rcsinfo
checkoutlist,v editinfo  modules        rcsinfo,v
commitinfo     editinfo,v     modules,v taginfo
commitinfo,v   history        notify         taginfo,v
cvswrappers    loginfo        notify,v  verifymsg
verifymsg,v

We'll postpone explaining what these are, but note for now that almost all of them are under RCS control. Here, ``under RCS control'' turns out also to mean ``under CVS control.'' With a nice self-referential twist, CVS lets you work with its administrative files as a legitimate, CVS collection.

But ``almost all''? CVSROOT/history has no associated RCS files because it's a file containing the entire history of everything done to any repository under $CVSROOT.

It starts out empty.

-rw-rw-r--   1 jsh      rd              0 Jun  2 12:17 cvs/CVSROOT/history
We'll look at this file again after we import etc.

Speaking of which, let's try, try again.

$ cvs import etc
Usage: cvs import [-d] [-k subst]
 [-I ign] [-m msg] [-b branch] [-W spec]
 repository vendor-tag release-tags...
  -d Use the file's modification time
    as the time of import.
  -k sub  Set default RCS keyword
    substitution mode.
  -I ign  More files to ignore (! to reset).
  -b bra  Vendor branch id.
  -m msg  Log message.
  -W spec Wrappers specification line.

We're getting close now. The usage message from cvs import tells us that we're just calling it with the wrong arguments.

What are the arguments? repository is where to put it under $CVSROOT. But where to put what? cvs is built to understand trees. By default, operations are performed on whatever directory you're in and all its subdirectories. If we say cvs import, we'll actually import everything under our current working directory. This means we have to be careful to do a

$ cd etc

The arguments vendor-tag and release-tag are there to let us use CVS to keep track of large releases of software from other people. Most of the time, we're focused on keeping track of large collections of software that we're developing ourselves and want to release. Sometimes, though, we begin with a code base from somewhere else -- a vendor that we've paid to develop something, or even a different branch of our own company. The vendor-tag lets us identify the source.

What about the release-tag? In some situations, we need to be able to handle massive updates from the original vendor. If we begin with a source release from Acme Software, do custom development on the package for six months, and then get an upgrade from Acme, we want a place to store the upgrade, exactly as supplied by the vendor, before we begin merging the Acme's changes into our developing version. With these tags, we can do that.

In our case, we'll call the ``vendor'' Jeff and the release ``initial,'' like this.

cvs import etc Jeff initial

Aha! Suddenly, we're editing a file containing these lines:


CVS: ----------------------------------------------------------------------
CVS: Enter Log.  Lines beginning with `CVS:' are removed automatically
CVS:
CVS: ----------------------------------------------------------------------

This is what CVS does to ask you for a log message. It's the same idea as the RCS prompt

enter log message, terminated with single '.' or end of file:
>>
with three twists:

After the comments go in, and we exit the editor, CVS fills the screen with a series of lines like this:

N etc/passwd
I etc/passwd~
N etc/rmt
I etc/rmt.old

The files marked `N' are new files, put into the repository. The files marked `I' are being ignored. CVS has customizable rules about what files it ignores, but the defaults are so reasonable that we've never had to modify them.

After all this is done, we have a repository, with an RCS file that corresponds to each file we imported.

$ ls $CVSROOT
CVSROOT
etc
$ ls $CVSROOT/etc/pass*
/cvs/etc/passwd,v


Working with files in the repository

The Source Code Motel: Your files check in, but they never check out.
-- Anonymous.

Seems like it took forever, doesn't it? Well, I suppose we could have tried reading the manual first, but that wouldn't have been as fun.

Let's review what we did to get started:

  1. Created a place to store our the repositories, then set $CVSROOT to point at it.
  2. Initialized the $CVSROOT directory with cvs init
  3. Went over to the tree of sources we wanted to check in.
  4. Said the right magic words: cvs import etc Jeff initial
  5. Put in a comment and exited the editor.

Is working with them as much of a hassle as getting them in, or is there a way to get something back out?

Try this:

$ cd ..
$ rm -rf etc   # That's 1705 files -- GONE!!!
    # A power tool is not user-friendly
$ cvs co etc   # Whew.  They're back.
cvs checkout: Updating etc
U etc/aliases
U etc/aliases.db
...
cvs checkout: Updating etc/X11
...
cvs checkout: Updating etc/rc.d
U etc/rc.0
...
cvs checkout: Updating etc/rc.d/init.d
U etc/rc/init.d/httpd

One command lets us check out the entire repository. As many times as we want.

$ cd /tmp
$ cvs co etc
cvs checkout: Updating etc
U etc/X11
...

Is it really all there? Sure.

$ ls -F etc
CVS/
X11/
aliases
aliases.db
...

Wait. What's that directory ``CVS''? A little investigation reveals that every directory in a checked-out hierarchy has one, and that they all contain the same files:

$ ls CVS
Entries
Repository
Root
$ ls X11/CVS
Entries
Repository
Root
$ ls rc/init.d/CVS
Entries
Repository
Root

These files are not in the repository itself. They're administrative files used by CVS, to keep track of where the files came from and what versions were checked out. Here's an example, using the control files in $CVSROOT/CVSROOT

$ cvs co CVSROOT
...
$ cat CVSROOT/CVS/Repository
/woodcock/jsh/RS/work/cvs/cvs/CVSROOT
$ cat CVSROOT/CVS/Entries
/checkoutlist/1.1/Mon Jun  2 18:17:04 1997//
/commitinfo/1.1/Mon Jun  2 18:17:04 1997//
/cvswrappers/1.1/Mon Jun  2 18:17:04 1997//
/editinfo/1.1/Mon Jun  2 18:17:04 1997//
/loginfo/1.1/Mon Jun  2 18:17:04 1997//
/modules/1.1/Mon Jun  2 18:17:04 1997//
/notify/1.1/Mon Jun  2 18:17:04 1997//
/rcsinfo/1.1/Mon Jun  2 18:17:04 1997//
/taginfo/1.1/Mon Jun  2 18:17:04 1997//
/verifymsg/1.1/Mon Jun  2 18:17:04 1997//
D

The Repository and Root files are obvious safeguards. If, while you're working, you change your $CVSROOT to point somewhere else, deliberately or by accident, the information in these directories ensures that any modifications you've made will be put back in the right repository. Do this:

$ unset $CVSROOT
$ echo >> editinfo; echo >> loginfo
$ cvs ci
and you find yourself in the editor facing a screen that looks like this:

CVS: ----------------------------------------------------------------------
CVS: Enter Log.  Lines beginning with `CVS:' are removed automatically
CVS:
CVS: Committing in .
CVS:
CVS: Modified Files:
CVS:    editinfo loginfo
CVS: ----------------------------------------------------------------------

At this point, you insert a comment, explaining why you've put a blank line at the end of these files, and exit the editor. When you do, you'll see something like this:

Checking in editinfo;
/woodcock/jsh/RS/work/cvs/cvs/CVSROOT/editinfo,v  <--  editinfo
new revision: 1.7; previous revision: 1.6
done
Checking in loginfo;
/woodcock/jsh/RS/work/cvs/cvs/CVSROOT/loginfo,v  <--  loginfo
new revision: 1.3; previous revision: 1.2
done
cvs commit: Rebuilding administrative file database

So the Repository and Root files let CVS remember where to check these in to, even though CVSROOT got unset after you checked them out.

Understanding the goal of the Entries file, requires thinking about bigger projects. Imagine, for a moment, a project so large that it has more than one file and more than one person working on it. Call those two people ``Jeff'' and ``Jeff.''

Hmm.

Call those two people ``Zoe'' and ``Gillian.'' Consider the following scenario:


If there were no source code control at all, Gillian's update could overwrite Zoe's update, wiping out Zoe's work.

If you're used to RCS, you may already be saying, ``Only one of them could have checked the file out for editing. That must have been Zoe, so Gillian has to now re-check-out the file for editing.'' But that wouldn't be practical either, because that would mean that one person would be grabbing and releasing locks on perhaps thousands of files, and Zoe and Gillian, and every other developer, would have to keep track of exactly who had the locks at all times.

Under CVS, Gillian, like Zoe, says cvs ci, CVS sees that dotsero has changed, then, before checking it in, looks in the Entries file to see if the version Gillian checked out matches the version at the top of the tree.

Because it doesn't, Gillian gets a message that says that there's a problem, and CVS points her at another command, cvs update, that will help her update her version of all files in the collection, and will help her find and resolve conflicts between her changes and any other changes that have been done since she checked out her base version.


Administrative files

CVS is easy enough to use that if it only extended RCS to let us handle collections of files and track their revisions, we'd be happy.

It turns out, though, that there's a lot more. Some of the things CVS provides are commands and options. To show you how to take a look at what commands and options are available, we'll go back to learning by not doing.

Try typing this:

$ cvs -:
(`:' is not a legal option for any UNIX command we know of, even ls, so we often use it to get commands to give us a usage message.)

(Which reminds us of a joke that re-appeared on the net the other day: The brave knight approached the evil magician at the bridge --- you've seen this Monty Python movie, so you at least know the form of what's coming --- the knight has to answer three questions to cross the bridge, or he will be cast into the abyss. The magician asks: ``what is your name?'' The knight answers: ``Sir Brian of Bell.'' The magician asks: ``What is your quest?'' In a clear, firm voice, the knight answers, ``I seek the Holy Grail.'' The magician, demonstrating just how evil he is, asks, ``What four lowercase alphabetic characters are not legal flag arguments to the Berkeley Unix implementation of ls?'' Sir Brian, of course, hasn't the foggiest idea, which is the end of that particular knight.)

Then try this, just the way that the usage message tells you to:

$ cvs --help-commands

If you want to learn more details about any of these commands, you can experiment with them. Or you can always do this:

$ man cvs

What else? We've mentioned some of the files in $CVSROOT/CVSROOT, like history and rcsinfo, but there's more.

Actually, there's a lot more, so rather than try to tackle it right now, let's wait and talk about it next time. If you want to play with it in the next month, you can get the CVS software at http://www.loria.fr/~molli/cvs-index.html.

Until then, happy trails.

[[Note added much, much later: Thanks to alert reader Mark Hudson at Critical Path in Santa Monica, who noticed two typos in the review section of this column that had gone undetected between initial publication and July 2002. They are now corrected.]]