Category: source code control

Total 7 Posts

Video Notes – Tools for Continuous Integration at Google Scale (GTAC2010)

Tools for Continuous Integration at Google Scale (GTAC2010)
Nathan York
29 min, 55 sec
http://www.youtube.com/watch?v=b52aXZ2yi08

Slide Notes:

  • Software Engineering Gap – lots of work on platforms and compilers, lots of work on apps. Middle (build systems, etc.) is often ignored (reach good enough and then move on)
  • Common Build System Issues – Incorrect, Slow, Cumbersome, Under-maintained
  • Why Build Systems Matter – Engineer Productivity, All about feedback
  • The Challenge At Google – 6000 engineers and one code base, everything built from source, development on mainline, extensive automated testing
  • Rough Developer Workflow (flow chart)
  • Better Build System Needed – optimized and tuned build languages, dependency analysis and scheduling, leverage infrastructure. Must be correct AND fast.
  • Inputs, Outputs and Actions – Content addressable storage (by digest of content), use relative paths, eliminate global state
  • Scaling Source Code Access – FUSE based file system. Most code needed for read only, on-demand syncing and caching, all source in the cloud, content digests as metadata
  • Making Builds Fast – Distributed builds in the cloud – built in arbitrary location
  • Scalable Distributed Builds – Caching key to scalable build. If inputs (from digest) and actions are same as previous, return prior result.
  • Scaling Build Outputs – FUSE based file systems, all output in the cloud, shared across builds and users
  • System View – builds appear local but are in the cloud
  • Platform for Automated Testing – Executing a test is just another build action.
  • Results – 20+ code changes per minute, 65K builds per day, 10000 CPUs, 50 TB memory, ~1PB output every 7 days, 94% cache hit rate.
  • Estimating build tool savings 2008 to 2009: Saving ~600 person years
  • Conclusion: build system is a core component of software engineering
  • Questions

 

Send a Subversion Change Log from a Hudson build

For some time, I wanted to send out a Subversion change log from Hudson after a successful build showing all of the changes since the last build. (This is something I think Hudson should support natively – you can see a change log via the UI.) After some research, I came across Using Groovy with Hudson to send rich text email posted by "Chetan".

This solution works like a champ and is recommended. Two points – 1) if you are using Subversion, see the comments for the email template code for displaying the subversion log in the email – the main template code is not for Subversion; 2) I had to change ${fileEntry.editType} which displayed an object reference to ${fileEntry.editType.name} to display whether the file was edited, added or deleted.

 

Hudson – Continuous Integration Tool

In my current position, we use perl extensively. There isn’t much of a build process — check out the code, set a couple variables, run a suite of tests. (There is always the perl -c command line option which will perform a syntax check but I want something more exhaustive.) A former colleague of mind pointed me at Hudson, a tool they recently started using. They are a java shop and Hudson is definitely aimed at java shops. Below are the details of my initial experience with Hudson.

Proof of Concept

I downloaded the Hudson WAR file (version 1.176) to my desktop machine. From a command line prompt, I launched Hudson with the command java -jar hudson.jar. I minimized the command prompt window and launched my browser to the default http:\\localhost:8080 which displays the Hudson configuration pages. Here, I created a new job and filled in the following:

  • Project Name
  • Project Description
  • Subversion Repository URL
  • WebSVN URL
  • Set up poll SCM (check for updates) on once a minute schedule)
  • Selected execute Windows batch commands with the necessary commands to run my perl tests
  • Select email notification

That is pretty much what ot took to get up and running. I had a few mistakes in the commands under “Windows Batch Commands” but once that was complete, I was able to get a successful build. (Here successful is defined as the last command executed returning with a non-zero exit code.) The email did not work at first. I needed to go back to the home page and select “Manage Hudson”, “System Configuration” to then set up my email server credentials.

I thought all was well until my first unattended build. The build failed with the message “ERROR: Failed to update…” from the Subversion checkout. I did some research and found some references to the same error in one of the packages that Hudson uses. There was a suggested fix but it required I code change. I filed a bug and a new build was available within a couple of days which fixed my problem.

Move into Production

After running the build on my local machine for a few days, I decided it was worth moving into our normal processes. I installed it on a Windows server. There is a good description on how to setup Hudson as a Windows service in the Hudson wiki.

We have been running Hudson now for a few weeks and I am very pleased. Highly recommended.

 

svn update threw away my changes

With a note of panic, I received the message that svn update had thrown away a developer’s changes. He had attached a log showing the commands he had issued. Sure enough, instead of seeing a merge, the file was updated and his changes were gone. The only thing not in the log I wish was there was a svn status command.

I was concerned but not overly worried. Since we converted from CVS to Subversion, there had not been any major issues. No one else had reported these problems. Obviously, if this were a bug in Subversion, it would be all over the web. There was something here that was missing in the description. I needed more details. I arranged a meeting to investigate.

What we discovered was enlightening and led us to a better understanding of how Subversion works. After discussing the situation, below is what we discovered had happened. For the purpose of the description, “Developer A” is the person who reported the issue. “Developer B” is another person on the team.

Developer A and Developer B’s working copies are both at revision N.

Developer A commits a change. (His working copy is now at revision N+1.)

Developer A and Developer B discuss and agree that the change needs to be rolled back.

(It is a little fuzzy here what happened. However, the essentials are correct)

Developer A “undoes” his change manually but does not commit the change. (His working copy is still a revision N+1.)

Developer B updates his working copy (now at revision N+1). Developer B issues the reverse merge command to undo the change introduced by Developer A. Developer B commits the change. (His working copy and the repository are now at N+2.)

(Time passes…)

Developer A and Developer B discuss and realize that the original change (committed at revision N+1) does need to be made.

(Here is where the confusion occured)

Developer B (whose working copy is still at revision N+1 while the repository is at N+2) starts to work on the change. Without updating his working copy, he reintroduces the change manually and does a svn update

At this point, svn “throws away” Developer A’s change.

If you are familiar with Subversion, you already know why. If Developer A had issued the following commands, he would see what happened:

(Working copy at revision N.)

Make changes.

svn commit

(Working copy at revision N+1.)

Manually undo the changes.

svn status filename
(The command indicates that filename has changed from the base revision – N+1)

Manually redo changes.

svn status filename
(Indicates no change in the file. Even though it has been changed twice since the last commit, it is now identical to the file at the time of the last commit in this working copy.)

svn update filename
(Subversion checks to see if the file is different from the base revision – N+1. It isn’t. No merging of changes need to be made. Go ahead and update the file to the state it exists at revision N+2 thus “throwing away” the changes the developer had introduced.

 

svn log – where are the other changes?

Our CVS to Subversion conversion is complete. Now, we are in the process of settling into new working habits and learning the differences between CVS and svn. One of the questions last week was why do I not see the all of the changes when I run svn log?

Luckily, I had anticipated this question. cvs log will display all of the log messages for a file in both the trunk and all of the branches. svn log will display the log messages for the current code path. You will not see changes in branches that are unrelated to the history of the file. There is a command line option, --stop-on-copy, that will stop the log messages at the point where a branch was created. This change in behavior is a welcome change and most people find it intuitive.

Unfortunately, this was not the issue that inspired the question. The developer wanted to know why he did not see a change he knew had been checked in. cvs log would show the change by default. What I discovered was that the developer had not done an update in his repository. By default, svn log does not show revisions after the state of the current repository. This feels non-intuitive to me and was something I did not expect. You can get a full log (including changes made after the last update) by using the -r option.

svn log -r 1:HEAD file   Displays all log messages in file history from oldest to newest.
svn log -r HEAD:1 file   Displays all log messages in file history from newest to oldest.
svn log -r BASE:HEAD file   Displays log messages since last svn update from oldest to newest.
svn log -r HEAD:BASE file   Displays log messages since last svn update from newest to oldest.

 

Minor Problems During CVS to Subversion Conversion

This past weekend, I converted our legacy CVS repository over to Subversion maintaining the last 5 years worth of history. Overall, I am very pleased with the process and the end result. Over the past few weeks, I had done portions of the conversion into a test repository and worked out most of the kinks.

Here is the process we used:

  • Set the CVS repositories to read only (using the readers / writers configuration files)
  • Export each CVS module using the cvs2svn.py script provided by Subversion
  • Create any directories in Subversion needed to receive the exported CVS data
  • Import in the files created by cvs2svn.py

Most of these steps were scripted and ran unattended with me checking on them occasionally.

As with all plans, a couple of minor issues crept up. The export using cvs2svn.py went smoothly (it started at 7:30 pm Friday and ended sometime around 2:00 am on Saturday). When I examined the output of the scripts, I discovered 3 exports failed. One because I misspelled the name of the CVS module directory. The other 2 because of an error of the form “A CVS repository cannot contain both repo/path/file.txt,v and repo/path/Attic/file.txt,v”. The cvs2svn FAQ discusses this error. After looking at the files in question, I chose option 4 – rename the attic version and was able to re-export.

I also noticed that some of the exports did not result in a file being created. Looking at these, I discovered that while the module directories I was exporting from CVS existed, they were empty. I could safely ignore these.

All of my imports failed due to the directory specified not existing in the repository. I thought I had accounted for this by including the appropriate svn mkdir command before the import. Looking through the logs I discovered that all of the mkdir commands failed due to not having the proper authorization credentials. I was running my scripts in an account where I did not have credentials cached. I added the appropriate --username and --password items and let the script run.

All in all, most of the imports ran successfully and we have had no issues since starting to use the repository. Most of the team is pleased with the results of the conversion and the new tools. All in all, a good experience.

 

CVS to Subversion: Line Endings and Bad Binaries

We are in the process of converting our CVS repository to subversion. The provided cvs2svn script works really well. As part of the conversion, we did a dry run and have been performing a variety of tests to verify that everything converted correctly. One of the tests we performed was to check out an arbitrary branch from both the CVS and SVN repository and compare the contents.

The comparison did not go as smoothly as I hoped. Quite a few of the files did not match. The investigation resulted in a change we needed to make to the CVS repository and the discovery of a nice feature of subversion.

It ended up that our CVS repository had multiple binary files that were not marked as binary. There was always a chance that these files would be corrupted when checked out of CVS but we never encountered this. (This is not entirely true, we never encountered any corruption issues when building on Unix. However, we had seen issues when building on Windows.) After correcting the keyword expansion setting, the conversion process worked fine.

The second issue had to deal with DOS line endings vs. Unix line endings. It appears that subversion does a better job handling the various line endings and converting them appropriately. While this made the comparing files difficult, the end result is very beneficial.