Xiao Peng

My personal blog moves to xiaopeng.me , blogs about design patterns will be synced to here.
肖鹏,ThoughtWorks资深咨询师,目前关注于架构模式、敏捷软件开发等领域,并致力于软件开发最佳实践的推广和应用。
多次为国内大型企业敏捷组织转型提供咨询和培训服务,在大型团队持续集成方面具有丰富的经验。
  博客园  :: 首页  :: 联系 :: 订阅 订阅  :: 管理

So I'd like to tell another story -- a bigger team with 300+ developers and 100+ testers. Stories are different, something are common.

When we arrived, they were using ClearCase. As you know, we hate CC (not CruiseControl). But we decided not changing it at first. The team had some problems. They knew that were their problems. But if you changed the SCM tools, and in case the problems got any worse, they would be the consultants' problem. Instead we used it, cautiously. 

We chose a small team, found a good code base line (that compile, pass some tests. Don't be surprised, it's not so easy to find a version that compiles. ), and created a branch for that team, so that other teams would not infect them. 

Then we added test script to that branch. Don't get me wrong. The test script were already there, just in another system. They had some mechanisms to keep the mapping between code revisions and test script suites. We decided to replace it with simply putting them together.

We helped the team build a continuous integration environment. (Normally, scm migration is part of the CI process improvement.) It was not so easy, but since the code compiles, we made it.

We trained the team of how to work in a CI process. Many teams don't know what's the right way to work under CI process (though Martin published it several years ago). They modify the code and submit and pray that the code would pass the commit build. We tought them, update, modify code, personal build, update, second person build, commit, commit build. 

Till now, two or three week passed. We had showcase each two weeks, in which we will told the client what we had do in the last two weeks and what we would do in the next two weeks. Most team leaders attended the showcase. As the showcase was on-site. They could see how that team work.

And we started the MIGRATION. In the real case, it's a BANG! We told the client, we would help them switch to SVN. And the next morning, they told us all teams (400+ people) had switched to SVN (they OT much more than TWers). We all were astonished!

But everything seemed OK. They had one general CMO, and some developers and testers, who had got trained before our coming. And the had a guide document, telling them how to update, and commit (they use tortoise svn as their client). We admitted that it was not our plan to switch in one night. We even had a plan, in which we designed a sandbox period. But everything just went well. The migration is sort of simple. They found a good code base from OUR TEAM, 'cause we have told them code and test code (script) must be on the same branch. Then they created a repository and commit the code in, send out a mail told everyone to install tortoise svn and check out the code to their computer. The branching strategy is what we showed in a former showcase. The simplified branches is as below (since there were more than 20 teams, the branches strategy is 3-layer, not 2-layer as below. The publish branch are tags, not real branch. We call the strategy as stable main line with active braches ):

   __ __ __ __ __ __ __ __ __ __ __ __ __(publish branch)

  /-

/

========================================(integration branch)
| \
|  \____________________________________(branch of team A)

\

\______________________________________(branch of team B)

Since they started using SVN, we were forced to start the training at once. What we told most was about merging. They used to use beyond compare to merge changes between branches. We told them as long as you do as we told, you could let SVN to merge them with good confidence. 

About CI, there are still much to say, but for SCM migration, I didn't see magic trick or disasters.

Q&As:

Specifically, how did you get the devs trained up? 

They trained some people, at least one in each team. And then publish a guide document. When we trained them, we used the same strategy. Since SVN is not so complex as ClearCase, it's not a big deal. What is more important is the process, described as before. In another consulting project, we had some internal coaches. We had them coach each developer (tester) when they wanted to commit in code in the beginning period.

How are they finding the transition? 

TW has high reputation. Since all TWers keep talking the benefit of atomic committing, optimistic locking, easy branching, some will try it and more will see it works.

What are the teething troubles of running SVN for that many people? 

1. Log. Some (even some of ourselves) will challenge of the missed logs. I believe it will be a nightmare, if you try to import the history of each file to svn. So convince them, they are not that important, or just leave an approach for accessing the old logs.

2. Merging. People keep doing some wrong with SVN, especially merging two branches in hard way.

3. SVN is not perfect. Sometime people broke their branches. Let's say, it's just a folder in svn. Creating a new branch is often the easiest way to fix it. I had spend one day to fix a strange problem, and then some other SVN expert another day. At last, creating a new branch, spent only several minutes. Of course, since client pay us by time, it depends.

4. Cleaning. Jeff said enough about it.

5. Some kind of files need pessimistic, such as word document. Don't mess them up, or you will be sued.

6. Big binary file could not be put into svn if they get changed a lot. As an exception, putting the whole CI environment into SVN is a very good idea. As someone told, perforce is good at handling big binary files.

7. Authentications. I don't have much experience on it. In my projects, they use the simple access control model based on folder control.

8. Some may delete .svn file. Stop them.

9. Teach them to write comments in right format. Add a hook to svn, so that code can't be commited without comments.

How have you split the repos amongst teams etc?

In fact, we don't do much on this. But we see that, their SVN supporting team set up several Linux servers and run a instance on each sever. More than one project's code are hosted in one SVN instance. They create a folder for one project. And truck/branches/tag folders for one project are created as sub folders. In fact, I know some team set up their own svn server, which I think is the most scalable way:), only if they can make sure the data are safe.

Why not Hg or Git?

For me, I think the learning curve is steep. And for some reason, having a local repository on developers' computer is dangerous in some managers' mind. The third reason is that I still don't know how to host more than one projects in one Hg/Git instance.

A question, does anyone know how GitHub manage projects?