Source Control - Git (COMP30023 W1 L2)
Updated:
Why Git?
- Git was created to maintain version control of the Linux operating system kernel
- It is an important part of software computer systems
- Good practice for all programming tasks
Version Control Systems
- Track and control changes over files
- Use it for:
- Collaboration
- Revision history
- Audit changes
- Not backup, since the data is lost if the server goes down
Git - Architecture
- Distributed version control: changes stored remotely by syncing repositories
- “snapshots” of the current project: directed acrylic graph (DAG)
- 3 steps
- Working directory: top level folder with single version of the files
- Staging area: file in .git that records files to be added to next commit
- .git directory: configuration and repository database
- Storage
- Git does not store deltas (changes), it takes snapshots (new copy) of the file system
- Files that are the same are referenced, not duplicated
- This is a big difference compared to other VCS
- Branching is easier and cheaper
- Git occasionally packs the snapshots into single files, and uses delta compression to save space
git gc --aggressive
- Automatic process
- Keeping recent files non-delta exploits locality (like chaching)
- Delta vs Snapshots
- Delta: record the changes made
- Snapshot: taking copy (Git does this, which leads to faster branch merging)
Using git
- Git - local file structure
git init
initialises a repo, which creates a.git
folder- Full history of the files in the repo are saved here
- Cloning a repo copies
.git
from the remote repository
- Git database
- Putting a value in the database returns a SHA1 hash
- Key-value database
- key: SHA1 hash (40 hex-character identifier)
- provides integrity, but not secure; the purpose of hash is unique association
- value: stored as blob in the objects folders
- key: SHA1 hash (40 hex-character identifier)
- Staging
- Process of marking the file as to be included in the next commit
- Staging command:
git add
- Adds a key-value for the file contents
- Adds the file path to the staging index
- Committing
- The comit itself is a git object
- Creates a git tree object of the changes, and wraps them in a commit
- The tree object contains Unix directory entries (permissions, path, blob key)
- Git log
- Displays the log of activity
- Always retrieved locally, since all operations are performed on the local repository
- Git checkout
- Checking out: switching the working tree to a particular commit/branch (generally the head)
- Sets the HEAD to the current branch or commit
- All subsequent operations are performed on this branch
git checkout main
- Returns to the most recent version
main
tells it to checkout the HEAD of the main branch- When you check out a specific commit you create a detached branch (you are no longer on the main branch)
- Git HEAD
- File
.git/HEAD
contains a reference to the currently checked out commit - If it is a commit hash it is a detached branch
- Normally it is a branch reference (
refs/heads/main
) - Used to determine which branch current changes apply to
- File
Git Branches
Branches are text files pointing to a different commit.
- Specific commits can be checked out as branches, but generally we branch from an existing branch (initially main)
- Far more efficient than having to duplicate workspace (SVN)
- Smaller storage space, quicker to create and merge
- Recommended workflow is to branch and merge
git checkout -b devbranch main
- Checks out master and creates a new branch:
devbranch
- Checks out master and creates a new branch:
Remote usage of git
- Clone sets the remote references in the git config file
- Called “origin”
- Remote references are URLs to your remote repository
- Pull performs 1. fetch and 2. merge
- Fetch: retrieves any changes from the remote
- Merge: combine those changes into local repo
- Push transfers your local commits to the remote repo
Real world use of Git
- Tags: used to add more informative names to commits
- A release is also often stored on a branch:
- Allows bug fixing to continue without impact on future development
- Git does not solve inherent conflicts; general workflow is to commit/push often (with small changes)
- Binary files can be inefficient to store in git
- File changes are packed using delta compression
- Binary files typically exhibit large changes
- Increases the size of the clone operation
- Forks complete separate copy of repository
- Link remains to parent
- Useful for large changes, keep development work out of main repo
- Remains even if original repo is removed
Continuous integration
- Uploading a test everytime you do a commit: snippets of code running after push (i.e. test cases)
- Important part of sw development
- Continuously build, test, and deploy iterative code Changes
- Reduce the chance of maintaining test-failing code
Leave a comment