Updated:

Why Git?

  • Git was created to maintain version control of the Linux operating system kernel
  • It is an important part of software computer systems
  • Good practice for all programming tasks

Version Control Systems

  • Track and control changes over files
  • Use it for:
    • Collaboration
    • Revision history
    • Audit changes
    • Not backup, since the data is lost if the server goes down

Git - Architecture

image image

  • Distributed version control: changes stored remotely by syncing repositories
  • “snapshots” of the current project: directed acrylic graph (DAG)

image

  • 3 steps
    • Working directory: top level folder with single version of the files
    • Staging area: file in .git that records files to be added to next commit
    • .git directory: configuration and repository database
  • Storage
    • Git does not store deltas (changes), it takes snapshots (new copy) of the file system
    • Files that are the same are referenced, not duplicated
    • This is a big difference compared to other VCS
      • Branching is easier and cheaper
    • Git occasionally packs the snapshots into single files, and uses delta compression to save space
      • git gc --aggressive
      • Automatic process
      • Keeping recent files non-delta exploits locality (like chaching)
    • Delta vs Snapshots image
      • Delta: record the changes made
      • Snapshot: taking copy (Git does this, which leads to faster branch merging)

Using git

  • Git - local file structure
    • git init initialises a repo, which creates a .git folder
      • Full history of the files in the repo are saved here
      • Cloning a repo copies .git from the remote repository
  • Git database
    • Putting a value in the database returns a SHA1 hash
    • Key-value database
      • key: SHA1 hash (40 hex-character identifier)
        • provides integrity, but not secure; the purpose of hash is unique association
        • value: stored as blob in the objects folders
  • Staging
    • Process of marking the file as to be included in the next commit
    • Staging command: git add
      • Adds a key-value for the file contents
      • Adds the file path to the staging index
  • Committing
    • The comit itself is a git object
    • Creates a git tree object of the changes, and wraps them in a commit
    • The tree object contains Unix directory entries (permissions, path, blob key)
  • Git log
    • Displays the log of activity
    • Always retrieved locally, since all operations are performed on the local repository
  • Git checkout
    • Checking out: switching the working tree to a particular commit/branch (generally the head)
    • Sets the HEAD to the current branch or commit
      • All subsequent operations are performed on this branch
    • git checkout main
      • Returns to the most recent version
      • main tells it to checkout the HEAD of the main branch
      • When you check out a specific commit you create a detached branch (you are no longer on the main branch)
  • Git HEAD
    • File .git/HEAD contains a reference to the currently checked out commit
    • If it is a commit hash it is a detached branch
    • Normally it is a branch reference (refs/heads/main)
    • Used to determine which branch current changes apply to

Git Branches

Branches are text files pointing to a different commit.

  • Specific commits can be checked out as branches, but generally we branch from an existing branch (initially main)
  • Far more efficient than having to duplicate workspace (SVN)
    • Smaller storage space, quicker to create and merge
  • Recommended workflow is to branch and merge
  • git checkout -b devbranch main
    • Checks out master and creates a new branch: devbranch

Remote usage of git

  • Clone sets the remote references in the git config file
    • Called “origin”
    • Remote references are URLs to your remote repository
  • Pull performs 1. fetch and 2. merge
    • Fetch: retrieves any changes from the remote
    • Merge: combine those changes into local repo
  • Push transfers your local commits to the remote repo

Real world use of Git

  • Tags: used to add more informative names to commits
  • A release is also often stored on a branch:
    • Allows bug fixing to continue without impact on future development
  • Git does not solve inherent conflicts; general workflow is to commit/push often (with small changes)
  • Binary files can be inefficient to store in git
    • File changes are packed using delta compression
    • Binary files typically exhibit large changes
    • Increases the size of the clone operation
  • Forks complete separate copy of repository
    • Link remains to parent
    • Useful for large changes, keep development work out of main repo
    • Remains even if original repo is removed

Continuous integration

  • Uploading a test everytime you do a commit: snippets of code running after push (i.e. test cases)
  • Important part of sw development
  • Continuously build, test, and deploy iterative code Changes
  • Reduce the chance of maintaining test-failing code

Leave a comment