class: center, middle, inverse, title-slide .title[ # git and GitHub for use with R ] .subtitle[ ## Tools for versioning and sharing research ] .author[ ### Ernest Guevarra ] .date[ ### 2024-01-29 ] --- # Outline 1. What is git? Why use git? 2. What is GitHub? Why use GitHub? 3. git and GitHub 3. git integration with RStudio 4. Practial session --- # What is git? .pull-left[ * Free and open source distributed **version control system** * Built for software development for a group of developers to work collaboratively and to manage the evolution of a set of files - like *"Track Changes"* in Microsoft Word on steriods! * Has been re-purposed to manage a collection of files that make up a typical data analytical project that consists of data, figures, reports, and source code ] .pull-right[ [.center[![](images/Git-Logo-2Color.png)]](https://git-scm.com) ] ??? Git is a version control system. Its original purpose was to help groups of developers work collaboratively on big software projects. Git manages the evolution of a set of files – called a repository – in a sane, highly structured way. Think of it as the “Track Changes” features from Microsoft Word on steroids. Git has been re-purposed by the data science community. In addition to using it for source code, it has been used to manage the motley collection of files that make up typical data analytical projects - data, figures, reports, and, source code. --- # Why use git? .pull-left[ .center[![](images/phdcomics-filenames.gif)] ] .pull-right[ ### Version control * Is the only reasonable and sane way to keep track changes in source code, manuscripts, presentations, and data analysis projects * Documentation of differences between versions * Exploration of differences between versions ] ??? Version control is the only reasonable way to keep track of changes in code, manuscripts, presentations, and data analysis projects. We are all used to or familiar with making numbered filenames for a project. But exploring the differences is difficult, to say the least. git in its very essence a powerful version control system. When used properly, the documentation of each small change across all your files is facilitated and made easier through git. And this documentation makes exploration of differences in versions easier and intelligible. --- # Why use git? .pull-left[ .center[![](images/why_repro_research1.gif)] ] .pull-right[ ### Communication and collaboration * **Communicating** one's research project with other people is part of the scientific process - not just results but the whole process * **Collaborating** with others on each other's research project allows us to build on each other's past work, using them for a different context/problem, or re-purposing them to come up with a new approach/solution * Communication and collaboration on various aspects of the scientific process is faciliated by using git ] ??? Merging collaborators’ changes made easy. Have you ever had to deal with a collaborator sending you modifications distributed across many files, or had to deal with two people having made changes to the same file at the same time? Painful. git merge is the answer. --- # What is GitHub and Why use GitHub? .pull-left[ [.center[![](images/Octocat.png)]](https://github.com) ] .pull-right[ <!--- .center[![](images/GitHub_Logo.png)] ---> * Service provider of hosting for software development and version control using git * Offers distributed version control and source code management functionality of git, plus its own features such as bug tracking, feature request, task management, continuous integration and wikis for every project * Like *facebook* but for programmers * Facilitates *"openness"* of **Open Source** * Lowers the barriers to collaboration ] ??? Github is like facebook for programmers. Everyone’s on there. You can look at what they’re working on and easily peruse their code and make suggestions or changes. It’s really open source. “Open source” is not so open if you can’t easily study it. With github, all of the code is easily inspected, as is its entire history. Github lowers the barriers to collaboration. It’s easy to offer suggested changes to others’ code through github. I was able to fix a mistake in the phobos library for the D programming language, because it’s hosted on github. I fixed some problems in some very useful code developed by someone I don’t know, because it’s hosted on github. You don’t have to set up a git server. It’s surprisingly easy to get things set up. --- background-color: #FFFFFF # git and GitHub .center[![](images/git_and_github01b.png)] --- background-color: #FFFFFF background-image: url(images/pcbi.1004947.g001.jpg) background-size: 50% # git and GitHub .footnote[Taken from Perez-Riverol, Y., Gatto, L., Wang, R., Sachsenberg, T., Uszkoreit, J., Leprevost, F., Fufezan, C., Ternent, T., Eglen, S. J., Katz, D. S., Pollard, T. J., Konovalov, A., Flight, R. M., Blin, K., & Vizcaíno, J. A. (2016). Ten Simple Rules for Taking Advantage of Git and GitHub. PLoS computational biology, 12(7), e1004947. https://doi.org/10.1371/journal.pcbi.1004947] --- # git integration with RStudio .pull-left[ * [RStudio](https://www.rstudio.com/products/rstudio/) is a popular integrated development environment (IDE) for [R](https://cran.r-project.org) * [RStudio](https://www.rstudio.com/products/rstudio/) has built-in facilities for [git](https://git-scm.com/) and [GitHub](https://github.com) * Within [RStudio](https://www.rstudio.com/products/rstudio/), one can create an [RStudio](https://www.rstudio.com/products/rstudio/) project (a directory with some special files to describe specific [RStudio](https://www.rstudio.com/products/rstudio/) options) which becomes your git repository * One can easily turn a current git repository into an [RStudio](https://www.rstudio.com/products/rstudio/) project. ] .pull-left[ .center[![](images/RStudio-Logo.png)] ] --- class: inverse, center, middle # Questions? --- # Practical session * [Register a GitHub account](https://happygitwithr.com/github-acct.html) * [Install or upgrade R and RStudio](https://happygitwithr.com/install-r-rstudio.html) * [Install git](https://happygitwithr.com/install-git.html) * [Introduce yourself to git](https://happygitwithr.com/hello-git.html) * [Personal access token for HTTPS](https://happygitwithr.com/https-pat.html) * [Connect RStudio to git and GitHub](https://oxford-ihtm.io/ihtm-handbook/connect-rstudio-github.html) --- # Practical topics * [RStudio, git, and GitHub process for participating in an R-based scientific project/workflow](https://oxford-ihtm.io/ihtm-handbook/participate-projects.html) * [RStudio, git, and GitHub process for initiating your own R-based scientific project/workflow](https://oxford-ihtm.io/ihtm-handbook/initiate-projects.html) --- background-color: #FFFFFF background-image: url(images/git_process1.png) background-size: 85% ## Participating in an R-based scientific project/workflow .footnote[see details of this process in this Chapter of the IHTM handbook - https://oxford-ihtm.io/ihtm-handbook/participate-projects.html] ??? * see details of this process in this Chapter of the IHTM handbook - https://oxford-ihtm.io/ihtm-handbook/participate-projects.html --- background-color: #FFFFFF background-image: url(images/git_process2.png) background-size: 70% ## Initiating your own R-based scientific project/workflow .footnote[see details of this process in this Chapter of the IHTM handbook - https://oxford-ihtm.io/ihtm-handbook/initiate-projects.html] ??? * see details of this process in this Chapter of the IHTM handbook - https://oxford-ihtm.io/ihtm-handbook/initiate-projects.html --- class: inverse, center, middle # Thank you! Slides can be viewed at https://oxford-ihtm.io/open-reproducible-science/session7.html PDF version of slides can be downloaded at https://oxford-ihtm.io/open-reproducible-science/pdf/session7-git-and-github-with-r.pdf R scripts for slides available [here](https://github.com/OxfordIHTM/open-reproducible-science/blob/main/session7.Rmd)