class: center, middle, inverse, title-slide .title[ # Getting the right tools for the job ] .subtitle[ ## R, RStudio, git, and GitHub ] .author[ ### Ernest Guevarra ] .date[ ### 11 October 2024 ] --- # Outline 1. What is R? 2. Why use R? 3. What is RStudio? 4. Why use RStudio? 5. What is Git and GitHub? 6. Why use Git and GitHub? --- # What is R? .pull-left[ * `R` is a simple but powerful *programming language* * `R` is a system for *data manipulation*, *calculation*, and *graphics*. It provides: * Facilities/functions for data handling and storage; * A large collection of tools for data analysis; and, * Graphical facilities for data analysis and display. * `R` is a programming language that provides statistical functions as part of a broader programming tool. ] .pull-right[ .center[![](https://upload.wikimedia.org/wikipedia/commons/thumb/1/1b/R_logo.svg/724px-R_logo.svg.png)] ] ??? R is not a statistical package/software as such in the same way as STATA or SPSS. R is a general programming language that has a sophisticated, well-developed, and well-maintained library of statistical tools and functions. --- # Why use R? .pull-left[ .center[![](https://user2021.r-project.org/img/artwork/user-logo-color.png)] ] .pull-right[ * It is an ***open source system*** and is available for ***free***. Even though free, R is ***more powerful than most commercial packages***. * Considerably ***more flexible than statistical packages*** that relies on menus, buttons, and boxes. * Every stage of your data management and analysis can be recorded and edited and re-run at a later date. * huge user and developer community. * has a robust set of user- and community-developed packages that support statistical methods and techniques. ] --- # What is RStudio? .pull-left[ * An ***integrated development environment (IDE)*** for R. RStudio is not R. RStudio is a tool for interfacing with R. * Includes a ***console***, ***syntax-highlighting editor*** that supports direct code execution, as well as ***tools for plotting, history, debugging and workspace management***. ] .pull-right[ .center[![](https://upload.wikimedia.org/wikipedia/commons/thumb/d/d0/RStudio_logo_flat.svg/1920px-RStudio_logo_flat.svg.png?20190314020554)] ] --- # Why use RStudio? * RStudio is designed to make it easier to work with R * RStudio facilitates creation of project-orientated workflows * RStudio makes it convenient to view and interact with the objects in your environment --- # What is git? .pull-left[ * Free and open source distributed **version control system** * Built for software development for a group of developers to work collaboratively and to manage the evolution of a set of files - like *"Track Changes"* in Microsoft Word on steroids! * Has been re-purposed to manage a collection of files that make up a typical data analytical project that consists of data, figures, reports, and source code ] .pull-right[ [.center[![](https://git-scm.com/images/logos/downloads/Git-Logo-2Color.png)]](https://git-scm.com) ] ??? Git is a version control system. Its original purpose was to help groups of developers work collaboratively on big software projects. Git manages the evolution of a set of files – called a repository – in a sane, highly structured way. Think of it as the “Track Changes” features from Microsoft Word on steroids. Git has been re-purposed by the data science community. In addition to using it for source code, it has been used to manage the motley collection of files that make up typical data analytical projects - data, figures, reports, and, source code. --- # Why use git? .pull-left[ .center[![](images/phdcomics-filenames.gif)] ] .pull-right[ ### Version control * Is the only reasonable and sane way to keep track changes in source code, manuscripts, presentations, and data analysis projects * Documentation of differences between versions * Exploration of differences between versions ] ??? Version control is the only reasonable way to keep track of changes in code, manuscripts, presentations, and data analysis projects. We are all used to or familiar with making numbered filenames for a project. But exploring the differences is difficult, to say the least. git in its very essence a powerful version control system. When used properly, the documentation of each small change across all your files is facilitated and made easier through git. And this documentation makes exploration of differences in versions easier and intelligible. --- # Why use git? .pull-left[ .center[![](images/why_repro_research1.gif)] ] .pull-right[ ### Communication and collaboration * **Communicating** one's research project with other people is part of the scientific process - not just results but the whole process * **Collaborating** with others on each other's research project allows us to build on each other's past work, using them for a different context/problem, or re-purposing them to come up with a new approach/solution * Communication and collaboration on various aspects of the scientific process is facilitated by using git ] ??? Merging collaborators’ changes made easy. Have you ever had to deal with a collaborator sending you modifications distributed across many files, or had to deal with two people having made changes to the same file at the same time? Painful. git merge is the answer. --- # What is GitHub and Why use GitHub? .pull-left[ [.center[![](https://avatars.githubusercontent.com/u/583231?v=4)]](https://github.com) ] .pull-right[ * Service provider of hosting for software development and version control using git * Offers distributed version control and source code management functionality of git, plus its own features such as bug tracking, feature request, task management, continuous integration and wikis for every project * Like *facebook* but for programmers * Facilitates *"openness"* of **Open Source** * Lowers the barriers to collaboration ] ??? Github is like facebook for programmers. Everyone’s on there. You can look at what they’re working on and easily peruse their code and make suggestions or changes. It’s really open source. “Open source” is not so open if you can’t easily study it. With github, all of the code is easily inspected, as is its entire history. Github lowers the barriers to collaboration. It’s easy to offer suggested changes to others’ code through github. You don’t have to set up a git server. It’s surprisingly easy to get things set up. --- background-color: #FFFFFF background-image: url(images/pcbi.1004947.g001.jpg) background-size: 50% # git and GitHub .footnote[Taken from Perez-Riverol, Y., Gatto, L., Wang, R., Sachsenberg, T., Uszkoreit, J., Leprevost, F., Fufezan, C., Ternent, T., Eglen, S. J., Katz, D. S., Pollard, T. J., Konovalov, A., Flight, R. M., Blin, K., & Vizcaíno, J. A. (2016). Ten Simple Rules for Taking Advantage of Git and GitHub. PLoS computational biology, 12(7), e1004947. https://doi.org/10.1371/journal.pcbi.1004947] --- class: inverse, center, middle # Questions? --- class: inverse, center, middle # Practical session ## Basic git operations with RStudio for retrieving and submitting assignments via GitHub Classroom --- class: inverse, center, middle # Thank you! Slides can be viewed at https://oxford-ihtm.io/open-reproducible-science/session1.html PDF version of slides can be downloaded at https://oxford-ihtm.io/open-reproducible-science/pdf/session1-getting-the-right-tools.pdf R scripts for slides available [here](https://github.com/OxfordIHTM/open-reproducible-science/blob/main/session1.Rmd)