Functions: Theory and Application (in R)
Athit Kao, PhD
UCI Bioinformatics Support Group - April 25, 2018
whoami
R enthusiast
PhD, Biomedical Sciences, UCI
IGB Biomedical Informatics Training program alumnus
Focus on proteomics mass spectrometry and bioinformatics
Prerequisites
Understand content from previous deck: http://learn.athitkao.com/presentation_functions1.html
Basic programming experience necessary
R and RStudio installed
Consider a simple use case for programming in your field, e.g.:
- Filter specific data from file
- Munge proprietary data format into CSV
- Calculating RPKM for sequencing data
Readability, part II
- Encapsulation via functions is an aspect of code legibility
- Others include:
- Documentation (comments and READMEs)
- Formatting (indentation and line wrapping)
- Variable naming (obvious and consistent names)
- There is a balance appropriate for you/your audience:
Generalized Concept of Functions
Anonymous Functions (in lapply)
Multiple Arguments (in lapply)
- First argument always the iterated variable
- Must explicitly name additional arguments
The Assignment Operator
- Use “<-” or “=”?
- I use “=” for consistency (with other programming languages) with no issues
- Consistency even within R
Environments and Variable Scoping in R
- Scoping refers to the visibility of variables in different environments
- Global: Can be referenced from anywhere
- Local: Accessible only within its environment
- http://adv-r.had.co.nz/Environments.html
- R uses “lexical” (a.k.a. static) scoping rules
Microsoft Excel
Pros
- Quick editing and prototyping
- WYSIWYG plots
- Widely known and used program
- Many similar alternatives (e.g. Sheets, Calc, Numbers, etc.)
- Programming/automation with Visual Basic
Cons
- Limit of 1M rows by 16K columns
- 32bit version has 2GB file limit
- Manual editing error-prone
- Complex visualizations not possible
- Third-party packages close to non-existent (e.g. advanced statistics, machine learning, etc.)
- Closed source
R Programming Language
Pros
- Quick prototyping towards minimum viable product
- Can generate complex interactive visualizations (plots, reports, etc.)
- No practical limit on data size
- Easily reproduce results and adapt code to changes
- You can do almost anything purely in R
Cons
- Steep learning curve
- Third-party package developers not forced to follow a set standard
- Open source
Split-Apply-Combine Paradigm
Multithreading Split-Apply-Combine
Pre-loaded Data Sets in R
Toy Example: ..Apply and Combine
Summarize for me...
- Good and bad of spreadsheet programs?
- What is scoping?
- What is an anonymous function?
- T/F: You must comment all code.
- T/F: You can't return a function from a function.
- T/F: Split-Apply-Combine is exclusive to R.
- #4, FALSE; #5, FALSE; #6, FALSE
Questions?
- Just try new code and see what happens; this isn't wet lab
- Google: Don't just search with “R”, use “R language”
- Stack Overflow: Use tag “[r]”
Keep going, you got this!
App: do.call(rbind, list)
- lapply( ) applies given function to each list element, iteratively
- do.call( ) applies given function to the list as a whole, once
App: NIH Proficiency Scale
- 1. Fundamental Awareness (basic knowledge): Common knowledge/understanding of basic techniques/concepts
- 2. Novice (limited experience): Expected to need help when performing this skill
- 3. Intermediate (practical application): Able to successfully complete tasks; expert required occasionally
- 4. Advanced (applied theory): Able to successfully complete tasks without assistance
- 5. Expert (recognized authority): Can provide guidance, troubleshooting, and answers related to this skill
- Source: https://hr.nih.gov/working-nih/competencies/competencies-proficiency-scale