Functions: Theory and Application (in R)
Athit Kao, PhD
UCI Bioinformatics Support Group - April 25, 2018
whoami
R enthusiast
PhD, Biomedical Sciences, UCI
IGB Biomedical Informatics Training program alumnus
Focus on proteomics mass spectrometry and bioinformatics
Prerequisites
Understand content from previous deck: http://learn.athitkao.com/presentation_functions1.html
Basic programming experience necessary
R and RStudio installed
Consider a simple use case for programming in your field, e.g.:
- Filter specific data from file
- Munge proprietary data format into CSV
- Calculating RPKM for sequencing data
Readability, part II
- Encapsulation via functions is an aspect of code legibility
- Others include:
- Documentation (comments and READMEs)
- Formatting (indentation and line wrapping)
- Variable naming (obvious and consistent names)
- There is a balance appropriate for you/your audience:

Generalized Concept of Functions
Anonymous Functions (in lapply)
Multiple Arguments (in lapply)
- First argument always the iterated variable
- Must explicitly name additional arguments

The Assignment Operator
- Use “<-” or “=”?
- I use “=” for consistency (with other programming languages) with no issues
- Consistency even within R
Environments and Variable Scoping in R
- Scoping refers to the visibility of variables in different environments
- Global: Can be referenced from anywhere
- Local: Accessible only within its environment
- http://adv-r.had.co.nz/Environments.html
- R uses “lexical” (a.k.a. static) scoping rules
Microsoft Excel
Pros
- Quick editing and prototyping
- WYSIWYG plots
- Widely known and used program
- Many similar alternatives (e.g. Sheets, Calc, Numbers, etc.)
- Programming/automation with Visual Basic
Cons
- Limit of 1M rows by 16K columns
- 32bit version has 2GB file limit
- Manual editing error-prone
- Complex visualizations not possible
- Third-party packages close to non-existent (e.g. advanced statistics, machine learning, etc.)
- Closed source
R Programming Language
Pros
- Quick prototyping towards minimum viable product
- Can generate complex interactive visualizations (plots, reports, etc.)
- No practical limit on data size
- Easily reproduce results and adapt code to changes
- You can do almost anything purely in R
Cons
- Steep learning curve
- Third-party package developers not forced to follow a set standard
- Open source
Split-Apply-Combine Paradigm
Multithreading Split-Apply-Combine