Functions: Theory and Practice (in R)
Athit Kao, PhD
UCI Bioinformatics Support Group - April 25, 2018
whoami
R enthusiast
PhD, Biomedical Sciences, UCI
IGB Biomedical Informatics Training program alumnus
Focus on proteomics mass spectrometry and bioinformatics
Prerequisites
Fundamental awareness of computers and programming
No programming experience necessary
R and RStudio installed (for 2nd half)
whoareyou
Who is using Windows, Linux, and/or Mac OS?
How many people have programmed before? What languages?
Who has written a function before?
What is a use case for your field that may require programming?
Objects and Variables
- In R, any data structure is considered an object:
- Vector of n length
- Matrix of m x n size
- Function
- Generally in programming, a storage location identified by a name is a variable:
- May contain data structure(s)
- Value can be modified by referencing the name

Purpose of Functions
- N.b. “Perfect is the enemy of progress”
- Encapsulates and organizes multiple tasks
- Generally, they make life easier by improving:
- Readability
- Reusability
- Abstraction
Readability and Reusability
Writing Script vs. Using Console
Why are we typing out everything?...
- So you can make mistakes:
- “That was case sensitive?”
- “I forgot a parenthesis/bracket?”
- “That wasn't a period/comma/semi-colon?”
- We learn better from mistakes
- We can help each other out here
Follow along with the red line numbers
Exercise #1: Our Function
Exercise #2: Using lapply
- lapply( ) is a function that wraps over your function
- A clean and concise way to iterate values through your function in R (vs. using a loop)

Exercise #3: Package "parallel"
Multithreading in R is very straightforward
Instead of lapply, we will swap it for a similar function
Windows users will have extra steps
All operating systems need to start with the following:

Exercise #3: mclapply (for Linux/Mac OS)
Exercise #3: clusterApply (for Windows/Linux/Mac OS)

- Objects additional to provided function must be clusterExport'ed to it (see “?clusterApply”)
Performance Benchmark
Exactly how much faster was that?
Time code using function system.time( )
Remember to encapsulate code with brackets “{ }”

Summarize for me...
- Describe the basic structure of a function?
- How do functions improve code?
- Anything weird or counterintuitive?
- Is Sys.sleep, system.time, or lapply a function?
- Which is faster, lapply or mclapply?
- How much does R cost after the trial period ends?
- #4, they're all functions; #5, it depends; #6, R is free!!!
Questions?
- Just try new code and see what happens; this isn't wet lab
- Google: Don't just search with “R”, use “R language”
- Stack Overflow: Use tag “[r]”

Keep going, you got this!
App: Readability (cont.)
- Encapsulation via functions is an aspect of code legibility
- Others include:
- Documentation (comments and READMEs)
- Formatting (indentation and line wrapping)
- Variable naming (obvious and consistent names)
- There is a balance appropriate for you/your audience:

App: Nothing to Return = NULL
App: Function Argument Order
App: "Functionals" in R
- Functional:
- lapply:
- Function that applies another function over list
- Returns a list
App: NIH Proficiency Scale
- 1. Fundamental Awareness (basic knowledge): Common knowledge/understanding of basic techniques/concepts
- 2. Novice (limited experience): Expected to need help when performing this skill
- 3. Intermediate (practical application): Able to successfully complete tasks; expert required occasionally
- 4. Advanced (applied theory): Able to successfully complete tasks without assistance
- 5. Expert (recognized authority): Can provide guidance, troubleshooting, and answers related to this skill
- Source: https://hr.nih.gov/working-nih/competencies/competencies-proficiency-scale