Setting up software for data analysis and statistical computing can be a major pain. Although setup on OSX is substantially less troublesome than on Windows (a lesson I’ve had to learn the hard way at the office), the process is nonetheless time-consuming and can be difficult to get right. I have finally arrived at a pretty clean, maintainable setup which perhaps merits sharing so that others can avoid spending as much time as I have configuring a new machine. This is intended to be a living document and so, I will update this post as technologies or my processes change.
Here is a general overview of the tools I use and what will need to be installed:
XCode
Homebrew
Git
Unix Shell (get bash profile with aliases, prompt string, etc from Git)
Vim
Python (Anaconda or not to Anaconda)
R (with RStudio)
LaTeX
XCode
The first thing you need to do is install Apple’s command line tools (or XCode). This can be done by installing XCode from the App Store, going to preferences, downloads and then installing command line tools.
In versions beyond OSX 10.9, you can also install XCode directly from the command line with
xcode-select --install
Homebrew
Now we can install Homebrew, a free package management system that simplifies the installation of software on the macOS operating system. To do this, open your Terminal app or whatever terminal emulator you use and enter:
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
Follow the command line prompts and enter your User password when instructed. By default, Homebrew will be installed such that we can use the brew
command without having to type sudo
and providing a password.
Run brew doctor
after to make sure the installation was successful and that Homebrew is working properly.
Claires-MacBook-Pro-2:Code clairesaint-donat$ brew doctor
Your system is ready to brew.
To make sure we have the latest Homebrew version and the latest formulas (which we should), we can run brew update && brew upgrade
.
Now we are ready to install some software!
We can install Git with:
brew install git
Now we’re ready with Git and Homebrew!
Unix Shell
I do like to have htop to monitor system usage so I will install it with:
brew install htop-osx
It’s also helpful to have wget
which can be installed with:
brew install wget
At this moment, I do not have a very particular set up for my Unix Shell environment or dotfiles I am particularly attached to so I will move on for now. Watch this space as more steps will be added here shortly.
VIM
Vim is a highly configurable text editor built to make creating and changing any kind of text very efficient. It is already included with Apple OS X but we will make sure we have the most recent version installed and configure the editor to our tastes.
To install the latest version, use Homebrew with the following command:
brew install vim && brew install macvim
brew link macvim
Now we should have the most recent version of Vim installed (if you would like to check, type vim —version
). Additionally, we have also install MacVim, a Vim port to Mac OSX that is meant to look better and integrate more seamlessly with your Mac. The homebrew command above includes the installation of the CLI mvim
and the Mac application (which point point to the same thing).
If you don’t already have a Vimrc file that you like to use, I recommend the Ultimate Vim configuration. On the other hand, if you are just learning vim and starting to incorporate it into your workflow, it might be a good idea to start with a blank rc file and get comfortable with the basic functionality of the editor without any plugins. In my first few weeks of using Vim, I found it useful to slowly incorporate packages and new features into my workflow over time.
If you would like to use the Ultimate vimrc, you can do so by cloning the repo and running the install script:
git clone --depth=1 https://github.com/amix/vimrc.git ~/.vim_runtime
sh ~/.vim_runtime/install_awesome_vimrc.sh
Python
I debated a lot on the best way to install Python on a system and ultimately landed on the Anaconda distribution. Anaconda is probably the most popular distribution for Data Science since it abstracts away many of the complexities associated with package management, helps keep dependencies updated and comes with many useful Python tools such as Jupyter Notebooks.
Install Anaconda
The easiest way to install the distribution is through the graphical installer.
Go to the Anaconda Website and choose a Python 3.x graphical installer (at the time of writing this, I am installing Python 3.7). Select only one version of Python and do not install both! If you need to run a program in Python 2, virtual environments allow you to create different versions of Python depending on the project you are working on.
Locate the download and double click it
Click through by hitting “Continue” to install with default settings. Give your admin password when prompted.
I choose to not install Microsoft VS Code, a Python IDE that comes with the distribution, however, you should feel free to if you like to use it.
Note that when you install Anaconda, the program will automatically update your bash profile with anaconda3
.
You can also install anaconda from the command line with the following code:
# Go to home directory
cd ~
# You can change what anaconda version you want at
# https://repo.continuum.io/archive/
curl https://repo.continuum.io/archive/Anaconda3-2018.12-MacOSX-x86_64.sh -o anaconda3.sh
bash anaconda3.sh -b -p ~/anaconda3
rm anaconda3.sh
echo 'export PATH="~/anaconda3/bin:$PATH"' >> ~/.bash_profile
# Refresh basically
source .bash_profile
conda update conda
Test your Installation
Note that you need to open a new Terminal window for the changes in your environment variables to take effect.
Run
python —version
in a Terminal window to make sure you have installed the correct version of Python and that your PATH variable has updated correctly. You should get output like:Claires-MBP-2:~ clairesaint-donat$ python --version Python 3.7.1
Type
conda update conda
to make sure that the conda function is working properly and that you are up-to-date.Another helpful test is to confirm the installation of Jupyter. Run the command
jupyter notebook
to see if a notebook instance launches.
If you have multiple versions of python installed on your computer these tests will not work and you will need to update your .bash_profile
to point to the correct installation of Python.
Python “Hello World”
R
Most people, myself included, install RStudio alongside R. Almost everyone uses the RStudio IDE and it’s generally considered the easiest and best way to work with R.
Install R and RStudio
There are several ways to do this but I choose to install using the command line and our newly-minted Homebrew.
Type
brew install r
in the TerminalUpdate your bash profile with the following command:
echo 'Sys.setlocale(category="LC_ALL", locale = "en_US.UTF-8")' >> ~/.bash_profile
Install R studio by entering
brew cask install rstudio
Install R Packages & Change the RStudio environment
Open RStudio. On the left panel, you should have an R console and terminal. In the console, you can type an R command followed by enter and R will execute the command for you.
To install Tidyverse
install.packages("tidyverse", repos = 'https://cran.us.r-project.org')
You can install a few more useful packages using the syntax
install.packages(<package_name>)
Some useful packages are:
XML
: Read and write XML documents with Rjsonlite
: Read and create JSON data tables with Rhttr
: A set of useful tools for working with http connectionsrvest
: Very useful tool for webscraping
I also like to change the editing colors of RStudio. You can do this by going to Tools > Global Options > Appearance. I personally like the Cobalt theme with Monaco font to reduce eye strain.
R “Hello, World!”
To check that everything works, try creating a simple plot in RStudio.
In the same console panel, load the ggplot library by typing library(ggplot2)
. Then type in the command
ggplot(airquality, aes(x = Day, y = Ozone)) +
geom_point()
What this does is instruct R to use airquality
, a pre-loaded dataset, and plot Day versus Ozone. The resulting plot should look something like:
LaTeX
Prepare to set aside a good amount of time (about an hour) to install LaTeX. Since you will have to download a large file, a high-speed internet connection is advisable.
To install LaTeX applications on your Mac:
Visit http://tug.org/mactex/ and click on the MacTex download link. The file is about 3.2 GB so may take a little while (about 20 minutes) to download.
Once the file has downloaded, double-clink mactex.pkg to begin the installation.
Read and accept the conditions and follow the on-screen instructions to install. The installation may take a few minutes.
After the installation is complete, you can delete the mactex.pkg file.
Since I don’t write that advanced LaTeX anymore, I have found that the editors that come with the MacTeX download are sufficient. TexShop is my go-to editor for editing most LaTex documents.