Tim Hopper | machine learning engineering, photography, math jokes

Right Code, Right Place, Right Time

Tue Jan 29, 2019 by Tim Hopper in read

I gave a talk at Pydata DC 2018 where I tried to articulate some reasons why companies building machine learning products under-invest in engineering and architecture. I’m very interested in feedback, pointers to other resources on this topic, and a general discussion about how to make more effective ML products.

Open in Google Docs

And the video:

Devops Empowered Data Science with Ansible

Thu Jul 12, 2018 by Tim Hopper in read

I gave a talk at Scipy 2018 loosely based on my Ansible tutorial. Here are my slides:

Open in Google Docs

And the video:

Python Plotting for Exploratory Data Analysis

Mon Jun 26, 2017 by Tim Hopper in read, technical

Plotting is an essential component of data analysis. As a data scientist, I spend a significant amount of my time making simple plots to understand complex data sets (exploratory data analysis) and help others understand them (presentations).

In particular, I make a lot of bar charts (including histograms), line plots (including time series), scatter plots, and density plots from data in Pandas data frames. I often want to facet these on various categorical variables and layer them on a common grid.

To that end, I made pythonplot.com, a brief introduction to Python plotting libraries and a “rosetta stone” comparing how to use them. I also included comparison to ggplot2, the R plotting library that I and many others consider a gold standard.

How I Quit My Ph.D. and Learned to Love Data Science

Tue Feb 14, 2017 by Tim Hopper in read, presentation

I recently gave to the Duke Big Data Initiative entitled ~~Dr.~~ Hopper, or How I Quit My Ph.D. and Learned to Love Data Science. The talk was well received, and my slides seemed to resonate in the Twitter data science community.

I’ve started a long-form blog post with the same message, but it’s not done yet. In the mean time, I wanted to share the slides that want along with the talk.

Ultralight Backpacking for the Ultratall

Fri Apr 29, 2016 by Tim Hopper in read

I created a single page website to collect notes on one of my other hobbies: ultralight backpacking. In particular, notes on ultralight gear for the very tall.

Using Twitter Data to Gain Insights into E-cigarette Marketing and Locations of Use

Fri Nov 06, 2015 by Tim Hopper in read, technical

When I worked at RTI International, I worked on an exploratory analysis of Twitter discussion of electronic cigarettes. A paper on our work was just published in the Journal of Internet Medical Research: Using Twitter Data to Gain Insights into E-cigarette Marketing and Locations of Use: An Infoveillance Study.¹

Marketing and use of electronic cigarettes (e-cigarettes) and other electronic nicotine delivery devices have increased exponentially in recent years fueled, in part, by marketing and word-of-mouth communications via social media platforms, such as Twitter. … We identified approximately 1.7 million tweets about e-cigarettes between 2008 and 2013, with the majority of these tweets being advertising (93.43%, 1,559,⁵⁰⁸⁄₁,669,123). Tweets about e-cigarettes increased more than tenfold between 2009 and 2010, suggesting a rapid increase in the popularity of e-cigarettes and marketing efforts. The Twitter handles tweeting most frequently about e-cigarettes were a mixture of e-cigarette brands, affiliate marketers, and resellers of e-cigarette products. Of the 471 e-cigarette tweets mentioning a specific place, most mentioned e-cigarette use in class (39.1%, ¹⁸⁴⁄₄₇₁) followed by home/room/bed (12.5%, ⁵⁹⁄₄₇₁), school (12.1%, ⁵⁷⁄₄₇₁), in public (8.7%, ⁴¹⁄₄₇₁), the bathroom (5.7%, ²⁷⁄₄₇₁), and at work (4.5%, ²¹⁄₄₇₁).

I have no idea what “Infoveillance” means. ^[return]

Nonparametric Latent Dirichlet Allocation

Fri Oct 16, 2015 by Tim Hopper in read, technical

Today is my last day at Qadium. Next week, I am joining the data science team at Distil Networks.

I’ve been privileged to work with Eric Jonas on the data microscopes project for the past 8 months. In particular, I contributed the implementation of Nonparametric Latent Dirichlet Allocation.

I published a collection of notes on nonparametric Bayesian methods and Latent Dirichlet Allocation at dp.tdhopper.com. I hope this will be useful to other students and researchers of these methods.

Notes on Dirichlet Processes

Fri Oct 16, 2015 by Tim Hopper in read, technical

I have published some notes on the Dirichlet distribute, Dirichlet processes, Gibbs sampling for mixture models and nonparametric mixture models, and the Gibbs sampler for nonparametric Latent Dirichlet Allocation.

This is related to my work on a Python implementation of Hierarchical Dirichlet Process Latent Dirichlet Allocation.

Profile in Computational Imagination

Tue Sep 01, 2015 by Tim Hopper in read

I recently had the honor of being interviewed by Michael Swenson for his interview series called “Profiles in Computational Imagination”. I talked a bit about my current work, my wandering road to data science, and my love for remote work. You can read it here.

Introduction to PySpark

Sat Feb 28, 2015 by Tim Hopper in read, technical, presentation

I gave a talk at the Research Triangle Analysts meetup about Pyspark. It wasn’t recorded, but you can see the IPython notebook I presented from.