Table of Contents
In today's interview, I am joined by Timothy Dobbins, who is a Principal Data Scientist at Trilliant Health. Before joining Trilliant Health, Timothy worked in several companies in various data science and engineering roles. As a result, Timothy has amassed a wealth of experience in the field. This interview explores his data science journey and his advice for budding data scientists.
Thank you, Timothy, for taking the time to talk to us today. Let’s start with your journey. How did you get into data science?
I studied economics and statistics in college and wanted to work at a think tank or be a writer. I was interested in economics and how that impacts decision making. I then read Nate Silver’s The Signal and The Noise while taking an econometrics class and got hooked on statistics for the same reasons I was hooked on economics! I picked up programming the following semester from a friend who was doing machine learning. Once I started programming, I never looked back. I didn’t think computer programming was possible for me since I had never got exposed to it before then, but shortly after learning it, I realized anyone could learn it.
How did you transition from software engineering to data science?
I was interested in economics and statistics before graduating but didn’t have a writing, or think-tank job lined up leading up to graduation. I was fortunate to respond to a mass email from a professor asking if there was interest in working as an intern for a health tech data startup. I got the interview, then the job as an intern, then they brought me on full-time to do data science and software eng work on a data-intensive web application. To this day, I consider this my biggest stroke of luck yet; the very job that set me up for success in a very skills-driven industry. I mostly did software engineering work here, but I started looking for full-time data science roles once I got comfortable with my programming skills. So my transition was moving from full stack web-dev working with very data-intensive apps to an engineering-focused data scientist.
What is the role of internships and contract work in getting a full-time role?
I had two internships. During my first internship, I was literally paid to learn how to program. My second internship kicked off my entire career. I’m biased, but internships are absolutely crucial to finding a career you’ll enjoy and learning how to be successful in that career. I also did a few contract jobs, which helped me focus purely on skills. I think both play important roles in building skills.
What’s the most exciting project you have worked on at Trilliant Health?
I split my time pretty evenly between machine learning and engineering. One of my biggest projects is converting our git repos into a monorepo. That has been a big learning curve since I’ve never worked with monorepos. Other than that, I’ve been learning a lot about packaging code and writing code that someone else will use in a different environment. That’s a lot different than just writing code that runs in a local notebook.
Tell us about how you are currently using AWS as part of your machine learning stack.
I’m not currently using Amazon Web Services (AWS) at work, but I heavily rely on it for side projects, and have used it exclusively in the past. I’ve really gotten into AWS serverless tech lately. I’m currently working on a project that uses S3 to store data, AWS Athena to query the data, AWS ECR to containerize a Python function, Lambda that allows me to use that Python function to connect to my database, and API Gateway to serve the data to a user. And the cost is super low compared to doing this on an EC2 instance! I’m just about fully on the serverless train at this point.
You have used R and Python for natural language processing tasks. Do you use both languages for a single project? Which of the two do you mostly use? Should data scientists learn both languages?
I’ve mostly used Python for NLP. These days you can do almost everything you need with either, so whatever the data scientist is comfortable with and whatever the team supports should be considered.
What process do you take when tackling a data science task, for example, predicting a claim's route by converting claim notes to Word2Vec?
I just posted about this today on LinkedIn :) Here’s my old process compared to today:
This is how I did data science projects in 2017:
❌ Met with the business one time and got everything I needed.
❌ Started writing code instantly to get data.
❌ Performed my ritual EDA—histograms, cool maps, descriptive statistics (with no real goal).
❌ Trained my model on a target variable I created.
❌ Impressed myself with how well my model performed.
❌ Showed results to the business and got dumbfounded looks because my analysis and model missed the mark.
This is how I do data science projects in 2022:
✅ Meet with the business lead a few times for deep dive sessions on the business problem.
✅ Meet with the data experts to learn about the data.
✅ Meet with the business to show early findings—maybe histograms, maybe descriptive stats, maybe how many values are null for the features they said were relevant. Do this a few times.
✅ Once I have a reliable target variable, and trusted data, start designing a solution with paper and pencil.
✅ Once I know how the model will interact with the business and other systems, I start developing.
✅ Meet with the business to tell them how bad the initial results are and how we can improve.
What role does statistics play in the life of a data scientist?
I’m a huge proponent of learning statistics. Statistics is the language of uncertainty, and most things we do as data scientists involve uncertainty—mostly trying to navigate or explain it. Too often, I see statistics neglected in data science in favor of just throwing all your data into an algorithm, running it through a train/test/hold-out split, and letting the best algorithm win. But what about when you don’t have labeled data and need to curate a dataset manually? How do you know that your sample is representative of the population? What about when you’re dealing with heterogeneity, and your predictions for some records are better than others? How do you combat that?
You have built a restful API for serving machine learning models. What role do software engineering skills play in the life of a data scientist?
I’m also a very big proponent of data scientists learning software skills.
What are the top three best practices one should keep in mind when deploying machine learning models?
- Identifying and preventing data leakage.
- Detecting and preventing training-serving skew.
- All the eng best practices that will keep your model from breaking.
You have written data science and machine learning articles. What role has this played in your data science and machine learning career?
This has been significant for my learning and staying motivated to learn more. I started a blog a few years ago, and that was fun, but writing on LinkedIn has been the most impactful. I’ve been doing that for a few months now and can already see the benefits. I don’t know what my short-term goal is, but building relationships on LinkedIn is the best way to open opportunities for the future. I may decide I want to teach an online course, write a book, or collaborate on side projects. I’m not sure exactly what my goal is for writing on LinkedIn. I just know (for me) it’s better to do it than not.
What would your learning journey look like today if you just starting your journey to becoming a machine learning engineer? Which resources would you use and where would you find them?
If I started over, I would focus on programming since that was the hardest for me to learn. I learn best with projects. When I have an end goal, I know all the little tasks I need to get to the next checkpoint.
In your opinion, which are the most underrated and overrated skills in data science and machine learning?
Generally, the most overrated is deep learning. Focusing too much on this will cause you to have misaligned expectations when entering an job industry.
The most underrated is statistics. Not knowing basic stats will make it tough to find valuable insights.
Your posts on LinkedIn are very insightful, and I would recommend readers go through them, including the comments. Why did you start creating content on LinkedIn? How do you come up with the ideas that you share?
Thank you! I started creating content on LinkedIn about five months ago after seeing a couple of my friends go through a content journey where they started with little engagement and worked their way to a respected voice on the platform. Once I saw people I knew doing it, I thought I could do it too. I had no idea about the other benefits I’d experience: the scaled networking, the relationships, the opportunities, and best of all, the constant learning. Honestly, I was pretty burned out with learning and working about a year ago. Creating content was a spark that I desperately needed at the time.
My ideas come from a lot of different places. I’ll be riding in my car and have an idea and write down a sentence or two, then when I’m back at the office, I’ll start to flesh it out more. Sometimes the ideas work, and sometimes, they don’t. I also get a lot of ideas from others in the data community on LinkedIn—through reading their content or engaging with them. That’s a big source of ideas—conversing with the community!
Tell us about your work at Gridsearch Consulting.
I run a consultancy (Gridsearch Consulting) where I work with startups looking to build an MVP data product.
What motivated you to start building machine learning apps for and with your sons?
I love building things! I mostly started building projects for my sons as a way to learn. I’ve built many projects:
Achoo. An inhaler usage tracker and predictor for my asthmatic son that uses weather, allergen, and climate data to predict when he’ll need his inhaler and then email the school nurse. That way, we can get in front of it, so he doesn’t have to miss too much class.
I’ve also built laser tag guns with him.
Another project was LADNER, where I used NLP to web scrape his teachers’ websites to Locate Assignments and Dates using Named Entity Recognition. All of his teachers used free form text to announce when assignments were due. This was my way of automatically grabbing his assignments, adding them to my calendar, and connecting them to his Trello board so that he never “forgot” about them.
Where can people find you online?
I’m most available on LinkedIn! Let’s connect.
🧡 Enjoy this newsletter?
Forward to a friend and let them know where they can subscribe (hint: it's here).
Anything else? Hit reply to send us feedback or say hello.
Join the conversation: Got more questions or comments? Join the conversation in the comments section.
Join the newsletter to receive the latest updates in your inbox.