Written by Tobias Zwingmann – Senior Data Scientist, Author, Mentor and Speaker
Tobias Zwingmann is an experienced data scientist with a strong business background and author of the book AI-Powered Business Intelligence. Tobias has more than 15 years of professional experience in a corporate setting, where he has been responsible for building out data science use cases and developing a company-wide data strategy. He is also a co-founder of the German AI start-up RAPYD.AI and is on a mission to help companies adopt machine learning and artificial intelligence faster while achieving meaningful business impact.
It has been quite a journey since I embarked on my data scientist career. I’ve read countless books, articles and papers that have helped shape me into the data scientist I am today. In this blog post, I want to share with you the 10 most important books of my data journey. Each book has taught me something valuable that I use in my work on a daily basis.
Online courses are great, but nothing beats a good book in hand. There are great online courses but books are still one of the fastest ways for me to gain knowledge. I love the feeling of flipping through the pages and seeing my progress as I highlight and take notes.
That said, here are the top books that have helped me in my data journey:
Statistics books
It’s important to have a solid foundation. That’s why having a good stats book is essential. Besides being a great foundation, they serve as easy look-ups.
1. Discovering Statistics Using R Discovering Statistics Using R is the most accessible statistics book I’ve ever come across. It covers all the aspects you want to know for an introduction to statistics, and provides code examples in R, which are great for practice.
2. Statistical Rethinking This book adds a Bayesian stats perspective. It provides excellent examples of how to use Bayesian methods for various types of data analysis and gives an overall view of Bayesian concepts.
3. Practical Statistics for Data Scientists Practical Statistics for Data Scientists shouldn’t necessarily be read cover to cover, but it’s a great resource for quickly looking up specific statistical tests and their applications. The code examples in Python and R help you get started quickly.
4. Introduction to Statistical Learning This is the more accessible version of Elements of Statistical Learning (which is also highly recommended). If you want to understand the main statistical concepts for machine learning applications in a thorough yet understandable way, this book is a good place to start.
Business books
Data science isn’t just about statistics and analysing data. It’s also about solving business problems. That’s why it’s important to know the main areas where data science plays a role in business.
5. Data Science for Business Data Science for Business provides a comprehensive overview of how data science can be used in business. It covers a wide range of topics beyond machine learning and explains specific real-world examples.
Technical/Machine learning books
After you’ve acquired a solid theoretical knowledge and played with some code on your local computer, it’s time to take the next step and figure out how to develop end-to-end data science projects. The biggest providers are AWS, Google Cloud Platform and Microsoft Azure. So it’s good to choose a book that focuses heavily on the actual platform you’re using.
6. Data Science on the Google Cloud Platform When I started to build end-to-end data science applications, I quickly fell in love with Google Cloud Platform. That’s why Data Science on the Google Cloud Platform was, of course, the go-to choice for me. This book has a strong end-to-end engineering focus and was excellently written by Google’s top expert Valliappa (Lak) Lakshmanan.
Alternative: Try *Data Science on AWS *by Chris Fregly and Antje Barth if you want to use, well, AWS. The book follows a similar end-to-end concept and is written by top experts alike.
7. Approaching (Almost) Any Machine Learning Problem This book isn’t at all what you’d expect from a typical textbook. Rather, it’s a collection of thoroughly explained code templates that show how to tackle machine learning projects with Python and how to organise your code. It was written by the world’s first 4x Kaggle Grand Master Abhishek Thakur and provides a great technical resource.
8. Machine Learning Bookcamp This book isn’t so well known, but I like it a lot because it’s practical and takes you through various real-world use cases of machine learning, explaining everything from A to Z. As the name suggests, it’s a whole bootcamp in the form of a book. Highly recommended!
Data Analysis
Data science isn’t always about machine learning. In fact, many data science use cases require basic statistical analysis and data-wrangling skills.
This is where the following book comes into play:
9. Advancing into Analytics Advancing into Analytics revisits core analytics concepts and teaches how to go beyond Excel using Python and R. It provides a comprehensive overview of essential data analysis techniques that are applicable to a wide range of industries.
Visualisations
Strong storytelling and communication skills are as important as technical knowledge, if not more. Having a good book that provides essential concepts about data visualisation best practices is therefore key.
10. Storytelling with Data This is hands down the best book you can get to learn what beautiful, effective visualisations need to look like to tell powerful data stories.
These are the top books that have helped me on my data journey. I hope you enjoyed this list and found some inspiration for your very own data journey!
PHUSE Top Tip: To expand your clinical and pharmaceutical knowledge and utilise the techniques explored in these books, take a look at PHUSE Education or contact education@phuse.global to get involved.