Data Science Tips And Life Hacks
Here are some tips and life hacks on how to survive data science. As they say, work smart, not hard. As a matter of fact, my favorite tasks are the tasks that I don’t have to do. It is possible that task, as a word, is defined in some forgotten dictionary as something to avoid. Like, has anyone in the history of humankind used the word task in a positive context? Anyway, going off on a tangent. Let’s get started.
Learn the domain where you work
It is easy to acquire a pile of numbers and compute something. Just calculate the average value or apply a statistical model. However, to be useful, connect your work with the domain. What do your results mean? What is the actual insight within your results? What kind of actions can be taken based on the results? Are the results valid? Are the results significant? Is there anything fishy about the results? Are your statistical assumptions valid? How to define and measure success? Are type I errors more serious than type II errors? Do you need to explain the logic behind the results or is a black box sufficient? It is hard to make impact and be relevant inside a vacuum.
Are you solving the correct problem?
Make sure that you are solving a valuable problem in the first place. Otherwise you are fighting against windmills, competing in a race that you can’t win. The “why” is at least as important as the “how”.
The value of the best solution for a wrong problem is a big number times zero
Some people will have a hard time communicating with you or even understanding what you do. Your world can be very abstract to others. However, you still need to interact with them and figure out what they need. For example, you can create mockups of deliverables, from documents to user interfaces. This will help people to latch on something concrete and have a real, grounded basis for discussion. They can relatively easily tell if you are moving towards a good direction, and to help find one. The quality of feedback, interaction, requirements and extracted domain knowledge will skyrocket.
Understand people
Humans are irrational and implicitly anthropomorphize everything. Like, “this restaurant makes excellent pasta” or “did you hear about the artificial intelligence that detects cats in images.” However, organizations, businesses and companies are not living creatures. They consists of humans. Clients and customers are humans, even in business-to-business context. You operate in a sandbox filled to the brim with Homo sapientes. There is no escape. Soft skills are very important and required for efficient collaboration. Before you point your finger at me, I acknowledge that many people together form a system that exhibits its own behavior patterns, like how birds fly together or how atoms form solid matter together.
Find your supporters in your organization, possibly the people who decided to hire you. Try to stay in friendly terms with them, remember their birthdays and so on. Supporters are your hedge against problems. Let’s say that your employer experiences financial challenges. If decision makers in the organization like you, and they understand your value for the business, then you might not be the first one to let go.
Try to get along with people, try to be like-able. Smile. Don’t make your work life harder than it has to be. It sucks when you build an awesome, provenly beneficial system and no-one wants to use it. You effectively end up doing nothing. Zero impact and no business value. Therefore, it is of a paramount importance to obtain supporters, testers, collaborators, feedback givers, users and internal buy-in for your work.
Data science is an abstract, hazy concept for many people. Just remember that they are not stupid or less intelligent than you. They are professionals in their own field. Who knows, maybe they are just not interested in data science. It is on you to help them understand your value. Never grow a unhealthy ego, no-one wants to feel stupid. The universe does not owe success and fame to you because you can implement the latest tricks in machine learning. Teach the intuition of relevant concepts in data science to your work buddies. Do not mind repeating the same things over and over again. Data science is a complex area and you can provide an easy, abstracted interface to utilize the power of data science.
Empathy is not only for hippies. Empathy is seriously important and gives you tools to build a personal connection with the person in front of you. Step in her shoes. Understand her and think from her perspective. What motivates her? What makes her tick? What are her hobbies? What are her values? Now you can use familiar vocabulary, build analogies, frame impact into relevant outcomes and become interesting. You can help her to understand you, what you do and why your work is important.
Data? Evaluation metrics?
You need to define what you are doing before acquiring data. If you have to produce insight or create something that works, practice designing experiments and to apply the scientific method. Put science in data science. For example, define hypotheses and how to measure effect sizes to reject or accept the hypotheses.
Be careful about the measures of effect size, which are typically some evaluation metrics in machine learning context. For example, accuracy metric is a bad choice for data with class imbalance. Let’s say that you are building a system to diagnose a disease where 0.1% of population has the disease. You get an expected accuracy of 99.9% simply by diagnosing everyone to not have the disease. You provide accurate diagnoses while denying treatment from a lot of people.
Now you need to define the required data. You need to understand the business context and domain to figure out if the data exist, can be bought or if you have to collect it. Take the following dimensions into account: cost, time, effort, legislation and availability.
Don’t tune parameters forever
Create a baseline result using a simple model that is easy to implement, understand and deploy in production. The baseline result is a measure stick to determine progress, and sometimes you’ll end up noticing that the simple model is all you need. For example, use Logistic Regression. Set a time limit for experiments and be systematic with tracking your experiments. Sometimes you tilt, go into a parameter frenzy and will eventually repeat existing experiments. Don’t tweak model parameters or hyperparameters for months if the incremental improvement is not justified by its business potential. Sometimes significant improvements are acquired by paradigm changes in model structure and using more accurate statistical assumptions through better understanding of the domain.
Feeling of loneliness and organizational maturity
One of the common reasons for a data science person to quit is the feeling of loneliness. If you are looking for a new job, find out if the potential work places have peers and mentors available. Businesses that are not mature in data science might not be easy on your mental health unless you have a very entrepreneurial personality. Ask questions during the recruitment process to assess the maturity. For example, by probing how they approach model deployment. Remember that you can also find mentors and data friends outside your work environment.
Prepare for help requests and to turn down some of them
If people find you useful, then they are likely to want your input and effort on many things. You might have to start turning some of them down to get your own work done. A data science manager can be helpful to have around since she can gate the incoming tickets on your behalf. She is someone who guards your time. A data science manager is probably found in organizations that are more mature in data science.
Never stop learning
Learn to learn, and never stop, it will keep rewarding you. You will become much more productive if you have a wide skill set. Maybe even a tenfold increase in productivity. I personally like the idea of a T-shaped skill set where a person knows a lot about some topics and a bit about many topics. It is very empowering to be able to create a concrete prototype of any idea: app, website, machine learning algorithm and so on.
Do not overfit your machine learning skills by being an expert in using only one specific method. Your favorite algorithm is not a hammer and not every problem is a nail. Have knowledge of many different methods, understand underlying theories and form mathematical connections between machine learning topics. Then you can design a suitable solution for a problem, and not the other way around.
Learn basics of software development, deployment and some best practices. Write unit tests and integration tests, especially for feature extraction pipelines. Implement active learning for speeding up the model training and data annotation. Learn the basics of containers and cloud platforms. You’ll learn to structure your software in a deployment-friendly manner, especially if someone else deploys it. This person will quickly understand how your software is used, she will save a lot of time and have a longer, happier life.
Math bugs are different than code bugs
A compiler or an interpreter can catch problems in source code but you are on your own with problems in mathematics and logic. Your neural network will happily keep training using random, shuffled labels, even though it’s not what you want. Code defensibly and do not trust the data that come in. Depending on the domain and context, it is sometimes better to just crash the program after a failed logic assertion. Review your code carefully.