Keeping Data Science Simple
Originally published in Towards Data Science.
- Pick the right tool for the job, which is probably the simplest.
- Focus on solving real world problems, making sure you can measure your solution.
- Repeat consistently.
Data Science is a field filled with fancy sounding things. Concepts both simple and complex get cool names, and let you make claims like being “Powered by AI”. While this isn’t necessarily a problem, it can mislead aspiring Data Scientists. Like any field, fancy names and complex concepts get much of the attention. This can help give the impression that the cutting edge is where the party is at. Data Science isn’t all about who’s got the most convolutional neural networks or the deepest learning. Crazy AI skills may prove valuable in certain situations, Data Science is about picking the right tool for the job and using it effectively to solve real world problems. That last part, solving real world problems, should always the ultimate goal. Consistently hitting that goal is the foundation of a Data Science career.
The right tool for the job is often the simplest one, at least at first. Complex models break, their behaviour can be hard to develop an intuition for, and implementing them is time consuming. Focus on simplicity and you’ll start more projects, which will themselves probably have a higher success rate. When it comes to starting a career in Data Science, having a track record of providing real world value will give you a tremendous boost. An education in machine learning, statistics, or programming will provide you with an essential base of skills, but proving you can apply those skills to real world problems is far more valuable.
Starting simple projects and seeing them through to the end will help you build that track record. A project can start as simple as a SQL query, so try making a list of possible projects in and out of your company and having a go at each. Generating and measuring value should take precedent over almost everything. It’s important to remember that how you measure your solution depends on what problem you’re trying to solve. If I’m trying to create a mortgage approval model, I probably care more about correctly screening out fraudsters than mistakenly rejecting those with good intentions. How I measure my solution should reflect those priorities. This approach isn’t just the best way to build a portfolio; it’s a vital part of any Data Science work. Talk to stakeholders, get to the root of a problem, and find the best way to measure the value your solution provides.