Getting Started with Data Science
By Kim Nilsson, CEO Pivigo
I get asked for advice a lot. Advice on how to best do data science. This is not an easy topic to advise on, especially given the few precedents and limited sharing of best practice in a young industry. Hence, I thought I would share the top three dilemmas that I get approached with by tech executives, and the best practice solutions that I have come across.
- Build a warehouse and then start, or start immediately?
One of the first questions a tech executive is faced with is whether to spend umph millions of dollars on building out a Hadoop cluster/data warehouse/data lake (or similar expensive piece of kit) first, and then try to get value from the neat data pool, or to get their hands dirty straight away. As with all of my points here, I have seen both cases in roughly equal measures.
Whereas it is tempting to want to have all the technology in place before letting analysts lose on the data, after all it is makes sense to, for example, make sure you have all the ingredients for a recipe before starting to cook a dish, it may not be the best idea in this case. Why? Well, for one, you are wasting time. Building a warehouse solution can take months, if not years, and in the meantime your competitors are optimising their businesses and launching new products and services. Furthermore, the optimal technology solution may not be clear from the start. Your first, small projects will already deliver value, and give you guidance on whether you are heading in the right direction.
Tip 1: Don’t wait! Get started straight away, even if it is just with proof-of-concepts, exploratory analysis or prototypes.
- Hire one, or hire a team?
So you want to get started, but now you need to decide, do I hire one experienced, talented, expensive superstar data scientist (a.k.a. unicorn) or do I spend my budget on a team of junior data scientists that can work together? Well, the advantage with an experienced hire is that you have lots of precedent of their work, and they can possibly get up to speed a little bit faster with your business needs. The disadvantage is that they can be set in their ways, i.e. try to fit your problem into their solution, that they come with one skillset and that they are incredibly hard to find and thus expensive.
The best solution in my book is thus to go for a small team of less experienced, but talented and motivated individuals with diverse skill and fresh mindsets. A team of three already allows for a good coverage of all necessary skillsets, e.g. a person more focused on coding and databases, a person focused on stats and analytics, and a third person with visualisation and reporting skills. Taking people younger in their careers typically means they will work that bit harder to prove themselves and learn, and they will strive to quickly become productive team members.
Tip 2: Hire a team of three younger data scientists with diverse skillsets.
- Centralised, or integrated teams?
Congratulations! You are now the proud manager of a data science team. Next big question, where to put them? Do we keep them close together, in one “data taskforce unit”, or do we send them out on secondments to different business units?
Keeping the team together ensures that learning and best practice quickly spreads, incentivises collaboration and increases motivation. It will also allow you to manage the team’s resources better, and combine skills on projects. On the other hand, it can certainly complicate communication with stakeholders outside the team and there is a risk of resentment towards a perceived “ivory tower” of knowledge and power. Placing team members in different business units risks lowered motivation and shared learning, but can facilitate the learning of domain knowledge and result in better communication. There is no one solution to this question that will work for all, but it is important to be aware of the challenges around team placements, and to put structures in place that counteract them.
Tip 3: Look for a hybrid team organisation, with a good mix of internal support and mentoring within the data science team and open communication lines with the rest of the organisation.
To summarise, if your organisation is not making good use of its data today, you will be in trouble tomorrow. Sooner or later all organisations will need to become more data driven, and those that start earlier will have the advantage. It is not easy to get started, but there are tools and services available for those with the right level of commitment. Start now, build a team of diverse skills, get advice on how to manage teams of data rockstars join the data revolution.