<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>Oriol Mirosa</title>
        <link>https://www.oriolmirosa.com</link>
        <description>Your blog description</description>
        <lastBuildDate>Wed, 22 Apr 2026 22:33:48 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <image>
            <title>Oriol Mirosa</title>
            <url>https://www.oriolmirosa.com/favicon.ico</url>
            <link>https://www.oriolmirosa.com</link>
        </image>
        <copyright>All rights reserved 2026</copyright>
        <item>
            <title><![CDATA[Data Science For the Rest of Us]]></title>
            <link>https://www.oriolmirosa.com/blog/data-science-for-the-rest-of-us</link>
            <guid>https://www.oriolmirosa.com/blog/data-science-for-the-rest-of-us</guid>
            <pubDate>Fri, 22 Feb 2019 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p>A few days ago, I read <a href="https://towardsdatascience.com/succeeding-as-a-data-scientist-in-small-companies-startups-92f59e22bd8c">Randy Au's Medium story “Succeeding as a data scientist
in small companies/startups”</a> and, as I wrote in a <a href="https://medium.com/@oriolmirosa/this-really-resonates-with-me-as-the-chief-data-scientist-i-e-bae64b927094">comment</a> to Randy’s article, I
immediately felt that he was touching on a subject that requires much more
attention from the data science community than it has received so far. Let me
put it in all caps so that I can impress the importance of this point: MOST OF
US DO NOT WORK AT GOOGLE.</p>
<p>Now, of course, I don't mean that there's anything wrong with working at
Google (not that I'm resentful or anything), and when I say Google I really
mean “company with tons of data, massive infrastructure, and a well-established
data-driven culture.” I imagine that working at one of these companies is like
a kind of data science nirvana, where when you turn on your workstation in the
morning you jump on a soft blanket and are smoothly carried over a collection
of well-defined problems, servers with endless memory and GPUs, and supportive
bosses who can't wait to congratulate you on what an amazing job you did
running that BERT network that made the company another million dollars by
lunch time. OK, I might be getting a little carried away about what life is on
the other side of this particular digital divide, but the point that I'm trying
to make is that all the talk about “data scientist” being the sexiest job of
the 21st century and all the hype about the latest machine learning techniques
do project an image of what being a data scientist is that doesn’t correspond
to life in the trenches.</p>
<p>Sure, we repeat <em>ad nauseam</em> that data scientists spend most of their time
cleaning data, and we're starting to hear some voices saying that not
everything is green in the data science job landscape (I heard that Vicki
Boykis's recent <a href="https://veekaybee.github.io/2019/02/13/data-science-is-different/">article</a> has given minor heart attacks to more than one
about-to-graduate bootcamp student). Yet when we look at the resources
available to those who are considering a career in data science, and
particularly the marketing put forth by the many education startups that offer
job guarantees, it is hard not to think of data science as a glamorous career
path that will allow you to ride your well-honed math skills into a future of
success and riches.</p>
<p>Now, don't get me wrong: data science DOES present amazing opportunities and,
if the n=1 of my humble experience can be used as evidence, it IS a wonderful
job full of excitement and intellectual stimulation (I will take it any day
over my old gig as a university professor). Yet I believe that most people
getting into the field have little idea of what doing data science involves
in the thousands of small and new organizations that want to jump onto the
data science bandwagon and are pulling the trigger on hiring data scientists
without having a clear idea of what they want us for. Thus the importance of
Randy's article and why I decided to piggyback on his efforts and share my
views on the matter: instead of writing yet another tutorial on logistic
regression, let's tell people who are considering a job in data science about
the non-glamorous parts of our days so that they don't feel cheated and know
what they're getting into.</p>
<p>I am planning on writing a whole series on the challenges of doing data
science in small companies and startups in the future, but today I want to
focus on what I see as a cornerstone of data science work: helping to build
an organizational culture that puts rigorous data analysis at the very center
of all decision-making. Now, reading this might put you in a catatonic state,
but not all companies share this view of the role of data. Even more: many
companies think that they are data-driven when in fact their use of data is
spotty and inconsistent at best. And yes: they are hiring data scientists.</p>
<p>The reason I'm mentioning this is not to lambast and shame these companies,
quite the contrary. The point I'm trying to make is that instead of dismissing
these firms and looking for work elsewhere, data scientists should see doing
what these companies need as a central part of our job. A well-established
company with a healthy data culture will know exactly what it needs and hire
someone to do a specific job (cue the growing literature on the
<a href="https://towardsdatascience.com/ode-to-the-type-a-data-scientist-78d11456019">2</a>,
<a href="https://www.leozqin.me/managing-data-scientists-know-your-archetypes/">3</a>,
<a href="https://hbr.org/2018/11/the-kinds-of-data-scientist">5</a>,
<a href="https://blog.paralleldots.com/data-science/7-types-job-profiles-makes-data-scientist/">7</a>, or
<a href="https://www.dezyre.com/article/10-different-types-of-data-scientists/179">10</a>
different types of data scientists). But many organizations are not there yet,
and a data scientist working for them will not be able to spend all her time
building ultra-exciting billion-layer ANNs. Often, not only the data or the
infrastructure won't be there, but not even the expectation or the perceived
need. What will our brand new data scientist do when no one seems to really
care about data in company X? Here's a (surely incomplete) list of what I think
she should do:</p>
<ol>
<li>
<p><strong>Ask for data all the time, like, constantly:</strong> there are probably plenty
of talented people who know their job in and out in the organization but who,
at best, have never thought about the systematic use of data to inform their
decisions, or, at worst, are actively hostile to anyone else telling them how
to do their job. Please don't get cocky and act all high and mighty because you
know what a gradient boosting machine is here. Collect any data you can get your
hands on, ask for people to document what they do even if it's in random
spreadsheets, put them in a way that is conducive to analysis, and try to gather
insights that will improve the way the firm works. Present them in a humble and
constructive, non-threatening way, and people will eventually start using them.
Then you'll be flooded by requests for help and you'll never have lunch away
from your desk again.</p>
</li>
<li>
<p><strong>Ask questions all the time:</strong> often, people don't know what they don't
know, and it's by asking questions that the need for new and/or systematic data
will emerge. This will also provide you with the kind of domain knowledge that a
data scientist needs to be truly effective. Your random forest will not be of
much help if you don't understand what kinds of features are relevant and what
is important to predict and why. You need to understand the business and its
needs in order to think about metrics and measurement, and the people whose work
you are going to impact need to perceive you as a someone who understands the
challenges they face. When there are no clear answers to your questions, you will
have an opportunity to push for data collection, systematization, and analysis.</p>
</li>
<li>
<p><strong>Ask why all the time:</strong> many times, things are done in a certain way because
they;ve always been done like that. Some of those times, there is a good reason
behind the current workflow. Others, the good reason is way way in the past and
it's not a relevant factor anymore. And still other times there is no reason at
all, and the status quo is just the solidified outcome of ad hoc decisions and
serendipity. Learn to identify where there is no good rationale for how things
are done and where an in-depth exploration of the issue with data is warranted.
If you can build a good case for a change of course (i.e., one that will bring
tangible benefits to the organization), people will start to see the upside of
your approach and are more likely to embrace it.</p>
</li>
<li>
<p><strong>Ask for help all the time:</strong> the goal of the previous points is to help your
organization develop a data culture. If you do your work in isolation and only
emerge from your cave once in a while to show everyone else how smart you are you
will not only get less done and have little actual impact, but people will see
you with overt animosity (and might steal your yoghurt from the fridge). For once,
you are bound to miss crucial stuff that will make it easy to dismiss your work.
But more importantly, people need to see you working with them on the issues that
they think are relevant. That's why you ask them for data and ask them questions.
If those who will be affected by the product of your work have a sense of
ownership over it, they are much more likely to engage with and adopt it.</p>
</li>
<li>
<p><strong>Ask for the chance to share what you found all the time:</strong> you've probably
heard before that communication is central to data analysis (what's the point of
your cool new algorithm if nobody understands why you used it or what it does?),
but I cannot overemphasize the importance of communicating what you learn. Notice
that I said “what you learned,” not “what you did.” Nobody gives a damn about
what a support vector machine does, and the communication I'm talking about has
little to do with the technical aspects of your work. It's great that you are so
excited about the beauty of backpropagation, but the people in your firm want to
know how to fix and improve the problems that they deal with every day in sales,
production, marketing, customer relations, or whatever other area of the
organization you didn't even know existed. Talk to people not only to learn from
them, but to figure out what <em>the best way to talk to them</em> about what you do and
why it matters is. Don't take for granted that everyone wants to hear you
pontificate because you have a Metis certificate of completion stapled to your
cubicle wall. Engage in a two-way dialogue and show how what you do is useful to
others <em>in their own terms</em>.</p>
</li>
</ol>
<p>All these points might not sound very glamorous, and to some will even sound like
mere common sense. But I still have to see a school that teaches these concepts,
or an interviewer who tries to assess the associated skills. Let me insist once
more: I'm not saying that technical skills are not essential, and in many jobs
they may be the most important asset for a data scientist. But in many contexts
your non-technical skills will be a much better predictor of your success, both
personally and in terms of your impact on the organization. Even more, your soft
skills and your effect on the company's culture are possibly the only way that
you can get to do the fancy stuff down the road once the data, the infrastructure,
and the need for analysis are in place and well-established.</p>
<p>I hope that you also keenly perceived my subtle hints and see that all the
'asking' I advocated for above needs to be accompanied with a certain attitude
that combines humility with genuine engagement. In a company, learning is a
collective process, and the main role of the data scientist is to help the company
learn and improve by using all the data that it is able to gather, process and
leverage. You cannot do this on your own, and knowing your place in the corporate
culture and how you can have a meaningful impact on it is one of the most
important soft skills a data scientist can bring to the table.</p>]]></content:encoded>
            <author>oriol@oriolmirosa.com (Oriol Mirosa)</author>
        </item>
    </channel>
</rss>