The Limits of Data December 21, 2020
Posted by Peter Varhol in Algorithms, Machine Learning, Strategy.Tags: bias, data
1 comment so far
I’ve been teaching statistics and operations research since, well, the mid-1980s I guess, to more or less degrees of student sophistication. In most cases, I try to add some real world context over what most students consider to be a dry and irrelevant topic, even as I realize that most people are in the room because it’s required for their degree.
Except that over the last few years statistics and analytics has shown itself to be anything but irrelevant. As data has become easier to collect and store, and faster processing has brought information to life from data in real time, more and more scientific, engineering, business, and management professionals are at least trying to use data to make more justifiable decisions.
(I casually follow American professional football, and have been amazed over the last few years to see disdain for any sort of analytics turn into a slavish following and detailed definition of obscure analytical results.)
And at least some people seem to be paying attention. I still get a lot of “I’m not a math person” or “I make my decisions without considering data” but that is becoming less common as people recognize that they are expected to justify the directions they take.
In general this is a good trend. An informed decision is demonstrably better than one based on “gut feel.” As the saying goes, you are entitled to your own opinion, but not your own facts. Professionals making decisions based on analytics won’t always result in the right answer, but it will be better than what many are doing today.
But data is not a universal panacea. First, any data set we use may not accurately represent the problem domain. There may have been data collection errors, or the data may not be highly related with the conclusion you want to draw. For example, there may be a correlation with intelligence and income, but the true determiner may well be education, not intelligence. In these circumstances, our analytics can lead us to the wrong conclusion.
Our data can also be biased. Machine learning systems do a poor job at facial recognition of other races, for example, causing high levels of misidentification. This is primarily because we don’t have good data on facial characteristics of those races. Years ago, Amazon came up with an algorithm to identify potential candidates for IT jobs that overwhelmingly used male data. The algorithms quite naturally came to the incorrect conclusion that only men made good IT workers.
So while our data can make decisions more accurately, it’s only the case when we apply it correctly. And that’s not as easy as it sounds.
Will We Ever Be Ready for Smart Cities? July 12, 2019
Posted by Peter Varhol in Machine Learning, Technology and Culture.Tags: data, smart cities
add a comment
In theory, a smart city is a great idea. With thousands of sensors and real time data analytics, the city and its inhabitants can operate far more efficiently than they do today. We have detailed traffic, pedestrian, and shopping patterns, right down to the individual if we so choose.
We can use data on traffic flows to route traffic and coordinate traffic lights. Stores can operate at times that are convenient to people. Power plants can generate electricity based on actual real time usage. Crime patterns can be easily identified, with crime avoidance and crimefighting strategies applied accordingly. The amount of data that can be collected in a city with tens of thousands of sensors all feeding into a massive database is enormous.
This is what Google (Alphabet) wants to do in a development in Toronto, with its company Sidewalk Labs, and last year won the right to take a neighborhood under development and make it a smart city. This article cites that urban planners have rushed to develop the waterfront area and build the necessary infrastructure to create at least a smart neighborhood that demonstrates many of the concepts.
But now Toronto is pushing back on the whole idea. The primary issue is one of data control and use. A smart city will generate enormous amounts of data, not just on aggregates of people, but on identifiable images and people. It seems this was left as a “to be determined” item in initial selection and negotiations. Now that Sidewalk Labs is moving forward to build out the plan, the question of the data has come to the forefront. And what is occurring isn’t pretty.
The answer that seems to be popular is called a “data trust”, a storage and access entity that protects the data from both government and the vendor supplying the smart services. Alphabet’s Sidewalk Labs claims to have produced the strongest possible data protection plan; Toronto and activist groups strongly disagree. Without seeing the plan, I can’t say, but I can say that I would be concerned about a commercial vendor (especially one connected to Google) having any access to this level of data for any purpose. It is truly the next level of potentially breeching privacy to obtain deeper commercial data. And do any of us really think that Google won’t ultimately do so?
Now, I was raised in rural America, and while I am comfortable enough whenever I am in a city, it is not my preferred habitat. It seems to me that there is a tradeoff between privacy and the ability to use data on individual activities (even aggregated) to make day to day activities more efficient for the city and its occupants. Despite the abstract advantages in the smart cities approach, I don’t think we have the trust necessary to carry it out.
Who Is the Data For? March 1, 2017
Posted by Peter Varhol in Publishing, Technology and Culture.Tags: big data, data
add a comment
Andreas Weigend recently published an intriguing book called Data For the People, in which he argues that we are not going to stop the proliferation of personal data that is used to categorize and market to us, so we should embrace this change and find ways to use collected data to our advantage.
He cites many of the data points that I do in my blog posts, but comes to different conclusions. In particular, my own thoughts are to limit my use of personal data on a case-by-case basis. His own conclusion is that we need to accept the proliferation of personal data as inevitable, and embrace it in a way that makes it valuable to us.
He makes a lot of sense, from an alternative point of view from mine, and I won’t dismiss it out of hand.
However, I would like to contrast that with another article, one that points out that when we choose our friends through shared data, we lose our ability to connect with our physical neighbors.
So, here is what I think. I think Andreas is correct, strategically. But I am simply not sure how we get from where we are to where he wants to be. I don’t think it will be clean and neat. And it certainly won’t be convenient, especially for those of us who are at least part way through our lives.
I’ve used this quote before, but it remains apropos. From Crosby, Stills, and Nash: “If you can’t be with the one you love, love the one you’re with.”
Weapons of Math Instruction February 15, 2017
Posted by Peter Varhol in Education, Technology and Culture.Tags: data, Math, statistics
add a comment
That old (and lame) joke, of course, refers to Al-Gebra (algebra). But the fear of math is very real. For decades, many have hid behind the matra “I’m not a math person”, without exploring the roots of that statement. This article, by Jenny Anderson on Quartz, offers hope that we may be able to move on from this false rhetoric.
I never understood math early, but I always loved it. Post-BA degree, I taught myself calculus, and obtained an MS in applied math.
I taught various math and statistics courses to college students for 15 years. I would like to think that my enthusiasm and down-to-earth explanations at the very least made it tolerable to them. I still remember one student saying to me, “In elementary school, the teacher would preface the math lesson by saying, ‘I don’t want to do this any more than you do, but we have to, so let’s get it over with.’” I think teaching is a very big part of the problem. If teachers don’t like the topic, neither will their students.
I especially came to appreciate word problems, something that few if any students liked. I had a method of dealing with them. My original issue with word problems was that if I read it once and didn’t immediately see the solution, I would be stumped. Instead, I taught people to read the problem first, to understand it without seeking a solution. Then read it again, and highlight any information that seemed pertinent. Then read it a third time, to pull out that information and see how it might help lead to a solution. Then try a formula. If it didn’t seem to work out, discard it and start back at step 1.
It is not hard, folks, though it does require overcoming age-old biases, as well as a willingness to be open to new ways of thinking. Anderson notes that learning and applying math and quantitative methods requires a growth mindset. That is, a willingness to get something wrong, and learn from it for the future.
As we move (or already have moved) into a data-driven world that requires an intimate understanding of how data shape our lives, we can no longer plead ignorance, or lack of ability. If we plead lack of interest, we will be left behind.
Alexa, Delete My Data December 25, 2016
Posted by Peter Varhol in Software platforms, Technology and Culture.Tags: Alexa, data, privacy
add a comment
As we become inundated this holiday season by Amazon ads for its EchoDot voice system and Alexa artificial intelligent assistant, I confess I remain conflicted about the potential and reality of AI technology in our lives.
To be sure, the Alexa commercials are wonderful. For those of us who grew up under the influence of George Jetson (were they really only on TV for one season?), Alexa represents the realization of something that we could only dream about for the last 50+ years. Few of us can afford a human assistant, but the intelligent virtual assistant is a reality. The future is now!
It’s only when you think it through that it becomes more problematic. A necessary corollary to an intelligent virtual assistant is that assistant has enough data about you to recognize what are at times ambiguous instructions. And by having that data, and current information about us, we could imagine issues with instructions like these:
“Alexa, I’m just going out for a few minutes; don’t bother setting the burglar alarm.”
“Alexa, turn the temperature down to 55 until January 15; I won’t be home.”
I’m sure that Google already has a lot of information on me. I rarely log into my Google account, but it identifies me anyway, so it knows what I search for. And Google knows my travel photos, through Picasa. Amazon also identifies me without logging in, but I don’t buy a lot through Amazon, so its data is less complete. Your own mileage with these and other data aggregators may vary.
To be fair, the US government currently and in the past has been in possession of an incredible amount of information on most adults. I have held jobs and am a taxpayer; I have a driver’s license (and pilot’s license, for that matter); I am a military veteran; and I’ve held government security clearances.
I’d always believed that my best privacy protection was the fact that government databases didn’t talk to one another. The IRS didn’t know, and didn’t care, whether or not my military discharge was honorable (it was). Yeah. That may have been true at one time, but it is changing. Data exchange between government agencies won’t be seamless in my lifetime, but it is heading, slowly but exorably in that direction.
And the commercial firms are far more efficient. Google and Facebook today know more about us than anyone might imagine. Third party data brokers can make our data show up in the strangest places.
And lest you mistake me, I’m not saying that this is necessarily a bad thing. There are tradeoffs in every action we take. Rather, it’s something that we let happen without thinking about it. We can come up with all sorts of rationalizations on why we love the convenience and efficiency, but rarely ponder the other side of the coin.
I personally try to think about the implications every time I release data to a computer, and sometimes decline to do so (take that, Facebook). And in some cases, such as my writings and conference talks, I’ve made career decisions that I am well aware make more data available on me. I haven’t yet decided on Alexa, but I am certainly not going to be an early adopter.
Update: Oh my. http://www.cnn.com/2016/12/28/tech/amazon-echo-alexa-bentonville-arkansas-murder-case-trnd/index.html



