Only empirical evidence counts

Michael Stevens, a philosopher, writing at Aeon:

Indeed, I conjecture, modern science arose in the 17th century, in the course of the so-called Scientific Revolution, precisely because it stumbled upon the extraordinary motivating power of ‘only empirical evidence counts’ – a story I tell in my book The Knowledge Machine (2020). For thousands of years, philosophers thinking deeply about nature had valued empirical evidence highly, but they had valued many other avenues of thought in addition: philosophical thought, theological thought and aesthetic thought. Consequently, they were never forced, like Kuhn’s scientists, to throw themselves wholeheartedly into experimentation and observation alone. They watched the way the world worked, but they stopped measuring and started thinking too soon. They missed out on the little details that tell us so much. Only once thinkers’ intellectual horizons were closed off by unreasonable constraints on argument was modern science born.

What “evidence-based” thinking leaves out

A few loosely related items I was reading today…

Zeynep Tufekci makes the case against election forecasts, and says this:

This is where weather and electoral forecasts start to differ. For weather, we have fundamentals — advanced science on how atmospheric dynamics work — and years of detailed, day-by-day, even hour-by-hour data from a vast number of observation stations. For elections, we simply do not have anything near that kind of knowledge or data. While we have some theories on what influences voters, we have no fine-grained understanding of why people vote the way they do, and what polling data we have is relatively sparse.

How much does that matter? That, really, has been the motivating question behind all the posts I’ve put up over the last several months on the relative roles of theory vs. evidence. (On models: here, here, here, and here. And on theory and data: here, here, here, here, and here.)

I was thinking about evidence-based medicine today for a reason I’ll share in a second, and it led me to this 2003 paper on the philosophy of evidence-based medicine which the authors see as essentially prizing randomized controlled trials over other forms of evidence. Here’s a worthwhile bit:

Even when a clinical trial returns positive results in the treatment arm that satisfy tests of statistical significance, we will have more confidence in these results when they have some antecedent biological plausibility.[28,29] Put more generally, we would suggest that the degree of confidence appropriate for a clinically tested claim is a function of both the strength of the clinical result and the claim’s antecedent biological plausibility.

This gets at my point the other day about theory and data as complements. (The whole section of the paper “EBM and Basic Science” is worth a read.)

Why was I reading about evidence-based medicine? Because of the comparison between evidence-based medicine and effective altruism from this talk and this paper by philosopher Hilary Greaves. Effective altruism also tends to prize causal evidence like RCTs above other forms of evidence. Greaves’ point is that many (most?) of the consequences of any given intervention aren’t (can’t be?) measured by these methods. If someone sets out to maximize the net consequences of their actions, this raises real problems.

Far as I can tell, a lot hinges in this discussion on whether unmeasured effects correlate with measured ones, and how. But Greaves ends this way:

If we deliberately try to beneficially influence the course of the very far future, can we find things where we more robustly have at least some clue that what we’re doing is beneficial and of how beneficial it is? I think the answer is yes.

That is a tall order, but potentially an area where attention to theoretical plausibility can help.

Judgment and wisdom

Two definitions of judgment…

First, from psychology:

The term judgment refers to the cognitive aspects of our decision-making process.

Judgment in Managerial Decision Making, Bazerman and Moore

And from economics:

All human activities can be described by five high-level components: data, prediction, judgment, action, and outcomes. For example, a visit to the doctor in response to pain leads to: 1) x-rays, blood tests, monitoring (data), 2) diagnosis of the problem, such as “if we administer treatment A, then we predict outcome X, but if we administer treatment B, then we predict outcome Y” (prediction), 3) weighing options: “given your age, lifestyle, and family status, I think you might be best with treatment A; let’s discuss how you feel about the risks and side effects” (judgment); 4) administering treatment A (action), and 5) full recovery with minor side effects (outcome).

Agrawal, Goldfarb, Gans

In the first view, judgment is broad and prediction is basically just judgment in a particular context (the future). (A similar definition is given in Michael Lewis’s The Undoing Project describing how psychologists Kahneman and Tversky thought about judgment and prediction.) In the second definition, judgment is narrower and involves weighing tradeoffs. Judgment is about understanding your own utility function.

I thought about these two definitions when reading this piece in Aeon about wisdom. Here’s how a group of philosophers and social scientists summed up that concept:

For instance, we found that scientists, like many philosophers before them, considered wisdom to be morally grounded: an aspirational quality helping you figure out the right thing to do in a complex situation and promote the common good. For the group, wisdom broadly included ideas such as the sense of shared humanity, pursuit of truth, recognition of the need to balance personal interests with other people’s interests, and a general willingness to cooperate and have compassion for others. Most scholars didn’t insist on moral behaviour as a prerequisite for wisdom. Sometimes a situation prevents one from acting on one’s moral impulses. Other times people make mistakes. Nonetheless, our task force agreed that moral grounding was the first foundational pillar of the common wisdom model in empirical sciences.

The second foundational pillar we arrived at was meta-cognition – the overarching mental processes that guide our thoughts. Human thoughts can be guided by a range of emotions, motives and visceral responses. Thoughts are governed by other thoughts, too, which is what meta-cognition is about. We engage in meta-cognition when creating reminders for an ingredient we might otherwise forget when cooking a new recipe, or when following specific instructions for assembling an Ikea chair. Meta-cognition also helps us check ourselves when we are wrong or when as we gather a broader range of perspectives on complex issues, gaining a big-picture view. You engage meta-cognition when showing signs of intellectual humility, recognising the limits of your knowledge. Or when you consider the diverse perspectives of those with whom you disagree.

(emphasis mine)

Squint and you can see both definitions of judgment here. Metacognition describes, in practice, much of what we think of as good judgment in the psychological sense. By thinking about thinking, we think a bit better. “Moral grounding” arguably speaks to judgment in the economic sense: it is primarily about weighing tradeoffs in complex situations to figure out what counts as best — which is what it actually takes to map consequences onto a utility function. (Moral grounding considers more than the individual in that calculation, but that’s not at odds with utility functions, it only sometimes seems that way because of how they’re taught in intro econ.)

So, what is wisdom? It’s possessing good judgment in multiple senses: the ability to think carefully about the world including about causes and consequences; and the ability to weigh personal and moral values in complicated situations, to arrive at good decisions.

Piketty on models

To summarize: models should be used with parsimony–that is, only when we really need them–and their role should not [be] exaggerated. Models can be useful to organize the data and clarify simple logical relations between basic concepts; but they cannot replace the historical narrative, which in my view must be the real core of the analysis (and which I consider to be the core of my book). The complexity and multidimensionality of historical, social, and political processes in real-world societies are so great that there is no way they can be adequately described by mathematical language alone: one needs to use primarily the natural language of the social sciences (and sometimes the language of literature and movies, which, as I try to show in my book, can be viewed as an additional and complementary way to grasp social and historical realities, just like mathematical language).

-Thomas Piketty, in his contribution to After Piketty: The agenda for economics and inequality, p. 554

Past posts on models: here, here, and here. And on theory and data: here, here, here, here, and here.

The empiricist shock

I’ve been posting a bit lately about data and theory, and the other week I excerpted the Stanford Encyclopedia of Philosophy’s entry on big data and science. I want to return to that topic through the lens of economics.

In short, the proliferation of data can be thought of as an economic shock and basic economic theory would then predict that it would play a greater role in science.

In an article that became the book Prediction Machines, economists Ajay Agrawal, Joshua Gans, and Avi Goldfarb talk about AI as a drop in the cost of prediction:

Technological revolutions tend to involve some important activity becoming cheap, like the cost of communication or finding information… When the cost of any input falls so precipitously, there are two other well-established economic implications. First, we will start using prediction to perform tasks where we previously didn’t. Second, the value of other things that complement prediction will rise…

As a historical example, consider semiconductors, an area of technological advance that caused a significant drop in the cost of a different input: arithmetic. With semiconductors we could calculate cheaply, so activities for which arithmetic was a key input, such as data analysis and accounting, became much cheaper. However, we also started using the newly cheap arithmetic to solve problems that were not historically arithmetic problems. An example is photography. We shifted from a film-oriented, chemistry-based approach to a digital-oriented, arithmetic-based approach. Other new applications for cheap arithmetic include communications, music, and drug discovery.

What does that mean for science and the role of data? As the cost of collecting data drops, scientists will use it more. For example, as the Stanford entry suggests, some see data-driven exploration as a substitute for traditional methods of hypothesis generation. If that’s the case, economic theory would expect the former to become more common and the latter less so. But what about theory? Most people would say theory is a complement to data, not a substitute, in which case its value should rise. This offers a sort of a synthesis position between advocates of data and theory at present: data-driven methods will and should become more common. But that shift makes theory more important, not less.

Obviously, this is all super speculative. Just thinking through the analogy.

From fact to law

Benjamin Peirce (father of Charles Sanders Peirce) on induction, deduction, and the role of math in the scientific process:

Observation supplies fact. Induction ascends from fact to law. Deduction, by applying the pure logic of mathematics, reverses the process and descends from law to fact. The facts of observation are liable to the uncertainties and inaccuracies of the human senses; and the first inductions of law are rough approximations to the truth. The law is freed from the defects of observation and converted by the speculations of the geometer into exact form. But it has ceased to be pure induction, and has become ideal hypothesis. Deductions are made from it with syllogistic precision, and consequent facts are logically evolved without immediate reference to the actual events of Nature. If the results of computation coincide, not merely qualitatively but quantitatively, with observation, the law is established as a reality, and is restored to the domain of induction.

Via The Metaphysical Club, p. 155-156. The original text is from 1881.

Past posts on theory and data here, here, and here.

More on data and theory

Past posts here and here.

First up, “A problem in theory,” an essay from 2019 blaming the replication crisis in psychology research largely on lack of theory:

The replication crisis facing the psychological sciences is widely regarded as rooted in methodological or statistical shortcomings. We argue that a large part of the problem is the lack of a cumulative theoretical framework or frameworks. Without an overarching theoretical framework that generates hypotheses across diverse domains, empirical programs spawn and grow from personal intuitions and culturally biased folk theories. By providing ways to develop clear predictions, including through the use of formal modelling, theoretical frameworks set expectations that determine whether a new finding is confirmatory, nicely integrating with existing lines of research, or surprising, and therefore requiring further replication and scrutiny. Such frameworks also prioritize certain research foci, motivate the use diverse empirical approaches and, often, provide a natural means to integrate across the sciences. Thus, overarching theoretical frameworks pave the way toward a more general theory of human behaviour. We illustrate one such a theoretical framework: dual inheritance theory.

Second, the Stanford Encylopedia of Philosophy’s entry on big data and scientific research (long quote coming):

6. Big Data, Knowledge and Inquiry

Let us now return to the idea of data-driven inquiry, often suggested as a counterpoint to hypothesis-driven science (e.g., Hey et al. 2009). Kevin Elliot and colleagues have offered a brief history of hypothesis-driven inquiry (Elliott et al. 2016), emphasising how scientific institutions (including funding programmes and publication venues) have pushed researchers towards a Popperian conceptualisation of inquiry as the formulation and testing of a strong hypothesis. Big data analysis clearly points to a different and arguably Baconian understanding of the role of hypothesis in science. Theoretical expectations are no longer seen as driving the process of inquiry and empirical input is recognised as primary in determining the direction of research and the phenomena—and related hypotheses—considered by researchers.

The emphasis on data as a central component of research poses a significant challenge to one of the best-established philosophical views on scientific knowledge. According to this view, which I shall label the theory-centric view of science, scientific knowledge consists of justified true beliefs about the world. These beliefs are obtained through empirical methods aiming to test the validity and reliability of statements that describe or explain aspects of reality. Hence scientific knowledge is conceptualised as inherently propositional: what counts as an output are claims published in books and journals, which are also typically presented as solutions to hypothesis-driven inquiry. This view acknowledges the significance of methods, data, models, instruments and materials within scientific investigations, but ultimately regards them as means towards one end: the achievement of true claims about the world. Reichenbach’s seminal distinction between contexts of discovery and justification exemplifies this position (Reichenbach 1938). Theory-centrism recognises research components such as data and related practical skills as essential to discovery, and more specifically to the messy, irrational part of scientific work that involves value judgements, trial-and-error, intuition and exploration and within which the very phenomena to be investigated may not have been stabilised. The justification of claims, by contrast, involves the rational reconstruction of the research that has been performed, so that it conforms to established norms of inferential reasoning. Importantly, within the context of justification, only data that support the claims of interest are explicitly reported and discussed: everything else—including the vast majority of data produced in the course of inquiry—is lost to the chaotic context of discovery.[2]

Much recent philosophy of science, and particularly modelling and experimentation, has challenged theory-centrism by highlighting the role of models, methods and modes of intervention as research outputs rather than simple tools, and stressing the importance of expanding philosophical understandings of scientific knowledge to include these elements alongside propositional claims. The rise of big data offers another opportunity to reframe understandings of scientific knowledge as not necessarily centred on theories and to include non-propositional components—thus, in Cartwright’s paraphrase of Gilbert Ryle’s famous distinction, refocusing on knowing-how over knowing-that (Cartwright 2019). One way to construe data-centric methods is indeed to embrace a conception of knowledge as ability, such as promoted by early pragmatists like John Dewey and more recently reprised by Chang, who specifically highlighted it as the broader category within which the understanding of knowledge-as-information needs to be placed (Chang 2017).

Another way to interpret the rise of big data is as a vindication of inductivism in the face of the barrage of philosophical criticism levelled against theory-free reasoning over the centuries. For instance, Jon Williamson (2004: 88) has argued that advances in automation, combined with the emergence of big data, lend plausibility to inductivist philosophy of science. Wolfgang Pietsch agrees with this view and provided a sophisticated framework to understand just what kind of inductive reasoning is instigated by big data and related machine learning methods such as decision trees (Pietsch 2015). Following John Stuart Mill, he calls this approach variational induction and presents it as common to both big data approaches and exploratory experimentation, though the former can handle a much larger number of variables (Pietsch 2015: 913). Pietsch concludes that the problem of theory-ladenness in machine learning can be addressed by determining under which theoretical assumptions variational induction works (2015: 910ff).

Others are less inclined to see theory-ladenness as a problem that can be mitigated by data-intensive methods, and rather see it as a constitutive part of the process of empirical inquiry. Arching back to the extensive literature on perspectivism and experimentation (Gooding 1990; Giere 2006; Radder 2006; Massimi 2012), Werner Callebaut has forcefully argued that the most sophisticated and standardised measurements embody a specific theoretical perspective, and this is no less true of big data (Callebaut 2012). Elliott and colleagues emphasise that conceptualising big data analysis as atheoretical risks encouraging unsophisticated attitudes to empirical investigation as a

“fishing expedition”, having a high probability of leading to nonsense results or spurious correlations, being reliant on scientists who do not have adequate expertise in data analysis, and yielding data biased by the mode of collection. (Elliott et al. 2016: 880)

To address related worries in genetic analysis, Ken Waters has provided the useful characterisation of “theory-informed” inquiry (Waters 2007), which can be invoked to stress how theory informs the methods used to extract meaningful patterns from big data, and yet does not necessarily determine either the starting point or the outcomes of data-intensive science. This does not resolve the question of what role theory actually plays. Rob Kitchin (2014) has proposed to see big data as linked to a new mode of hypothesis generation within a hypothetical-deductive framework. Leonelli is more sceptical of attempts to match big data approaches, which are many and diverse, with a specific type of inferential logic. She rather focused on the extent to which the theoretical apparatus at work within big data analysis rests on conceptual decisions about how to order and classify data—and proposed that such decisions can give rise to a particular form of theorization, which she calls classificatory theory (Leonelli 2016).

These disagreements point to big data as eliciting diverse understandings of the nature of knowledge and inquiry, and the complex iterations through which different inferential methods build on each other. Again, in the words of Elliot and colleagues,

attempting to draw a sharp distinction between hypothesis-driven and data-intensive science is misleading; these modes of research are not in fact orthogonal and often intertwine in actual scientific practice. (Elliott et al. 2016: 881, see also O’Malley et al. 2009, Elliott 2012)

Studying the replication crisis

Vox’s Future Perfect newsletter reports:

“Just carefully reading a paper — even as a layperson without deep knowledge of the field — is sufficient to form a pretty accurate guess about whether the study will replicate.

Meanwhile, DARPA’s replication markets found that guessing which papers will hold up and which won’t is often just a matter of looking at whether the study makes any sense. Some important statistics to take note of: Did the researchers squeeze out a result barely below the significance threshold of p = 0.05? (A paper can often claim a “significant” result if this threshold is met, and many use various statistical tricks to push their paper across that line.) Did they find no effects in most groups but significant effects for a tiny, hyper-specific subgroup?

“Predicting replication is easy,” Menard writes. “There’s no need for a deep dive into the statistical methodology or a rigorous examination of the data, no need to scrutinize esoteric theories for subtle errors—these papers have obvious, surface-level problems.”

This is important work and I get the point but in a way it’s studying things backwards. It’s assessing whether laypeople can do better than random at predicting which studies will replicate, which, again, is important. But the test of the studies’ usefulness is really whether they can help people improve their judgments, not the other way around.

The study I’d like to see would work like this: A group of people is asked to predict the result of a forthcoming study which, unbeknownst to them, is a replication of a past study. They’re asked to predict the effect that some intervention has on some outcome variable. One group, the control, makes this prediction just based on their knowledge of the world. The other group, the treatment, gets access to the original study. They can read it, see its result and methodology, and then incorporate that (if they want to) in making their prediction.

Would access to the original studies improve peoples’ predictions?

New name, same blog

In late 2009, a little more than a decade ago, I started this blog. It wasn’t my first time blogging, but it was my first sustained effort and I might have started it sooner if it hadn’t been for the difficulty of picking a suitable name. After far too much deliberation, someone close to me suggested “Beyond the times,” which I liked because it captured my interest in the media and in the future.

My subject, as I announced it to quite literally no one, was to cover “The Internet, Information, and the Public Sphere.”

I’ve written more than 350 posts over the intervening years, and a lot has changed since then. When I started, I was writing about the media from outside of it. But almost a year after launching the blog, I wrote a post about dating algorithms, in response to a piece on The Atlantic’s newly launched tech vertical. That led to some contributions to the section, which led to a job reporting on tech for a news startup, which led to jobs at HBR and now Quartz.

Since joining the media eight years ago, I’ve written less about it. I still have lots of opinions about media and journalism, of course. But my writing has focused on innovation and the economy, and today I’m renaming the blog to reflect that.

This blog’s name is now Nonrival, to reflect my current focus on economics and my continued interest in information and innovation.

Most economic goods are “rivalrous,” meaning if one person consumes them then another person can’t. If you and I have an apple, you can eat it or I can eat it or we can split it. We can’t both eat the whole apple. But nonrivalrous goods are different. If I share an idea with you, we both get to enjoy it. If you share it with someone else, it doesn’t take anything away from me. Digital goods are nonrival.* A Netflix episode is more like an idea than an apple. The new name captures my focus not just on the internet but its economic effects.

And to the extent that “nonrival” has any meaning colloquially, it’s one I like, too. One of the topics I blogged about most in the early days was collaboration, and “nonrival” gets at some of that spirit.

I’m hoping to add a Nonrival newsletter soon, too. You can sign up in advance here.

*OK, sure, not totally. Server space and various other physical goods that support digital ones may be rivalrous.


Notes on political and social change

Just a post to clip together some resources…

Julie Battilana at Harvard, in SSIR:

In this article, we build on research on social change,1 including our own research,2 for which we studied hundreds of social change initiatives over multiple years and interviewed social entrepreneurs, civil society leaders, and public officials around the world. We identify three distinct roles played by those who participate in movements for social change: agitator, innovator, and orchestrator. An agitator brings the grievances of specific individuals or groups to the forefront of public awareness. An innovator creates an actionable solution3 to address these grievances. And an orchestrator coordinates action across groups, organizations, and sectors to scale the proposed solution. Any pathway to social change requires all three. Agitation without innovation means complaints without ways forward, and innovation without orchestration means ideas without impact.

Four rules for effective protests, from Vox.

Cass Sunstein’s book, How Change Happens.