publications
Organized in reversed chronological order. Conference publications in computer science are archival and going through highly selective peer review.
2024
- PreprintCounterfactual Token Generation in Large Language ModelsIvi Chatzi, Nina Corvelo Benz, Eleni Straitouri, Stratis Tsirtsis, and Manuel Gomez-RodriguezarXiv, 2024
"Sure, I am happy to generate a story for you: Captain Lyra stood at the helm of her trusty ship, the Maelstrom’s Fury, gazing out at the endless sea. [...] Lyra’s eyes welled up with tears as she realized the bitter truth - she had sacrificed everything for fleeting riches, and lost the love of her crew, her family, and herself." Although this story, generated by a large language model, is captivating, one may wonder – how would the story have unfolded if the model had chosen "Captain Maeve" as the protagonist instead? We cannot know. State-of-the-art large language models are stateless – they maintain no internal memory or state. Given a prompt, they generate a sequence of tokens as an output using an autoregressive process. As a consequence, they cannot reason about counterfactual alternatives to tokens they have generated in the past. In this work, our goal is to enhance them with this functionality. To this end, we develop a causal model of token generation that builds upon the Gumbel-Max structural causal model. Our model allows any large language model to perform counterfactual token generation at almost no cost in comparison with vanilla token generation, it is embarrassingly simple to implement, and it does not require any fine-tuning nor prompt engineering. We implement our model on Llama 3 8B-instruct and conduct both qualitative and quantitative analyses of counterfactually generated text. We conclude with a demonstrative application of counterfactual token generation for bias detection, unveiling interesting insights about the model of the world constructed by large language models.
- JournalOptimal Decision Making Under Strategic BehaviorStratis Tsirtsis, Behzad Tabibian, Moein Khajehnejad, Adish Singla, Bernhard Schölkopf, and Manuel Gomez-RodriguezManagement Science, 2024
We are witnessing an increasing use of data-driven predictive models to inform decisions. As decisions have implications for individuals and society, there is increasing pressure on decision makers to be transparent about their decision policies. At the same time, individuals may use knowledge, gained by transparency, to invest effort strategically in order to maximize their chances of receiving a beneficial decision. Our goal is to find decision policies that are optimal in terms of utility in such a strategic setting. To this end, we first characterize how strategic investment of effort by individuals leads to a change in the feature distribution. Using this characterization, we first show that, in general, we cannot expect to find optimal decision policies in polynomial time and there are cases in which deterministic policies are suboptimal. Then, we demonstrate that, if the cost individuals pay to change their features satisfies a natural monotonicity assumption, we can narrow down the search for the optimal policy to a particular family of decision policies with a set of desirable properties, which allow for a highly effective polynomial time heuristic search algorithm using dynamic programming. Finally, under no assumptions on the cost individuals pay to change their features, we develop an iterative search algorithm that is guaranteed to find locally optimal decision policies also in polynomial time. Experiments on synthetic and real credit card data illustrate our theoretical findings and show that the decision policies found by our algorithms achieve higher utility than those that do not account for strategic behavior.
A preliminary version appeared at the NeurIPS Workshop on Human-Centric Machine Learning, 2019.
- ConferenceTowards a computational model of responsibility judgments in sequential human-AI collaborationStratis Tsirtsis, Manuel Gomez-Rodriguez, and Tobias Gerstenberg46th Annual Conference of the Cognitive Science Society (CogSci), 2024
When a human and an AI agent collaborate to complete a task and something goes wrong, who is responsible? Prior work has developed theories to describe how people assign responsibility to individuals in teams. However, there has been little work studying the cognitive processes that underlie responsibility judgments in human-AI collaborations, especially for tasks comprising a sequence of interdependent actions. In this work, we take a step towards filling this gap. Using semi-autonomous driving as a paradigm, we develop an environment that simulates stylized cases of human-AI collaboration using a generative model of agent behavior. We propose a model of responsibility that considers how unexpected an agent’s action was, and what would have happened had they acted differently. We test the model’s predictions empirically and find that in addition to action expectations and counterfactual considerations, participants’ responsibility judgments are also affected by how much each agent actually contributed to the outcome.
A preliminary version appeared at the CHI Workshop on Theory of Mind in Human-AI Interaction, 2024.
2023
- ConferenceFinding Counterfactually Optimal Action Sequences in Continuous State SpacesStratis Tsirtsis, and Manuel Gomez-Rodriguez37th Conference on Neural Information Processing Systems (NeurIPS), 2023
Whenever a clinician reflects on the efficacy of a sequence of treatment decisions for a patient, they may try to identify critical time steps where, had they made different decisions, the patient’s health would have improved. While recent methods at the intersection of causal inference and reinforcement learning promise to aid human experts, as the clinician above, to retrospectively analyze sequential decision making processes, they have focused on environments with finitely many discrete states. However, in many practical applications, the state of the environment is inherently continuous in nature. In this paper, we aim to fill this gap. We start by formally characterizing a sequence of discrete actions and continuous states using finite horizon Markov decision processes and a broad class of bijective structural causal models. Building upon this characterization, we formalize the problem of finding counterfactually optimal action sequences and show that, in general, we cannot expect to solve it in polynomial time. Then, we develop a search method based on the A* algorithm that, under a natural form of Lipschitz continuity of the environment’s dynamics, is guaranteed to return the optimal solution to the problem. Experiments on real clinical data show that our method is very efficient in practice, and it has the potential to offer interesting insights for sequential decision making tasks.
A preliminary version appeared at the ICML Workshop on Counterfactuals in Minds and Machines, 2023.
- ConferenceOn the Within-Group Fairness of Screening ClassifiersNastaran Okati, Stratis Tsirtsis, and Manuel Gomez-Rodriguez40th International Conference on Machine Learning (ICML), 2023
Screening classifiers are increasingly used to identify qualified candidates in a variety of selection processes. In this context, it has been recently shown that if a classifier is calibrated, one can identify the smallest set of candidates which contains, in expectation, a desired number of qualified candidates using a threshold decision rule. This lends support to focusing on calibration as the only requirement for screening classifiers. In this paper, we argue that screening policies that use calibrated classifiers may suffer from an understudied type of within-group unfairness—they may unfairly treat qualified members within demographic groups of interest. Further, we argue that this type of unfairness can be avoided if classifiers satisfy within-group monotonicity, a natural monotonicity property within each group. Then, we introduce an efficient post-processing algorithm based on dynamic programming to minimally modify a given calibrated classifier so that its probability estimates satisfy within-group monotonicity. We validate our algorithm using US Census survey data and show that withingroup monotonicity can often be achieved at a small cost in terms of prediction granularity and shortlist size.
Also appeared at the 2nd ACM SIGKDD Workshop on Ethical Artificial Intelligence: Methods and Applications, 2023.
2022
- JournalQuantifying the Effects of Contact Tracing, Testing, and Containment Measures in the Presence of Infection HotspotsLars Lorch, Heiner Kremer, William Trouleau, Stratis Tsirtsis, Aron Szanto, Bernhard Schölkopf, and Manuel Gomez-RodriguezACM Transactions on Spatial Algorithms and Systems, 2022
Multiple lines of evidence strongly suggest that infection hotspots, where a single individual infects many others, play a key role in the transmission dynamics of COVID-19. However, most of the existing epidemiological models fail to capture this aspect by neither representing the sites visited by individuals explicitly nor characterizing disease transmission as a function of individual mobility patterns. In this work, we introduce a temporal point process modeling framework that specifically represents visits to the sites where individuals get in contact and infect each other. Under our model, the number of infections caused by an infectious individual naturally emerges to be overdispersed. Using an efficient sampling algorithm, we demonstrate how to estimate the transmission rate of infectious individuals at the sites they visit and in their households using Bayesian optimization (BO) and longitudinal case data. Simulations using fine-grained and publicly available demographic data and site locations from Bern, Switzerland showcase the flexibility of our framework. To facilitate research and analyses of other cities and regions, we release an open-source implementation of our framework.
- JournalPooled Testing of Traced Contacts Under Superspreading DynamicsStratis Tsirtsis, Abir De, Lars Lorch, and Manuel Gomez-RodriguezPLOS Computational Biology, 2022
Testing is recommended for all close contacts of confirmed COVID-19 patients. However, existing pooled testing methods are oblivious to the circumstances of contagion provided by contact tracing. Here, we build upon a well-known semi-adaptive pooled testing method, Dorfman’s method with imperfect tests, and derive a simple pooled testing method based on dynamic programming that is specifically designed to use information provided by contact tracing. Experiments using a variety of reproduction numbers and dispersion levels, including those estimated in the context of the COVID-19 pandemic, show that the pools found using our method result in a significantly lower number of tests than those found using Dorfman’s method. Our method provides the greatest competitive advantage when the number of contacts of an infected individual is small, or the distribution of secondary infections is highly overdispersed. Moreover, it maintains this competitive advantage under imperfect contact tracing and significant levels of dilution.
2021
- ConferenceCounterfactual Explanations in Sequential Decision Making Under UncertaintyStratis Tsirtsis, Abir De, and Manuel Gomez-Rodriguez35th Conference on Neural Information Processing Systems (NeurIPS), 2021
Methods to find counterfactual explanations have predominantly focused on one-step decision making processes. In this work, we initiate the development of methods to find counterfactual explanations for decision making processes in which multiple, dependent actions are taken sequentially over time. We start by formally characterizing a sequence of actions and states using finite horizon Markov decision processes and the Gumbel-Max structural causal model. Building upon this characterization, we formally state the problem of finding counterfactual explanations for sequential decision making processes. In our problem formulation, the counterfactual explanation specifies an alternative sequence of actions differing in at most k actions from the observed sequence that could have led the observed process realization to a better outcome. Then, we introduce a polynomial time algorithm based on dynamic programming to build a counterfactual policy that is guaranteed to always provide the optimal counterfactual explanation on every possible realization of the counterfactual environment dynamics. We validate our algorithm using both synthetic and real data from cognitive behavioral therapy and show that the counterfactual explanations our algorithm finds can provide valuable insights to enhance sequential decision making under uncertainty.
A preliminary version appeared at the ICML Workshop on Interpretable Machine Learning in Healthcare, 2021.
- ConferenceBridging Machine Learning and Mechanism Design towards Algorithmic FairnessJessie Finocchiaro, Roland Maio, Faidra Monachou, Gourab K Patro, Manish Raghavan, Ana-Andreea Stoica, and Stratis Tsirtsis4th ACM Conference on Fairness, Accountability, and Transparency (FAccT), 2021
Decision-making systems increasingly orchestrate our world: how to intervene on the algorithmic components to build fair and equitable systems is therefore a question of utmost importance; one that is substantially complicated by the context-dependent nature of fairness and discrimination. Modern decision-making systems that involve allocating resources or information to people (e.g., school choice, advertising) incorporate machine-learned predictions in their pipelines, raising concerns about potential strategic behavior or constrained allocation, concerns usually tackled in the context of mechanism design. Although both machine learning and mechanism design have developed frameworks for addressing issues of fairness and equity, in some complex decision-making systems, neither framework is individually sufficient. In this paper, we develop the position that building fair decision-making systems requires overcoming these limitations which, we argue, are inherent to each field. Our ultimate objective is to build an encompassing framework that cohesively bridges the individual frameworks of mechanism design and machine learning. We begin to lay the ground work towards this goal by comparing the perspective each discipline takes on fair decision-making, teasing out the lessons each field has taught and can teach the other, and highlighting application domains that require a strong collaboration between these disciplines.
A preliminary version appeared at the Harvard CRCS Workshop on AI for Social Good (AI4SG), 2020.
2020
- ConferenceDecisions, Counterfactual Explanations and Strategic BehaviorStratis Tsirtsis, and Manuel Gomez-Rodriguez34th Conference on Neural Information Processing Systems (NeurIPS), 2020
As data-driven predictive models are increasingly used to inform decisions, it has been argued that decision makers should provide explanations that help individuals understand what would have to change for these decisions to be beneficial ones. However, there has been little discussion on the possibility that individuals may use the above counterfactual explanations to invest effort strategically and maximize their chances of receiving a beneficial decision. In this paper, our goal is to find policies and counterfactual explanations that are optimal in terms of utility in such a strategic setting. We first show that, given a pre-defined policy, the problem of finding the optimal set of counterfactual explanations is NP-hard. Then, we show that the corresponding objective is nondecreasing and satisfies submodularity and this allows a standard greedy algorithm to enjoy approximation guarantees. In addition, we further show that the problem of jointly finding both the optimal policy and set of counterfactual explanations reduces to maximizing a non-monotone submodular function. As a result, we can use a recent randomized algorithm to solve the problem, which also offers approximation guarantees. Finally, we demonstrate that, by incorporating a matroid constraint into the problem formulation, we can increase the diversity of the optimal set of counterfactual explanations and incentivize individuals across the whole spectrum of the population to self improve. Experiments on synthetic and real lending and credit card data illustrate our theoretical findings and show that the counterfactual explanations and decision policies found by our algorithms achieve higher utility than several competitive baselines.
A preliminary version appeared at the 4th Workshop on Mechanism Design for Social Good (MD4SG), 2020.