The digital world we live in produces an explosive amount of personal data on a daily basis. From demographic surveys to biomedical studies, and to user information from massive online platforms such as Facebook and Netflix, large-scale collections of human data are translated into open-source databases or formats accessible for scientific research, as they play a crucial part in informing our collective biological and social functioning. While data transparency is key to fostering public knowledge and improving research reproducibility, it poses the risk of exposing the confidential information of respondents, patients and clients, violating their privacy. What measures can we take as a society to ensure effective learning from data, while at the same time respecting individuals’ rightful claim to privacy? In this class, we trace the historical and modern developments of privacy, and examine challenges around the execution of this concept from a multi-disciplinary perspective that integrates statistics, computer science, and economics. In a sense, privacy is the antithesis to information, thus the quest for privacy poses a fundamental dilemma to the growth of public knowledge. We weigh the social and policy implications of the inherent tradeoff between privacy and utility. We also take a close look at the recent development of a new mathematical framework for privacy, called differential privacy, and its adoption to protect the release of the 2020 U.S. Decennial Census. Philosophical implications of privacy on the broader subject of data ethics will also be discussed.
Ruobin Gong (Statistics)
01:090:101 section 14