Fairness, Accountability, and Transparency


  1. EIOPA - regulation for insurance companies.

  2. Ethics and regulations in Israel

    1. First Report by the intelligence committee headed by prof. Itzik ben israel and prof. evyatar matanya

    2. Third by meizam leumi for AI systems in ethics and regulation in israel, lecture


  1. FATML website - The past few years have seen growing recognition that machine learning raises novel challenges for ensuring non-discrimination, due process, and understandability in decision-making. In particular, policymakers, regulators, and advocates have expressed fears about the potentially discriminatory impact of machine learning, with many calling for further technical research into the dangers of inadvertently encoding bias into automated decisions.

At the same time, there is increasing alarm that the complexity of machine learning may reduce the justification for consequential decisions to “the algorithm made me do it.”

  1. FAccT - A computer science conference with a cross-disciplinary focus that brings together researchers and practitioners interested in fairness, accountability, and transparency in socio-technical systems.

  2. Poisoning attacks on fairness - Research in adversarial machine learning has shown how the performance of machine learning models can be seriously compromised by injecting even a small fraction of poisoning points into the training data. We empirically show that our attack is effective not only in the white-box setting, in which the attacker has full access to the target model, but also in a more challenging black-box scenario in which the attacks are optimized against a substitute model and then transferred to the target model

  3. A series of articles about Bias & Fairness by Johnathan Hui

    1. In Clinical research - Selection , Sample , Time , Attrition , Survivorship, reporting, funding, citation, Volunteer , self-selection , non-response, pre-screening , healthy person, membership, ascertainment, performance, berkson admission, neyman, measurement, observer, expectation, response, self reporting, social desirability, recall, acquiescence agreement, leading, courtesy, attention verification, lead time, immortal time, misclassification, chronological, detection, spectrum, cofounder, susceptibility, collider, simpson, ommited, allocation, channeling.

    2. AI - known cases in Vision, NLP - sentiment, embedding, language models, historical, compass, recommender, datasets.

    3. address AI Bias with Fairness criteria and tools - per population, predictive parity, calibration by group

    4. Caveats and limitations of AI Fairness Approaches - sample bias, label bias, miscalibration outcome test, redlining, etc.

    5. AI Fairness Approaches - statistical fairness, equalizing acceptance rate, error rate, etc.


M. Zafar et al. (2017), Fairness Constraints: Mechanisms for Fair Classification

M. Hardt, E. Price and N. Srebro (2016), Equality of Opportunity in Supervised Learning


  1. arize.ai on model bias.


  1. Adversarial removal of demographic features - “We show that demographic information of authors is encoded in -- and can be recovered from -- the intermediate representations learned by text-based neural classifiers. The implication is that decisions of classifiers trained on textual data are not agnostic to -- and likely condition on -- demographic attributes. “ “we explore several techniques to improve the effectiveness of the adversarial component. Our main conclusion is a cautionary one: do not rely on the adversarial training to achieve invariant representation to sensitive features.”

  2. Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection (paper) , github, presentation by Shauli et al. - removing biased information such as gender from an embedding space using nullspace projection. The objective is this: give a representation of text, for example BERT embeddings of many resumes/CVs, we want to achieve a state where a certain quality, for example a gender representation of the person who wrote this resume is not encoded in X. they used the light version definition for “not encoded”, i.e., you cant predict the quality from the representation with a higher than random score, using a linear model. I.e., every linear model you will train, will not be able to predict the person’s gender out of the embedding space and will reach a 50% accuracy. This is done by an iterative process that includes. 1. Linear model training to predict the quality of the concept from the representation. 2. Performing ‘projection to null space’ for the linear classifier, this is an acceptable linear algebra calculation that has a meaning of zeroing the representation from the projection on the separation place that the linear model is representing, making the model useless. I.e., it will always predict the zero vector. This is done iteratively on the neutralized output, i.e., in the second iteration we look for an alternative way to predict the gender out of X, until we reach 50% accuracy (or some other metric you want to measure) at this point we have neutralized all the linear directions in the embedding space, that were predictive to the gender of the author.

For a matrix W, the null space is a sub-space of all X such that WX=0, i.e., W maps X to the zero vector, this is a linear projection of the zero vector into a subspace. For example you can take a 3d vectors and calculate its projection on XY.

  1. Can we extinct predictive samples? Its an open question, Maybe we can use influence functions?

Understanding Black-box Predictions via Influence Functions - How can we explain the predictions of a blackbox model? In this paper, we use influence functions — a classic technique from robust statistics — to trace a model’s prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction.

We show that even on non-convex and non-differentiable models where the theory breaks down, approximations to influence functions can still provide valuable information. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating visually indistinguishable training-set attacks.

  1. Bias detector by intuit - Based on first and last name/zip code the package analyzes the probability of the user belonging to different genders/races. Then, the model predictions per gender/race are compared using various bias metrics.



  1. Differential privacy has emerged as a major area of research in the effort to prevent the identification of individuals and private data. It is a mathematical definition for the privacy loss that results to individuals when their private information is used to create AI products. It works by injecting noise into a dataset, during a machine learning training process, or into the output of a machine learning model, without introducing significant adverse effects on data analysis or model performance. It achieves this by calibrating the noise level to the sensitivity of the algorithm. The result is a differentially private dataset or model that cannot be reverse engineered by an attacker, while still providing useful information. Uses BOTLON & EPSILON



Last updated