Ethics and Data Science

While data science offers the ability to find patterns in data and innovate new data products for the greater social good, it is ethically neutral. It does not come with its own perspective of what is right or wrong or what is good or bad in using it. While data science has no value framework, organisation have a value system. By asking and seeking answers to ethical questions can ensure it is used in a way that aligns organisation values.

There’s no doubt about it: The future will be machine learning driven and central to this future are the data science. A machine learning model is fuelled by the data they’re trained on. Every advertisement we see, every self-driving cars, every medical diagnosis provided by a machine will be based on data. Data ethics is a rapidly appearing area. Increasingly, those collecting, sharing and working with data are delving into the ethics of their method and, in some cases, being forced to encounter those ethics in the face of public criticism. A failure to handle data ethically can severely impact people and lead to a loss of trust in projects, products or organisations.

Ethical challenges occur when opinions on what is considered right and wrong deviate. For example, should a Data science have the power to decide whether a litigant is released on bail or not? The application built on top of data like COMPAS (Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) software system used in US courts) would require evaluating how the data is generated in the first place. The algorithm learns biases contained in the training dataset. Training dataset may contain historical traces of intentional or unintentional discrimination, biased decision, or maybe a sample from populations that do not represent everyone. There are three main ethical challenges related to data and data science

  1. Unfair discrimination – If data reflects unfair social biases against sensitive attributes, such as our race or gender, the conclusions are drawn from that data might also be based on those bias.
  2. Reinforcing human biases – This kind of problem can arise when computer models are used to make predictions in areas such as insurance, financial loans and policing. If members of a certain racial group have historically been more likely to default on their loans or been more likely to be convicted of a crime than the model can deem these people more risky. That doesn’t necessarily mean that these people actually engage in more criminal behaviour or are worse at managing their money.
  3. Lack of transparency. – There are two areas required transparency. First the step by step process of modelling and its parameters by which a prediction is made. Depending on the model used this can become very difficult as the exact working of a certain model like a neural network is still unclear. Second, it remains unclear which data is used in making a prediction. A statistical model cannot distinguish between the predictive power of single variables and that a set of variables.

Some ethical and data science concerns while modelling are:

A: Preserving privacy
1. Methods for handling sensitive data
2. Uses of data science that undermine privacy

B. Avoiding bias
1. Data selection and unintentional red-lining
2. Re-inscription of existing biases
3. Reducing the discrimination already present in a training dataset

C. Mitigating malicious attacks
1. Intentional subversion of machine learning systems
2. Hazards of learning from the open internet

Data ethics is in the hands of data scientists, and that won’t be modified any time soon. However, data scientists, who have access to essential tools that can anatomise how people think with an eye towards affecting their behaviour’s, don’t get any ethics training in most programs. That’s a problem we need to fix if we want to avoid a constant flow of questionable uses of data or models.

There are indeed more principles we need to create as more powerful technology become available. Data scientists, data engineers, database administrators and everyone associated with controlling data should have a voice in the ethical debate about how data should be used. Organisations should openly address these difficulties in formal and informal conventions.


How I am fighting bias in algorithms

Reducing discrimination in AI with new methodology

It’s not big data that discriminates – it’s the people that use It

Challenges of transparent accountability in Big Data Analytics

Benefits and ethical challenges in Data Science- COMPAS and Smart Meters