Seven Legal Questions for Data Scientists – O’Reilly


“[T]he threats to consumers arising from data abuse, including those posed by algorithmic harms, are mounting and urgent.”

FTC Commissioner Rebecca K. Slaughter

Variants of artificial intelligence (AI), such as predictive modeling, statistical learning, and machine learning (ML), can create new value for organizations. AI can also cause costly reputational damage, get your organization slapped with a lawsuit, and run afoul of local, federal, or international regulations. Difficult questions about compliance and legality often pour cold water on late-stage AI deployments as well, because data scientists rarely get attorneys or oversight personnel involved in the build-stages of AI systems. Moreover, like many powerful commercial technologies, AI is likely to be highly regulated in the future.

Learn faster. Dig deeper. See farther.

This article poses seven legal questions that data scientists should address before they deploy AI. This article is not legal advice. However, these questions and answers should help you better align your organization’s technology with existing and future laws, leading to less discriminatory and invasive customer interactions, fewer regulatory or litigation headwinds, and better return on AI investments. As the questions below indicate, it’s important to think about the legal implications of your AI system as you’re building it. Although many organizations wait until there’s an incident to call in legal help, compliance by design saves resources and reputations.

Fairness: Are there outcome or accuracy differences in model decisions across protected groups? Are you documenting efforts to find and fix these differences?

Examples: Alleged discrimination in credit lines; Poor experimental design in healthcare algorithms

Federal regulations require non-discrimination in consumer finance, employment, and other practices in the U.S. Local laws often extend these protections or define separate protections. Even if your AI isn’t directly affected by existing laws today, algorithmic discrimination can lead to reputational damage and lawsuits, and the current political winds are blowing toward broader regulation of AI. To deal with the issue of algorithmic discrimination and to prepare for pending future regulations, organizations must improve cultural competencies, business processes, and tech stacks.

Technology alone cannot solve algorithmic discrimination problems. Solid technology must be paired with culture and process changes, like increased demographic and professional diversity on the teams that build AI systems and better audit processes for those systems. Some additional non-technical solutions involve ethical principles for organizational AI usage, and a general mindset change. Going fast and breaking things isn’t the best idea when what you’re breaking are people’s loans, jobs, and healthcare.

From a technical standpoint, you’ll need to start with careful experimental design and data that truly represents modeled populations. After your system is trained, all aspects of AI-based decisions should be tested for disparities across demographic groups: the system’s primary outcome, follow-on decisions, such as limits for credit cards, and manual overrides of automated decisions, along with the accuracy of all these decisions. In many cases, discrimination tests and any subsequent remediation must also be conducted using legally sanctioned techniques—not just your new favorite Python package. Measurements like adverse impact ratio, marginal effect, and standardized mean difference, along with prescribed methods for fixing discovered discrimination, are enshrined in regulatory commentary. Finally, you should document your efforts to address algorithmic discrimination. Such documentation shows your organization takes accountability for its AI systems seriously and can be invaluable if legal questions arise after deployment.

Privacy: Is your model complying with relevant privacy regulations?

Examples: Training data violates new state privacy laws

Personal data is highly regulated, even in the U.S., and nothing about using data in an AI system changes this fact. If you are using personal data in your AI system, you need to be mindful of existing laws and watch evolving state regulations, like the Biometric Information Privacy Act (BIPA) in Illinois or the new California Privacy Rights Act (CPRA).

To cope with the reality of privacy regulations, teams that are engaged in AI also need to comply with organizational data privacy policies. Data scientists should familiarize themselves with these policies from the early stages of an AI project to help avoid privacy problems. At a minimum, these policies will likely address:

  • Consent for use: how consumer consent for data-use is obtained; the types of information collected; and ways for consumers to opt-out of data collection and processing.
  • Legal basis: any applicable privacy regulations to which your data or AI are adhering; why you’re collecting certain information; and associated consumer rights.
  • Anonymization requirements: how consumer data is aggregated and anonymized.
  • Retention requirements: how long you store consumer data; the security you have to protect that data; and if and how consumers can request that you delete their data.

Given that most AI systems will change over time, you should also regularly audit your AI to ensure that it remains in compliance with your privacy policy over time. Consumer requests to delete data, or the addition of new data-hungry functionality, can cause legal problems, even for AI systems that were in compliance at the time of their initial deployment.

One last general tip is to have an incident response plan. This is a lesson learned from general IT security. Among many other considerations, that plan should detail systematic ways to inform regulators and consumers if data has been breached or misappropriated.

Security: Have you incorporated applicable security standards in your model? Can you detect if and when a breach occurs?

Examples: Poor physical security for AI systems; Security attacks on ML; Evasion attacks

As consumer software systems, AI systems likely fall under various security standards and breach reporting laws. You’ll need to update your organization’s IT security procedures to apply to AI systems, and you’ll need to make sure that you can report if AI systems—data or algorithms—are compromised.

Luckily, the basics of IT security are well-understood. First, ensure that these are applied uniformly across your IT assets, including that super-secret new AI project and the rock-star data scientists working on it. Second, start preparing for inevitable attacks on AI. These attacks tend to involve adversarial manipulation of AI-based decisions or the exfiltration of sensitive data from AI system endpoints. While these attacks are not common today, you don’t want to be the object lesson in AI security for years to come. So update your IT security policies to consider these new attacks. Standard counter-measures such as authentication and throttling at system endpoints go a long way toward promoting AI security, but newer approaches such as robust ML, differential privacy, and federated learning can make AI hacks even more difficult for bad actors.

Finally, you’ll need to report breaches if they occur in your AI systems. If your AI system is a labyrinthian black-box, that could be difficult. Avoid overly complex, black-box algorithms whenever possible, monitor AI systems in real-time for performance, security, and discrimination problems, and ensure system documentation is applicable for incident response and breach reporting purposes.

Agency: Is your AI system making unauthorized decisions on behalf of your organization?

Examples: Gig economy robo-firing; AI executing equities trades

If your AI system is making material decisions, it is crucial to ensure that it cannot make unauthorized decisions. If your AI is based on ML, as most are today, your system’s outcome is probabilistic: it will make wrong decisions. Wrong AI-based decisions about material matters—lending, financial transactions, employment, healthcare, or criminal justice, among others—can cause serious legal liabilities (see Negligence below). Worse still, using AI to mislead consumers can put your organization on the wrong side of an FTC enforcement action or a class action.

Every organization approaches risk management differently, so setting necessary limits on automated predictions is a business decision that requires input from many stakeholders. Furthermore, humans should review any AI decisions that implicate such limits before a customer’s final decision is issued. And don’t forget to routinely test your AI system with edge cases and novel situations to ensure it stays within those preset limits.

Relatedly, and to quote the FTC, “[d]on’t deceive consumers about how you use automated tools.” In their Using Artificial Intelligence and Algorithms guidance, the FTC specifically called out companies for manipulating consumers with digital avatars posing as real people. To avoid this kind of violation, always inform your consumers that they are interacting with an automated system. It’s also a best practice to implement recourse interventions directly into your AI-enabled customer interactions. Depending on the context, an intervention might involve options to interact with a human instead, options to avoid similar content in the future, or a full-blown appeals process.

Negligence: How are you ensuring your AI is safe and reliable?

Examples: Releasing the wrong person from jail; autonomous vehicle kills pedestrian

AI decision-making can lead to serious safety issues, including physical injuries. To keep your organization’s AI systems in check, the practice of model risk management–based roughly on the Federal Reserve’s SR 11-7 letter–is among the most tested frameworks for safeguarding predictive models against stability and performance failures.

For more advanced AI systems, a lot can go wrong. When creating autonomous vehicle or robotic process automation (RPA) systems, be sure to incorporate practices from the nascent discipline of safe and reliable machine learning. Diverse teams, including domain experts, should think through possible incidents, compare their designs to known past incidents, document steps taken to prevent such incidents, and develop response plans to prevent inevitable glitches from spiraling out of control.

Transparency: Can you explain how your model arrives at a decision?

Examples: Proprietary algorithms hide data errors in criminal sentencing and DNA testing

Federal law already requires explanations for certain consumer finance decisions. Beyond meeting regulatory requirements, interpretability of AI system mechanisms enables human trust and understanding of these high-impact technologies, meaningful recourse interventions, and proper system documentation. Over recent years, two promising technological approaches have increased AI systems’ interpretability: interpretable ML models and post-hoc explanations. Interpretable ML models (e.g., explainable boosting machines) are algorithms that are both highly accurate and highly transparent. Post-hoc explanations (e.g., Shapley values) attempt to summarize ML model mechanisms and decisions. These two tools can be used together to increase your AI’s transparency. Given both the fundamental importance of interpretability and the technological process made toward this goal, it’s not surprising that new regulatory initiatives, like the FTC’s AI guidance and the CPRA, prioritize both consumer-level explanations and overall transparency of AI systems.

Third Parties: Does your AI system depend on third-party tools, services, or personnel? Are they addressing these questions?

Examples:Natural language processing tools and training data images conceal discriminatory biases

It is rare for an AI system to be built entirely in-house without dependencies on third-party software, data, or consultants. When you use these third-party resources, third-party risk is introduced into your AI system. And, as the old saying goes, a chain is only as strong as its weakest link. Even if your organization takes the utmost precaution, any incidents involving your AI system, even if they stem from a third-party you relied on, can potentially be blamed on you. Therefore, it is essential to ensure that any parties involved in the design, implementation, review, or maintenance of your AI systems follow all applicable laws, policies, and regulations.

Before contracting with a third party, due diligence is required. Ask third parties for documentary proof that they take discrimination, privacy, security, and transparency seriously. And be on the lookout for signs of negligence, such as shoddy documentation, erratic software release cadences, lack of warranty, or unreasonably broad exceptions in terms of service or end-user license agreements (EULAs). You should also have contingency plans, including technical redundancies, incident response plans, and insurance covering third-party dependencies. Finally, don’t be shy about grading third-party vendors on a risk-assessment report card. Make sure these assessments happen over time, and not just at the beginning of the third-party contract. While these precautions may increase costs and delay your AI implementation in the short-term, they are the only way to mitigate third-party risks in your system consistently over time.

Looking Ahead

Several U.S. states and federal agencies have telegraphed their intentions regarding the future regulation of AI. Three of the broadest efforts to be aware of include the Algorithmic Accountability Act, the FTC’s AI guidance, and the CPRA. Numerous other industry-specific guidance documents are being drafted, such as the FDA’s proposed framework for AI in medical devices and FINRA’s Artificial Intelligence (AI) in the Securities Industry. Furthermore, other countries are setting examples for U.S. policymakers and regulators to follow. Canada, the European Union, Singapore, and the United Kingdom, among others, have all drafted or implemented detailed regulations for different aspects of AI and automated decision-making systems. In light of this government movement, and the growing public and government distrust of big tech, now is the perfect time to start minimizing AI system risk and prepare for future regulatory compliance.

Previous articleWhere Programming, Ops, AI, and the Cloud are Headed in 2021 – O’Reilly
Next articlePatterns – O’Reilly
Avatar photo
Gabriel uses AI every day to help him with his work. He is a CEO of a company that uses AI to help other companies grow and develop. Gabriel is very intelligent and loves learning about new things.


Please enter your comment!
Please enter your name here