Week 3: AI Safety and Security

Explore either AI governance or AI safety careers: (45 minutes)

  1. Technical AI Safety Research:

    1. 80k career profile (35 minutes)

Background: The basic case for AI risk

  1. 80k problem profile (15 minutes)

  2. Deadly by default (30 minutes)

How does AI learn? (10 -30minutes)

Enablers of Modern AI Systems

  • Many of the current ideas were around in 1943 and ran on computers in 1967

  • 3 Key Factors

    • Compute

      • Power to run complex tasks

      • Past: Didn't have enough compute

    • Data

      • Information to be ran in tasks

      • Past: Didn't have that much data

    • Algorithms

      • Facilitates more complex tasks

      • Facilitates efficiency

      • Past: Existed already sufficiently

Neural Network

  • What?

    • General-Purpose tool to fit any data

    • Inspired by the biological brain

  • How?

    • Having Variables 'y = a*x + b'

    • Gradient = Error i.e., how far away form truth

    • Trialling with parameters a&b and updating so gradient minimizes

    • Making Predictions of what 'truth' (output) to use according to parameters

How AI Systems learn parameters from data

  • Reducing Gradient

    • Allowing for predictions and updating based on reality

  • Positive/Negative Reinforcement

    • Allowing AI to behave creatively and reinforcing positive/negative behaviour to guide behavior

      • Similarly when reflecting upon positive/negative behaviors of one's day and perceiving the benefit and disadvantage in them

    • Discover new solutions

Personal Reflections

  • Neural Networking Concluding seems similar to Human Concluding. How does a neural network differ from how a human creates reason?

Large Language Models explained briefly (10-20 minutes)

Predicts best harmony response

LLM

  • Mathematical Function that predicts what words comes next for any giving input

  • Assigns probability for any potential words

    • Chooses highest or sometimes lower if more fitting

How does the LLM learn?

  • Decision-making

    • Done by multiple (100+bn) 'Parameters'

  • Parameters

    • mathematical calculations that give weight to a certain output

    • Causal Interdependence

      • Every parameter influences each other

  • Updating / Harmonizing

    • Adjust parameters based on disharmony i.e., when coming in contact with the truth one updates the parameters so that it comes closer to truth/harmony

Understanding AI Behavior

  • One does not understand how specifically they come to the conclusion

  • One does understand the approach they use

Personal Reflections

  • Causal Interdependence

    • Every parameter influencing each other

    • Likewise, every habit is influencing other habits

      • e.g., Societies, Values, Emotions, Attention, etc.

  • Harmonization / Updating Gradient

    • Expressing one's truth (habits) and coming in contact with the world (other habits) one experiences suffering/happiness (disharmony/harmony) and can update accordingly to increase harmony

Teaching AI to reason: this year’s most important story (10-15 minutes)

Facilitator for Reasoning: Reinforcement Learning (RL)

  • RL accelerated capabilities of AI

  • Many believed AI peaked

  • Not constrained by data

Personal Reflection

  • For practical AGI it will require autonomous reinforcement by suffering/happiness it produces in the world

Why are people building AI systems? (5 minutes)

Automation of Tasks

  • Evaluate applications, Respond to Questions, Summaries of meetings

Replacing Human Labour

  • Possibly capturing half of wages in developed countires

Scientific & Technological Progresss

Personal Reflections

  • Facilitates Centralization of Power

The cost of caution (5 minutes)

Acceleration facilitating X-risks vs Slowing down facilitating preventing benefits

  • Cure diseases

  • Accessibility to basic needs

  • Meat alternatives shutting down factory farms

Machines of Loving Grace (35 minutes)

Branding of Anthropic

  • Focus on Risks > Positivity

    • Belief

      • Benefits of AI unavoidable - if it goes right

      • Only if risks come true, we have a problem

  • Avoid Grandiosity

    • Dangerous to view practical technological goals in religious terms

  • Avoid Propaganda

  • Avoid 'Sci-Fi' negative connotations

    • e.g., upload mind, cyperpunk, space exploration

    • Neglecting significant moral aspects of world

Benefits

  • Biology & Physical Health

    • Extend healthy human lifespan

    • Control & freedom over biological processes

      • Weight, physical appearance, reproduction, fertility

    • Reliable prevention and treatment of nearly all17 natural infectious disease

    • Elimination of most Cancer

    • Prevention & Cure: Genetic Disease

      • Embryo screening, CRISPR

    • Improved treatment of most other ailments

    • Immortality

    • Hindrance

      • Complexity of Cause and Effect

      • 'Time' needed e.g., weeks, months

      • Bureaucracy

      • Seriel Dependence (Requires A for B to be possible)

  • Neuroscience & Mental Health

    • Behavioural Interventions

    • Cause and Effect of Brain & States of Mind (Depression, etc.)

  • Economic Development & Poverty

    • Having Capacity is insufficient

      • Require the actual applying of them (Capacity vs Intent)

    • Hindrance

      • Corruption

      • Scepticism

      • Non-Willingness to share/generosity

  • Peace & Governance

    • Facilitators

      • Democracy

        • Free Information

        • Entente

          • Recommends a centralisation of power quickly via powerful AI to overthrow 'adversaries'

        • Quality of Life

  • Work & Meaning

Powerful AI (AGI)

  • What? 'Country of Geniuses in a Datacenter'

    • Intelligence > Nobel Prize Winner

      • prove unsolved mathematical theorems, write extremely good novels, write difficult codebases from scratch, etc.

    • Interfaces

      • Has access to e.g., audio, video, mouse & keyboard control, internet access

      • Capacities such as e.g., ordering materials, directing experiments, make videos

    • Copy-Paste

      • Resources used to train the model can be repurposed to run millions of instances of it

  • When?

    • Earliest 2026

  • First 5-10 Years after AGI

    • Hindrances to 'Singularity' (instantly transformed on the scale of seconds)

      • Lack of Data

        • Intelligence with absence of data - cannot make choices

      • Harmony

        • Laws

        • Societies Tradition

        • Causing Suffering

        • Humans need time to respond (if they have control)

        • Bureaucracy

      • Law of Nature

        • Impossible to travel faster than light

        • Hardware

        • Biological Experiments

      • Serial Dependence

        • Requiring A to make B possible

    • Facilitators to Fast Development

      • Intelligence creating accelerators

        • simulations/in-vitro experiments > live animals, human trials

        • Improved particle accelerators > small

    • 5-10 Years huge shift: Yes

      • 1 Year huge shift: No

Personal Reflection

  • Neglect of Non-Human Sentience?

    • to directly improve the quality of human life

  • Neglect of potential root cause of Immorality & Suffering: Ignorance & Craving

Podcast Allan Dafoe on why technology is unstoppable & how to shape AI development anyway (~3 hours)

That which influences our Future

  • Craving (Fear, Greed)

    • once a powerful new capability becomes available, societies that adopt it tend to outcompete those that don’t

    • Competition

      • Military

      • Economic

  • Technological Determinism

    • What?

      • Technology determines the future, rather than human choices

    • Technological Politics

      • What?

        • Express Political goals through technology

        • Technology that reinforces how society will be

      • Examples

        • Parisian Boulevards

          • Linear structure to suppress riots as they facilitate access of cavalry

          • Gates or urban design as an expression of a view of how people should behave

  • Momentum

    • Once a habit/system gets built, gets going, it has intertia that facilitates itself and hinders that which is in disharmony with it

    • Example

      • Dependence on cars in American urban infrastructure

  • Unintended Consequence

    • Future is not determined by some structure, nor by our choices, but just by being buffeted one way or another

Hopes & Worries

  • Safety

    • 'Defence in Depth' Process

      • Carefully staged deployment with internal testing

      • External Testers

      • Limited release

      • Observe behaviour of models in real world

      • Not releasing model weights i.e., able to back up and add additional safeguards

    • International Governance & Cooperation

    • Building Backdoors into AI-models

      • If someone copies and the AI registers someone without permission is using it, it switches difficult-to-detect behavior

  • Interpretability

    • Inability to understand behavior of AI or Human Agents i.e., intentions, goals

  • Cooperation

    • Seemingly net-positive to cultivate 'Cooperation' for those included and potentially net-negative for those excluded

      • e.g., humans cooperate & excluding non-human animals

  • Offence-Defence: Endless Race against 'bad' actors

    • Problem

      • Costs to create powerful AI drop every two years by a factor of 10

      • Facilitates access by everyone including bad actors

    • Solution

      • Ever-increase a more powerful AI to protect against the inferior models

      • Depends on offence-defence balance i.e., how much does it cost for the 'good' actors to repair/protect against the 'bad' actors e.g., 1:1, 1:5

Last updated