Week 3: AI Safety and Security

Not Yet Summarized

Explore either AI governance or AI safety careers: (45 minutes)

Governances
1. 80k career profile (35 minutes)
2. 9 Pieces on AI Policy that Set the Tone in 2024 (10 minutes)
Technical AI Safety Research:
1. 80k career profile (35 minutes)
2. What is AI alignment? (10 minutes)

Background: The basic case for AI risk

80k problem profile (15 minutes)
Why AGI could be here by 2030: the case for and against (20 minutes)
Could AI wipe out humanity? (10 minutes)
Deadly by default (30 minutes)

How does AI learn? (10 -30minutes)

Enablers of Modern AI Systems

Many of the current ideas were around in 1943 and ran on computers in 1967
3 Key Factors
- Compute
  - Power to run complex tasks
  - Past: Didn't have enough compute
- Data
  - Information to be ran in tasks
  - Past: Didn't have that much data
- Algorithms
  - Facilitates more complex tasks
  - Facilitates efficiency
  - Past: Existed already sufficiently

Neural Network

What?
- General-Purpose tool to fit any data
- Inspired by the biological brain
How?
- Having Variables 'y = a*x + b'
- Gradient = Error i.e., how far away form truth
- Trialling with parameters a&b and updating so gradient minimizes
- Making Predictions of what 'truth' (output) to use according to parameters

How AI Systems learn parameters from data

Reducing Gradient
- Allowing for predictions and updating based on reality
Positive/Negative Reinforcement
- Allowing AI to behave creatively and reinforcing positive/negative behaviour to guide behavior
  - Similarly when reflecting upon positive/negative behaviors of one's day and perceiving the benefit and disadvantage in them
- Discover new solutions

Personal Reflections

Neural Networking Concluding seems similar to Human Concluding. How does a neural network differ from how a human creates reason?

Large Language Models explained briefly (10-20 minutes)

Predicts best harmony response

LLM

Mathematical Function that predicts what words comes next for any giving input
Assigns probability for any potential words
- Chooses highest or sometimes lower if more fitting

How does the LLM learn?

Decision-making
- Done by multiple (100+bn) 'Parameters'
Parameters
- mathematical calculations that give weight to a certain output
- Causal Interdependence
  - Every parameter influences each other

Updating / Harmonizing
- Adjust parameters based on disharmony i.e., when coming in contact with the truth one updates the parameters so that it comes closer to truth/harmony

Understanding AI Behavior

One does not understand how specifically they come to the conclusion
One does understand the approach they use

Personal Reflections

Causal Interdependence
- Every parameter influencing each other
- Likewise, every habit is influencing other habits
  - e.g., Societies, Values, Emotions, Attention, etc.
Harmonization / Updating Gradient
- Expressing one's truth (habits) and coming in contact with the world (other habits) one experiences suffering/happiness (disharmony/harmony) and can update accordingly to increase harmony

Teaching AI to reason: this year’s most important story (10-15 minutes)

Facilitator for Reasoning: Reinforcement Learning (RL)

RL accelerated capabilities of AI
Many believed AI peaked
Not constrained by data

Personal Reflection

For practical AGI it will require autonomous reinforcement by suffering/happiness it produces in the world

Why are people building AI systems? (5 minutes)

Automation of Tasks

Evaluate applications, Respond to Questions, Summaries of meetings

Replacing Human Labour

Possibly capturing half of wages in developed countires

Scientific & Technological Progresss

Personal Reflections

Facilitates Centralization of Power

The cost of caution (5 minutes)

Acceleration facilitating X-risks vs Slowing down facilitating preventing benefits

Cure diseases
Accessibility to basic needs
Meat alternatives shutting down factory farms

Machines of Loving Grace (35 minutes)

Branding of Anthropic

Focus on Risks > Positivity
- Belief
  - Benefits of AI unavoidable - if it goes right
  - Only if risks come true, we have a problem
Avoid Grandiosity
- Dangerous to view practical technological goals in religious terms
Avoid Propaganda
Avoid 'Sci-Fi' negative connotations
- e.g., upload mind, cyperpunk, space exploration
- Neglecting significant moral aspects of world

Benefits

Biology & Physical Health
- Extend healthy human lifespan
- Control & freedom over biological processes
  - Weight, physical appearance, reproduction, fertility
- Reliable prevention and treatment of nearly all¹⁷ natural infectious disease
- Elimination of most Cancer
- Prevention & Cure: Genetic Disease
  - Embryo screening, CRISPR
- Improved treatment of most other ailments
- Immortality
- Hindrance
  - Complexity of Cause and Effect
  - 'Time' needed e.g., weeks, months
  - Bureaucracy
  - Seriel Dependence (Requires A for B to be possible)
Neuroscience & Mental Health
- Behavioural Interventions
- Cause and Effect of Brain & States of Mind (Depression, etc.)
Economic Development & Poverty
- Having Capacity is insufficient
  - Require the actual applying of them (Capacity vs Intent)
- Hindrance
  - Corruption
  - Scepticism
  - Non-Willingness to share/generosity
Peace & Governance
- Facilitators
  - Democracy
    Free Information
    Entente
    Recommends a centralisation of power quickly via powerful AI to overthrow 'adversaries'
    Quality of Life
Work & Meaning

Powerful AI (AGI)

What? 'Country of Geniuses in a Datacenter'
- Intelligence > Nobel Prize Winner
  - prove unsolved mathematical theorems, write extremely good novels, write difficult codebases from scratch, etc.
- Interfaces
  - Has access to e.g., audio, video, mouse & keyboard control, internet access
  - Capacities such as e.g., ordering materials, directing experiments, make videos
- Copy-Paste
  - Resources used to train the model can be repurposed to run millions of instances of it
When?
- Earliest 2026
First 5-10 Years after AGI
- Hindrances to 'Singularity' (instantly transformed on the scale of seconds)
  - Lack of Data
    Intelligence with absence of data - cannot make choices
  - Harmony
    Laws
    Societies Tradition
    Causing Suffering
    Humans need time to respond (if they have control)
    Bureaucracy
  - Law of Nature
    Impossible to travel faster than light
    Hardware
    Biological Experiments
  - Serial Dependence
    Requiring A to make B possible
- Facilitators to Fast Development
  - Intelligence creating accelerators
    simulations/in-vitro experiments > live animals, human trials
    Improved particle accelerators > small
- 5-10 Years huge shift: Yes
  - 1 Year huge shift: No

Personal Reflection

Neglect of Non-Human Sentience?
- to directly improve the quality of human life
Neglect of potential root cause of Immorality & Suffering: Ignorance & Craving

Podcast Allan Dafoe on why technology is unstoppable & how to shape AI development anyway (~3 hours)

That which influences our Future

Craving (Fear, Greed)
- once a powerful new capability becomes available, societies that adopt it tend to outcompete those that don’t
- Competition
  - Military
  - Economic

Technological Determinism
- What?
  - Technology determines the future, rather than human choices
- Technological Politics
  - What?
    Express Political goals through technology
    Technology that reinforces how society will be
  - Examples
    Parisian Boulevards
    Linear structure to suppress riots as they facilitate access of cavalry
    Gates or urban design as an expression of a view of how people should behave

Momentum
- Once a habit/system gets built, gets going, it has intertia that facilitates itself and hinders that which is in disharmony with it
- Example
  - Dependence on cars in American urban infrastructure

Unintended Consequence
- Future is not determined by some structure, nor by our choices, but just by being buffeted one way or another

Hopes & Worries

Safety
- 'Defence in Depth' Process
  - Carefully staged deployment with internal testing
  - External Testers
  - Limited release
  - Observe behaviour of models in real world
  - Not releasing model weights i.e., able to back up and add additional safeguards
- International Governance & Cooperation
- Building Backdoors into AI-models
  - If someone copies and the AI registers someone without permission is using it, it switches difficult-to-detect behavior
Interpretability
- Inability to understand behavior of AI or Human Agents i.e., intentions, goals

Cooperation
- Seemingly net-positive to cultivate 'Cooperation' for those included and potentially net-negative for those excluded
  - e.g., humans cooperate & excluding non-human animals
Offence-Defence: Endless Race against 'bad' actors
- Problem
  - Costs to create powerful AI drop every two years by a factor of 10
  - Facilitates access by everyone including bad actors
- Solution
  - Ever-increase a more powerful AI to protect against the inferior models
  - Depends on offence-defence balance i.e., how much does it cost for the 'good' actors to repair/protect against the 'bad' actors e.g., 1:1, 1:5

PreviousWeek 2: Epistemics NextWeek 4: Animal Welfare

Last updated 4 months ago