Week 3: AI Safety and Security
Explore either AI governance or AI safety careers: (45 minutes)
Governances
80k career profile (35 minutes)
9 Pieces on AI Policy that Set the Tone in 2024 (10 minutes)
Technical AI Safety Research:
80k career profile (35 minutes)
What is AI alignment? (10 minutes)
Background: The basic case for AI risk
80k problem profile (15 minutes)
Could AI wipe out humanity? (10 minutes)
Deadly by default (30 minutes)
How does AI learn? (10 -30minutes)
Enablers of Modern AI Systems
Many of the current ideas were around in 1943 and ran on computers in 1967
3 Key Factors
Compute
Power to run complex tasks
Past: Didn't have enough compute
Data
Information to be ran in tasks
Past: Didn't have that much data
Algorithms
Facilitates more complex tasks
Facilitates efficiency
Past: Existed already sufficiently

Neural Network
What?
General-Purpose tool to fit any data
Inspired by the biological brain
How?
Having Variables 'y = a*x + b'
Gradient = Error i.e., how far away form truth
Trialling with parameters a&b and updating so gradient minimizes
Making Predictions of what 'truth' (output) to use according to parameters

How AI Systems learn parameters from data
Reducing Gradient
Allowing for predictions and updating based on reality
Positive/Negative Reinforcement
Allowing AI to behave creatively and reinforcing positive/negative behaviour to guide behavior
Similarly when reflecting upon positive/negative behaviors of one's day and perceiving the benefit and disadvantage in them
Discover new solutions
Personal Reflections
Neural Networking Concluding seems similar to Human Concluding. How does a neural network differ from how a human creates reason?
Large Language Models explained briefly (10-20 minutes)
Predicts best harmony response
LLM
Mathematical Function that predicts what words comes next for any giving input
Assigns probability for any potential words
Chooses highest or sometimes lower if more fitting

How does the LLM learn?
Decision-making
Done by multiple (100+bn) 'Parameters'
Parameters
mathematical calculations that give weight to a certain output
Causal Interdependence
Every parameter influences each other

Updating / Harmonizing
Adjust parameters based on disharmony i.e., when coming in contact with the truth one updates the parameters so that it comes closer to truth/harmony
Understanding AI Behavior
One does not understand how specifically they come to the conclusion
One does understand the approach they use
Personal Reflections
Causal Interdependence
Every parameter influencing each other
Likewise, every habit is influencing other habits
e.g., Societies, Values, Emotions, Attention, etc.
Harmonization / Updating Gradient
Expressing one's truth (habits) and coming in contact with the world (other habits) one experiences suffering/happiness (disharmony/harmony) and can update accordingly to increase harmony
Teaching AI to reason: this year’s most important story (10-15 minutes)
Facilitator for Reasoning: Reinforcement Learning (RL)
RL accelerated capabilities of AI
Many believed AI peaked
Not constrained by data
Personal Reflection
For practical AGI it will require autonomous reinforcement by suffering/happiness it produces in the world
Why are people building AI systems? (5 minutes)
Automation of Tasks
Evaluate applications, Respond to Questions, Summaries of meetings
Replacing Human Labour
Possibly capturing half of wages in developed countires
Scientific & Technological Progresss
Personal Reflections
Facilitates Centralization of Power
The cost of caution (5 minutes)
Acceleration facilitating X-risks vs Slowing down facilitating preventing benefits
Cure diseases
Accessibility to basic needs
Meat alternatives shutting down factory farms
Machines of Loving Grace (35 minutes)
Branding of Anthropic
Focus on Risks > Positivity
Belief
Benefits of AI unavoidable - if it goes right
Only if risks come true, we have a problem
Avoid Grandiosity
Dangerous to view practical technological goals in religious terms
Avoid Propaganda
Avoid 'Sci-Fi' negative connotations
e.g., upload mind, cyperpunk, space exploration
Neglecting significant moral aspects of world
Benefits
Biology & Physical Health
Extend healthy human lifespan
Control & freedom over biological processes
Weight, physical appearance, reproduction, fertility
Reliable prevention and treatment of nearly all17 natural infectious disease
Elimination of most Cancer
Prevention & Cure: Genetic Disease
Embryo screening, CRISPR
Improved treatment of most other ailments
Immortality
Hindrance
Complexity of Cause and Effect
'Time' needed e.g., weeks, months
Bureaucracy
Seriel Dependence (Requires A for B to be possible)
Neuroscience & Mental Health
Behavioural Interventions
Cause and Effect of Brain & States of Mind (Depression, etc.)
Economic Development & Poverty
Having Capacity is insufficient
Require the actual applying of them (Capacity vs Intent)
Hindrance
Corruption
Scepticism
Non-Willingness to share/generosity
Peace & Governance
Facilitators
Democracy
Free Information
Entente
Recommends a centralisation of power quickly via powerful AI to overthrow 'adversaries'
Quality of Life
Work & Meaning
Powerful AI (AGI)
What? 'Country of Geniuses in a Datacenter'
Intelligence > Nobel Prize Winner
prove unsolved mathematical theorems, write extremely good novels, write difficult codebases from scratch, etc.
Interfaces
Has access to e.g., audio, video, mouse & keyboard control, internet access
Capacities such as e.g., ordering materials, directing experiments, make videos
Copy-Paste
Resources used to train the model can be repurposed to run millions of instances of it
When?
Earliest 2026
First 5-10 Years after AGI
Hindrances to 'Singularity' (instantly transformed on the scale of seconds)
Lack of Data
Intelligence with absence of data - cannot make choices
Harmony
Laws
Societies Tradition
Causing Suffering
Humans need time to respond (if they have control)
Bureaucracy
Law of Nature
Impossible to travel faster than light
Hardware
Biological Experiments
Serial Dependence
Requiring A to make B possible
Facilitators to Fast Development
Intelligence creating accelerators
simulations/in-vitro experiments > live animals, human trials
Improved particle accelerators > small
5-10 Years huge shift: Yes
1 Year huge shift: No
Personal Reflection
Neglect of Non-Human Sentience?
to directly improve the quality of human life
Neglect of potential root cause of Immorality & Suffering: Ignorance & Craving
Podcast Allan Dafoe on why technology is unstoppable & how to shape AI development anyway (~3 hours)
That which influences our Future
Craving (Fear, Greed)
once a powerful new capability becomes available, societies that adopt it tend to outcompete those that don’t
Competition
Military
Economic
Technological Determinism
What?
Technology determines the future, rather than human choices
Technological Politics
What?
Express Political goals through technology
Technology that reinforces how society will be
Examples
Parisian Boulevards
Linear structure to suppress riots as they facilitate access of cavalry
Gates or urban design as an expression of a view of how people should behave
Momentum
Once a habit/system gets built, gets going, it has intertia that facilitates itself and hinders that which is in disharmony with it
Example
Dependence on cars in American urban infrastructure
Unintended Consequence
Future is not determined by some structure, nor by our choices, but just by being buffeted one way or another
Hopes & Worries
Safety
'Defence in Depth' Process
Carefully staged deployment with internal testing
External Testers
Limited release
Observe behaviour of models in real world
Not releasing model weights i.e., able to back up and add additional safeguards
International Governance & Cooperation
Building Backdoors into AI-models
If someone copies and the AI registers someone without permission is using it, it switches difficult-to-detect behavior
Interpretability
Inability to understand behavior of AI or Human Agents i.e., intentions, goals
Cooperation
Seemingly net-positive to cultivate 'Cooperation' for those included and potentially net-negative for those excluded
e.g., humans cooperate & excluding non-human animals
Offence-Defence: Endless Race against 'bad' actors
Problem
Costs to create powerful AI drop every two years by a factor of 10
Facilitates access by everyone including bad actors
Solution
Ever-increase a more powerful AI to protect against the inferior models
Depends on offence-defence balance i.e., how much does it cost for the 'good' actors to repair/protect against the 'bad' actors e.g., 1:1, 1:5
Last updated