https://www.lesswrong.com/posts/pRkFkzwKZ2zfa3R6H/without-specific-countermeasures-the-easiest-path-to)Crossposted from the AI Alignment Forum). May contain more technical jargon than usual.
I think that in the coming 15-30 years), the world could plausibly develop “transformative AI”: AI powerful enough to bring us into a new, qualitatively different future, via an explosion in science and technology R&D). This sort of AI could) be sufficient to make this the most important century of all time for humanity).
The most straightforward vision for developing transformative AI that I can imagine working with very little innovation in techniques is what I’ll call human feedback[1])** on diverse tasks (HFDT):** Train a powerful neural network) model to simultaneously master a wide variety of challenging tasks (e.g. software development, novel-writing, game play, forecasting, etc) by using reinforcement learning on human feedback) and other metrics of performance. HFDT is not the only approach to developing transformative AI,[2]) and it may not work at all.[3]) But I take it very seriously, and I’m aware of increasingly many executives and ML researchers at AI companies who believe something within this space could work soon.
Unfortunately, **I think that if AI companies race forward training increasingly powerful models using HFDT, this is likely to eventually lead to a **full-blown AI takeover) (i.e. a possibly violent uprising or coup) by AI systems). I don’t think this is a certainty, but it looks like the best-guess default absent specific efforts to prevent it.