Median Group

Revisiting the Insights model

2019-07-20T00:00:00-07:00

Note: This demo is in beta, and you may experience issues such as strange numerical behavior at this time.

Last year, we released our insights-based model that generated a projected timeline using historical data and a prior distribution. We’ve revisited it to address its limitations and improve the data it draws from.

The model relies on the assumption that progress in AI relies on accumulating insights, fundamental advances in our understanding, that allow for improvements in capacity without increase in resources expended. This choice makes an attempt to separate out the effects of true technological advancement from the effects of an increase in computing power devoted to a problem, both of which can increase the capacity of machine intelligence to solve complex problems. Computational power is an expensive, finite resource, and without a paradigm-shifting improvement in computing itself, precise allocation of that power alone will not be enough to continue advancing AI’s problem-solving capabilities.

The interactive model below provides two methods of capturing a prior about how many more advances in understanding are required to achieve human-level machine intelligence. Based on that prior, and on the pace of insight discovery during a particular historical period, we compute a probability distribution over time of the likelihood humans will develop human-level AI. Results of this calculation are shown in the “Implied timeline” graph below.

Step 1: Specify a prior for current progress

Option A: Draw a distribution

For various “percentages of the way” done AI research could be, in terms of percentage of the necessary insights discovered, what is the probability that AI research is not yet that percentage done?

The graph below allows you to draw a distribution of how likely it is we have achieved a particular portion of the insights required for human-level machine intelligence.

Option B: Pre-set priors from Pareto distribution

Instead of drawing a cumulative distribution function, you can instead use a pre-set prior based on a Pareto distribution.

To make the choice of Pareto distribution more intuitive, we parameterize the distribution in terms of a probability q, equal to the probability that a doubling in number of insights (starting from the minimum number of insights) would result in a sufficient set of insights. q can be set directly, or we can sample from a mixture of Pareto distirbutions, where the q parameters are sampled from a uniform distribution or a beta distribution.

Number of samples to take when running the simulation

Set q directly

Sample q uniformly over (0,1)

Sample q with Beta(α, β)

α:
β:

Note: The simulator can be very slow for larger values of q, as most of the samples need to be thrown away.

Step 2: Specify pace of progress

Which period in history is most representative of the future pace of AI insight discovery?

The graph below plots the aggregate of insights discovered over time and allows selection of a particular period of history in AI research. The curve fit to that period (linear, exponential, or sigmoidal) is used to project the future distribution of discoveries.

Regression mode:

Result: Implied timeline

Sources

The data used in this model is available as a JSON file. The source code for the demo can be found on the Median Group github.

Feasibility of Training an AGI using Deep Reinforcement Learning: A Very Rough Estimate

2019-03-24T00:00:00-07:00

Several months ago, we were presented with a scenario for how artificial general intelligence (AGI) may be achieved in the near future. We found the approach surprising, so we attempted to produce a rough model to investigate its feasibility. The document presents the model and its conclusions.

The usual cliches about the folly of trying to predict the future go without saying and this shouldn’t be treated as a rigorous estimate, but hopefully it can give a loose, rough sense of some of the relevant quantities involved. The notebook and the data used for it can be found in the Median Group numbers GitHub repo if the reader is interested in using different quantities or changing the structure of the model. If the reader is interested in a more general approach based on the rate of theoretical progress, see our Insights model.

Download PDF

The Brain and Computation

2018-11-24T00:00:00-08:00

Measuring Computation

The computational performance of microprocessors can be quantified by measuring the number of floating-point arithmetic operations the processor can perform per second (FLOPS). This number is very useful for comparing different hardware being used for numerically intensive applications like scientific computing or mining fake internet points, but some have attempted to quantify the computation done by the human brain in these terms to reason about how difficult it would be to run a human-level intelligence on modern computing hardware.

This post will discuss a few of the issues associated with measuring the computational performance of the brain with FLOPS, and a follow-up post will consider specific estimates.

Does it make sense to think about the computational capacity of the brain in terms of FLOPS?

There is a line of thinking that goes something like:

Neurons generate action potentials. Action potentials are stereotyped signals, so the computation that happens in the brain is essentially digital, so it makes sense to compare brains to digital computers, and synaptic operations are kind of like arithmetic operations.

This may or may not be a good enough approximation, but it’s definitely a lossy approximation.

Brains probably aren’t bottlenecked on arithmetic

A common objection to measuring the performance of the brain in FLOPS is that computation in the brain isn’t bottlenecked by arithmetic capacity, but rather by information flow, so the capacity of the brain should be measured in traversed edges per second (TEPS) rather than FLOPS. Synaptic connections between neurons tend to be sparse and axons tend to be long, which seems to suggest a lot of neural tissue is dedicated to pushing signals around rather than performing arithmetic on them¹.

Brains are asynchronous

Microprocessors are clocked circuits. When a computation unfolds on a microprocessor, it proceeds in discrete, well-delineated steps with one occurring each processor cycle. This method of computation is fundamentally synchronous.

Brains don’t have a clock: neurons fire when they fire, which usually isn’t very often (one to ten times a second), but is sometimes much faster (up to around 1000 Hz)². And the phase of the neural spike trains also seem to be important³, which further complicates the comparison.

Non-spiking neurons

Many neurons don’t even spike, having graded, non-stereotyped potentials. The best-studied are the photo-receptive neurons in the retina, but they occur throughout the brain and it’s unclear how to integrate them into the larger computational picture of the brain.

Conclusion

This post was not meant to be comprehensive, and is merely meant to highlight the strangeness and limitations of thinking of the limits of neural computation in terms of FLOPS.

Limitations in the ability of evolution to modify the basic vertebrate developmental plan lead can lead to bizarre inefficiencies, like the optic nerve needing to carry signals from the retina to the back of the head before being processed in the visual cortex, or in the case of giraffes the laryngeal nerve needing to take a >4 meter detour. ↩
See sparse coding. ↩
See phase coding. ↩

How rapidly are GPUs improving in price performance?

2018-11-13T12:00:00-08:00

Introduction

Many of the impressive results in deep learning in recent years have been achieved through massive investment in hardware needed for training, with projects like AlphaGo Zero using $25 million worth of computer hardware. Given this, improvements in price-performance of hardware used for deep learning will play an important role in determining the scale of projects in the coming years.

While machine learning ASICs like TPUs are likely the future, the recent deep learning boom was powered by GPUs¹. The architectures of TPUs and GPUs differ in important ways, but much of the design and fabrication process is similar and both are largely focused on efficient, parallelized arithmetic², so trends observed in GPUs can inform us about what to expect from TPUs.

Commonly mentioned figures for the price-performance generalization of Moore’s Law suggest that price-performance doubles roughly every two years, but this figure warranted further investigation.

Data

We constructed a data set containing the model name, launch date, single precision performance in GFLOPS, and release price in non-inflation adjusted US dollars for 223 Nvidia and AMD GPUs (scraped from Wikipedia)³. The data set covered almost two decades, so prices were adjusted to 2018 US dollars using the Consumer Price Index.

Analysis

Fitting an exponential to the data-set yielded the curve:

\begin{equation} f(t) \approx 14.2 e^{0.2 t} \end{equation}

This implies a doubling time of $\sim 3.5$ years. It should be noted that this is somewhat misleading because the price-performance curve isn’t a clean exponential. Inspecting a log-plot suggests that price-performance has been in a distinctly slower growth regime since around 2012.

Fitting to data from 2012 or later yields the curve:

\begin{equation} f(t) \approx 15.2 e^{0.176 t}, \end{equation}

corresponding to a doubling time of ~3.9 years.

External Discussion

Comments on LessWrong about this article

GPUs are still more cost effective than TPUs, but have lower serial computation speed. ↩
This is not nearly as true as with CPUs which have managed to extract performance improvements from increasingly arcane changes to control circuitry. ↩
AI Impacts has a similar data set. ↩

AGNI

2018-10-03T16:54:36-07:00

Download PDF

Insight-based AI timelines model

2018-06-12T00:00:00-07:00

100%

P(no more than this much of the way done)

100%

Proportion of required insights that have been discovered

Pre-set priors

Instead of drawing a cumulative distribution function, you can instead use a pre-set prior. These priors are based on the Pareto distribution. To make choice of the parameter more intuitive, we parameterize the distribution in terms of a probability q, equal to the probability that a doubling in number of insights (starting from the minimum number of insights) would result in a sufficient set of insights.

Minimum plausible number of insights required:

α: β:

Resulting timeline

Assuming a linear increase in number of required insights over time, the following cumulative distribution function for time when all required insights are discovered is implied by these beliefs.

Adjust the maximum year displayed:

Derivation

How was this data generated? Jessica Taylor, Jack Gallagher, and Baeo Maltinsky spent a few hours generating a list of AI insights that seemed around the same order of significance or more significant than the insight of LSTM (specifically, the insight of inventing LSTM given that RNNs were already invented). The following is a plot of number of AI insights in our list over time since 1850.

The model assumes that insights increase linearly over time. The increase has been roughly linear since 1945, but this could change due to low hanging fruit, expanding research avenues, changes in the number and effectiveness of research institutions, and so on. The model does not distinguish between insights in our list (which we selected according to some subjective estimation of importance) and specifically required insights; however, if the percentage of insights that are actually required stays somewhat constant over time, this does not significantly affect the timeline.

The list of insights and their years can be found in this document.