Person

Dario Amodei

Co-founder and CEO

Co-founder and chief executive of Anthropic, focused on frontier model development, safety, and deployment.

Papers

openalex-author · arXiv (Cornell University)

The Capacity for Moral Self-Correction in Large Language Models

We test the hypothesis that language models trained with reinforcement learning from human feedback (RLHF) have the capability to "morally self-correct" -- to avoid producing harmful outputs -- if instructed to do so. We find strong evidence in support of this hypothesis across three different experiments, each of which reveal different facets of moral self-correction. We find that the capability for moral self-correction emerges at 22B model parameters, and typically improves with increasing model size and RLHF training. We believe that at this level of scale, language models obtain two capabilities that they can use for moral self-correction: (1) they can follow instructions and (2) they can learn complex normative concepts of harm like stereotyping, bias, and discrimination. As such, they can follow instructions to avoid certain kinds of morally harmful outputs. We believe our results are cause for cautious optimism regarding the ability to train language models to abide by ethical principles.

openalex-author · Findings of the Association for Computational Linguistics: ACL 2023

Discovering Language Model Behaviors with Model-Written Evaluations

Ethan Perez, Sam Ringer, Kamile Lukosiute, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, Andy Jones, Anna Chen, Benjamin Mann, Brian Israel, Bryan Seethor, Cameron McKinnon, Christopher Olah, Da Yan, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li, Eli Tran-Johnson, Guro Khundadze, Jackson Kernion, James Landis, Jamie Kerr, Jared Mueller, Jeeyoon Hyun, Joshua Landau, Kamal Ndousse, Landon Goldberg, Liane Lovitt, Martin Lucas, Michael Sellitto, Miranda Zhang, Neerav Kingsland, Nelson Elhage, Nicholas Joseph, Noemi Mercado, Nova DasSarma, Oliver Rausch, Robin Larson, Sam McCandlish, Scott Johnston, Shauna Kravec, Sheer El Showk, Tamera Lanham, Timothy Telleen-Lawton, Tom Brown, Tom Henighan, Tristan Hume, Yuntao Bai, Zac Hatfield-Dodds, Jack Clark, Samuel R. Bowman, Amanda Askell, Roger Grosse, Danny Hernandez, Deep Ganguli, Evan Hubinger, Nicholas Schiefer, Jared Kaplan. Findings of the Association for Computational Linguistics: ACL 2023. 2023.

openalex-author · arXiv (Cornell University)

Measuring Progress on Scalable Oversight for Large Language Models

Developing safe and useful general-purpose AI systems will require us to make progress on scalable oversight: the problem of supervising systems that potentially outperform us on most skills relevant to the task at hand. Empirical work on this problem is not straightforward, since we do not yet have systems that broadly exceed our abilities. This paper discusses one of the major ways we think about this problem, with a focus on ways it can be studied empirically. We first present an experimental design centered on tasks for which human specialists succeed but unaided humans and current general AI systems fail. We then present a proof-of-concept experiment meant to demonstrate a key feature of this experimental design and show its viability with two question-answering tasks: MMLU and time-limited QuALITY. On these tasks, we find that human participants who interact with an unreliable large-language-model dialog assistant through chat -- a trivial baseline strategy for scalable oversight -- substantially outperform both the model alone and their own unaided performance. These results are an encouraging sign that scalable oversight will be tractable to study with present models and bolster recent findings that large language models can productively assist humans with difficult tasks.

openalex-author · arXiv (Cornell University)

In-context Learning and Induction Heads

"Induction heads" are attention heads that implement a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. In this work, we present preliminary and indirect evidence for a hypothesis that induction heads might constitute the mechanism for the majority of all "in-context learning" in large transformer models (i.e. decreasing loss at increasing token indices). We find that induction heads develop at precisely the same point as a sudden sharp increase in in-context learning ability, visible as a bump in the training loss. We present six complementary lines of evidence, arguing that induction heads may be the mechanistic source of general in-context learning in transformer models of any size. For small attention-only models, we present strong, causal evidence; for larger models with MLPs, we present correlational evidence.

openalex-author · arXiv (Cornell University)

Governance Architecture for Neural Network Superposition: A Structural Solution to Hallucination via Routing and Interference Filtering

Neural networks often pack many unrelated concepts into a single neuron - a puzzling phenomenon known as 'polysemanticity' which makes interpretability much more challenging. This paper provides a toy model where polysemanticity can be fully understood, arising as a result of models storing additional sparse features in "superposition." We demonstrate the existence of a phase change, a surprising connection to the geometry of uniform polytopes, and evidence of a link to adversarial examples. We also discuss potential implications for mechanistic interpretability.

openalex-author · arXiv (Cornell University)

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

We describe our early efforts to red team language models in order to simultaneously discover, measure, and attempt to reduce their potentially harmful outputs. We make three main contributions. First, we investigate scaling behaviors for red teaming across 3 model sizes (2.7B, 13B, and 52B parameters) and 4 model types: a plain language model (LM); an LM prompted to be helpful, honest, and harmless; an LM with rejection sampling; and a model trained to be helpful and harmless using reinforcement learning from human feedback (RLHF). We find that the RLHF models are increasingly difficult to red team as they scale, and we find a flat trend with scale for the other model types. Second, we release our dataset of 38,961 red team attacks for others to analyze and learn from. We provide our own analysis of the data and find a variety of harmful outputs, which range from offensive language to more subtly harmful non-violent unethical outputs. Third, we exhaustively describe our instructions, processes, statistical methodologies, and uncertainty about red teaming. We hope that this transparency accelerates our ability to work together as a community in order to develop shared norms, practices, and technical standards for how to red team language models.

openalex-author · arXiv (Cornell University)

Language Models (Mostly) Know What They Know

We study whether language models can evaluate the validity of their own claims and predict which questions they will be able to answer correctly. We first show that larger models are well-calibrated on diverse multiple choice and true/false questions when they are provided in the right format. Thus we can approach self-evaluation on open-ended sampling tasks by asking models to first propose answers, and then to evaluate the probability "P(True)" that their answers are correct. We find encouraging performance, calibration, and scaling for P(True) on a diverse array of tasks. Performance at self-evaluation further improves when we allow models to consider many of their own samples before predicting the validity of one specific possibility. Next, we investigate whether models can be trained to predict "P(IK)", the probability that "I know" the answer to a question, without reference to any particular proposed answer. Models perform well at predicting P(IK) and partially generalize across tasks, though they struggle with calibration of P(IK) on new tasks. The predicted P(IK) probabilities also increase appropriately in the presence of relevant source materials in the context, and in the presence of hints towards the solution of mathematical word problems. We hope these observations lay the groundwork for training more honest models, and for investigating how honesty generalizes to cases where models are trained on objectives other than the imitation of human writing.

openalex-author · 2022 ACM Conference on Fairness Accountability and Transparency

Predictability and Surprise in Large Generative Models

Large-scale pre-training has recently emerged as a technique for creating capable, general-purpose, generative models such as GPT-3, Megatron-Turing NLG, Gopher, and many others. In this paper, we highlight a counterintuitive property of such models and discuss the policy implications of this property. Namely, these generative models have a paradoxical combination of predictable loss on a broad training distribution (as embodied in their ”scaling laws”), and unpredictable specific capabilities, inputs, and outputs. We believe that the high-level predictability and appearance of useful capabilities drives rapid development of such models, while the unpredictable qualities make it difficult to anticipate the consequences of model deployment. We go through examples of how this combination can lead to socially harmful behavior with examples from the literature and real world observations, and we also perform two novel experiments to illustrate our point about harms from unpredictability. Furthermore, we analyze how these conflicting properties combine to give model developers various motivations for deploying these models, and challenges that can hinder deployment. We conclude with a list of possible interventions the AI community may take to increase the chance of these models having a beneficial impact. We intend for this paper to be useful to policymakers who want to understand and regulate AI systems, technologists who care about the potential policy impact of their work, funders who want to support work addressing these challenges, and academics who want to analyze, critique, and potentially develop large generative models.

openalex-author · arXiv (Cornell University)

Scaling Laws and Interpretability of Learning from Repeated Data

Recent large language models have been trained on vast datasets, but also often on repeated data, either intentionally for the purpose of upweighting higher quality data, or unintentionally because data deduplication is not perfect and the model is exposed to repeated data at the sentence, paragraph, or document level. Some works have reported substantial negative performance effects of this repeated data. In this paper we attempt to study repeated data systematically and to understand its effects mechanistically. To do this, we train a family of models where most of the data is unique but a small fraction of it is repeated many times. We find a strong double descent phenomenon, in which repeated data can lead test loss to increase midway through training. A predictable range of repetition frequency leads to surprisingly severe degradation in performance. For instance, performance of an 800M parameter model can be degraded to that of a 2x smaller model (400M params) by repeating 0.1% of the data 100 times, despite the other 90% of the training tokens remaining unique. We suspect there is a range in the middle where the data can be memorized and doing so consumes a large fraction of the model's capacity, and this may be where the peak of degradation occurs. Finally, we connect these observations to recent mechanistic interpretability work - attempting to reverse engineer the detailed computations performed by the model - by showing that data repetition disproportionately damages copying and internal structures associated with generalization, such as induction heads, providing a possible mechanism for the shift from generalization to memorization. Taken together, these results provide a hypothesis for why repeating a relatively small fraction of data in large language models could lead to disproportionately large harms to performance.

openalex-author · arXiv (Cornell University)

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

We apply preference modeling and reinforcement learning from human feedback (RLHF) to finetune language models to act as helpful and harmless assistants. We find this alignment training improves performance on almost all NLP evaluations, and is fully compatible with training for specialized skills such as python coding and summarization. We explore an iterated online mode of training, where preference models and RL policies are updated on a weekly cadence with fresh human feedback data, efficiently improving our datasets and models. Finally, we investigate the robustness of RLHF training, and identify a roughly linear relation between the RL reward and the square root of the KL divergence between the policy and its initialization. Alongside our main results, we perform peripheral analyses on calibration, competing objectives, and the use of OOD detection, compare our models with human writers, and provide samples from our models using prompts appearing in recent related work.

openalex-author · arXiv (Cornell University)

A General Language Assistant as a Laboratory for Alignment

Given the broad capabilities of large language models, it should be possible to work towards a general-purpose, text-based assistant that is aligned with human values, meaning that it is helpful, honest, and harmless. As an initial foray in this direction we study simple baseline techniques and evaluations, such as prompting. We find that the benefits from modest interventions increase with model size, generalize to a variety of alignment evaluations, and do not compromise the performance of large models. Next we investigate scaling trends for several training objectives relevant to alignment, comparing imitation learning, binary discrimination, and ranked preference modeling. We find that ranked preference modeling performs much better than imitation learning, and often scales more favorably with model size. In contrast, binary discrimination typically performs and scales very similarly to imitation learning. Finally we study a `preference model pre-training' stage of training, with the goal of improving sample efficiency when finetuning on human preferences.

openalex-author · arXiv (Cornell University)

Evaluating Large Language Models Trained on Code

We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics.

openalex-author · arXiv (Cornell University)

Scaling Laws for Autoregressive Generative Modeling

We identify empirical scaling laws for the cross-entropy loss in four domains: generative image modeling, video modeling, multimodal image$\leftrightarrow$text models, and mathematical problem solving. In all cases autoregressive Transformers smoothly improve in performance as model size and compute budgets increase, following a power-law plus constant scaling law. The optimal model size also depends on the compute budget through a power-law, with exponents that are nearly universal across all data domains. The cross-entropy loss has an information theoretic interpretation as $S($True$) + D_{\mathrm{KL}}($True$||$Model$)$, and the empirical scaling laws suggest a prediction for both the true data distribution's entropy and the KL divergence between the true and model distributions. With this interpretation, billion-parameter Transformers are nearly perfect models of the YFCC100M image distribution downsampled to an $8\times 8$ resolution, and we can forecast the model size needed to achieve any given reducible loss (ie $D_{\mathrm{KL}}$) in nats/image for other resolutions. We find a number of additional scaling laws in specific domains: (a) we identify a scaling relation for the mutual information between captions and images in multimodal models, and show how to answer the question "Is a picture worth a thousand words?"; (b) in the case of mathematical problem solving, we identify scaling laws for model performance when extrapolating beyond the training distribution; (c) we finetune generative image models for ImageNet classification and find smooth scaling of the classification loss and error rate, even as the generative loss levels off. Taken together, these results strengthen the case that scaling laws have important implications for neural network performance, including on downstream tasks.

openalex-author · arXiv (Cornell University)

Language Models are Few-Shot Learners

Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

openalex-author · arXiv (Cornell University)

Scaling Laws for Neural Language Models

We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. Simple equations govern the dependence of overfitting on model/dataset size and the dependence of training speed on model size. These relationships allow us to determine the optimal allocation of a fixed compute budget. Larger models are significantly more sample-efficient, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping significantly before convergence.

openalex-author · arXiv (Cornell University)

Fine-Tuning Language Models from Human Preferences

Reward learning enables the application of reinforcement learning (RL) to tasks where reward is defined by human judgment, building a model of reward by asking humans questions. Most work on reward learning has used simulated environments, but complex information about values is often expressed in natural language, and we believe reward learning for language is a key to making RL practical and safe for real-world tasks. In this paper, we build on advances in generative pretraining of language models to apply reward learning to four natural language tasks: continuing text with positive sentiment or physically descriptive language, and summarization tasks on the TL;DR and CNN/Daily Mail datasets. For stylistic continuation we achieve good results with only 5,000 comparisons evaluated by humans. For summarization, models trained with 60,000 comparisons copy whole sentences from the input but skip irrelevant preamble; this leads to reasonable ROUGE scores and very good performance according to our human labelers, but may be exploiting the fact that labelers rely on simple heuristics.

openalex-author · Journal of the American Society for Mass Spectrometry

Improving Precursor Selectivity in Data-Independent Acquisition Using Overlapping Windows

A major goal of proteomics research is the accurate and sensitive identification and quantification of a broad range of proteins within a sample. Data-independent acquisition (DIA) approaches that acquire MS/MS spectra independently of precursor information have been developed to overcome the reproducibility challenges of data-dependent acquisition and the limited breadth of targeted proteomics strategies. Typical DIA implementations use wide MS/MS isolation windows to acquire comprehensive fragment ion data. However, wide isolation windows produce highly chimeric spectra, limiting the achievable sensitivity and accuracy of quantification and identification. Here, we present a DIA strategy in which spectra are collected with overlapping (rather than adjacent or random) windows and then computationally demultiplexed. This approach improves precursor selectivity by nearly a factor of 2, without incurring any loss in mass range, mass resolution, chromatographic resolution, scan speed, or other key acquisition parameters. We demonstrate a 64% improvement in sensitivity and a 17% improvement in peptides detected in a 6-protein bovine mix spiked into a yeast background. To confirm the method's applicability to a realistic biological experiment, we also analyze the regulation of the proteasome in yeast grown in rapamycin and show that DIA experiments with overlapping windows can help elucidate its adaptation toward the degradation of oxidatively damaged proteins. Our integrated computational and experimental DIA strategy is compatible with any DIA-capable instrument. The computational demultiplexing algorithm required to analyze the data has been made available as part of the open-source proteomics software tools Skyline and msconvert (Proteowizard), making it easy to apply as part of standard proteomics workflows. Graphical Abstract.

openalex-author · arXiv (Cornell University)

An Empirical Model of Large-Batch Training

In an increasing number of domains it has been demonstrated that deep learning models can be trained using relatively large batch sizes without sacrificing data efficiency. However the limits of this massive data parallelism seem to differ from domain to domain, ranging from batches of tens of thousands in ImageNet to batches of millions in RL agents that play the game Dota 2. To our knowledge there is limited conceptual understanding of why these limits to batch size differ or how we might choose the correct batch size in a new domain. In this paper, we demonstrate that a simple and easy-to-measure statistic called the gradient noise scale predicts the largest useful batch size across many domains and applications, including a number of supervised learning datasets (MNIST, SVHN, CIFAR-10, ImageNet, Billion Word), reinforcement learning domains (Atari and Dota), and even generative model training (autoencoders on SVHN). We find that the noise scale increases as the loss decreases over a training run and depends on the model size primarily through improved model performance. Our empirically-motivated theory also describes the tradeoff between compute-efficiency and time-efficiency, and provides a rough model of the benefits of adaptive batch-size training.

openalex-author · arXiv (Cornell University)

Reward learning from human preferences and demonstrations in Atari

To solve complex real-world problems with reinforcement learning, we cannot rely on manually specified reward functions. Instead, we can have humans communicate an objective to the agent directly. In this work, we combine two approaches to learning from human feedback: expert demonstrations and trajectory preferences. We train a deep neural network to model the reward function and use its predicted reward to train an DQN-based deep reinforcement learning agent on 9 Atari games. Our approach beats the imitation learning baseline in 7 games and achieves strictly superhuman performance on 2 games without using game rewards. Additionally, we investigate the goodness of fit of the reward model, present some reward hacking problems, and study the effects of noise in the human labels.

openalex-author · arXiv (Cornell University)

Supervising strong learners by amplifying weak experts

Many real world learning tasks involve complex or hard-to-specify objectives, and using an easier-to-specify proxy can lead to poor performance or misaligned behavior. One solution is to have humans provide a training signal by demonstrating or judging performance, but this approach fails if the task is too complicated for a human to directly evaluate. We propose Iterated Amplification, an alternative training strategy which progressively builds up a training signal for difficult problems by combining solutions to easier subproblems. Iterated Amplification is closely related to Expert Iteration (Anthony et al., 2017; Silver et al., 2017), except that it uses no external reward function. We present results in algorithmic environments, showing that Iterated Amplification can efficiently learn complex behaviors.

openalex-author · arXiv (Cornell University)

Variational Option Discovery Algorithms

We explore methods for option discovery based on variational inference and make two algorithmic contributions. First: we highlight a tight connection between variational option discovery methods and variational autoencoders, and introduce Variational Autoencoding Learning of Options by Reinforcement (VALOR), a new method derived from the connection. In VALOR, the policy encodes contexts from a noise distribution into trajectories, and the decoder recovers the contexts from the complete trajectories. Second: we propose a curriculum learning approach where the number of contexts seen by the agent increases whenever the agent's performance is strong enough (as measured by the decoder) on the current set of contexts. We show that this simple trick stabilizes training for VALOR and prior variational option discovery methods, allowing a single agent to learn many more modes of behavior than it could with a fixed context distribution. Finally, we investigate other topics related to variational option discovery, including fundamental limitations of the general approach and the applicability of learned options to downstream tasks.

openalex-author · arXiv (Cornell University)

AI safety via debate

To make AI systems broadly useful for challenging real-world tasks, we need them to learn complex human goals and preferences. One approach to specifying complex goals asks humans to judge during training which agent behaviors are safe and useful, but this approach can fail if the task is too complicated for a human to directly judge. To help address this concern, we propose training agents via self play on a zero sum debate game. Given a question or proposed action, two agents take turns making short statements up to a limit, then a human judges which of the agents gave the most true, useful information. In an analogy to complexity theory, debate with optimal play can answer any question in PSPACE given polynomial time judges (direct judging answers only NP questions). In practice, whether debate works involves empirical questions about humans and the tasks we want AIs to perform, plus theoretical questions about the meaning of AI alignment. We report results on an initial MNIST experiment where agents compete to convince a sparse classifier, boosting the classifier's accuracy from 59.4% to 88.9% given 6 pixels and from 48.2% to 85.2% given 4 pixels. Finally, we discuss theoretical and practical aspects of the debate model, focusing on potential weaknesses as the model scales up, and we propose future human and computer experiments to test these properties.

openalex-author · arXiv (Cornell University)

The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation

This report surveys the landscape of potential security threats from malicious uses of AI, and proposes ways to better forecast, prevent, and mitigate these threats. After analyzing the ways in which AI may influence the threat landscape in the digital, physical, and political domains, we make four high-level recommendations for AI researchers and other stakeholders. We also suggest several promising areas for further research that could expand the portfolio of defenses, or make attacks less effective or harder to execute. Finally, we discuss, but do not conclusively resolve, the long-term equilibrium of attackers and defenders.

openalex-author · arXiv (Cornell University)

Deep reinforcement learning from human preferences

For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this work, we explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion, while providing feedback on less than one percent of our agent's interactions with the environment. This reduces the cost of human oversight far enough that it can be practically applied to state-of-the-art RL systems. To demonstrate the flexibility of our approach, we show that we can successfully train complex novel behaviors with about an hour of human time. These behaviors and environments are considerably more complex than any that have been previously learned from human feedback.

openalex-author · ISTA Research Explorer

Multi-electrode array recording from salamander retinal ganglion cells

This data was collected as part of the study [1]. It consists of preprocessed multi-electrode array recording from 160 salamander retinal ganglion cells responding to 297 repeats of a 19 s natural movie. The data is available in two formats: (1) a .mat file containing an array with dimensions “number of repeats” x “number of neurons” x “time in a repeat”; (2) a zipped .txt file containing the same data represented as an array with dimensions “number of neurons” x “number of samples”, where the number of samples is equal to the product of the number of repeats and timebins within a repeat. The time dimension is divided into 20 ms time windows, and the array is binary indicating whether a given cell elicited at least one spike in a given time window during a particular repeat. See the reference below for details regarding collection and preprocessing: [1] Tkačik G, Marre O, Amodei D, Schneidman E, Bialek W, Berry MJ II. Searching for Collective Behavior in a Large Network of Sensory Neurons. PLoS Comput Biol. 2014;10(1):e1003408.

openalex-author · arXiv (Cornell University)

Learning a Natural Language Interface with Neural Programmer

Learning a natural language interface for database tables is a challenging task that involves deep language understanding and multi-step reasoning. The task is often approached by mapping natural language queries to logical forms or programs that provide the desired response when executed on the database. To our knowledge, this paper presents the first weakly supervised, end-to-end neural network model to induce such programs on a real-world dataset. We enhance the objective function of Neural Programmer, a neural network with built-in discrete operations, and apply it on WikiTableQuestions, a natural language question-answering dataset. The model is trained end-to-end with weak supervision of question-answer pairs, and does not require domain-specific grammars, rules, or annotations that are key elements in previous approaches to program induction. The main experimental result in this paper is that a single Neural Programmer model achieves 34.2% accuracy using only 10,000 examples with weak supervision. An ensemble of 15 models, with a trivial combination technique, achieves 37.7% accuracy, which is competitive to the current state-of-the-art accuracy of 37.1% obtained by a traditional natural language semantic parser.

openalex-author · Paper

End to end speech recognition in English and Mandarin

No abstract available from the OpenAlex source record.

openalex-author · arXiv (Cornell University)

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our application of HPC techniques, resulting in a 7x speedup over our previous system. Because of this efficiency, experiments that previously took weeks now run in days. This enables us to iterate more quickly to identify superior architectures and algorithms. As a result, in several cases, our system is competitive with the transcription of human workers when benchmarked on standard datasets. Finally, using a technique called Batch Dispatch with GPUs in the data center, we show that our system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.

openalex-author · Proceedings of the National Academy of Sciences

Thermodynamics and signatures of criticality in a network of neurons

The activity of a neural network is defined by patterns of spiking and silence from the individual neurons. Because spikes are (relatively) sparse, patterns of activity with increasing numbers of spikes are less probable, but, with more spikes, the number of possible patterns increases. This tradeoff between probability and numerosity is mathematically equivalent to the relationship between entropy and energy in statistical physics. We construct this relationship for populations of up to N = 160 neurons in a small patch of the vertebrate retina, using a combination of direct and model-based analyses of experiments on the response of this network to naturalistic movies. We see signs of a thermodynamic limit, where the entropy per neuron approaches a smooth function of the energy per neuron as N increases. The form of this function corresponds to the distribution of activity being poised near an unusual kind of critical point. We suggest further tests of criticality, and give a brief discussion of its functional significance.

openalex-author · Molecular & Cellular Proteomics

Identification of a Set of Conserved Eukaryotic Internal Retention Time Standards for Data-independent Acquisition Mass Spectrometry

Accurate knowledge of retention time (RT) in liquid chromatography-based mass spectrometry data facilitates peptide identification, quantification, and multiplexing in targeted and discovery-based workflows. Retention time prediction is particularly important for peptide analysis in emerging data-independent acquisition (DIA) experiments such as SWATH-MS. The indexed RT approach, iRT, uses synthetic spiked-in peptide standards (SiRT) to set RT to a unit-less scale, allowing for normalization of peptide RT between different samples and chromatographic set-ups. The obligatory use of SiRTs can be costly and complicates comparisons and data integration if standards are not included in every sample. Reliance on SiRTs also prevents the inclusion of archived mass spectrometry data for generation of the peptide assay libraries central to targeted DIA-MS data analysis. We have identified a set of peptide sequences that are conserved across most eukaryotic species, termed Common internal Retention Time standards (CiRT). In a series of tests to support the appropriateness of the CiRT-based method, we show: (1) the CiRT peptides normalized RT in human, yeast, and mouse cell lysate derived peptide assay libraries and enabled merging of archived libraries for expanded DIA-MS quantitative applications; (2) CiRTs predicted RT in SWATH-MS data within a 2-min margin of error for the majority of peptides; and (3) normalization of RT using the CiRT peptides enabled the accurate SWATH-MS-based quantification of 340 synthetic isotopically labeled peptides that were spiked into either human or yeast cell lysate. To automate and facilitate the use of these CiRT peptide lists or other custom user-defined internal RT reference peptides in DIA workflows, an algorithm was designed to automatically select a high-quality subset of datapoints for robust linear alignment of RT for use. Implementations of this algorithm are available for the OpenSWATH and Skyline platforms. Thus, CiRT peptides can be used alone or as a complement to SiRTs for RT normalization across peptide spectral libraries and in quantitative DIA-MS studies.

openalex-author · Nature Protocols

Building high-quality assay libraries for targeted analysis of SWATH MS data

No abstract available from the OpenAlex source record.

openalex-author · arXiv (Cornell University)

Thermodynamics for a network of neurons: Signatures of criticality

The activity of a neural network is defined by patterns of spiking and silence from the individual neurons. Because spikes are (relatively) sparse, patterns of activity with increasing numbers of spikes are less probable, but with more spikes the number of possible patterns increases. This tradeoff between probability and numerosity is mathematically equivalent to the relationship between entropy and energy in statistical physics. We construct this relationship for populations of up to N=160 neurons in a small patch of the vertebrate retina, using a combination of direct and model-based analyses of experiments on the response of this network to naturalistic movies. We see signs of a thermodynamic limit, where the entropy per neuron approaches a smooth function of the energy per neuron as N increases. The form of this function corresponds to the distribution of activity being poised near an unusual kind of critical point. Networks with more or less correlation among neurons would not reach this critical state. We suggest further tests of criticality, and give a brief discussion of its functional significance.

openalex-author · PLoS Computational Biology

Searching for Collective Behavior in a Large Network of Sensory Neurons

Maximum entropy models are the least structured probability distributions that exactly reproduce a chosen set of statistics measured in an interacting network. Here we use this principle to construct probabilistic models which describe the correlated spiking activity of populations of up to 120 neurons in the salamander retina as it responds to natural movies. Already in groups as small as 10 neurons, interactions between spikes can no longer be regarded as small perturbations in an otherwise independent system; for 40 or more neurons pairwise interactions need to be supplemented by a global interaction that controls the distribution of synchrony in the population. Here we show that such "K-pairwise" models--being systematic extensions of the previously used pairwise Ising models--provide an excellent account of the data. We explore the properties of the neural vocabulary by: 1) estimating its entropy, which constrains the population's capacity to represent visual information; 2) classifying activity patterns into a small set of metastable collective modes; 3) showing that the neural codeword ensembles are extremely inhomogenous; 4) demonstrating that the state of individual neurons is highly predictable from the rest of the population, allowing the capacity for error correction.

openalex-author · Proceedings of the National Academy of Sciences

Characterizing deformability and surface friction of cancer cells

Metastasis requires the penetration of cancer cells through tight spaces, which is mediated by the physical properties of the cells as well as their interactions with the confined environment. Various microfluidic approaches have been devised to mimic traversal in vitro by measuring the time required for cells to pass through a constriction. Although a cell's passage time is expected to depend on its deformability, measurements from existing approaches are confounded by a cell's size and its frictional properties with the channel wall. Here, we introduce a device that enables the precise measurement of (i) the size of a single cell, given by its buoyant mass, (ii) the velocity of the cell entering a constricted microchannel (entry velocity), and (iii) the velocity of the cell as it transits through the constriction (transit velocity). Changing the deformability of the cell by perturbing its cytoskeleton primarily alters the entry velocity, whereas changing the surface friction by immobilizing positive charges on the constriction's walls primarily alters the transit velocity, indicating that these parameters can give insight into the factors affecting the passage of each cell. When accounting for cell buoyant mass, we find that cells possessing higher metastatic potential exhibit faster entry velocities than cells with lower metastatic potential. We additionally find that some cell types with higher metastatic potential exhibit greater than expected changes in transit velocities, suggesting that not only the increased deformability but reduced friction may be a factor in enabling invasive cancer cells to efficiently squeeze through tight spaces.

openalex-author · Frontiers in Computational Neuroscience

Physical principles for scalable neural recording

Simultaneously measuring the activities of all neurons in a mammalian brain at millisecond resolution is a challenge beyond the limits of existing techniques in neuroscience. Entirely new approaches may be required, motivating an analysis of the fundamental physical constraints on the problem. We outline the physical principles governing brain activity mapping using optical, electrical, magnetic resonance, and molecular modalities of neural recording. Focusing on the mouse brain, we analyze the scalability of each method, concentrating on the limitations imposed by spatiotemporal resolution, energy dissipation, and volume displacement. Based on this analysis, all existing approaches require orders of magnitude improvement in key parameters. Electrical recording is limited by the low multiplexing capacity of electrodes and their lack of intrinsic spatial resolution, optical methods are constrained by the scattering of visible light in brain tissue, magnetic resonance is hindered by the diffusion and relaxation timescales of water protons, and the implementation of molecular recording is complicated by the stochastic kinetics of enzymes. Understanding the physical limits of brain activity mapping may provide insight into opportunities for novel solutions. For example, unconventional methods for delivering electrodes may enable unprecedented numbers of recording sites, embedded optical devices could allow optical detectors to be placed within a few scattering lengths of the measured neurons, and new classes of molecularly engineered sensors might obviate cumbersome hardware architectures. We also study the physics of powering and communicating with microscale devices embedded in brain tissue and find that, while radio-frequency electromagnetic data transmission suffers from a severe power-bandwidth tradeoff, communication via infrared light or ultrasound may allow high data rates due to the possibility of spatial multiplexing. The use of embedded local recording and wireless data transmission would only be viable, however, given major improvements to the power efficiency of microelectronic devices.

openalex-author · The Journal of Neuroscience

Mapping a Complete Neural Population in the Retina

Recording simultaneously from essentially all of the relevant neurons in a local circuit is crucial to understand how they collectively represent information. Here we show that the combination of a large, dense multielectrode array and a novel, mostly automated spike-sorting algorithm allowed us to record simultaneously from a highly overlapping population of >200 ganglion cells in the salamander retina. By combining these methods with labeling and imaging, we showed that up to 95% of the ganglion cells over the area of the array were recorded. By measuring the coverage of visual space by the receptive fields of the recorded cells, we concluded that our technique captured a neural population that forms an essentially complete representation of a region of visual space. This completeness allowed us to determine the spatial layout of different cell types as well as identify a novel group of ganglion cells that responded reliably to a set of naturalistic and artificial stimuli but had no measurable receptive field. Thus, our method allows unprecedented access to the complete neural representation of visual information, a crucial step for the understanding of population coding in sensory systems.

openalex-author · Nature Biotechnology

A cross-platform toolkit for mass spectrometry and proteomics

No abstract available from the OpenAlex source record.

openalex-author · Journal of Neurophysiology

Low error discrimination using a correlated population code

We explored the manner in which spatial information is encoded by retinal ganglion cell populations. We flashed a set of 36 shape stimuli onto the tiger salamander retina and used different decoding algorithms to read out information from a population of 162 ganglion cells. We compared the discrimination performance of linear decoders, which ignore correlation induced by common stimulation, with nonlinear decoders, which can accurately model these correlations. Similar to previous studies, decoders that ignored correlation suffered only a modest drop in discrimination performance for groups of up to ∼30 cells. However, for more realistic groups of 100+ cells, we found order-of-magnitude differences in the error rate. We also compared decoders that used only the presence of a single spike from each cell with more complex decoders that included information from multiple spike counts and multiple time bins. More complex decoders substantially outperformed simpler decoders, showing the importance of spike timing information. Particularly effective was the first spike latency representation, which allowed zero discrimination errors for the majority of shape stimuli. Furthermore, the performance of nonlinear decoders showed even greater enhancement compared with linear decoders for these complex representations. Finally, decoders that approximated the correlation structure in the population by matching all pairwise correlations with a maximum entropy model fit to all 162 neurons were quite successful, especially for the spike latency representation. Together, these results suggest a picture in which linear decoders allow a coarse categorization of shape stimuli, whereas nonlinear decoders, which take advantage of both correlation and spike timing, are needed to achieve high-fidelity discrimination.

openalex-author · Paper

Network-Scale Electrophysiology: Measuring and Understanding the Collective Behavior of Neural Circuits

Some of the most intriguing, powerful, and complex features of biological systems arise from the collective properties of large networks of relatively simple elements. This is particularly true in neural systems, where the bewilderingly complex behavior of whole organisms emerges from the comparatively simple activities of billions of neurons. Unfortunately, the rigorous study of neural collective behavior is very challenging from both a theoretical and an experimental perspective. Experimentally, studying collective properties requires the ability to record from large numbers of neurons in a connected network with single cell resolution. Technologies suitable for this task have only recently begun to emerge, and are still in their early stages, with an enormous amount of work left to be done. Theoretically, the study of collective systems can be computationally intractable if approached naively, as the number of collective states of a system tends to increase exponentially with the number of elements in the system. Efforts to render this problem more tractable are also in their early stages. This work presents four novel but related advances on both the experimental and theoretical fronts. The retina is used as a model system, but most of the technologies and methods presented here are widely applicable to other neural systems and in some cases to biological systems in general. Part I presents an improvement to microelectrode array technology which enables very high quality recording of almost all the spiking activity in a small patch of retinal ganglion cells. Part II presents a new signal processing technique for identifying and differentiating the spikes recorded using the device described in part I, enabling us to record from almost all of the 200+ cells in a 0.5 x 0.5 mm patch of retina. I also discuss cell labeling experiments verifying that we have in fact recorded from almost all the cells in this patch. In part III, we use the data collected in parts I and II to empirically test a set of models of collective neural activity (data-driven MaxEnt models) at much larger scales than had been possible before. We find that the model performs well, and we also develop a more advanced model which captures the neural behavior even better than the Ising model. We find strong preliminary evidence of critical behavior in both the Ising model and the newly developed model, which confirms a key theoretical prediction made several years earlier. Finally, in part IV, we present early progress towards a completely novel recording technology, designed to improve the capabilities and the scale of intracellular recording, eventually enabling its application to network-scale studies. The new device is a customized patch clamp electrode, and opens up the eventual possibility of recording from many cells in a small region of tissue simultaneously as well as performing internal dialysis in tissue. Although not yet fully mature, this device potentially represents a new frontier in intracellular recording.

openalex-author · Physical Review E

Computation of uniform wave forms using complex rays

Complex rays and polynomial phase functions are used to numerically solve the Helmholtz equation in a realistic two-dimensional smoothly varying heterogeneous velocity model with multiple adjacent cusp caustics. Together these two methods allow the determination of global uniformly asymptotic solutions in the presence of arbitrarily many caustics. Two algorithms are introduced to this end: a two-point ray tracing algorithm for complex rays and a perturbation method for constructing polynomial phase functions. Model representation in complex space is performed via discrete cosine transform analysis. Geometrical and uniformly asymptotic solutions are computed for a linear layer test model as well as a velocity model from Yucca Mountain.

openalex-author · AGUFM

Computing Uniformly Asymptotic Seismograms Using Complex Ray Tracing

No abstract available from the OpenAlex source record.

openalex-author · Applied Optics

Mirrors with regular hexagonal segments

The point-spread function and emissivity are calculated for a mirror made from regular hexagonal segments of just a few different sizes. A mirror of this type has many similar segments, which is an advantage for manufacturing, and for an approximately f/1 mirror with > or = 1000 segments and > or = 4 sizes of regular hexagons the increase in intersegment gap area is negligible. This result raises the possibility of making a mirror from very large numbers of identical small segments that are warped to the required figure.