m8ta
You are not authenticated, login.
text: sort by
tags: modified
type: chronology
{1556}
hide / / print
ref: -0 tags: concept net NLP transformers graph representation knowledge date: 11-04-2021 17:48 gmt revision:0 [head]

Symbolic Knowledge Distillation: from General Language Models to Commonsense Models

  • From a team at University of Washington / Allen institute for artificial intelligence/
  • Courtesy of Yannic Kilcher's youtube channel.
  • General idea: use GPT-3 as a completion source given a set of prompts, like:
    • X starts running
      • So, X gets in shape
    • X and Y engage in an argument
      • So, X wants to avoid Y.
  • There are only 7 linkage atoms (edges, so to speak) in these queries, but of course many actions / direct objects.
    • These prompts are generated from the Atomic 20-20 human-authored dataset.
    • The prompts are fed into 175B parameter DaVinci model, resulting in 165k examples in the 7 linkages after cleaning.
    • In turn the 165k are fed into a smaller version of GPT-3, Curie, that generates 6.5M text examples, aka Atomic 10x.
  • Then filter the results via a second critic model, based on fine-tuned RoBERTa & human supervision to determine if a generated sentence is 'good' or not.
  • By throwing away 62% of Atomic 10x, they get a student accuracy of 96.4%, much better than the human-designed knowledge graph.
    • They suggest that one way thins works is by removing degenerate outputs from GPT-3.

Human-designed knowledge graphs are described here: ConceptNet 5.5: An Open Multilingual Graph of General Knowledge

And employed for profit here: https://www.luminoso.com/

{1497}
hide / / print
ref: -2017 tags: human level concept learning through probabalistic program induction date: 01-20-2020 15:45 gmt revision:0 [head]

PMID-26659050 Human level concept learning through probabalistic program induction

  • Preface:
    • How do people learn new concepts from just one or a few examples?
    • And how do people learn such abstract, rich, and flexible representations?
    • How can learning succeed from such sparse dataset also produce such rich representations?
    • For any theory of learning, fitting a more complicated model requires more data, not less, to achieve some measure of good generalization, usually in the difference between new and old examples.
  • Learning proceeds bu constructing programs that best explain the observations under a Bayesian criterion, and the model 'learns to learn' by developing hierarchical priors that allow previous experience with related concepts to ease learning of new concepts.
  • These priors represent learned inductive bias that abstracts the key regularities and dimensions of variation holding actoss both types of concepts and across instances.
  • BPL can construct new programs by reusing pieced of existing ones, capturing the causal and compositional properties of real-world generative processes operating on multiple scales.
  • Posterior inference requires searching the large combinatorial space of programs that could have generated a raw image.
    • Our strategy uses fast bottom-up methods (31) to propose a range of candidate parses.
    • That is, they reduce the character to a set of lines (series of line segments), then simply the intersection of those lines, and run a series of parses to estimate the generation of those lines, with heuristic criteria to encourage continuity (e.g. no sharp angles, penalty for abruptly changing direction, etc).
    • The most promising candidates are refined by using continuous optimization and local search, forming a discrete approximation to the posterior distribution P(program, parameters | image).