You are not authenticated, login.
text: sort by
tags: modified
type: chronology
hide / / print
ref: -0 tags: diffusion models image generation OpenAI date: 12-24-2021 05:50 gmt revision:0 [head]

Some investigations into denoising models & their intellectual lineage:

Deep Unsupervised Learning using Nonequilibrium Thermodynamics 2015

  • Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli
  • Starting derivation of using diffusion models for training.
  • Verrry roughly, the idea is to destroy the structure in an image using diagonal Gaussian per-pixel, and train an inverse-diffusion model to remove the noise at each step. Then start with Gaussian noise and reverse-diffuse an image.
  • Diffusion can take 100s - 1000s of steps; steps are made small to preserve the assumption that the conditional probability, p(x t1|x t)N(0,I)p(x_{t-1}|x_t) \propto N(0, I)
    • The time variable here goes from 0 (uncorrupted data) to T (fully corrupted / Gaussian noise)

Generative Modeling by Estimating Gradients of the Data Distribution July 2019

  • Yang Song, Stefano Ermon

Denoising Diffusion Probabilistic Models June 2020

  • Jonathan Ho, Ajay Jain, Pieter Abbeel
  • A diffusion model that can output 'realistic' images (low FID / low log-likelihood )

Improved Denoising Diffusion Probabilistic Models Feb 2021

  • Alex Nichol, Prafulla Dhariwal
  • This is directly based on Ho 2020 and Shol-Dickstein 2015, but with tweaks
  • The objective is no longer the log-likelihood of the data given the parameters (per pixel); it's now mostly the MSE between the corrupting noise (which is known) and the estimated noise.
  • That is, the neural network model attempts, given x tx_t to estimate the noise which corrupted it, which then can be used to produce x t1x_{t-1}
    • Simpicity. Satisfying.
  • The also include a reweighted version of the log-likelihood loss, which puts more emphasis on the first few steps of noising. These steps are more important for NLL; reweighting also smooths the loss.
    • I think that, per Ho above, the simple MSE loss is sufficient to generate good images, but the reweighted LL improves the likelihood of the parameters.
  • There are some good crunchy mathematical details on how how exactly the the mean and variance of the estimated Gaussian distributions are handled -- at each noising step, you need to scale the mean down to prevent Brownian / random walk.
    • Taking these further, you can estimate an image at any point t in the forward diffusion chain. They use this fact to optimize the function approximator (a neural network; more later) using a (random but re-weighted/scheduled) t and the LL loss + simple loss.
  • Ho 2020 above treats the variance of the noising Gaussian as fixed -- that is, β \beta ; this paper improves the likelihood by adjusting the noise varaince mostly at the last steps by a ~β t~\beta_t , and then further allowing the function approximator to tune the variance (a multiplicative factor) per inverse-diffusion timestep.
    • TBH I'm still slightly foggy on how you go from estimating noise (this seems like samples, concrete) to then estimating variance (which is variational?). hmm.
  • Finally, they schedule the forward noising with a cosine^2, rather than a linear ramp. This makes the last phases of corruption more useful.
  • Because they have an explicit parameterization of the noise varaince, they can run the inverse diffusion (e.g. image generation) faster -- rather than 4000 steps, which can take afew minutes on a GPU, they can step up the variance and run it only for 50 steps and get nearly as good images.

Diffusion Models Beat GANs on Image Synthesis May 2021

  • Prafulla Dhariwal, Alex Nichol

In all of above, it seems that the inverse-diffusion function approximator is a minor player in the paper -- but of course, it's vitally important to making the system work. In some sense, this 'diffusion model' is as much a means of training the neural network as it is a (rather inefficient, compared to GANs) way of sampling from the data distribution. In Nichol & Dhariwal Feb 2021, they use a U-net convolutional network (e.g. start with few channels, downsample and double the channels until there are 128-256 channels, then upsample x2 and half the channels) including multi-headed attention. Ho 2020 used single-headed attention only at the 16x16 level. Ho 2020 in turn was based on PixelCNN++

PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications Jan 2017

  • Tim Salimans, Andrej Karpathy, Xi Chen, Diederik P. Kingma

which is an improvement to (e.g. add selt-attention layers)

Conditional Image Generation with PixelCNN Decoders

  • Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, Koray Kavukcuoglu

Most recently,

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

  • Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, Mark Chen

Added text-conditional generation + many more parameters + much more compute to yield very impressive image results + in-painting. This last effect is enabled by the fact that it's a full generative denoising probabilistic model -- you can condition on other parts of the image!

hide / / print
ref: -2018 tags: Michael Levin youtube talk NIPS 2018 regeneration bioelectricity organism patterning flatworm date: 04-09-2019 18:50 gmt revision:1 [0] [head]

What Bodies Think About: Bioelectric Computation Outside the Nervous System - NeurIPS 2018

  • Short notes from watching the video, mostly interesting factoids: (This is a somewhat more coordinated narrative in the video. Am resisting ending each of these statements with and exclamation point).
  • Human children up to 7-11 years old can regenerate their fingertips.
  • Human embryos, when split in half early, develop into two normal humans; mouse embryos, when squished together, make one normal mouse.
  • Butterflies retain memories from their caterpillar stage, despite their brains liquefying during metamorphosis.
  • Flatworms are immortal, and can both grow and contract, as the environment requires.
    • They can also regenerate a whole body from segments, and know to make one head, tail, gut etc.
  • Single cell organisms, e.g. Lacrymaria, can have complex (and fast!) foraging / hunting plans -- without a brain or anything like it.
  • Axolotl can regenerate many parts of their body (appendages etc), including parts of the nervous system.
  • Frog embryos can self-organize an experimenter jumbled body plan, despite the initial organization having never been experienced in evolution.
  • Salamanders, when their tail is grafted into a foot/leg position, remodel the transplant into a leg and foot.
  • Neurotransmitters are ancient; fungi, who diverged from other forms of life about 1.5 billion years ago, still use the same set of inter-cell transmitters e.g. serotonin, which is why modulatory substances from them have high affinity & a strong effect on humans.
  • Levin, collaborators and other developmental biologists have been using voltage indicators in embryos ... this is not just for neurons.
  • Can make different species head shapes in flatworms by exposing them to ion-channel modulating drugs. This despite the fact that the respective head shapes are from species that have been evolving separately for 150 million years.
  • Indeed, you can reprogram (with light gated ion channels, drugs, etc) to body shapes not seen in nature or not explored by evolution.
    • That said, this was experimental, not by design; Levin himself remarks that the biology that generates these body plans is not known.
  • Flatworms can sore memory in bioelectric networks.
  • Frogs don't normally regenerate their limbs. But, with a drug cocktail targeting bioelectric signaling, they can regenerate semi-functional legs, complete with nerves, muscle, bones, and cartilage. The legs are functional (enough).
  • Manipulations of bioelectric signaling can reverse very serious genetic problems, e.g. deletion of Notch, to the point that tadpoles regain some ability for memory creation & recall.

  • I wonder how so much information can go through a the apparently scalar channel of membrane voltage. It seems you'd get symbol interference, and that many more signals would be required to pattern organs.
  • That said, calcium is used a great many places in the cell for all sorts of signaling tasks, over many different timescales as well, and it doesn't seem to be plagued by interference.
    • First question from the audience was how cells differentiate organismal patterning signals and behavioral signals, e.g. muscle contraction.

hide / / print
ref: -0 tags: third harmonic generation Nd:YAG pulsed laser date: 08-29-2015 06:44 gmt revision:7 [6] [5] [4] [3] [2] [1] [head]

Problem: have a Q-switched Nd:YAG laser, (flashlamp pumped, passively Q-switched) from ebay (see this album). Allegedly it outputs 1J pulses of 8ns duration; in practice, it may put several 100mJ pulses ~ 16ns long while the flashlamp is firing. It was sold as a tattoo removal machine. However, I'm employing it to drill micro-vias in fine polyimide films.

When focused through a 10x objective via the camera mount of an Leica microscope, 532nm (KTP doubled, second harmonic generation (SHG)) laser pulses both ablates the material, but does not leave a clean, sharp hole: it looks more like 'blasting': the hole is ragged, more like a crater. This may be from excessive 1064nm heating (partial KTP conversion), or plasma/flame heating & expansion due to absorption of the 532nm / 1064nm light. It may also be due to excessive pulse duration (should the laser not actually be q-switched... photodiode testing suggests otherwise, but I'd like to verify that), excessive pulse power, insufficient pulse intensity, or insufficient polyimide absorption at 532nm.

The solution to excessive plasma and insufficient polyimide absorption is to shift the wavelength to 355nm (NUV) via third harmonic generation, 1064 + 532 = 355nm. This requires sum frequency generation (SFG), for which LBO (lithium triborate) or BBO (beta-barium borate) seem the commonly accepted nonlinear optical materials.

To get SHG or THG, phase and polarization matching of the incoming light is critical. The output of the Nd:YAG laser is, I assume, non-polarized (or randomly polarized), as the KTP crystal simply screws on the front, and so should be rotationally agnostic (and there are no polarizing elements in the simple laser head -- unless the (presumed) Cr:YAG passive Q-switch induces some polarization.)

Output polarization of the KTP crystal will be perpendicular to the incoming beam; if the resulting THG / SFG crystal needs Type-1 phase matching (both in phase and parallel polarization), will need a half-wave plate for 1064nm; for Type-II phase matching, no plate is needed. For noncritical phase matching in LBO (which I just bought), an oven is required to heat the crystal to the correct temperature.

This suggests 73C for THG, while this suggests 150C (for SHG?).

Third harmonic frequency generation by type-I critically phase-matched LiB3O5 crystal by means of optically active quartz crystal Suggests most lasers operate in Type-1 SHG, and Type-II THG, but this is less efficient than dual Type-1; the quartz crystal is employed to rotate the polarizations to alignment. Both SHG and THG crystals are heated for optimum power output.

Finally, Short pulse duration of an extracavity sum-frequency mixing with an LiB3O5 (LBO) crystal suggests that no polarization change is required, nor oven control LBO temperature. Tight focus and high energy density is required, of course (at the expense of reduced crystal lifetime). Likely this is the Type-1,Type-II scheme alluded to in the paper above. I'll try this first before engaging further complexity (efficiency is not very important, as the holes are very small & material removal may be slow.)

hide / / print
ref: work-0 tags: Cohen Singer SLIPPER machine learning hypothesis generation date: 10-25-2009 18:42 gmt revision:2 [1] [0] [head]


  • "One disadvantage of boosting is that improvements in accuracy are often obtained at the expense of comprehensibility.
  • SLIPPER = simple learner with iterative pruning to produce error reduction.
  • Inner loop: the weak lerner splits the training data, grows a single rule using one subset of the data, and then prunes the rule using the other subset.
  • They use a confidence-rated prediction based boosting algorithm, which allows the algorithm to abstain from examples not covered by the rule.
    • the sign of h(x) - the weak learner's hyposthesis - is interpreted as the predited label and the magnitude |h(x)| is the confidence in the prediction.
  • SLIPPER only handles two-class problems now, but can be extended..
  • Is better than, though not dramatically so, than c5rules (a commercial version of Quinlan's decision tree algorithms).
  • see also the excellent overview at http://www.cs.princeton.edu/~schapire/uncompress-papers.cgi/msri.ps