you are not logged in, login. new entry
text: sort by
tags: modified
type: chronology
{1104} is owned by tlh24.{822} is owned by tlh24.{845} is owned by tlh24.
hide / edit[0] / print
ref: -0 tags: NET probes SU-8 microfabrication sewing machine carbon fiber electrode insertion mice histology 2p date: 03-01-2017 23:20 gmt revision:0 [head]

Ultraflexible nanoelectronic probes form reliable, glial scar–free neural integration

  • SU-8 asymptotic H2O absorption is 3.3% in PBS -- quite a bit higher than I expected, and higher than PI.
  • Faced yield problems with contact litho at 2-3um trace/space.
  • Good recordings out to 4 months!
  • 3 minutes / probe insertion.
  • Fab:
    • Ni release layer, Su-8 2000.5. "excellent tensile strength" --
      • Tensile strength 60 MPa
      • Youngs modulus 2.0 GPa
      • Elongation at break 6.5%
      • Water absorption, per spec sheet, 0.65% (but not PBS)
    • 500nm dielectric; < 1% crosstalk; see figure S12.
    • Pt or Au rec sites, 10um x 20um or 30 x 30um.
    • FFC connector, with Si substrate remaining.
  • Used transgenic mice, YFP expressed in neurons.
  • CA glue used before metabond, followed by Kwik-sil silicone.
  • Neuron yield not so great -- they need to plate the electrodes down to acceptable impedance. (figure S5)
    • Measured impedance ~ 1M at 1khz.
  • Unclear if 50um x 1um is really that much worse than 10um x 1.5um.
  • Histology looks realyl great, (figure S10).
  • Manuscript did not mention (though the did at the poster) problems with electrode pull-out; they deal with it in the same way, application of ACSF.

hide / edit[20] / print
ref: -0 tags: image registration optimization camera calibration sewing machine date: 07-15-2016 05:04 gmt revision:20 [19] [18] [17] [16] [15] [14] [head]

Recently I was tasked with converting from image coordinates to real world coordinates from stereoscopic cameras mounted to the end-effector of a robot. The end goal was to let the user (me!) click on points in the image, and have the robot record that position & ultimately move to it.

The overall strategy is to get a set of points in both image and RW coordinates, then fit some sort of model to the measured data. I began by printing out a grid of (hopefully evenly-spaced and perpendicular) lines via a laserprinter; spacing was ~1.1 mm. This grid was manually aligned to the axes of robot motion by moving the robot along one axis & checking that the lines did not jog.

The images were modeled as a grating with quadratic phase in u,v texture coordinates:

p h(u,v)=sin((a hu/1000+b hv/1000+c h)v+d hu+e hv+f h)+0.97 (1)

p v(u,v)=sin((a vu/1000+b vv/1000+c v)u+d vu+e vv+f v)+0.97 (2)

I(u,v)=16p hp v/(2+16p h 2+16p v 2) (3)

The 1000 was used to make the parameter search distribution more spherical; c h,c v were bias terms to seed the solver; 0.97 was a duty-cycle term fit by inspection to the image data; (3) is a modified sigmoid.

I was then optimized over the parameters using a GPU-accelerated (CUDA) nonlinear stochastic optimization:

(a h,b h,d h,e h,f ha v,b v,d v,e v,f v)=Argmin u v(I(u,v)Img(u,v)) 2 (4)

Optimization was carried out by drawing parameters from a normal distribution with a diagonal covariance matrix, set by inspection, and mean iteratively set to the best solution; horizontal and vertical optimization steps were separable and carried out independently. The equation (4) was sampled 18k times, and equation (3) 34 billion times per frame. Hence the need for GPU acceleration.

This yielded a set of 10 parameters (again, c h and c v were bias terms and kept constant) which modeled the data (e.g. grid lines) for each of the two cameras. This process was repeated every 0.1 mm from 0 - 20 mm height (z) from the target grid, resulting in a sampled function for each of the parameters, e.g. a h(z) . This required 13 trillion evaluations of equation (3).

Now, the task was to use this model to generate the forward and reverse transform from image to world coordinates; I approached this by generating a data set of the grid intersections in both image and world coordinates. To start this process, the known image origin u origin z=0,v origin z=0 was used to find the corresponding roots of the periodic axillary functions p h,p v :

3π2+2πn h=a huv/1000+b hv 2/1000+(c h+e h)v+d hu+f h (5)

3π2+2πn h=a vu 2/1000+b vuv/1000+(c v+d v)u+e vv+f v (6)

Or ..

n h=round((a huv/1000+b hv 2/1000+(c h+e h)v+d hu+f h3π2)/(2π) (7)

n v=round((a vu 2/1000+b vuv/1000+(c v+d v)u+e vv+f v3π2)/(2π) (8)

From this, we get variables n h,origin z=0andn v,origin z=0 which are the offsets to align the sine functions p h,p v with the physical origin. Now, the reverse (world to image) transform was needed, for which a two-stage newton scheme was used to solve equations (7) and (8) for u,v . Note that this is an equation of phase, not image intensity -- otherwise this direct method would not work!

First, the equations were linearized with three steps of (9-11) to get in the right ballpark:

u 0=640,v 0=360

n h=n h,origin z+[30..30],n v=n v,origin z+[20..20] (9)

B i=[3π2+2πn ha hu iv i/1000b hv i 2f h 3π2+2πn va vu i 2/1000b vu iv if v] (10)

A i=[d h c h+e h c v+d v e v] and

[u i+1 v i+1]=mldivide(A i,B i) (11) where mldivide is the Matlab operator.

Then three steps with the full Jacobian were made to attain accuracy:

J i=[a hv i/1000+d h a hu i/1000+2b hv i/1000+c h+e h 2a vu i/1000+b vv i/1000+c v+d v b vu i/1000+e v] (12)

K i=[a hu iv i/1000+b hv i 2/1000+(c h+e h)v i+d hu i+f h3π22πn h a vu i 2/1000+b vu iv i/1000+(c v+d v)u i+e vv+f v3π22πn v] (13)

[u i+1 v i+1]=[u i v i]J i 1K i (14)

Solutions (u,v) were verified by plugging back into equations (7) and (8) & verifying n h,n v were the same. Inconsistent solutions were discarded; solutions outside the image space [0,1280),[0,720) were also discarded. The process (10) - (14) was repeated to tile the image space with gird intersections, as indicated in (9), and this was repeated for all z in (0..0.1..20) , resulting in a large (74k points) dataset of (u,v,n h,n v,z) , which was converted to full real-world coordinates based on the measured spacing of the grid lines, (u,v,x,y,z) . Between individual z steps, n h,originn v,origin was re-estimated to minimize (for a current z ):

(u origin z+0.1u origin z+0.1) 2+(v origin z+0.1+v origin z) 2 (15)

with grid-search, and the method of equations (9-14). This was required as the stochastic method used to find original image model parameters was agnostic to phase, and so phase (via parameter f ) could jump between individual z measurements (the origin did not move much between successive measurements, hence (15) fixed the jumps.)

To this dataset, a model was fit:

[u v]=A[1 x y z x 2 y 2 z 2 w 2 xy xz yz xw yw zw] (16)

Where x=x10 , y=y10 , z=z10 , and w=2020z . w was introduced as an axillary variable to assist in perspective mapping, ala computer graphics. Likewise, x,y,z were scaled so the quadratic nonlinearity better matched the data.

The model (16) was fit using regular linear regression over all rows of the validated dataset. This resulted in a second set of coefficients A for a model of world coordinates to image coordinates; again, the model was inverted using Newton's method (Jacobian omitted here!). These coefficients, one set per camera, were then integrated into the C++ program for displaying video, and the inverse mapping (using closed-form matrix inversion) was used to convert mouse clicks to real-world coordinates for robot motor control. Even with the relatively poor wide-FOV cameras employed, the method is accurate to ±50μm , and precise to ±120μm .

hide / edit[8] / print
ref: Akin-1995.06 tags: Najafi neural recording technology micromachined digital TETS 1995 PNS schematics date: 01-01-2012 20:23 gmt revision:8 [7] [6] [5] [4] [3] [2] [head]

IEEE-717081 (pdf) An Implantable Multichannel Digital neural recording system for a micromachined sieve electrode

  • Later pub: IEEE-654942 (pdf) -- apparently putting on-chip isolated diodes is a difficult task.
  • 90mw of power @ 5V, 4x4mm of area (!!)
  • targeted for regenerated peripheral neurons grown through a micromachined silicon sieve electrode.
    • PNS nerves are deliberately severed and allowed to regrow through the sieve.
  • 8bit low-power current-mode ADC. seems like a clever design to me - though I can't really follow the operation from the description written there.
  • class e transmitter amplifier.
  • 3um BiCMOS process. (you get vertical BJTs and Zener diodes)
  • has excellent schematics. - including the voltage regulator, envelop detector & ADC.
  • most of the power is dissipated in the voltage regulator (!!) - 80mW of 90mW.
  • tiny!
  • rather than using pseudoresistors, they use diode-capacitor input filter which avoids the need for chopping or off-chip hybrid components.
  • can record from any two of 32 input channels. I think the multiplexer is after the preamp - right?


Akin, T. and Najafi, K. and Bradley, R.M. Solid-State Sensors and Actuators, 1995 and Eurosensors IX.. Transducers '95. The 8th International Conference on 1 51 -54 (1995)

hide / edit[3] / print
ref: bookmark-0 tags: machine_learning research_blog parallel_computing bayes active_learning information_theory reinforcement_learning date: 12-31-2011 19:30 gmt revision:3 [2] [1] [0] [head]

hunch.net interesting posts:

  • debugging your brain - how to discover what you don't understand. a very intelligent viewpoint, worth rereading + the comments. look at the data, stupid
    • quote: how to represent the problem is perhaps even more important in research since human brains are not as adept as computers at shifting and using representations. Significant initial thought on how to represent a research problem is helpful. And when it’s not going well, changing representations can make a problem radically simpler.
  • automated labeling - great way to use a human 'oracle' to bootstrap us into good performance, esp. if the predictor can output a certainty value and hence ask the oracle all the 'tricky questions'.
  • The design of an optimal research environment
    • Quote: Machine learning is a victim of it’s common success. It’s hard to develop a learning algorithm which is substantially better than others. This means that anyone wanting to implement spam filtering can do so. Patents are useless here—you can’t patent an entire field (and even if you could it wouldn’t work).
  • More recently: http://hunch.net/?p=2016
    • Problem is that online course only imperfectly emulate the social environment of a college, which IMHO are useflu for cultivating diligence.
  • The unrealized potential of the research lab Quote: Muthu Muthukrishnan says “it’s the incentives”. In particular, people who invent something within a research lab have little personal incentive in seeing it’s potential realized so they fail to pursue it as vigorously as they might in a startup setting.
    • The motivation (money!) is just not there.

hide / edit[2] / print
ref: Maass-2002.11 tags: Maass liquid state machine expansion LSM Markram computation cognition date: 12-06-2011 07:17 gmt revision:2 [1] [0] [head]

PMID-12433288[0] Real-time computing without stable states: a new framework for neural computation based on perturbations.

  • It is shown that the inherent transient dynamics of the high-dimensional dynamical system formed by a sufficiently large and heterogeneous neural circuit may serve as universal analog fading memory. Readout neurons can learn to extract in real time from the current state of such recurrent neural circuit information about current and past inputs that may be needed for diverse tasks.
    • Stable states, e.g. Turing machines and attractor-based networks are not requried!
    • How does this compare to Shenoy's result that neuronal dynamics converge to a 'stable' point just before movement?


[0] Maass W, Natschläger T, Markram H, Real-time computing without stable states: a new framework for neural computation based on perturbations.Neural Comput 14:11, 2531-60 (2002 Nov)

hide / edit[1] / print
ref: notes-0 tags: data effectiveness Norvig google statistics machine learning date: 12-06-2011 07:15 gmt revision:1 [0] [head]

The unreasonable effectiveness of data.

  • counterpoint to Eugene Wigner's "The Unreasonable effectiveness of mathematics in the natural sciences"
    • that is, math is not effective with people.
    • we should not look for elegant theories, rather embrace complexity and make use of extensive data. (google's mantra!!)
  • in 2006 google released a trillion-word corpus with all words up to 5 words long.
  • document translation and voice transcription are successful mostly because people need the services - there is demand.
    • Traditional natural language processing does not have such demand as of yet. Furthermore, it has required human-annotated data, which is expensive to produce.
  • simple models and a lot of data triumph more elaborate models based on less data.
    • for translation and any other application of ML to web data, n-gram models or linear classifiers work better than elaborate models that try to discover general rules.
  • much web data consists of individually rare but collectively frequent events.
  • because of a huge shared cognitive and cultural context, linguistic expression can be highly ambiguous and still often be understood correctly.
  • mention project halo - $10,000 per page of a chemistry textbook. (funded by DARPA)
  • ultimately suggest that there is so so much to explore now - just use unlabeled data with an unsupervised learning algorithm.

hide / edit[2] / print
ref: -0 tags: machine learning CMU slides tutorial date: 01-17-2011 05:05 gmt revision:2 [1] [0] [head]

http://www.autonlab.org/tutorials/ -- excellent

http://energyfirefox.blogspot.com/2010/12/data-mining-with-ubuntu.html -- apt-get!


hide / edit[3] / print
ref: -0 tags: artificial intelligence machine learning education john toobey leda cosmides date: 12-13-2010 03:43 gmt revision:3 [2] [1] [0] [head]

Notes & responses to evolutionary psychologists John Toobey and Leda Cosmides' - authors of The Adapted Mind - essay in This Will change Everything

  • quote: Currently the most keenly awaited technological development is an all-purpose artificial intelligence-perhaps even an intelligence that would revise itself and grow at an ever-accelerating rate until it enacts millennial transformations. [...] Yet somehow this goal, like the horizon, keeps retreating as fast as it is approached.
  • AI's wrong turn was assuming that the best methods for reasoning and thinking are those that can be applied successfully to any problem domain.
    • But of course it must be possible - we are here, and we did evolve!
    • My opinion: the limit is codifying abstract, assumed, and ambiguous information into program function - e.g. embodying the world.
  • Their idea: intelligences use a number of domain-specific, specialized "hacks", that work for limited tasks; general intelligence appears as a result of the combination of all of these.
    • "Our mental programs can be fiendishly well engineered to solve some problems because they are not limited to using only those strategies that can be applied to all problems."
    • Given the content of the wikipedia page (above), it seems that they have latched onto this particular idea for at least 18 years. Strange how these sorts of things work.
  • Having accurate models of human intelligence would achieve two things:
    • It would enable humans to communicate more effectively with machines via shared knowledge and reasoning.
    • (me:) The AI would be enhanced by the tricks and hacks that evolution took millions of years, billions of individuals, and 10e?? (non-discrete) interactions between individuals and the environment. This constitutes an enormous store of information, to overlook it necessitates (probably, there may be seriuos shortcuts to biological evolution) re-simulating all of the steps that it took to get here. We exist as a cashed output of the evolutionary algorithm; recomputing this particular function is energetically impossible.
  • "The long term ambition [of evolutionary psychology] is to develop a model of human nature as precise as if we had the engineering specifications for the control systems of a robot.
  • "Humanity will continue to be blind slaves to the programs evolution has built into our brains until we drag them into the light. Ordinarily, we inhabit only the versions of reality that they spontaneously construct for us -- the surfaces of things. Because we are unaware that we are in a theater, with our roles and our lines largely written for us by our mental programs, we are credulously swept up in these plays (such as the genocidal drama of us versus them). Endless chain reactions among these programs leave us the victims of history -- embedded in war and oppression, enveloped in mass delusions and cultural epidemics, mired in endless negative-sum conflict \\ If we understood these programs and the coordinated hallucinations they orchestrate in our minds, our species could awaken from the roles these programs assign to us. Yet this cannot happen if knowledge -- like quantum mechanics -- remains forever locked up in the minds of a few specialists, walled off by the years of study required to master it. " Exactly. Well said.
    • The solution, then: much much better education; education that utilizes the best knowledge about transferring knowledge.
    • The authors propose video games; this is already being tested, see {859}

hide / edit[7] / print
ref: work-0 tags: metacognition AI bootstrap machine learning Pitrat self-debugging date: 08-07-2010 04:36 gmt revision:7 [6] [5] [4] [3] [2] [1] [head]

Jacques Pitrat seems to have many of the same ideas that I've had (only better, and he's implemented them!)--

A Step toward and Artificial Scientist

  • The overall structure seems good - difficult problems are attacked by 4 different levels. First level tries to solve the problem semi-directly, by writing a program to solve combinatorial problems (all problems here are constraint based; constraints are used to pare the tree of possible solutions; these trees are tested combinatorially); second level monitors lower level performance and decides which hypotheses to test (which branch to pursue on the tree) and/or which rules to apply to the tree; third level directs the second level and restarts the whole process if a snag or inconsistency is found, forth level gauges the interest of a given problem and looks for new problems to solve within a family so as to improve the skill of the 3 lower levels.
    • This makes sense, but why 4? Seems like in humans we only need 2 - the actor and the critic, bootstrapping forever.
    • Also includes a "Zeus" module that periodically checks for infinite loops of the other programs, and recompiles with trace instructions if an infinite loop is found within a subroutine.
  • Author claims that the system is highly efficient - it codes constraints and expert knowledge using a higher level language/syntax that is then converted to hundreds of thousands of lines of C code. The active search program runs runtime-generated C programs to evaluate and find solutions, wow!
  • This must have taken a decade or more to create! Very impressive. (seems it took 2 decades, at least according to http://tunes.org/wiki/jacques_20pitrat.html)
    • Despite all this work, he is not nearly done - it has not "learning" module.
    • Quote: In this paper, I do not describe some parts of the system which still need to be developed. For instance, the system performs experiments, analyzes them and finds surprising results; from these results, it is possible to learn some improvements, but the learning module, which would be able to find them, is not yet written. In that case, only a part of the system has been implemented: on how to find interesting data, but still not on how to use them.
  • Only seems to deal with symbolic problems - e.g. magic squares, magic cubes, self-referential integer series. Alas, no statistical problems.
  • The whole CAIA system can effectively be used as a tool for finding problems of arbitrary difficulty with arbitrary number of solutions from a set of problem families or meta-families.
  • Has hypothesis based testing and backtracking; does not have problem reformulation or re-projection.
  • There is mention of ALICE, but not the chatbot A.L.I.C.E - some constraint-satisfaction AI program from the 70's.
  • Has a C source version of MALICE (his version of ALICE) available on the website. Amazingly, there is no Makefile - just gcc *.c -rdynamic -ldl -o malice.
  • See also his 1995 Paper: AI Systems Are Dumb Because AI Researchers Are Too Clever images/815_1.pdf

Artificial beings - his book.

hide / edit[5] / print
ref: work-0 tags: machine learning manifold detection subspace segregation linearization spectral clustering date: 10-29-2009 05:16 gmt revision:5 [4] [3] [2] [1] [0] [head]

An interesting field in ML is nonlinear dimensionality reduction - data may appear to be in a high-dimensional space, but mostly lies along a nonlinear lower-dimensional subspace or manifold. (Linear subspaces are easily discovered with PCA or SVD(*)). Dimensionality reduction projects high-dimensional data into a low-dimensional space with minimum information loss -> maximal reconstruction accuracy; nonlinear dim reduction does this (surprise!) using nonlinear mappings. These techniques set out to find the manifold(s):

  • Spectral Clustering
  • Locally Linear Embedding
    • related: The manifold ways of perception
      • Would be interesting to run nonlinear dimensionality reduction algorithms on our data! What sort of space does the motor system inhabit? Would it help with prediction? Am quite sure people have looked at Kohonen maps for this purpose.
    • Random irrelevant thought: I haven't been watching TV lately, but when I do, I find it difficult to recognize otherwise recognizable actors. In real life, I find no difficulty recognizing people, even some whom I don't know personally - is this a data thing (little training data), or mapping thing (not enough time training my TV-not-eyes facial recognition).
  • A Global Geometric Framework for Nonlinear Dimensionality Reduction method:
    • map the points into a graph by connecting each point with a certain number of its neighbors or all neighbors within a certain radius.
    • estimate geodesic distances between all points in the graph by finding the shortest graph connection distance
    • use MDS (multidimensional scaling) to embed the original data into a smaller-dimensional euclidean space while preserving as much of the original geometry.
      • Doesn't look like a terribly fast algorithm!

(*) SVD maps into 'concept space', an interesting interpretation as per Leskovec's lecture presentation.

hide / edit[1] / print
ref: work-0 tags: machine learning reinforcement genetic algorithms date: 10-26-2009 04:49 gmt revision:1 [0] [head]

I just had dinner with Jesse, and the we had a good/productive discussion/brainstorm about algorithms, learning, and neurobio. Two things worth repeating, one simpler than the other:

1. Gradient descent / Newton-Rhapson like techniques should be tried with genetic algorithms. As of my current understanding, genetic algorithms perform an semi-directed search, randomly exploring the space of solutions with natural selection exerting a pressure to improve. What if you took the partial derivative of each of the organism's genes, and used that to direct mutation, rather than random selection of the mutated element? What if you looked before mating and crossover? Seems like this would speed up the algorithm greatly (though it might get it stuck in local minima, too). Not sure if this has been done before - if it has, edit this to indicate where!

2. Most supervised machine learning algorithms seem to rely on one single, externally applied objective function which they then attempt to optimize. (Rather this is what convex programming is. Unsupervised learning of course exists, like PCA, ICA, and other means of learning correlative structure) There are a great many ways to do optimization, but all are exactly that - optimization, search through a space for some set of weights / set of rules / decision tree that maximizes or minimizes an objective function. What Jesse and I have arrived at is that there is no real utility function in the world, (Corollary #1: life is not an optimization problem (**)) -- we generate these utility functions, just as we generate our own behavior. What would happen if an algorithm iteratively estimated, checked, cross-validated its utility function based on the small rewards actually found in the world / its synthetic environment? Would we get generative behavior greater than the complexity of the inputs? (Jesse and I also had an in-depth talk about information generation / destruction in non-linear systems.)

Put another way, perhaps part of learning is to structure internal valuation / utility functions to set up reinforcement learning problems where the reinforcement signal comes according to satisfaction of sub-goals (= local utility functions). Or, the gradient signal comes by evaluating partial derivatives of actions wrt Creating these goals is natural but not always easy, which is why one reason (of very many!) sports are so great - the utility function is clean, external, and immutable. The recursive, introspective creation of valuation / utility functions is what drives a lot of my internal monologues, mixed with a hefty dose of taking partial derivatives (see {780}) based on models of the world. (Stated this way, they seem so similar that perhaps they are the same thing?)

To my limited knowledge, there has been some work as of recent in the creation of sub-goals in reinforcement learning. One paper I read used a system to look for states that had a high ratio of ultimately rewarded paths to unrewarded paths, and selected these as subgoals (e.g. rewarded the agent when this state was reached.) I'm not talking about these sorts of sub-goals. In these systems, there is an ultimate goal that the researcher wants the agent to achieve, and it is the algorithm's (or s') task to make a policy for generating/selecting behavior. Rather, I'm interested in even more unstructured tasks - make a utility function, and a behavioral policy, based on small continuous (possibly irrelevant?) rewards in the environment.

Why would I want to do this? The pet project I have in mind is a 'cognitive' PCB part placement / layout / routing algorithm to add to my pet project, kicadocaml, to finally get some people to use it (the attention economy :-) In the course of thinking about how to do this, I've realized that a substantial problem is simply determining what board layouts are good, and what are not. I have a rough aesthetic idea + some heuristics that I learned from my dad + some heuristics I've learned through practice of what is good layout and what is not - but, how to code these up? And what if these aren't the best rules, anyway? If i just code up the rules I've internalized as utility functions, then the board layout will be pretty much as I do it - boring!

Well, I've stated my sub-goal in the form of a problem statement and some criteria to meet. Now, to go and search for a decent solution to it. (Have to keep this blog m8ta!) (Or, realistically, to go back and see if the problem statement is sensible).

(**) Corollary #2 - There is no god. nod, Dawkins.

hide / edit[2] / print
ref: -0 tags: chess evolution machine learning 2004 partial derivative date: 10-26-2009 04:07 gmt revision:2 [1] [0] [head]

A Self-learning Evolutionary Chess Program

  • The evolved program is able to perform at near master level!
  • Used object networks (neural networks that can be moved about according to the symmetries of the problem space). Paul Werbos apparently invented these, too.
  • Approached the problem by assigning values to having pieces at particular places on the board (PVT, positional value tables). The value of a move was the value of the resulting global valuation (sum of value of pieces - value of opponents pieces) + PVT. They used these valuations to look a set number of moves in the future, using an alpha-beta search.
    • Used 4-plys (search depth) while in normal genetic evolution; 6 when pawns would be upgraded.
  • The neural networks looked at the first 2 rows, the last two rows, and a 4x4 square in the middle of the board - areas known to matter in real games. (The main author is a master-level chess player and chess teacher).
  • The outputs of the three neural networks were added to the material and PVT values to assess a hypothetical board position.
  • Genetic selection operated on the PVT values, neural network weights, piece valuation, and biases of the neural networks. These were initialized semi-randomly; PVT values were initialized based on open-source programs.
  • Performed 50 generations of 20 players each. The top 10 players from each generation survived.
  • Gary Kasparov was consulted in this research. Cool!
  • I wonder what would happen if you allowed the program to propose (genetically or otherwise) alternate algorithmic structures. What they describe is purely a search through weight space - what about a genetic search through algorithmic structure space? Too difficult of a search?
  • I mean, that's what humans (the authors) do while they were designing this program/algorithm. The lead author, as mentioned, is already a very good chess player, and hence he could imbue the initial program with a lot of good 'filters' 'kernels' or 'glasses' for looking at the chess board. And how did he arrive at these ideas? Practice (raw data) and communication (other peoples kernels extracted from more raw data, and validated). And how does he play? By using his experience and knowledge to predict probable moves into the future, evaluating their value, and selecting the best. And how does he evaluate his algorithmic? The same way! By using his knowledge of both chess and computer science to simulate hypothetical designs in his head, seeing how he thinks they will perform, and selecting the best one.
  • The problem with present algorithms is that they have no sense of artistic beauty - no love of symmetry, whether it be simple geometric symmetry (beautiful people have symmetric faces) or more fractal (fractional-dimensioned) symmetry, e.g. music, fractals (duh), human art. I think symmetry can enormously cut down the dimension of the search space in learning, hence is frequently worthy of its own search.
    • Algorithms do presently have a good sense of parsimony, at least, through the AIC / regularization / SVD / bayes net's priors / etc. Parsimony can be beauty, too.
  • Another notable discrepancy is that humans can reason in a concrete way - they actively search for the thing that is causing the problem, the thing that is contributing greatly to either good or bad results. They do this by the scientific method, sorta - hold all other things constant, perturb some section of the system, measure the output. This is the same as taking a partial derivative. Such derivative are used heavily/exclusively in training neural networks - weights are changed based on the partial derivative of that weight wrt the output-referenced error. So reasoning is similar to non-parallel backprop? Or a really slow way of taking partial derivatives? Maybe. The goal of both is to assign valuation/causation to a given weight/subsystem.
  • Human reasoning involves dual valuation pathways - internal, based on a model of the world, and external, which of course involves experimentation and memory (and perhaps scholarly journal papers etc). The mammalian cortex-basal ganglia-thalamus loop seems designed for running these sorts of simulations because it is the dual of the problem of selecting appropriate behaviors. (there! I said it!) In internal simulation, you take world state, apply forward transform with perturbation, then evaluate the result - see if your perturbation (partial derivative) yields information. In motor behavior, you take the body state, apply forward transformation with perturbation (muscle contraction), and evaluate the result. Same thing. Of course you don't have to do this too much, as the cortex will remember the input-perturbation-result.
  • Understanding seems to be related to this input-transform-evaluate cycle, too, except here what is changing is the forward transform, and the output is compared to known output - does a given kernel (concept) predict the output/observed data?
  • Now what would happen if you applied this input-transform-evaluate to itself, e.g. you allowed the system to evaluate itself. Nothing? Recursion? (recursion is a very beautiful concept.) Some degree of awareness?
  • Surely someone has thought of this before, and tried to simulate it on a computer. Wasn't AI research all about this in the 70's-80's? People have said that their big problem was that AI was then entirely/mostly symbolic and insufficiently probabilistic or data-intensive; the 90's-21st century seems to have solved that. This field is unfamiliar to me, it'll take some sussing about before I can grok the academic landscape.
    • Even more surely, someone is doing it right now! This is the way the world advances. Same thing happened to me with GPGPU stuff, which I was doing in 2003. Now everyone is up to that shiznit.
  • It seems that machine-learning is transitioning from informing my personal philosophy, to becoming my philosophy. Good/bad? Feel free to edit this entry!
  • It's getting late and I'm tried -> rant ends.

hide / edit[0] / print
ref: work-0 tags: Ng computational leaning theory machine date: 10-25-2009 19:14 gmt revision:0 [head]

Andrew Ng's notes on learning theory

  • goes over the bias / variance tradeoff.
    • variance = when the model has a large testing error; large generalization error.
    • bias = the expected generalization error even if the model is fit to a very large training set.
  • proves that, with a sufficiently large training set, the training error will be the same as the fitting error.
    • also gives an upper bound on the generalization error in terms of fitting error in terms of the number of models available (discrete number)
    • this bound is only logarithmic in k, the number of hypotheses.
  • the training size m that a certain method or algorithm requires in order to achieve a certain level of performance is the algorithm's sample complexity.
  • shows that with infinite hypothesis space, the number of training examples needed is at most linear in the parameters of the model.
  • goes over the Vapnik-Chervonenkis dimension = the size of the largest set that is shattered by a hypothesis space. = VC(H)
    • A hypothesis space can shatter a set if it can realize any labeling (binary, i think) on the set of points in S. see his diagram.
    • In oder to prove that VC(H) is at least D, only need to show that there's at least one set of size d that H can shatter.
  • There are more notes in the containing directory - http://www.stanford.edu/class/cs229/notes/

hide / edit[2] / print
ref: work-0 tags: Cohen Singer SLIPPER machine learning hypothesis generation date: 10-25-2009 18:42 gmt revision:2 [1] [0] [head]


  • "One disadvantage of boosting is that improvements in accuracy are often obtained at the expense of comprehensibility.
  • SLIPPER = simple learner with iterative pruning to produce error reduction.
  • Inner loop: the weak lerner splits the training data, grows a single rule using one subset of the data, and then prunes the rule using the other subset.
  • They use a confidence-rated prediction based boosting algorithm, which allows the algorithm to abstain from examples not covered by the rule.
    • the sign of h(x) - the weak learner's hyposthesis - is interpreted as the predited label and the magnitude |h(x)| is the confidence in the prediction.
  • SLIPPER only handles two-class problems now, but can be extended..
  • Is better than, though not dramatically so, than c5rules (a commercial version of Quinlan's decision tree algorithms).
  • see also the excellent overview at http://www.cs.princeton.edu/~schapire/uncompress-papers.cgi/msri.ps

hide / edit[1] / print
ref: life-0 tags: IQ intelligence Flynn effect genetics facebook social utopia data machine learning date: 10-02-2009 14:19 gmt revision:1 [0] [head]


My theory on the Flynn effect - human intelligence IS increasing, and this is NOT stopping. Look at it from a ML perspective: there is more free time to get data, the data (and world) has almost unlimited complexity, the data is much higher quality and much easier to get (the vast internet & world!(travel)), there is (hopefully) more fuel to process that data (food!). Therefore, we are getting more complex, sophisticated, and intelligent. Also, the idea that less-intelligent people having more kids will somehow 'dilute' our genetic IQ is bullshit - intelligence is mostly a product of environment and education, and is tailored to the tasks we need to do; it is not (or only very weakly, except at the extremes) tied to the wetware. Besides, things are changing far too fast for genetics to follow.

Regarding this social media, like facebook and others, you could posit that social intelligence is increasing, along similar arguments to above: social data is seemingly more prevalent, more available, and people spend more time examining it. Yet this feels to be a weaker argument, as people have always been socializing, talking, etc., and I'm not sure if any of these social media have really increased it. Irregardless, people enjoy it - that's the important part.

My utopia for today :-)

hide / edit[0] / print
ref: -0 tags: alopex machine learning artificial neural networks date: 03-09-2009 22:12 gmt revision:0 [head]

Alopex: A Correlation-Based Learning Algorithm for Feed-Forward and Recurrent Neural Networks (1994)

  • read the abstract! rather than using the gradient error estimate as in backpropagation, it uses the correlation between changes in network weights and changes in the error + gaussian noise.
    • backpropagation requires calculation of the derivatives of the transfer function from one neuron to the output. This is very non-local information.
    • one alternative is somewhat empirical: compute the derivatives wrt the weights through perturbations.
    • all these algorithms are solutions to the optimization problem: minimize an error measure, E, wrt the network weights.
  • all network weights are updated synchronously.
  • can be used to train both feedforward and recurrent networks.
  • algorithm apparently has a long history, especially in visual research.
  • the algorithm is quite simple! easy to understand.
    • use stochastic weight changes with a annealing schedule.
  • this is pre-pub: tables and figures at the end.
  • looks like it has comparable or faster convergence then backpropagation.
  • not sure how it will scale to problems with hundreds of neurons; though, they looked at an encoding task with 32 outputs.

hide / edit[2] / print
ref: -0 tags: differential dynamic programming machine learning date: 09-24-2008 23:39 gmt revision:2 [1] [0] [head]

excellent bibliography.

  • Jacobson, D. and Mayne, D., Differential Dynamic Programming, Elsevier, New York, 1970. in Perkins library.
  • Bertsekas, Dimitri P. Dynamic programming and optimal control Ford Library.
  • Receding horizon differential dynamic programming
    • good for high-dimensional problems. for this paper, they demonstrate control of a swimming robot.
    • webpage, including a animated gif of the swimmer
    • above is a quote from the conclusions -- very interesting!

hide / edit[0] / print
ref: bookmark-0 tags: book information_theory machine_learning bayes probability neural_networks mackay date: 0-0-2007 0:0 revision:0 [head]

http://www.inference.phy.cam.ac.uk/mackay/itila/book.html -- free! (but i liked the book, so I bought it :)

hide / edit[0] / print
ref: bookmark-0 tags: machine_learning todorov motor_control date: 0-0-2007 0:0 revision:0 [head]

Iterative Linear Quadratic regulator design for nonlinear biological movement systems

  • paper for an international conference on informatics in control/automation/robotics

hide / edit[0] / print
ref: bookmark-0 tags: Unscented sigma_pint kalman filter speech processing machine_learning SDRE control UKF date: 0-0-2007 0:0 revision:0 [head]

hide / edit[0] / print
ref: bookmark-0 tags: machine_learning algorithm meta_algorithm date: 0-0-2006 0:0 revision:0 [head]

Boost learning or AdaBoost - the idea is to update the discrete distribution used in training any algorithm to emphasize those points that are misclassified in the previous fit of a classifier. sensitive to outliers, but not overfitting.

hide / edit[0] / print
ref: bookmark-0 tags: neural_networks machine_learning matlab toolbox supervised_learning PCA perceptron SOM EM date: 0-0-2006 0:0 revision:0 [head]

http://www.ncrg.aston.ac.uk/netlab/index.php n.b. kinda old. (or does that just mean well established?)

hide / edit[0] / print
ref: bookmark-0 tags: machine_learning date: 0-0-2006 0:0 revision:0 [head]


A related machine learning classifier, the relevance vector machine (RVM), has recently been introduced, which, unlike SVM, incorporates probabalistic output (probability of membership) through Bayesian inference. Its decision function depends on fewer input variables that SVM, possibly allowing better classification for small data sets with high dimensionality.

  • input data here is a number of glaucoma-correlated parameters.
  • " SVM is a machine classification method that directly minimizes the classification error without requiring a statistical data model. SVM uses a kernel function to find a hyperplane that maximizes the distance (margin) between two classes (or more?). The resultant model is spares, depending only on a few training samples (support vectors).
  • The RVM has the same functional form as the SVM within a Bayesian framework. This classifier is a sparse Bayesian model that provides probabalistic predictions (e.g. probability of glaucoma based on the training samples) through bayesian inference.
    • RVM outputs probabilities of membership rather than point estimates like SVM

hide / edit[0] / print
ref: bookmark-0 tags: wire bond machine date: 0-0-2006 0:0 revision:0 [head]

wire bonding whitepaper

hide / edit[0] / print
ref: bookmark-0 tags: smith predictor motor control wolpert cerebellum machine_learning prediction date: 0-0-2006 0:0 revision:0 [head]


  • quote in reference to models in which the cerebellum works as a smith predictor, e.g. feedforward prediction of the behavior of the limbs, eyes, trunk: Motor performance based on the use of such internal models would be degraded if the model was inavailable or inaccurate. These theories could therefore account for dysmetria, tremor, and dyssynergia, and perhaps also for increased reaction times.
  • note the difference between inverse model (transforms end target to a motor plan) and inverse models 9is used on-line in a tight feedback loop).
  • The difficulty becomes one of detecting mismatches between a rapid prediction of the outcome of a movement and the real feedback that arrives later in time (duh! :)
  • good set of notes on simple simulated smith predictor performance.

hide / edit[0] / print
ref: bookmark-0 tags: machine_learning classification entropy information date: 0-0-2006 0:0 revision:0 [head]

http://iridia.ulb.ac.be/~lazy/ -- Lazy Learning.