m8ta
You are not authenticated, login.
text: sort by
tags: modified
type: chronology
{868} is owned by tlh24.
{723}
hide / / print
ref: notes-0 tags: data effectiveness Norvig google statistics machine learning date: 12-06-2011 07:15 gmt revision:1 [0] [head]

The unreasonable effectiveness of data.

  • counterpoint to Eugene Wigner's "The Unreasonable effectiveness of mathematics in the natural sciences"
    • that is, math is not effective with people.
    • we should not look for elegant theories, rather embrace complexity and make use of extensive data. (google's mantra!!)
  • in 2006 google released a trillion-word corpus with all words up to 5 words long.
  • document translation and voice transcription are successful mostly because people need the services - there is demand.
    • Traditional natural language processing does not have such demand as of yet. Furthermore, it has required human-annotated data, which is expensive to produce.
  • simple models and a lot of data triumph more elaborate models based on less data.
    • for translation and any other application of ML to web data, n-gram models or linear classifiers work better than elaborate models that try to discover general rules.
  • much web data consists of individually rare but collectively frequent events.
  • because of a huge shared cognitive and cultural context, linguistic expression can be highly ambiguous and still often be understood correctly.
  • mention project halo - $10,000 per page of a chemistry textbook. (funded by DARPA)
  • ultimately suggest that there is so so much to explore now - just use unlabeled data with an unsupervised learning algorithm.

{722}
hide / / print
ref: notes-0 tags: programming excellence norvig 10 years date: 04-07-2009 20:26 gmt revision:0 [head]

Teach yourself programming in 10 years

  • points out that, in order to be excellent at any difficult skill/art, you must practice 10 years or 10,000 hours, and this practice must be focused and deliberate.
    • quote: "have shown it takes about ten years to develop expertise in any of a wide variety of areas, including chess playing, music composition, telegraph operation, painting, piano playing, swimming, tennis, and research in neuropsychology and topology"
    • possibly this is partially due to competition - most other people drop out after 10 years!
    • Or this is due to the fact that, for general purpose behaviors, we are really no better than the present gradient descent & reinforcement learning algorithms which require repeated presentation of patterns and behaviors. Where humans achieve sub-gradient/RL performance is where evolution has supplied us with hardware or 'prior assumptions' to bias for a correct solution / correct solution space. These prior assumptions are (part of) that which the make study brain interesting!
  • "Life is short, [the] craft long, opportunity fleeting, experiment treacherous, judgment difficult." -- Hippocrates.