Thinking about a paper "A Refinement Approach to Handling Model Misfit in Text Categorization"
2006-11-30 19:01
447 查看
in this paper, i think the most interesting thing is why there is no overfitting? overfitting is inevitable when training on training examples too much. because decision line(surface) fits specified property owned exclusively by these training examples, generality has been lost. in this paper, the authors use a refinement on classifier(Simon Haykin's book <<Neural Networks: A Comprehensive Foundation>> sec 7.5 shows a similar example: boosting by filtering. but difference should be noticed: this paper's method don't vote, it is not a boosting) to improving classifier performance. in the view of decision line, whatever refinement it is, the effect must be adjusted decision line to fitting training examples better. surprising experiments results shows there is no overfitting, authors just show this fact without explanation.
how could this be? i have noticed that this method generates decision line is different with traditonal line: you can imagine this "line": root classifier correspond to main line, its children classifers correspond to two smaller lines near the main line and so on... it is similar to fractal!! i guess these decision lines correctly express the STRUCTURE of target(real) decision, because i believe fractal is the nature of nature. but the problem still remained, ok, structure, so what? i guess:
1 structure is more general than line, however this structure is obtained by training examples but it still has more generality than line. so overfitting is alleviated.
2 to see gobal from local, this is property of fractal, similarity. training examples decision structure is similar to all examples. however regions contain arbitrary +,-.
3 this method classifier don't vote, the classification is only determined by leaves nodes, intermidate nodes only do the dispatch job: translate input to a leaves node. this behavior contradict intuition and hard to explaining. leaves nodes should be specified version of intermidate nodes, but it replaces its parents judgements without too much error.
may be each input is specified input, so correct classification in intermidate nodes can't be viewed as a "perfect classification", it just a "estimate on imperfection".
further work is how to take advantage of this character of "decision line" if i guess it right.
how could this be? i have noticed that this method generates decision line is different with traditonal line: you can imagine this "line": root classifier correspond to main line, its children classifers correspond to two smaller lines near the main line and so on... it is similar to fractal!! i guess these decision lines correctly express the STRUCTURE of target(real) decision, because i believe fractal is the nature of nature. but the problem still remained, ok, structure, so what? i guess:
1 structure is more general than line, however this structure is obtained by training examples but it still has more generality than line. so overfitting is alleviated.
2 to see gobal from local, this is property of fractal, similarity. training examples decision structure is similar to all examples. however regions contain arbitrary +,-.
3 this method classifier don't vote, the classification is only determined by leaves nodes, intermidate nodes only do the dispatch job: translate input to a leaves node. this behavior contradict intuition and hard to explaining. leaves nodes should be specified version of intermidate nodes, but it replaces its parents judgements without too much error.
may be each input is specified input, so correct classification in intermidate nodes can't be viewed as a "perfect classification", it just a "estimate on imperfection".
further work is how to take advantage of this character of "decision line" if i guess it right.
相关文章推荐
- 处理教材:Introduction to Objects of "Thinking in Java"
- 处理教材:Introduction to Objects of &quot;Thinking in Java&quot;
- Exception in thread "main" org.javalite.activejdbc.InitException: failed to determine Model class na
- How to create columns like "bigint" or "longtext" in Rails migrations
- Get "Server Down" when retrieve the channel list in sopcast, how to solve it?
- "Thinking in Code" Audio Interviews now Available铪
- PRB: "Requested Registry Access Is Not Allowed" Error Message When ASP.NET Application Tries to Write New EventSource in the Eve
- 解决Unable to locate theme engine in module_path: "pixmap"
- "Thinking in Code" Audio Interviews now Available铪
- More about “PartitionKey”&"RowKey” in windows azure table storage
- Issue of weblogic [Servlet: "action" failed to preload on startup in Web application]
- About Derby "ERROR 42X51: The class '***' does not exist or is inaccessible. This can hap pen if the class is not public."
- Rethinking "A refinement..."
- Don't know how to iterate over supplied "items" in <forEach>
- Video Google: A Text Retrieval Approach to Object Matching in Videos
- How to avoid the "enumeral mismatch" error in eikmenup.h
- ERROR: "This virtual machine appears to be in use"
- How to fix "Resource file <C_PsdRsdll> not found" issue in Windows OS
- Unable to find a value for "Sex" in object of class ...错误
- Howto resolve the "df -h" error in android