This work is for the 2010 AMICUS Conference in Vienna, October 21. A work in progress, AutoPropp posits an algorithm to automatically assign the semantics of probable Proppian functions to candidate text segments in unannotated formulaic narratives (in this case, Russian magic tales), given training data that is marked up in PFTML.
- AutoPropp: Toward the Automatic Markup, Classification, and Annotation of Russian Magic Tales: Current Draft of Paper
- Slides
- AfanEng.tar.gz : Afanas'ev in Norbert Guterman's English translation
- AfanPFTMLCorpus.xml.tar.gz : Afanas'ev coded in PFTML format
- AfanSources.tar.gz : a subset of Afanas'ev's collection of Russkie narodnye skazki in Russian originals
- Afan_Magnus_Leonard_trans_1916.tar.gz : another subset of Afanas'ev's collection of Russkie narodnye skazki in Leonard A. Magnus's 1916 translation into English
- Weka data coded from Appendix III of Propp's Morphology (1928) in .arff format
- **AutoPropp9.R : AutoPropp adventure
- *HMM.R : experiments with Hidden Markov Models from Propp's Data/Afanas'ev (includes observed data encoded from .arff file above)
- LSA.R : experiments with Latent Semantic Analysis
- LDA.R : experiments with Latent Dirichlet Analysis and Topic Models
- openNLP.R : POS tagging example using the openNLP package in R
- XML.R : XML tree parsing
- Aggressive Russian Stemmer in Java
- Light Russian Stemmer in Java
- R language : R language and programming environment
- Weka Data Mining : open source data mining
- Latent Semantic Analysis (LSA) [wikipedia]
- Annotated Topic Models Bibliography
- Paper and Presentation
- Data Sets
- R code snippets and AutoPropp drafts
- Exernal Resources
- More on LSA, LDA, etc.