New options.

2015-04-12 11:05:52 +02:00 · 2015-04-12 11:05:52 +02:00 · b36dcfc56a
commit b36dcfc56a
parent 8ba0aba2f8
1 changed files with 7 additions and 3 deletions
--- a/README.md
+++ b/README.md
@ -12,8 +12,8 @@ The script is invoked with a Wikipedia dump file as an argument.
 The output is stored in a number of files of similar size in a chosen directory.
 Each file will contains several documents in this [document format](http://medialab.di.unipi.it/wiki/Document_Format).

-This is a beta version that performs template expansion by preprocesssng the whole dump and
-extracting template definitions.
+This is a beta version that performs template expansion by preprocesssng the
+whole dump and extracting template definitions.

    Usage:
     WikiExtractor.py [options] xml-dump-file
@ -30,11 +30,15 @@ extracting template definitions.
      -ns ns1,ns2, --namespaces ns1,ns2
                            accepted namespaces
      -q, --quiet           suppress reporting progress info
+      --debug               print debug info
      -s, --sections        preserve sections
      -a, --article         analyze a file containing a single article
      --templates TEMPLATES
                            use or create file containing templates
+      --no-templates        Do not expand templates
+      --threads THREADS     Number of threads to use (default 8)
      -v, --version         print program version

-Saving templates to a file will speed up performing extraction the next time, assuming template definitions have not changed.
+Saving templates to a file will speed up performing extraction the next time,
+assuming template definitions have not changed.