New options.
This commit is contained in:
parent
8ba0aba2f8
commit
b36dcfc56a
10
README.md
10
README.md
@ -12,8 +12,8 @@ The script is invoked with a Wikipedia dump file as an argument.
|
||||
The output is stored in a number of files of similar size in a chosen directory.
|
||||
Each file will contains several documents in this [document format](http://medialab.di.unipi.it/wiki/Document_Format).
|
||||
|
||||
This is a beta version that performs template expansion by preprocesssng the whole dump and
|
||||
extracting template definitions.
|
||||
This is a beta version that performs template expansion by preprocesssng the
|
||||
whole dump and extracting template definitions.
|
||||
|
||||
Usage:
|
||||
WikiExtractor.py [options] xml-dump-file
|
||||
@ -30,11 +30,15 @@ extracting template definitions.
|
||||
-ns ns1,ns2, --namespaces ns1,ns2
|
||||
accepted namespaces
|
||||
-q, --quiet suppress reporting progress info
|
||||
--debug print debug info
|
||||
-s, --sections preserve sections
|
||||
-a, --article analyze a file containing a single article
|
||||
--templates TEMPLATES
|
||||
use or create file containing templates
|
||||
--no-templates Do not expand templates
|
||||
--threads THREADS Number of threads to use (default 8)
|
||||
-v, --version print program version
|
||||
|
||||
Saving templates to a file will speed up performing extraction the next time, assuming template definitions have not changed.
|
||||
Saving templates to a file will speed up performing extraction the next time,
|
||||
assuming template definitions have not changed.
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user