New options.

This commit is contained in:
Giuseppe Attardi 2015-04-12 11:05:52 +02:00
parent 8ba0aba2f8
commit b36dcfc56a

View File

@ -12,8 +12,8 @@ The script is invoked with a Wikipedia dump file as an argument.
The output is stored in a number of files of similar size in a chosen directory.
Each file will contains several documents in this [document format](http://medialab.di.unipi.it/wiki/Document_Format).
This is a beta version that performs template expansion by preprocesssng the whole dump and
extracting template definitions.
This is a beta version that performs template expansion by preprocesssng the
whole dump and extracting template definitions.
Usage:
WikiExtractor.py [options] xml-dump-file
@ -30,11 +30,15 @@ extracting template definitions.
-ns ns1,ns2, --namespaces ns1,ns2
accepted namespaces
-q, --quiet suppress reporting progress info
--debug print debug info
-s, --sections preserve sections
-a, --article analyze a file containing a single article
--templates TEMPLATES
use or create file containing templates
--no-templates Do not expand templates
--threads THREADS Number of threads to use (default 8)
-v, --version print program version
Saving templates to a file will speed up performing extraction the next time, assuming template definitions have not changed.
Saving templates to a file will speed up performing extraction the next time,
assuming template definitions have not changed.