|
template<class TREE , class OUT_STREAM > |
void | process_tree (const TREE &rT, double *props, char *prop_set, OUT_STREAM &out_lab_file) noexcept |
| Process a single tree in a treebank.
|
|
template<class OUT_STREAM > |
void | output_tree_type_header (OUT_STREAM &out_lab_file) const noexcept |
| Output the header for the tree types.
|
|
template<class OUT_STREAM > |
void | output_syndepstruct_type_header (OUT_STREAM &out_lab_file) const noexcept |
| Output the header for the tree types.
|
|
template<class TREE_TYPE , class OUT_STREAM > |
void | output_tree_type_values (TREE_TYPE &t, OUT_STREAM &out_lab_file) const noexcept |
| Output the values for the tree types.
|
|
template<class TREE_TYPE , class OUT_STREAM > |
void | output_syndepstruct_type_values (const TREE_TYPE &t, uint64_t C, OUT_STREAM &out_lab_file) const noexcept |
| Output the values for the syntactic dependency tree types.
|
|
Automatic processing of treebank files.
This class, the objects of which will be referred to as the "processors", has the goal to ease the processing a whole treebank collection and produce data for a fixed set of features available in the library. See the enumeration lal::io::treebank_feature for details on the features available.
This class is meant to process a single treebank file only (see Treebank for further details on treebank files).
Every processor must be initialised prior to processing the treebank file. This is done via method init, which requires the path to the treebank file and the name of the output file (the treebank result) where the results are going to be stored. It also requires a Boolean value indicating whether all (or none) of the features should be used. Processing a treebank file with this class will produce a single file, with as many columns as features added to the processor. The columns are separated with a separating character (see method set_separator); this file will contain a header only if method set_output_header has been called with true). Progress and error messages can be controlled via method set_verbosity.
When initialised, features can be added to or removed from the processor: when the number of features to calculate is low, it can be initialised with no features, and then be added some via method add_feature. Conversely, if the number of features is high, but not all features are needed, a processer can be initialised with all features, and then be removed some of them via method remove_feature.
Finally, the treebank file is processed via method process. This method returns a value of the enumeration treebank_error.
The usage of this class is a lot simpler than that of class treebank_collection_reader. For example:
tbproc.
init(treebank_input_file, result_filename,
"Book_1");
void add_feature(const treebank_feature &fs) noexcept
Adds a feature to the processor.
Definition: process_treebank_base.hpp:69
Automatic processing of treebank files.
Definition: treebank_processor.hpp:103
treebank_error process() noexcept
Process the treebank file.
treebank_error init(const std::string &treebank_input_file, const std::string &output_file, const std::string &treebank_id="") noexcept
Initialise the processor with a new collection.
@ num_crossings
Number of edge crossings .
@ var_num_crossings
Variance of , .