|
treebank_file_error | init (const std::string &treebank_input_file, const std::string &output_file, const std::string &treebank_id="") noexcept |
| Initialize the processor with a new collection.
|
|
treebank_file_error | process () noexcept |
| Process the treebank file.
|
|
void | add_feature (const treebank_feature_type &fs) noexcept |
| Adds a feature to the processor.
|
|
void | remove_feature (const treebank_feature_type &fs) noexcept |
| Removes a feature from the processor.
|
|
void | set_check_before_process (const bool v) noexcept |
| Should the treebank file or collection be checked for errors prior to processing?
|
|
void | clear_features () noexcept |
| Clear the features in the processor.
|
|
void | set_separator (const char c) noexcept |
| Sets the separator character.
|
|
void | set_verbosity (const int k) noexcept |
| Sets the level of verbosity of the process methods.
|
|
void | set_output_header (const bool h) noexcept |
| Output a hedaer for the treebank result file.
|
|
void | set_column_name (const treebank_feature_type &tf, const std::string &name) noexcept |
| Sets a custom name for the column corresponding to a given feature.
|
|
bool | has_feature (const treebank_feature_type &fs) const noexcept |
| Is a given feature to be calculated?
|
|
|
template<class TREE , class OUT_STREAM > |
void | process_tree (const TREE &rT, double *const props, char *const prop_set, OUT_STREAM &out_lab_file) noexcept |
| Process a single tree in a treebank.
|
|
template<class OUT_STREAM > |
void | output_tree_type_header (OUT_STREAM &out_lab_file) const noexcept |
| Output the header for the tree types.
|
|
template<class OUT_STREAM > |
void | output_syndepstruct_type_header (OUT_STREAM &out_lab_file) const noexcept |
| Output the header for the tree types.
|
|
template<class TREE_TYPE , class OUT_STREAM > |
void | output_tree_type_values (TREE_TYPE &t, OUT_STREAM &out_lab_file) const noexcept |
| Output the values for the tree types.
|
|
template<class TREE_TYPE , class OUT_STREAM > |
void | output_syndepstruct_type_values (const TREE_TYPE &t, const uint64_t C, OUT_STREAM &out_lab_file) const noexcept |
| Output the values for the syntactic dependency tree types.
|
|
Automatic processing of treebank files.
This class, the objects of which will be referred to as the "processors", has the goal to ease the processing a whole treebank collection and produce data for a fixed set of features available in the library. See the enumeration lal::io::treebank_feature_type for details on the features available.
This class is meant to process a single treebank file only (see Treebank for further details on treebank files).
Every processor must be initialized prior to processing the treebank file. This is done via method init, which requires the path to the treebank file and the name of the output file (the treebank result) where the results are going to be stored. It also requires a Boolean value indicating whether all (or none) of the features should be used. Processing a treebank file with this class will produce a single file, with as many columns as features added to the processor. The columns are separated with a separating character (see method set_separator); this file will contain a header only if method set_output_header has been called with true). Progress and error messages can be controlled via method set_verbosity.
When initialized, features can be added to or removed from the processor: when the number of features to calculate is low, it can be initialized with no features, and then be added some via method add_feature. Conversely, if the number of features is high, but not all features are needed, a processer can be initialized with all features, and then be removed some of them via method remove_feature.
Finally, the treebank file is processed via method process. This method returns an error, if any, via lal::io::treebank_file_error.
The usage of this class is a lot simpler than that of class treebank_collection_reader. For example:
tbproc.
init(treebank_input_file, result_filename,
"Book_1");
void add_feature(const treebank_feature_type &fs) noexcept
Adds a feature to the processor.
Definition treebank_processor_base.hpp:69
Automatic processing of treebank files.
Definition treebank_processor.hpp:99
treebank_file_error init(const std::string &treebank_input_file, const std::string &output_file, const std::string &treebank_id="") noexcept
Initialize the processor with a new collection.
treebank_file_error process() noexcept
Process the treebank file.
@ num_crossings
Number of edge crossings .
@ var_num_crossings
Variance of , .