|
void | set_join_files (bool v) noexcept |
| Join the resulting files into a single file. More...
|
|
void | set_number_threads (std::size_t n_threads) noexcept |
| Set the number of threads.
|
|
std::size_t | get_num_errors () const noexcept |
| Returns the number of errors that arised during processing.
|
|
const treebank_error & | get_error_type (std::size_t i) const noexcept |
| Get the ith error. More...
|
|
const std::string & | get_error_treebank_filename (std::size_t i) const noexcept |
| Get the treebank's file name where the ith error happened. More...
|
|
const std::string & | get_error_treebank_name (std::size_t i) const noexcept |
| Get the treebank's name for where the ith error happened. More...
|
|
void | set_join_to_file_name (const std::string &join_to) noexcept |
| Sets the name of the file where all values are going to be stored. More...
|
|
void | set_treebank_column_name (const std::string &name) noexcept |
| Sets the name of the column used to group lines according to the treebank.
|
|
treebank_error | init (const std::string &main_file, const std::string &output_directory) noexcept |
| Initialise the processor with a new collection. More...
|
|
treebank_error | process () noexcept |
| Process the treebank collection. More...
|
|
void | add_feature (const treebank_feature &fs) noexcept |
| Adds a feature to the processor. More...
|
|
void | remove_feature (const treebank_feature &fs) noexcept |
| Removes a feature from the processor. More...
|
|
void | set_check_before_process (bool v) noexcept |
| Should the treebank file or collection be checked for errors prior to processing?
|
|
void | clear_features () noexcept |
| Clear the features in the processor.
|
|
void | set_separator (char c) noexcept |
| Sets the separator character. More...
|
|
void | set_verbosity (int k) noexcept |
| Sets the level of verbosity of the process methods. More...
|
|
void | set_output_header (bool h) noexcept |
| Output a hedaer for the treebank result file. More...
|
|
void | set_column_name (const treebank_feature &tf, const std::string &name) noexcept |
| Sets a custom name for the column corresponding to a given feature. More...
|
|
bool | has_feature (const treebank_feature &fs) const noexcept |
| Is a given feature to be calculated? More...
|
|
|
std::vector< std::string > | m_all_individual_treebank_ids |
| The list of names of the treebanks.
|
|
std::string | m_join_to_file = "" |
| The name of the file that joins all result files.
|
|
bool | m_join_files = true |
| Join the files into a single file.
|
|
std::string | m_treebank_column_name = "treebank" |
| Name of the column that identifies each treebank.
|
|
std::size_t | m_num_threads = 1 |
| Number of threads to use.
|
|
std::string | m_column_join_name = "" |
| The name of the column in the join file.
|
|
std::vector< std::tuple< treebank_error, std::string, std::string > > | m_errors_from_processing |
| Set of errors resulting from processing the treebank collection.
|
|
std::string | m_out_dir = "none" |
| Output directory.
|
|
std::string | m_main_file = "none" |
| File containing the list of languages and their treebanks.
|
|
Automatic processing of treebank collections.
This class, the objects of which will be referred to as the "processors", has the goal to ease the processing a whole treebank collection and produce data for a fixed set of features available in the library. See the enumeration lal::io::treebank_feature for details on the features available, and see Treebank Collection and Treebank for further details on treebank collections and treebanks.
Every processor must be initialised prior to processing the collection. This is done via method init, which requires the path to the main file and the output directory where the results are going to be stored. It also requires a Boolean value indicating whether all (or none) of the features should be used. Moreover, it also admits an optional parameter indicating the number of threads to be used to parallelise the processing of the files.
When initialised, a processor can be removed or added features: when the number of features to calculate is low, it can be initialised with no features, and then be added some via method add_feature. Conversely, if the number of features is high, but not all features are needed, a processer can be initialised with all features, and then be removed some of them via method remove_feature.
Processing a treebank collection with this class will produce a file for every treebank in the collection. These files can be merged together by indicating so via method set_join_files. A new file will be created, regardless of the number of treebanks in the collection. See method set_join_to_file_name to set the name of the file that merges the result. Said file will contain an extra column (besides the columns corresponding to the features computed) indicating the treebank each line pertains. The name of this column can be changed using method set_treebank_column_name.
Finally, the treebank collection is processed via method process. Said method returns a value of the enumeration treebank_error. Further errors can be checked via methods get_num_errors, get_error_type, get_error_treebank_filename, get_error_treebank_name.
The usage of this class is a lot simpler than that of class treebank_collection_reader. For example:
tbproc.
init(main_file, output_dir, 4);
void add_feature(const treebank_feature &fs) noexcept
Adds a feature to the processor.
Definition: process_treebank_base.hpp:69
Automatic processing of treebank collections.
Definition: treebank_collection_processor.hpp:110
treebank_error process() noexcept
Process the treebank collection.
treebank_error init(const std::string &main_file, const std::string &output_directory) noexcept
Initialise the processor with a new collection.
@ num_crossings
Number of edge crossings .
@ var_num_crossings
Variance of , .