LAL: Linear Arrangement Library 24.10.00
A library focused on algorithms on linear arrangements of graphs.
Loading...
Searching...
No Matches
lal::io::treebank_collection_processor Class Reference

Automatic processing of treebank collections. More...

#include <treebank_collection_processor.hpp>

Inheritance diagram for lal::io::treebank_collection_processor:
lal::io::_treebank_processor_base

Public Member Functions

void set_join_files (const bool v) noexcept
 Join the resulting files into a single file.
 
void set_number_threads (const std::size_t n_threads) noexcept
 Set the number of threads.
 
std::size_t get_num_errors () const noexcept
 Returns the number of errors that arised during processing.
 
const treebank_file_errorget_error_type (const std::size_t i) const noexcept
 Get the ith error.
 
const std::string & get_error_treebank_filename (const std::size_t i) const noexcept
 Get the treebank's file name where the ith error happened.
 
const std::string & get_error_treebank_name (const std::size_t i) const noexcept
 Get the treebank's name for where the ith error happened.
 
void set_join_to_file_name (const std::string &join_to) noexcept
 Sets the name of the file where all values are going to be stored.
 
void set_treebank_column_name (const std::string &name) noexcept
 Sets the name of the column used to group lines according to the treebank.
 
treebank_file_error init (const std::string &main_file, const std::string &output_directory) noexcept
 Initialize the processor with a new collection.
 
treebank_file_error process () noexcept
 Process the treebank collection.
 
void add_feature (const treebank_feature_type &fs) noexcept
 Adds a feature to the processor.
 
void remove_feature (const treebank_feature_type &fs) noexcept
 Removes a feature from the processor.
 
void set_check_before_process (const bool v) noexcept
 Should the treebank file or collection be checked for errors prior to processing?
 
void clear_features () noexcept
 Clear the features in the processor.
 
void set_separator (const char c) noexcept
 Sets the separator character.
 
void set_verbosity (const int k) noexcept
 Sets the level of verbosity of the process methods.
 
void set_output_header (const bool h) noexcept
 Output a hedaer for the treebank result file.
 
void set_column_name (const treebank_feature_type &tf, const std::string &name) noexcept
 Sets a custom name for the column corresponding to a given feature.
 
bool has_feature (const treebank_feature_type &fs) const noexcept
 Is a given feature to be calculated?
 

Protected Member Functions

void initialize_column_names () noexcept
 Initializes column names m_column_names.
 

Protected Attributes

std::array< std::string, __treebank_feature_sizem_column_names
 String for each column.
 
std::array< bool, __treebank_feature_sizem_what_fs
 The list of features to be computed.
 
bool m_check_before_process = true
 Process the treebank file or collection prior to processing.
 
char m_separator = '\t'
 Character used as separator.
 
bool m_output_header = true
 Output a header for each file.
 
int m_be_verbose = 0
 The verbosity of the processor.
 

Private Member Functions

treebank_file_error join_all_files () const noexcept
 Joins all resulting files into a single file.
 

Private Attributes

std::vector< std::string > m_all_individual_treebank_ids
 The list of names of the treebanks.
 
std::string m_join_to_file = ""
 The name of the file that joins all result files.
 
bool m_join_files = true
 Join the files into a single file.
 
std::string m_treebank_column_name = "treebank"
 Name of the column that identifies each treebank.
 
std::size_t m_num_threads = 1
 Number of threads to use.
 
std::string m_column_join_name = ""
 The name of the column in the join file.
 
std::vector< std::tuple< treebank_file_error, std::string, std::string > > m_errors_from_processing
 Set of errors resulting from processing the treebank collection.
 
std::string m_out_dir = "none"
 Output directory.
 
std::string m_main_file = "none"
 File containing the list of languages and their treebanks.
 

Detailed Description

Automatic processing of treebank collections.

This class, the objects of which will be referred to as the "processors", has the goal to ease the processing a whole treebank collection and produce data for a fixed set of features available in the library. See the enumeration lal::io::treebank_feature_type for details on the features available, and see Treebank Collection and Treebank for further details on treebank collections and treebanks.

Every processor must be initialized prior to processing the collection. This is done via method init, which requires the path to the main file and the output directory where the results are going to be stored. It also requires a Boolean value indicating whether all (or none) of the features should be used. Moreover, it also admits an optional parameter indicating the number of threads to be used to parallelise the processing of the files.

When initialized, a processor can be removed or added features: when the number of features to calculate is low, it can be initialized with no features, and then be added some via method add_feature. Conversely, if the number of features is high, but not all features are needed, a processer can be initialized with all features, and then be removed some of them via method remove_feature.

Processing a treebank collection with this class will produce a file for every treebank in the collection. These files can be merged together by indicating so via method set_join_files. A new file will be created, regardless of the number of treebanks in the collection. See method set_join_to_file_name to set the name of the file that merges the result. Said file will contain an extra column (besides the columns corresponding to the features computed) indicating the treebank each line pertains. The name of this column can be changed using method set_treebank_column_name.

Finally, the treebank collection is processed via method process. This method returns an error, if any, via lal::io::treebank_file_error. Further errors can be checked via methods get_num_errors, get_error_type, get_error_treebank_filename, get_error_treebank_name.

The usage of this class is a lot simpler than that of class treebank_collection_reader. For example:

// initialize the processor without features (remember to check for errors)
// and 4 threads for faster processing.
tbproc.init(main_file, output_dir, 4);
tbproc.process();
// it is advisable to check for errors
void add_feature(const treebank_feature_type &fs) noexcept
Adds a feature to the processor.
Definition treebank_processor_base.hpp:69
Automatic processing of treebank collections.
Definition treebank_collection_processor.hpp:109
treebank_file_error process() noexcept
Process the treebank collection.
treebank_file_error init(const std::string &main_file, const std::string &output_directory) noexcept
Initialize the processor with a new collection.
@ num_crossings
Number of edge crossings .
@ var_num_crossings
Variance of , .

Member Function Documentation

◆ add_feature()

void lal::io::_treebank_processor_base::add_feature ( const treebank_feature_type & fs)
inlinenoexceptinherited

Adds a feature to the processor.

Parameters
fsFeature to be added.

◆ get_error_treebank_filename()

const std::string & lal::io::treebank_collection_processor::get_error_treebank_filename ( const std::size_t i) const
inlinenodiscardnoexcept

Get the treebank's file name where the ith error happened.

Parameters
iThe index of the error, an unsigned integer.
Returns
The name of the treebank file where the i-th error happened as a string.
Precondition
Can only be checked when process returns lal::io::treebank_file_error_type::some_treebank_file_failed.

◆ get_error_treebank_name()

const std::string & lal::io::treebank_collection_processor::get_error_treebank_name ( const std::size_t i) const
inlinenodiscardnoexcept

Get the treebank's name for where the ith error happened.

Parameters
iThe index of the error, an unsigned integer.
Returns
The name of the treebank where the i-th error happened as a string.
Precondition
Can only be checked when process returns lal::io::treebank_file_error_type::some_treebank_file_failed.

◆ get_error_type()

const treebank_file_error & lal::io::treebank_collection_processor::get_error_type ( const std::size_t i) const
inlinenodiscardnoexcept

Get the ith error.

Parameters
iThe index of the error, an unsigned integer.
Returns
A value of the enumeration lal::io::treebank_file_error.
Precondition
Can only be checked when process returns lal::io::treebank_file_error_type::some_treebank_file_failed.

◆ get_num_errors()

std::size_t lal::io::treebank_collection_processor::get_num_errors ( ) const
inlinenodiscardnoexcept

Returns the number of errors that arised during processing.

Precondition
Can only be checked when process returns lal::io::treebank_file_error_type::some_treebank_file_failed.

◆ has_feature()

bool lal::io::_treebank_processor_base::has_feature ( const treebank_feature_type & fs) const
inlinenodiscardnoexceptinherited

Is a given feature to be calculated?

Parameters
fsThe feature being queried.
Returns
True or False depending on whether the feature was added or removed.

◆ init()

treebank_file_error lal::io::treebank_collection_processor::init ( const std::string & main_file,
const std::string & output_directory )
nodiscardnoexcept

Initialize the processor with a new collection.

Parameters
main_fileFile listing all the treebanks.
output_directoryDirectory where the result files are to be stored.
Returns
The type of the error, if any. The list of errors that this method can return is:

◆ join_all_files()

treebank_file_error lal::io::treebank_collection_processor::join_all_files ( ) const
nodiscardprivatenoexcept

Joins all resulting files into a single file.

Returns
An error code, if any.

◆ process()

treebank_file_error lal::io::treebank_collection_processor::process ( )
nodiscardnoexcept

Process the treebank collection.

This method produces the information as explained in this class' description. However, it may fail to do so. In this case it will return a value different from lal::io::treebank_file_error_type::no_error.

This function uses attributes m_separator, m_output_header to format the output data. It also outputs the current progress if m_be_verbose is set to true.

Moreover, it gathers the errors thay may have occurred during processing. If so, see methods get_num_errors, get_error_type, get_error_treebank_name.

Returns
The type of the error, if any. The list of errors that this method can return is:

See methods get_num_errors, get_error_treebank_filename, get_error_treebank_name to know how to retrieve these errors.

Precondition
Initialisation did not return any errors.

◆ remove_feature()

void lal::io::_treebank_processor_base::remove_feature ( const treebank_feature_type & fs)
inlinenoexceptinherited

Removes a feature from the processor.

Parameters
fsFeature to be removed.

◆ set_column_name()

void lal::io::_treebank_processor_base::set_column_name ( const treebank_feature_type & tf,
const std::string & name )
inlinenoexceptinherited

Sets a custom name for the column corresponding to a given feature.

This does not work for features

◆ set_join_files()

void lal::io::treebank_collection_processor::set_join_files ( const bool v)
inlinenoexcept

Join the resulting files into a single file.

Parameters
vA Boolean value.

◆ set_join_to_file_name()

void lal::io::treebank_collection_processor::set_join_to_file_name ( const std::string & join_to)
inlinenoexcept

Sets the name of the file where all values are going to be stored.

If this name is not given set, the class will construct one using the name of the main file and appending the string "_full" to it. If, on the contrary, some path is given and is not a full path then the path is considered to be relative to the working directory path.

Parameters
join_toString

◆ set_output_header()

void lal::io::_treebank_processor_base::set_output_header ( const bool h)
inlinenoexceptinherited

Output a hedaer for the treebank result file.

Default is true.

Parameters
hOutput header or not.

◆ set_separator()

void lal::io::_treebank_processor_base::set_separator ( const char c)
inlinenoexceptinherited

Sets the separator character.

The default seprator is a tabulator character '\t'.

Parameters
cThe separator character.

◆ set_verbosity()

void lal::io::_treebank_processor_base::set_verbosity ( const int k)
inlinenoexceptinherited

Sets the level of verbosity of the process methods.

Default is 0 (i.e., no verbosity at all). Verbosity is classified by levels:

  • Level 1: outputs progress messages.
  • Level 2: outputs error messages.
    Parameters
    kVerbosity level.

Member Data Documentation

◆ m_be_verbose

int lal::io::_treebank_processor_base::m_be_verbose = 0
protectedinherited

The verbosity of the processor.

When set to a value greater than or equal to 1, method process will output progress messages.

◆ m_errors_from_processing

std::vector<std::tuple<treebank_file_error, std::string, std::string> > lal::io::treebank_collection_processor::m_errors_from_processing
private

Set of errors resulting from processing the treebank collection.

  • First value: an instance of lal::io::treebank_file_error.
  • Second value: full path to the treebank file.
  • Third value: id of the treebank where this happened (retrieved from the main file).

Errors are pushed to this member only when processing could actually be initiated.


The documentation for this class was generated from the following file: