LAL: Linear Arrangement Library 23.01.00
A library focused on algorithms on linear arrangements of graphs.
Loading...
Searching...
No Matches
Public Member Functions | Protected Member Functions | Protected Attributes | Private Member Functions | Private Attributes | List of all members
lal::io::treebank_processor Class Reference

Automatic processing of treebank files. More...

#include <treebank_processor.hpp>

Inheritance diagram for lal::io::treebank_processor:
lal::io::_process_treebank_base

Public Member Functions

treebank_error init (const std::string &treebank_input_file, const std::string &output_file, const std::string &treebank_id="") noexcept
 Initialise the processor with a new collection. More...
 
treebank_error process () noexcept
 Process the treebank file. More...
 
void add_feature (const treebank_feature &fs) noexcept
 Adds a feature to the processor. More...
 
void remove_feature (const treebank_feature &fs) noexcept
 Removes a feature from the processor. More...
 
void set_check_before_process (bool v) noexcept
 Should the treebank file or collection be checked for errors prior to processing?
 
void clear_features () noexcept
 Clear the features in the processor.
 
void set_separator (char c) noexcept
 Sets the separator character. More...
 
void set_verbosity (int k) noexcept
 Sets the level of verbosity of the process methods. More...
 
void set_output_header (bool h) noexcept
 Output a hedaer for the treebank result file. More...
 
void set_column_name (const treebank_feature &tf, const std::string &name) noexcept
 Sets a custom name for the column corresponding to a given feature. More...
 
bool has_feature (const treebank_feature &fs) const noexcept
 Is a given feature to be calculated? More...
 

Protected Member Functions

void initialise_column_names () noexcept
 Initialises column names m_column_names.
 

Protected Attributes

std::array< std::string, __treebank_feature_sizem_column_names
 String for each column.
 
std::array< bool, __treebank_feature_sizem_what_fs
 The list of features to be computed.
 
bool m_check_before_process = true
 Process the treebank file or collection prior to processing.
 
char m_separator = '\t'
 Character used as separator.
 
bool m_output_header = true
 Output a header for each file.
 
int m_be_verbose = 0
 The verbosity of the processor. More...
 

Private Member Functions

template<class TREE , class OUT_STREAM >
void process_tree (const TREE &rT, double *props, char *prop_set, OUT_STREAM &out_lab_file) noexcept
 Process a single tree in a treebank.
 
template<class OUT_STREAM >
void output_tree_type_header (OUT_STREAM &out_lab_file) const noexcept
 Output the header for the tree types.
 
template<class OUT_STREAM >
void output_syndepstruct_type_header (OUT_STREAM &out_lab_file) const noexcept
 Output the header for the tree types.
 
template<class TREE_TYPE , class OUT_STREAM >
void output_tree_type_values (TREE_TYPE &t, OUT_STREAM &out_lab_file) const noexcept
 Output the values for the tree types.
 
template<class TREE_TYPE , class OUT_STREAM >
void output_syndepstruct_type_values (const TREE_TYPE &t, uint64_t C, OUT_STREAM &out_lab_file) const noexcept
 Output the values for the syntactic dependency tree types.
 

Private Attributes

std::string m_treebank_filename = "none"
 File containing the list of languages and their treebanks.
 
std::string m_output_file = "none"
 Output directory.
 
std::string m_treebank_id = ""
 Treebank identifier.
 

Detailed Description

Automatic processing of treebank files.

This class, the objects of which will be referred to as the "processors", has the goal to ease the processing a whole treebank collection and produce data for a fixed set of features available in the library. See the enumeration lal::io::treebank_feature for details on the features available.

This class is meant to process a single treebank file only (see Treebank for further details on treebank files).

Every processor must be initialised prior to processing the treebank file. This is done via method init, which requires the path to the treebank file and the name of the output file (the treebank result) where the results are going to be stored. It also requires a Boolean value indicating whether all (or none) of the features should be used. Processing a treebank file with this class will produce a single file, with as many columns as features added to the processor. The columns are separated with a separating character (see method set_separator); this file will contain a header only if method set_output_header has been called with true). Progress and error messages can be controlled via method set_verbosity.

When initialised, features can be added to or removed from the processor: when the number of features to calculate is low, it can be initialised with no features, and then be added some via method add_feature. Conversely, if the number of features is high, but not all features are needed, a processer can be initialised with all features, and then be removed some of them via method remove_feature.

Finally, the treebank file is processed via method process. This method returns a value of the enumeration treebank_error.

The usage of this class is a lot simpler than that of class treebank_collection_reader. For example:

// initialise the processor without features (remmeber to check for errors)
tbproc.init(treebank_input_file, result_filename, "Book_1");
tbproc.process();
// it is advisable to check for errors
void add_feature(const treebank_feature &fs) noexcept
Adds a feature to the processor.
Definition: process_treebank_base.hpp:69
Automatic processing of treebank files.
Definition: treebank_processor.hpp:103
treebank_error process() noexcept
Process the treebank file.
treebank_error init(const std::string &treebank_input_file, const std::string &output_file, const std::string &treebank_id="") noexcept
Initialise the processor with a new collection.
@ num_crossings
Number of edge crossings .
@ var_num_crossings
Variance of , .

Member Function Documentation

◆ add_feature()

void lal::io::_process_treebank_base::add_feature ( const treebank_feature fs)
inlinenoexceptinherited

Adds a feature to the processor.

Parameters
fsFeature to be added.

◆ has_feature()

bool lal::io::_process_treebank_base::has_feature ( const treebank_feature fs) const
inlinenoexceptinherited

Is a given feature to be calculated?

Parameters
fsThe feature being queried.
Returns
True or False depending on whether the feature was added or removed.

◆ init()

treebank_error lal::io::treebank_processor::init ( const std::string &  treebank_input_file,
const std::string &  output_file,
const std::string &  treebank_id = "" 
)
noexcept

Initialise the processor with a new collection.

Parameters
treebank_input_fileFile listing all the treebanks.
output_fileFile where the results are to be stored.
treebank_idA nickname for this treebank (for example, an ISO code).
Returns
The type of the error, if any. The list of errors that this method can return is:

◆ process()

treebank_error lal::io::treebank_processor::process ( )
noexcept

Process the treebank file.

This method produces the information as explained in this class' description. However, it may fail to do so. In this case it will return a value different from lal::io::treebank_error_type::no_error.

This function uses attributes m_separator, m_output_header to format the output data. It also outputs the current progress if m_be_verbose is set to true.

Returns
The type of the error, if any. The list of errors that this method can return is:
Precondition
Initialisation did not return any errors.

◆ remove_feature()

void lal::io::_process_treebank_base::remove_feature ( const treebank_feature fs)
inlinenoexceptinherited

Removes a feature from the processor.

Parameters
fsFeature to be removed.

◆ set_column_name()

void lal::io::_process_treebank_base::set_column_name ( const treebank_feature tf,
const std::string &  name 
)
inlinenoexceptinherited

Sets a custom name for the column corresponding to a given feature.

This does not work for features

◆ set_output_header()

void lal::io::_process_treebank_base::set_output_header ( bool  h)
inlinenoexceptinherited

Output a hedaer for the treebank result file.

Default is true.

Parameters
hOutput header or not.

◆ set_separator()

void lal::io::_process_treebank_base::set_separator ( char  c)
inlinenoexceptinherited

Sets the separator character.

The default seprator is a tabulator character '\t'.

Parameters
cThe separator character.

◆ set_verbosity()

void lal::io::_process_treebank_base::set_verbosity ( int  k)
inlinenoexceptinherited

Sets the level of verbosity of the process methods.

Default is 0 (i.e., no verbosity at all). Verbosity is classified by levels:

  • Level 1: outputs progress messages.
  • Level 2: outputs error messages.
    Parameters
    kVerbosity level.

Member Data Documentation

◆ m_be_verbose

int lal::io::_process_treebank_base::m_be_verbose = 0
protectedinherited

The verbosity of the processor.

When set to a value greater than or equal to 1, method process will output progress messages.


The documentation for this class was generated from the following file: