NAME

Text::CSV::Hash - hash based CSV file handing, including comparisons


SYNOPSIS

Text::CSV::Hash provides simple hash based methods for loading, saving, and comparing CSV files. Will handle standard quoting and entries with newlines.


DESCRIPTION

Methods

$csv->config()
Allows various configuration options to be set. Should be passed a hash with any of the following:
lookupscores
hash reference to default lookup fields and scores - see lookup() for more details. Load will fail if any are missing.

mapfields
reference to hash that maps column names on load. Mapping happens before column names are checked against $lookupscores.

max_data_rows
maximum number of data rows to read. Useful for quickly reading the start of a csv file.

name_row_offset
offset to row containing column names

quote
character used to quote fields. Defaults to '``'.

separator
field separator. Defaults to ','.

skip_blank_lines
true/false - are blank (containing no non-whitespace) lines in the file ignored

trim
true/false - trim whitespace from start and end of each field.

$csv->dump($row, $row2)
If called with one csv row reference displays each field label and its values. If called with two, displays the second's value where it differs from the first.

$csv->labels()
Return the current set of standard labels for current csv as an array.

$csv->labels_add(@labels)
Add @labels to the set of standard labels for all csv rows. The standard labels are used by dump(), save() etc.

$csv->labels_set(@labels)
Set the standard labels for all csv rows to @labels. The standard labels are used by dump(), save() etc.

$csv->linenolabel
The name of the field containing the file linenumber for each row. This is set by $csv->read automatically. It defaults to 'lineno', but if that conflicts with a label in the file it will prepend '_' until a unique name is found.

$csv->load($file, $conf);
Load csv file '$file'. If $conf is present it can specify any of the options valid for $csv->config.

The first line of the file is expected to contain column labels, except where overridden by name_row_offset or skip_blank_lines.

Returns undef on error and error value can be displayed with $csv->error().

$csv->lookup($row, $minscore, $lookupscores);
Lookup row $row in csv. Other parameters are optional, but if specified:
$minscore - Minimum score for matching fields. Defaults to 1.
$lookupscores - Fields and scores for lookups. Defaults to $lookupscores passed to load().

lookup() is intended to handle the case where some fields may mismatch, and multiple match() calls may be needed to find a suitable row. If it is unable to match all fields in $lookupscores it will cycle through all possible combnations of fields in $lookupscores starting with the highest possible score and working down. The first combination that matches will be used. Any match with a score lower than $minscore will be rejected. For example, if the csv file has the fields 'first name', 'last name', 'country' and you are willing to accept a mismatch of either 'first name' or 'country', you could use

        $lookupscores = { 'last name' => 2,
                          'first name' => 1,
                          'country' => 1 };
        $minscore = 3;

Lookup returns a list of three items ($match, $ambig, $mismatch);

$match - reference to exact match if only one match found.
$ambig - reference to array of ambiguous matches if more than one match found.
$mismatch - reference to array of field names that mismatched if not exact match.

If you do not care about details of ambiguous matches or mismatched fields you can treat them as boolean or just ignore them, as in ($match) = $csv->lookup($row);

$csv->match($row, @fields);
Return a list of references to rows in $csv which match $row for all fields in @fields. Typically $row would be from a different csv object. match() builds a hash cache internally to speedup repeated lookups on the same csv object.

If the returned list has one entry it is an exact match, more than one indicates an ambiguous match, and zero no matches.

For more sophisticated matching see lookup().

new Text::CSV::Hash;
Create a new $csv object. If passed a reference to an array of hashes, set the labels to the keys in the first entry, and the rows to the contents of the array.

$csv->rows()
Returns the reference to an array of all rows in the csv. Each row is a hash of label => value. If passed a reference to an array of hashes will set the rows to the contents of the array.

$csv->save($file, $conf);
Save csv to file. If $map was specified on load, saved column names are the values after mapping. If $file is not specified it will default to the file used in load().

Returns undef on error and error value can be displayed with $csv->error().

If present, $conf can specify characteristics of the file. Note these will default to those set when the file was loaded.

quote - charater used to quote fields. Defaults to '``'.
separator - field separator. Defaults to ','.


EXAMPLE1

    # Sample script to load file.csv, double the 'cost' columns, and save
    my($csv);
    $csv = new Text::CSV::Hash;
    $csv->load('file.csv') || die $csv->error();
    foreach my $row (@{$csv->rows()})
        { $row->{cost} *= 2; }
    $csv->save() || die $csv->error();


EXAMPLE2

    # Sample script to produce merged versions of file1.csv and file2.csv
    #   - matching on 'first name', 'last name', and 'country' columns
    #   - add two columns 'results', and 'mismatches'
    #   - allowing for any one of the three fields to mismatch
    #   - trimming spaces from each field.
    my($csv1, $csv2, %merge_match);
    $csv1 = new Text::CSV::Hash;
    $csv2 = new Text::CSV::Hash;
    %merge_match = ('first name' => 1, 'last name' => 1, 'country' => 1);
    $csv1->load('file1.csv', {trim => 1}, \%lookups) || die $csv1->error();
    $csv2->load('file2.csv', {trim => 1}, \%lookups) || die $csv2->error();
    $csv2->labels_add('results', 'mismatches');
    foreach my $row2 (@{$csv2->rows()})
        {
        my($match1, $ambig, $mismatches, $resfield, $misfield);
        ($match1, $ambig, $mismatches) = $csv1->lookup($row2, 2);
        if ($ambig)
            { $resfield = 'Ambiguous'; }
        elsif (!$match1)
            { $resfield = 'Not Found'; }
        elsif ($mismatches)
            {
            $resfield = "Mismatch ".join(' + ', @${mismatches});
            foreach my $mis (@${mismatches})
                { $misfield .= qq#("$row2->{$mis}" - "$match1->{$mis}") #; }
            chop $misfield;
            }
        else
            { $resfield = 'Match all'; }
        $row2->{results} = $resfield;
        $row2->{mismatches} = $misfield;
        if ($match1)
            {
            # Merge fields both ways
            foreach($csv1->labels())
                { $row2->{$_} ||= $match1->{$_}; }
            foreach($csv2->labels())
                { $match1->{$_} ||= $csv2->{$_}; }
            }
        }
    $csv1->labels_add($csv2->labels());
    $csv1->save('merged_'.$csv1->{filename}) || die $csv1->error();
    $csv2->labels_add($csv1->labels());
    $csv2->save('merged_'.$csv2->{filename}) || die $csv2->error();