Skip to content

cca/koha_qa

Repository files navigation

Koha Records QA

Various bibliographic metadata tools.

Setup

To use summon.py, obtain a Summon API key from the Ex Libris Developer Network, see their documentation. To use summon-update.py, add our Summon SFTP credentials to the .env file.

Prefix python commands with uv run to use the project's virtual environment.

uv sync
cp example.env .env
vim .env # edit in secret values

break.py

Split MARC files into smaller subsets named like records-1.mrc, records-2.mrc, etc. This is the same as MARCEdit's MARCSplit feature if you would prefer not to use the command line. Koha can only process so many records at once without failing so we tend to batch record imports at 500 or 1000 records at a time.

# break file.mrc into batches of 500 records
uv run python break.py 500 file.mrc

comics_plus.py

Add our proxy server prefix to Comics Plus MARC records and warn if there are any corrected or deleted records. See our wiki page on Comics Plus for more information and why we cannot accomplish this with Koha's MARC modification templates.

usage: comics_plus.py [-h] <file.mrc> [<output.mrc>]

Process Comics Plus MARC records

positional arguments:
  <file.mrc>    MARC file to process
  <output.mrc>  Output filename, defaults to YYYY-MM-DD-comicsplus.mrc

options:
  -h, --help    show this help message and exit

dupes.py

Find duplicate MARC records based on the 001 control field. We could extend this to consider duplicates in other manners as well.

uv run python dupes.py file.mrc

link_check.py

Check URLs in Koha 856$u fields. See the readme for details.

summon.py

Check if MARC record(s) are in CCA's Summon index.

# search for single title with detailed search results output
> uv run python summon.py --debug "the color purple"
https://cca.summon.serialssolutions.com/search?s.q=%22the+color+purple%22&s.fvf=SourceType%2CLibrary+Catalog%2Cf
Search Results: 10
Title: The color purple
Authors: Walker, Alice
Publication Date: 1983.
ISBNs: 0671668781, 9780671668785, 0156031825, 9780156031820, 0671617028, 9780671617028, 9780151191543, 0151191549
Type: Book
Summon Link: https://cca.summon.serialssolutions.com/search?bookMark=...
# more search results printed...

# iterate over MARC records with summary output and CSV of missing records
> uv run python summon.py --missing missing.csv file.mrc

Total Records:      50
Had Search Results: 50
Had ISBN:           46
ISBN Matches:       45

A record is considered "missing" if there is no ISBN match in Summon, records without ISBNs are not considered missing. The Summon search is a title search, so records with short, generic titles like "Art Now" can be considered "missing" because the record with the matching ISBN isn't in the first page of 10 search results returned.

summon_update.py

Update our Summon index with a file of MARC records. Can delete or update records. A "full" update requires contacting support but this script can upload the file. Export records from Koha staff side > Cataloging > Export data.

Usage: summon_update.py [OPTIONS] FILE_PATH

  Puts a file to the Summon SFTP server.

Options:
  -h, --help                      Show this message and exit.
  -t, --type [updates|deletes|full]
                                  type of update
  -d, --debug                     enable SFTP debug logging

split_lang_codes.py

One-time fix for outdated lang codes and all our lang codes being stuffed into a single $a subfield, e.g. 041 1_ $aengjap -> 041 1_ $aeng$ajpn. Usage:

uv run python split_lang_codes.py test # run test suite
uv run python split_lang_codes.py fix --debug input.mrc # print changes to console, no file changes
uv run python split_lang_codes.py fix input.mrc output.mrc # write changes to output.mrc

LICENSE

ECL Version 2.0

Code from Summon API Toolkit may come with a different license but none was stated in their GitHub repo.

About

MARC record utilities for Koha & Summon

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages