User Manual

Indexing Documents

New documents may be indexed via the TYPO3 command line interface (CLI).

Index single document

The command kitodo:index is used for indexing a single document:

./vendor/bin/typo3 kitodo:index -d http://example.com/path/mets.xml -p 123 -s dlfCore1

Copied!

Option

Required

Description

Example

-d|--doc

yes

This may be an UID of an existing document in tx_dlf_documents or the URL of a METS XML file. If the URL is already known as location in tx_dlf_documents, the file is processed anyway and the records in database and solr index are updated.

Hint: Do not encode the URL! If you have spaces in path, use quotation marks.

123 or http://example.com/path/mets.xml

-p|--pid

yes

The page UID of the Kitodo.Presentation data folder. This keeps all records of documents, metadata, structures, solrcores etc.

123

-s|--solr

yes

This may be the UID of the solrcore record in tx_dlf_solrcores. Alternatively you may write the index name of the solr core.

The solr core must exist in table tx_dlf_solrcores on page "pid". Otherwise an error is shown and the processing won't start.

123 or 'dlfCore1'

-o|--owner

no

This may be the UID of the library record in tx_dlf_libraries which should be set as the owner of the document. If omitted, the default is to try to read the ownership from the metadata field "owner".

123

--dry-run

no

Nothing will be written to database or index. The solr-setting will be checked and the documents location URL will be shown.

 

-q|--quite

no

Do not output any message. Useful when using a wrapper script. The script may check the return value of the CLI job. This is always 0 on success and 1 on failure.

 

-v|--verbose

no

Show processed documents uid and location with indexing parameters.

 

Reindex collections

With the command kitodo:reindex it is possible to reindex one or more collections or even to reindex all documents on the given page.:

# reindex collection with uid 1 on page 123 with solr core 'dlfCore1'
# short notation
./vendor/bin/typo3 kitodo:reindex -c 1 -p 123 -s dlfCore1
# long notation
./vendor/bin/typo3 kitodo:reindex --coll 1 --pid 123 --solr dlfCore1

# reindex collection with uid 1 on page 123 with solr core 'dlfCore1' in given range
# short notation
./vendor/bin/typo3 kitodo:reindex -c 1 -l 1000 -b 0 -p 123 -s dlfCore1
./vendor/bin/typo3 kitodo:reindex -c 1 -l 1000 -b 1000 -p 123 -s dlfCore1
# long notation
./vendor/bin/typo3 kitodo:reindex --coll 1 --index-limit=1000 --index-begin=0 --pid 123 ---solr dlfCore1
./vendor/bin/typo3 kitodo:reindex --coll 1 --index-limit=1000 --index-begin=1000 --pid 123 --solr dlfCore1

# reindex collection with uid 1 and 4 on page 123 with solr core 'dlfCore1'
# short notation
./vendor/bin/typo3 kitodo:reindex -c 1,4 -p 123 -s dlfCore1
# long notation
./vendor/bin/typo3 kitodo:reindex --coll 1,4 --pid 123 --solr dlfCore1

# reindex collection with uid 1 and 4 on page 123 with solr core 'dlfCore1' in given range
# short notation
./vendor/bin/typo3 kitodo:reindex -c 1,4 -l 1000 -b 0 -p 123 -s dlfCore1
./vendor/bin/typo3 kitodo:reindex -c 1,4 -l 1000 -b 1000 -p 123 -s dlfCore1
# long notation
./vendor/bin/typo3 kitodo:reindex --coll 1,4 --index-limit=1000 --index-begin=0 --pid 123 ---solr dlfCore1
./vendor/bin/typo3 kitodo:reindex --coll 1,4 --index-limit=1000 --index-begin=1000 --pid 123 --solr dlfCore1

# reindex all documents on page 123 with solr core 'dlfCore1' (caution can result in memory problems for big amount of documents)
# short notation
./vendor/bin/typo3 kitodo:reindex -a -p 123 -s dlfCore1
# long notation
./vendor/bin/typo3 kitodo:reindex --all --pid 123 --solr dlfCore1

# reindex all documents on page 123 with solr core 'dlfCore1' in given range
# short notation
./vendor/bin/typo3 kitodo:reindex -a -l 1000 -b 0 -p 123 -s dlfCore1
./vendor/bin/typo3 kitodo:reindex -a -l 1000 -b 1000 -p 123 -s dlfCore1
# long notation
./vendor/bin/typo3 kitodo:reindex --all --index-limit=1000 --index-begin=0 --pid 123 ---solr dlfCore1
./vendor/bin/typo3 kitodo:reindex --all --index-limit=1000 --index-begin=1000 --pid 123 --solr dlfCore1

Copied!

Option

Required

Description

Example

-a|--all

no

With this option, all documents from the given page will be reindex.

 

-c|--coll

no

This may be a single collection UID or a list of UIDs to reindex.

1 or 1,2,3

-p|--pid

yes

The page UID of the Kitodo.Presentation data folder. This keeps all records of documents, metadata, structures, solrcores etc.

123

-s|--solr

yes

This may be the UID of the solrcore record in tx_dlf_solrcores. Alternatively you may write the index name of the solr core.

The solr core must exist in table tx_dlf_solrcores on page "pid". Otherwise an error is shown and the processing won't start.

123 or 'dlfCore1'

-o|--owner

no

This may be the UID of the library record in tx_dlf_libraries which should be set as the owner of the documents. If omitted, the default is to try to read the ownership from the metadata field "owner".

123

-l|--index-limit

no

With this option, all documents in given limit for the given page will be reindex.

Used when it is expected that memory problems can appear due to the high amount of documents.

1000

-b|--index-begin

no

With this option, all documents beginning from given value for the given page will be reindex.

Used when it is expected that memory problems can appear due to the high amount of documents.

--dry-run

no

Nothing will be written to database or index. All documents will be listed which would be processed on a real run.

 

-q|--quite

no

Do not output any message. Useful when using a wrapper script. The script may check the return value of the CLI job. This is always 0 on success and 1 on failure.

 

-v|--verbose

no

Show each processed documents uid and location with timestamp and amount of processed/all documents.

 

Harvest OAI-PMH interface

With the command kitodo:harvest it is possible to harvest an OAI-PMH interface and index all fetched records.:

# example
./vendor/bin/typo3 kitodo:harvest --lib=<UID> --pid=<PID> --solr=<CORE> --from=<timestamp> --until=<timestamp> --set=<set>
Copied!

In order to use the command, you first have to configure a library in the backend, setting at least a label and oai_base. The latter should be a valid OAI-PMH base URL (e.g. https://digital.slub-dresden.de/oai/).

Option

Required

Description

Example

-l|--lib

yes

This is the UID of the library record with the OAI interface that should be harvested. This library is also automatically set as the documents' owner.

123

-p|--pid

yes

This is the page UID of the library record and therefore the page the documents are added to.

123

-s|--solr

yes

This may be the UID of the solrcore record in tx_dlf_solrcores. Alternatively you may write the index name of the solr core.

The solr core must exist in table tx_dlf_solrcores on page "pid". Otherwise an error is shown and the processing won't start.

123 or 'dlfCore1'

--from

no

This is a timestamp in the format YYYY-MM-DD. The parameters from and until limit harvesting to the given period, e.g. for incremental updates.

2021-01-01

--until

no

This is a timestamp in the format YYYY-MM-DD. The parameters from and until limit harvesting to the given period, e.g. for incremental updates.

2021-06-30

--set

no

This is the name of an OAI set. The parameter limits harvesting to the given set.

'vd18'

--dry-run

no

Nothing will be written to database or index. All documents will be listed which would be processed on a real run.

 

-q|--quite

no

Do not output any message. Useful when using a wrapper script. The script may check the return value of the CLI job. This is always 0 on success and 1 on failure.

 

-v|--verbose

no

Show each processed documents uid and location with timestamp and amount of processed/all documents.

 

Delete single document

The command kitodo:delete is used for deleting a single document:

./vendor/bin/typo3 kitodo:delete -d http://example.com/path/mets.xml -p 123 -s dlfCore1

Copied!

Option

Required

Description

Example

-d|--doc

yes

This may be an UID of an existing document in tx_dlf_documents or the URL of a METS XML file.

Hint: Do not encode the URL! If you have spaces in path, use quotation marks.

123 or http://example.com/path/mets.xml

-p|--pid

yes

The page UID of the Kitodo.Presentation data folder. This keeps all records of documents, metadata, structures, solrcores etc.

123

-s|--solr

yes

This may be the UID of the solrcore record in tx_dlf_solrcores. Alternatively you may write the index name of the solr core.

The solr core must exist in table tx_dlf_solrcores on page "pid". Otherwise an error is shown and the processing won't start.

123 or 'dlfCore1'

-v|--verbose

no

Show processed documents uid and location with deleting parameters.

 

Commit and/or optimize index

With the command kitodo:optimize it is possible to hard commit documents to and/or optimize the index.:

# example
./vendor/bin/typo3 kitodo:optimize --solr=<CORE> --commit --optimize
Copied!

Option

Required

Description

Example

-s|--solr

yes

This may be the UID of the solrcore record in tx_dlf_solrcores. Alternatively you may write the index name of the solr core.

The solr core must exist in table tx_dlf_solrcores on page "pid". Otherwise an error is shown and the processing won't start.

123 or 'dlfCore1'

--commit

no

Hard commit documents to the index.

 

--optimize

no

Optimize the index.

 

--dry-run

no

Nothing will be written to database or index. All documents will be listed which would be processed on a real run.

 

-q|--quite

no

Do not output any message. Useful when using a wrapper script. The script may check the return value of the CLI job. This is always 0 on success and 1 on failure.

 

-v|--verbose

no

Show each processed documents uid and location with timestamp and amount of processed/all documents.