Documentation
Installation
In the following we describe different ways in which you can install pycottas.
PyPI
PyPI is the fastest way to install pycottas.
We recommend to use virtual environments to install pycottas.
From Source
You can also grab the latest source code from the GitHub repository.
API Reference
General Functions
rdf2cottas
pycottas.rdf2cottas(rdf_file_path, cottas_file_path, index='spo', disk=False)
Compress an RDF file in a plain text format into COTTAS. The compressed file is indexed for efficient querying.
Parameters:
-
rdf_file_path : str
Path to the input RDF file. Supported formats: N-Triples, N-Quads, Turtle, TriG, N3, and RDF/XML.
-
cottas_file_path : str
Path to the output COTTAS file.
-
index : {‘spo’, ‘sop’, ‘pso’, ‘pos’, ‘osp’, ‘ops’}, default ‘spo’
Computed index for the compressed COTTAS file. For RDF datasets index permutations include
g, e.g.,spog. -
disk : bool, default False
Whether to use on-disk storage.
cottas2rdf
pycottas.cottas2rdf(cottas_file_path, rdf_file_path)
Uncompress a COTTAS file to RDF in N-Triples format.
Parameters:
-
cottas_file_path : str
Path to the input COTTAS file.
-
rdf_file_path : str
Path to the output RDF file (N-Triples).
search
pycottas.search(cottas_file_path, triple_pattern)
Evaluate a triple pattern over a COTTAS file.
Parameters:
-
cottas_file_path : str
Path to the COTTAS file.
-
triple_pattern : str, list or tuple
The triple pattern can be a string or a list or tuple with the sequence of RDFLib (subject, predicate, object) terms with variables given by
None. The pattern can be a quad pattern in the case of querying an RDF dataset.
import pycottas
from rdflib import URIRef
res = pycottas.search('my_file.cottas', '?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o') # (1)
res = pycottas.search('my_file.cottas', (None, URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), None)) # (2)
print(res)
Triple pattern as a string.
Triple pattern as a tuple of RDFLib terms (
Nonefor variables).
cat
pycottas.cat(cottas_file_paths, cottas_cat_file_path, index='spo', remove_input_files=False)
Merge multiple COTTAS files into one.
Parameters:
-
cottas_file_paths : str
Paths of the input COTTAS files to merge. E.g.,
['file1.cottas', 'file2.cottas']. -
cottas_cat_file_path : str
Path to the output COTTAS file.
-
index : {‘spo’, ‘sop’, ‘pso’, ‘pos’, ‘osp’, ‘ops’}, default ‘spo’
Computed index for the merged COTTAS file. For RDF datasets index permutations include
g, e.g.,spog. -
remove_input_files : bool, default False
Whether to remove the input COTTAS files after merging.
diff
pycottas.diff(cottas_file_1_path, cottas_file_2_path, cottas_diff_file_path, index='spo', remove_input_files=False)
Substract a COTTAS file from another.
Parameters:
-
cottas_file_1_path : str
Path to the initial COTTAS file.
-
cottas_file_2_path : str
Path to the COTTAS file to substract.
-
cottas_diff_file_path : str
Path to the output COTTAS file.
-
index : {‘spo’, ‘sop’, ‘pso’, ‘pos’, ‘osp’, ‘ops’}, default ‘spo’
Computed index for the resulting COTTAS file. For RDF datasets index permutations include
g, e.g.,spog. -
remove_input_files : bool, default False
Whether to remove the input COTTAS files after subtracting.
info
pycottas.info(cottas_file_path)
Query COTTAS file metadata.
Parameters:
-
cottas_file_path : str
Path to the COTTAS file.
verify
pycottas.verify(cottas_file_path)
Check whether a COTTAS file is valid.
Parameters:
-
cottas_file_path : str
Path to the COTTAS file.
COTTASDocument
pycottas.COTTASStore(path)
Class for evaluating triple patterns over COTTAS files.
Parameters:
-
path : str
Path to the COTTAS file.
from pycottas import COTTASDocument
from rdflib import Graph, URIRef
cottas_doc = COTTASDocument('my_file.cottas')
res = cottas_doc.search('?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o') # (1)
res = cottas_doc.search((None, URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), None)) # (2)
# limit and offset
res = cottas_doc.search('?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o', limit=10, offset=20) # (3)
# RDF terms in the result set can be in N3 (default) or as RDFLib terms
res = cottas_doc.search('?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o', results_in_n3=False) # (4)
print(res)
Triple pattern as a string.
Triple pattern as a tuple of RDFLib terms (
Nonefor variables).Limit and offset can be optionally provided for pagination.
The RDF terms in the result set are RDFLib terms.
COTTASStore
pycottas.COTTASStore(path)
RDFLib store backend to query COTTAS files with SPARQL.
Parameters:
-
path : str
Path to the COTTAS file.
from pycottas import COTTASStore
from rdflib import Graph
store = COTTASStore('my_file.cottas')
graph = Graph(store) # (1)
res = graph.query('''
SELECT DISTINCT ?s ?o WHERE {
?s a ?o .
} LIMIT 10''')
for row in res:
print(row)
From here
graphcan be used as a read-only RDFLib store.
Command Line
To execute COTTAS from the command line, call pycottas, specify the operation to perform (rdf2cottas, search, cat, etc.), and provide the required parameters.
As an alternative, python3 -m pycottas is also supported.
$ pycottas -h
usage: pycottas {rdf2cottas,cottas2rdf,search,info,verify,cat,diff} ...
positional arguments:
{rdf2cottas,cottas2rdf,search,info,verify,cat,diff}
subcommand help
rdf2cottas Compress an RDF file into COTTAS format
cottas2rdf Decompress a COTTAS file to RDF (N-Triples)
search Evaluate a triple pattern
info Get the metadata of a COTTAS file
verify Check whether a file is a valid COTTAS file
cat Merge multiple COTTAS files
diff Subtract the triples in a COTTAS files from another
rdf2cottas
$ pycottas rdf2cottas -h
usage: pycottas rdf2cottas -r RDF_FILE -c COTTAS_FILE [-i INDEX]
options:
-h, --help show this help message and exit
-r RDF_FILE, --rdf_file RDF_FILE
Path to RDF file
-c COTTAS_FILE, --cottas_file COTTAS_FILE
Path to COTTAS file
-i INDEX, --index INDEX
Zonemap index, e.g.: `SPO`, `PSO`, `GPOS`
-d DISK, --disk DISK Whether to use on-disk storage
cottas2rdf
$ pycottas cottas2rdf -h
usage: pycottas cottas2rdf -c COTTAS_FILE -r RDF_FILE
options:
-c COTTAS_FILE, --cottas_file COTTAS_FILE
Path to COTTAS file
-r RDF_FILE, --rdf_file RDF_FILE
Path to RDF file (N-Triples)
search
$ pycottas search -h
usage: pycottas search -c COTTAS_FILE -t TRIPLE_PATTERN [-r {table,tuples,to_csv}]
options:
-c COTTAS_FILE, --cottas_file COTTAS_FILE
Path to COTTAS file
-t TRIPLE_PATTERN, --triple_pattern TRIPLE_PATTERN
Triple pattern, e.g., `?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o`
-r {table,tuples,to_csv}, --result_option {table,tuples,to_csv}
What to do with the result set
info
$ pycottas info -h
usage: pycottas info -c COTTAS_FILE
options:
-c COTTAS_FILE, --cottas_file COTTAS_FILE
Path to COTTAS file
verify
$ pycottas verify -h
usage: pycottas verify -c COTTAS_FILE
options:
-c COTTAS_FILE, --cottas_file COTTAS_FILE
Path to COTTAS file
cat
$ pycottas cat -h
usage: pycottas cat --input_cottas_files INPUT_COTTAS_FILES [INPUT_COTTAS_FILES ...] --output_cottas_file OUTPUT_COTTAS_FILE [-i INDEX] [-r REMOVE_INPUT_FILES]
options:
--input_cottas_files INPUT_COTTAS_FILES [INPUT_COTTAS_FILES ...]
Path of the input COTTAS files
--output_cottas_file OUTPUT_COTTAS_FILE
Path of the output COTTAS file
-i INDEX, --index INDEX
Zonemap index, e.g.: `SPO`, `PSO`, `GPOS`
-r REMOVE_INPUT_FILES, --remove_input_files REMOVE_INPUT_FILES
Whether to remove input COTTAS files after merging
diff
$ pycottas diff -h
usage: pycottas diff -c COTTAS_FILE -s SUBTRACT_COTTAS_FILE -o OUTPUT_COTTAS_FILE [-i INDEX] [-r REMOVE_INPUT_FILES]
options:
-c COTTAS_FILE, --cottas_file COTTAS_FILE
Path to the COTTAS file
-s SUBTRACT_COTTAS_FILE, --subtract_cottas_file SUBTRACT_COTTAS_FILE
Path to the COTTAS file to subtract
-o OUTPUT_COTTAS_FILE, --output_cottas_file OUTPUT_COTTAS_FILE
Path to the output COTTAS file
-i INDEX, --index INDEX
Zonemap index, e.g.: `SPO`, `PSO`, `GPOS`
-r REMOVE_INPUT_FILES, --remove_input_files REMOVE_INPUT_FILES
Whether to remove the input COTTAS files after merging
Tricks
Multiple Files
Multiple COTTAS files can be simultaneously accessed with list parameters and glob patterns. Check here the syntax for glob patterns.
from pycottas import COTTASDocument
cottas_doc = COTTASDocument(['file1.cottas', 'file2.cottas', 'file3.cottas']) # (1)
cottas_doc = COTTASDocument('some_dir/*.cottas') # (2)
cottas_doc = COTTASDocument(['dir1/*.cottas', 'dir2/*.cottas']) # (3)
res = cottas_doc.search('?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o')
Query three COTTAS files and treat them as a single one.
Query all COTTAS files that match the glob pattern.
Query all COTTAS files from two specific directories.
Remote Files
COTTAS files can be read over HTTP.
from pycottas import COTTASDocument
store = COTTASDocument('https://some.url/my_file.cottas')
res = cottas_doc.search('?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o')
