csa_config.ExtractionConfig
- class csa_config.ExtractionConfig(data_directory, data_prefix, actions, filters, extraction_batch_size, post_extraction_batch_size)[source]
Configuration settings for the data-extraction pipeline.
- Parameters:
data_directory (Path) – Directory under which all raw and intermediate extraction outputs will be stored. Subdirectories (e.g. “structures/”, “csv/”) are created automatically.
data_prefix (str) – Prefix used when naming output files, for example
"{data_prefix}_refcode_families.csv".actions (Dict[str, bool]) – Flags to enable or skip individual extraction substeps: -
get_refcode_families-cluster_refcode_families-get_unique_structures-get_structure_data-post_extraction_processfilters (Dict[str, Any]) – Criteria for filtering CSD entries, for example: -
elements(List[str]): only structures containing these elements -min_resolution(float): only structures with resolution ≤ this value -space_groups(List[str]): only structures in these space groupsextraction_batch_size (int) – Number of structures or refcode families to process per batch during extraction
post_extraction_batch_size (int) – Number of structures to process per batch during post-extraction
- from_json(cls, json_path)[source]
Load and validate fields from the “extraction” section of a JSON file.
- __init__(data_directory, data_prefix, actions, filters, extraction_batch_size, post_extraction_batch_size)
Methods
__init__(data_directory, data_prefix, ...)from_json(json_path)Load an ExtractionConfig from a JSON file.
Attributes
- classmethod from_json(json_path)[source]
Load an ExtractionConfig from a JSON file.
- Parameters:
json_path (Union[str, Path]) – Path to the JSON configuration file.
- Returns:
Instance populated from the “extraction” section.
- Return type:
- Raises:
FileNotFoundError – If the file does not exist.
KeyError – If the “extraction” section is missing.
json.JSONDecodeError – If the file contains invalid JSON.
- __init__(data_directory, data_prefix, actions, filters, extraction_batch_size, post_extraction_batch_size)