structure_post_extraction_processor.StructurePostExtractionProcessor

class structure_post_extraction_processor.StructurePostExtractionProcessor(hdf5_path, batch_size, device=None)[source]

Orchestrates post-extraction computation of derived structure features.

Reads raw data from an HDF5 file, processes structures in GPU-accelerated batches to compute geometric, topological, and contact-based features, and writes both raw and computed data to a new processed HDF5 file.

__init__(hdf5_path, batch_size, device=None)[source]

Initialize the processor.

Parameters:
  • hdf5_path (Path) – Path to the raw HDF5 file containing extracted structure data.

  • batch_size (int) – Number of structures to process per GPU batch.

  • device (str or torch.device, optional) – Device specifier (e.g., ‘cuda’, ‘cpu’); if None, selects CUDA if available.

Methods

__init__(hdf5_path, batch_size[, device])

Initialize the processor.

run()

Execute the full post-extraction processing pipeline.

__init__(hdf5_path, batch_size, device=None)[source]

Initialize the processor.

Parameters:
  • hdf5_path (Path) – Path to the raw HDF5 file containing extracted structure data.

  • batch_size (int) – Number of structures to process per GPU batch.

  • device (str or torch.device, optional) – Device specifier (e.g., ‘cuda’, ‘cpu’); if None, selects CUDA if available.

run()[source]

Execute the full post-extraction processing pipeline.

Removes any existing processed file, reads raw data, initializes output datasets, processes structures in batches, and writes both raw and computed data to the output HDF5 file.