process_xml

A module for extracting metadata, median beats, and raw waveforms from ECG XML files, allowing for XML validation.

This module provides an API through a reader class, which maps ECG data from XML files to class attributes. These attributes can be programmatically accessed and further processed by downstream ECGprocess modules or external programs leveraging the API.

class ecgprocess.process_xml.ECGXMLReader(_as_array: bool = True, augment_leads: bool = False, resample_500: bool = True)[source]

Processes an XML file containing ECG data and extracts the metadata, median beats, and raw waveforms.

Parameters:
  • augment_leads (bool, default False) – Whether the augmented leads should be calculated if these are not already available in the source file.

  • resample_500 (bool, default True) – Whether to resample the ECG to a frequency of 500 Hertz. Note this will internally calculate the ECG duration in seconds. For the duration to be in seconds the sampling frequency/rate should be in seconds not milliseconds.

augment_leads

Whether the augmented leads were calculated if these were unavailable.

Type:

bool

resample

Whether the ECG was resampled to a 500 Hertz frequency.

Type:

bool

extract(config, skip_empty, parse_numeric, \*\*kwargs)[source]

Processes the XML file content applying optional lead augmentation and resampling. The XML content will be mapped to class attributes.

__call__(path: str, schema: str | None = None, verbose: bool = False, **kwargs: Any | None) Self[source]

Reads an .xml file containing ECG readings, optionally validates this based on a .xsd schema, and map the XML file to a flat dictionary.

Parameters:
  • path (str) – The path to the .xml file.

  • schema (str, default NoneType) – A path to an XML schema which will be used to valudate the XML file against.

  • verbose (bool, default False) – Whether warnings and process info should be printed.

  • **kwargs (any) – keyword arguments passed to flatten_dict.

tags

A list of strings with parsed tags matching the raw_data keys.

Type:

list [str]

raw_data

The raw parsed data.

Type:

dict [str, any]

Returns:

self – Returns the class instance with updated attributes including the extracted XML data.

Return type:

ECGXMLReader instance

Raises:

XMLValidationError – If the XML file is not valid based on the supplied schema.

__eq__(other)

Return self==value.

__hash__ = None
__init__(_as_array: bool = True, augment_leads: bool = False, resample_500: bool = True) None

Initialises slots entries to None.

__post_init__()[source]

Validating inputs.

__repr__()

Return repr(self).

__weakref__

list of weak references to the object

extract(config: ConfigParser, bits: dtype | None = None, skip_empty: bool = True, parse_numeric: bool = True, **kwargs: Any | None) Self[source]

Processes the raw ECG data and assign these to class attributes performing resampling and lead augmentation if requested.

Parameters:
  • config (ConfigParser) – A class instance of a parsed configuration file, mapping the XML content to class attributes. Specifically this should include dictionary attributes MetaData, WaveForms, MedianBeats, OtherData. The MetaData includes some privileged keys including essential information to describe an ECG instance, as well as non-privileged information. The difference between OtherData and MetaData is the way it is processed by other functions or methods with the OtherData processed without strong checks on its content. WaveForms and MedianBeats simply include the lead mappings. Please refer to the constants.CoreData class for the specifics

  • parse_numeric (bool, default True) – Whether to check for numeric data accidentally recorded as string and try to parse these to int or float depending on the presence of a decimal separator.

  • skip_empty (bool, default True) – Whether empty tags should be skipped or throw an error.

  • bits (np.dtype, default None) – np.array bits passed to numpy.array dtype.

  • **kwargs – The keyword arguments for reader_tools.get_ecg_data. For the waveforms and medianbeats as_array and bits are hard coded so these will raise an error if supplied as kwargs.

MetaData

ECG metadata.

Type:

dict [str, any]

Waveforms

The lead specific ECG waveforms.

Type:

dict [str, np.array]

MedianBeats

The lead specific ECG median beats.

Type:

dict [str, np.array]

OtherData

Other data.

Type:

dict [str, any]

Returns:

self – Returns the class instance with updated attributes including the extracted XML data.

Return type:

ECGXMLReader instance

process_dicom

A module for extracting metadata, median beats, and raw waveforms from ECG DICOM files.

This module provides an API through a reader class, which maps ECG data from DICOM files to class attributes. These attributes can be programmatically accessed and further processed by downstream ECGprocess modules or external programs leveraging the API.

class ecgprocess.process_dicom.ECGDICOMReader(_as_array: bool = True, augment_leads: bool = False, resample_500: bool = True)[source]

Processes an DICOM file containing ECG data and extracts the metadata, median beats, and raw waveforms.

Parameters:
  • augment_leads (bool, default False) – Whether the augmented leads should be calculated if these are not already available in the source file.

  • resample (bool, default True) – Whether to resample the ECG to a frequency of 500 Hertz.

augment_leads

Whether the augmented leads were calculated if these were unavailable.

Type:

bool

resample

Whether the ECG was resampled to a 500 Hertz frequency.

Type:

bool

extract(config, skip_empty, parse_numeric, \*\*kwargs)[source]

Processes the DICOM file content applying optional lead augmentation and resampling. The DICOM content will be mapped to class attributes.

__call__(path: str, verbose: bool = False, **kwargs: Any | None) Self[source]

Reads an .dcm file containing ECG readings.

Parameters:
  • path (str) – The path to a .dcm file.

  • verbose (bool, default False) – Whether warnings and process info should be printed.

  • **kwargs (any) – keyword arguments passed to flatten_dict.

tags

A list of strings with parsed tags matching the raw_data keys.

Type:

list [str]

raw_data

The raw parsed data.

Type:

dict [str, any]

Returns:

self – Returns the class instance with updated attributes including the extracted DICOM data.

Return type:

ECGDICOMReader instance

__eq__(other)

Return self==value.

__hash__ = None
__init__(_as_array: bool = True, augment_leads: bool = False, resample_500: bool = True) None

Initialises slots entries to None.

__post_init__()[source]

Validating inputs.

__repr__()

Return repr(self).

extract(config: ConfigParser, bits: dtype | None = None, skip_empty: bool = True, parse_numeric: bool = True, pattern: dict[str, str] | None = None, substitute: tuple[str, str] | None = ('_[0-9]{1,2}\\.*', ' '), character_trim: int = 0, **kwargs: Any | None) Self[source]

Processes the raw ECG data and assign these to class attributes performing resampling and lead augmentation if requested.

Parameters:
  • config (ConfigParser) – A class instance of a parsed configuration file, mapping the DICOM content to class attributes. Specifically this should include dictionary attributes MetaData, WaveForms, MedianBeats, OtherData. The MetaData includes some privileged keys including essential information to describe an ECG instance, as well as non-privileged information. The difference between OtherData and MetaData is the way it is processed by other functions or methods with the OtherData processed without strong checks on its content. WaveForms and MedianBeats simply include the lead mappings. Please refer to the constants.CoreData class for the specifics

  • parse_numeric (bool, default True) – Whether to check for numeric data accidentally recorded as string and try to parse these to int or float depending on the presence of a decimal separator.

  • skip_empty (bool, default True) – Whether empty tags should be skipped or throw an error.

  • bits (np.dtype, default None) – np.array bits passed to numpy.array dtype.

  • pattern (dict [str, str], default NoneType) – Use this to extract a subset of items from MetaData based on the pattern key, and adds a unique name as a prefix to the keys of the selected subset. The unique name will be base on the value from the key which matches the pattern value.

  • substitute (tuple [str,`str`] or None, default (r”_[0-9]{1,2}.*”, “ “)) – A tuple containing a regular expression pattern and replacement string. This substitution is applied to the remaining portion of the data key after removing the matching prefix.

  • character_trim (int, default 0) – The number of charecters which should be removed from the right-hand side of the data key which did not match the pattern key.

  • **kwargs – The keyword arguments for reader_tools.get_ecg_data. For the waveforms and medianbeats as_array and bits are hard coded so these will raise an error if supplied as kwargs.

MetaData

ECG metadata.

Type:

dict [str, any]

Waveforms

The lead specific ECG waveforms.

Type:

dict [str, np.array]

MedianBeats

The lead specific ECG median beats.

Type:

dict [str, np.array]

OtherData

Other data.

Type:

dict [str, any]

Returns:

self – Returns the class instance with updated attributes including the extracted DICOM data.

Return type:

ECGDICOMReader instance

class ecgprocess.process_dicom.FixedDICOMReader(augment_leads: bool = False, resample_500: bool = True, retain_raw: bool = False, METADATA: dict = <factory>, WAVE_FORMS: dict = <factory>, MEDIAN_BEATS: dict = <factory>, ECG_TRAIT_DICT: dict = <factory>)[source]

Takes an ECG DICOM file and extracts metadata, median beats (if available) and raw waveforms.

Parameters:
  • augment_leads (bool, default False) – Whether the augmented leads are available in the DICOM, if not these are calculated.

  • resample (bool, default True) – Whether to resample the ECG to a frequency of 500 Hertz.

  • retain_raw (bool, default False) – Whether the raw pydicom instance and raw waveforms should be retained. Set to False to decrease memory usage. Set to True to explore the orignal pydicom instance. For example, use this one a few files to identify none-standard information to extract.

augment_leads

Whether the augmented leads were calculated if these were unavailable.

Type:

bool

resample

Whether the ECG was resampled to a 500 Hertz frequency.

Type:

bool

retain_raw

Whether the raw pydicom data was retained.

Type:

bool

METADATA

A dictionary describing the metadata one wants to extract from a DICOM. The dictionary keys represents the target (new) name and the dictionary values the source (old) names.

Type:

dict [str, str]

ECG_TRAIT_DICT

A dictionary with the keys reflecting the desired name of the ECG trait. Each key will have a list of strings as a value. These strings will be compared to the names in WaveformAnnotationSequence attribute. Matching is done without case-sensitivity. If for any key there are multiple matching strings the algorithm will check if the extracted values are all the same, if not multiple entries will be returned for the user to decide what to do next. The extracted ECG traits will be included with the extracted METADATA.

Type:

dict [str, list[str]]

get_metadata(path, skip_empty)[source]

Extract the dicom metadata.

make_leadvoltages(waveform_array, lead_info, augment_leads)

Extracts the voltages from a DICOM file. Will automatically extract the limb leads if missing.

Notes

While the type of information that is extracted by this class is relatively extensive and can be axpanded through for example METADATA, the pydicom attributes which these data can be extracted from are fixed. This therfore provides a less flexible solution than FixedDICOMReader.

__call__(path: str, skip_empty: bool = True, verbose: bool = False) Self[source]

Read a .dcm DICOM file and extracts metadata, raw waveforms, and median beats.

see constants.DICOMTags for the METADATA, WAVE_FORMS, and MEADIAN_BEATS tags looked for.

Parameters:
  • path (str) – The path to the .dcm file.

  • skip_empty (bool, default True) – Whether empty tags should be skipped or throw an error.

  • verbose (bool, default False) – Prints missing tags if skip_empty is set to True.

GeneralInfo

A list of dcmread extracted attributes.

Type:

list [str]

Waveforms

The lead specific ECG waveforms.

Type:

dict [str, np.array]

MedianWaveforms

The lead specific ECG median beats.

Type:

dict [str, np.array]

Returns:

self – Returns the class instance with updated attributes extracted from dcmread.

Return type:

FixedDICOMReader instance

__eq__(other)

Return self==value.

__hash__ = None
__init__(augment_leads: bool = False, resample_500: bool = True, retain_raw: bool = False, METADATA: dict = <factory>, WAVE_FORMS: dict = <factory>, MEDIAN_BEATS: dict = <factory>, ECG_TRAIT_DICT: dict = <factory>) None

Initialises slots entries to None.

__repr__()

Return repr(self).

__weakref__

list of weak references to the object

get_median_beats(path: str | None = None, dicom_instance: FileDataset | None = None, skip_empty: bool = True) tuple[FileDataset, dict[str, Any], list[str]][source]

Takes a dicom file and extracts the median beats and its metadata.

Parameters:
  • path (str, default NoneType.) – The path to the .dcm file.

  • dicom_instance (DCM_Class, default NoneType.) – A DCM_Class instance.

Returns:

results

  • A DCM_Class instance.

  • A dictionary with extracted metadata.

  • A list of missing DCM_Class attribute names.

Return type:

dict,DCM_Class

Notes

Either supply a path to a dicom file or a DCM_Class instance

get_metadata(path: str | None = None, dicom_instance: FileDataset | None = None, skip_empty: bool = True) tuple[FileDataset, dict[str, Any], list[str]][source]

Takes a dicom file and extracts its metadata

Parameters:
  • path (str, default NoneType.) – The path to the .dcm file.

  • dicom_instance (DCM_Class, default NoneType.) – A DCM_Class instance.

Returns:

  • A DCM_Class instance.

  • A dictionary with extracted metadata.

  • A list of missing DCM_Class attribute names.

Return type:

dict,DCM_Class

Notes

Either supply a path to a dicom file or a DCM_Class instance

get_waveforms(path: str | None = None, dicom_instance: FileDataset | None = None, skip_empty: bool = True) tuple[FileDataset, dict[str, Any], list[str]][source]

Takes a dicom file and extracts the waveforms and waveform metadata.

Parameters:
  • path (str, default NoneType.) – The path to the .dcm file.

  • dicom_instance (DCM_Class, default NoneType.) – A DCM_Class instance.

Returns:

results

  • A DCM_Class instance.

  • A dictionary with extracted metadata.

  • A list of missing DCM_Class attribute names.

Return type:

dict,DCM_Class

Notes

Either supply a path to a dicom file or a DCM_Class instance

tabular

A module to process ECG signal data and metadata to tabular form. This also includes 2D figures - which are strictly speaking tables of pixels.

The module leverages the existing process class instances and based on their class attributes extracts the requested data.

class ecgprocess.tabular.ECGTable(ecgreader: ~ecgprocess.utils.reader_tools.BaseReader, path_list: list[str], extract_meta: bool = True, extract_wave: bool = True, extract_median: bool = True, engineer_meta: ~typing.Callable = <function metadata_identity>, engineer_wave: ~typing.Callable = <function signal_identity>, engineer_median: ~typing.Callable = <function signal_identity>, schema: str | None = None)[source]

Takes an BaseReader instance and loops over a supplied list of files containing ECG data and extract relevant information using a Processing instance and a Configuration instance. The extracted information can be mapped to a Pandas.DataFrame or saved to disk.

Parameters:
  • ecgreader (BaseReader) – An instance of the BaseReader data class.

  • path_list (list [str]) – A list of paths to one or more files containing ECG data.

  • extract_meta (bool, default True) – Whether to extract the metadata data.

  • extract_wave (bool, default True) – Whether to extract the raw waveforms.

  • extract_median (bool, default True) – Whether to extract the median beats.

  • engineer_meta (Callable, default metadata_identity) – A function applied to the internal meta_dict object. Please ensure the function includes parameter: meta_dict.

  • engineer_wave (Callable, default signal_identity) – A function applied to the internal wave_dict object. Please ensure the function includes parameters: wave_dict and **kwargs.

  • engineer_median (Callable, default signal_identity) – A function applied to the internal median_dict object. Please ensure the function includes parameters: median_dict and **kwargs.

  • schema (str or NoneType, default NoneType) – The path to an optional XSD schema to validate XML files. Set to None to ignore.

raw_path_list

A list of file paths.

Type:

list [str]

get_table(unique, \*\*kwargs)[source]

extract ECG data and maps these to class attributes.

write_ecg(chunk, target_tar, target_path, tar_mode, file_type, tab_sep,
tab_compression, tab_append, unique, write_failed, write_chunk_record,
kwargs_reader, kwargs_tab)

writes processed ECG data to a single or multiple files.

Notes

The engineering parameters can be used to supply functions that will be applied separately to dictionaries of each processed file before these dictionaries are combined and written to disk. The functions are applied in the following order to the internal objects: meta_dict (for metadata), wave_dict (for waveform signals), median_dict (for medianbeats signals). Please ensure the waveform and medianbeats functions includes a **kwargs parameter. The kwargs will be used internally to include the metadata. This ensures for example, that the engineer_wave function will have access to the metadata.

__call__(ignore_permission: bool = True, ignore_data: bool = False, ignore_invalid: bool = False, verbose: bool = False) Self[source]

Will take a BaseReader instance and loops over a list of file paths and and confirm the files exist and have appropriate read permission.

Parameters:
  • ignore_permission (bool, default True) – Skips file permission errors. The failed file names will be recorded for review.

  • ignore_data (bool, default False) – Whether files with missing MetaData, WaveForm, or MedianBeat attributes should be skipped. The file names for these failures will be recorded. Note, to limit I/O calls this step will not be conducted during __call__ and instead will be applied when processing data using any method with utilises _loop_table.

  • ignore_invalid (bool, default False) – Whether XML files who failed to an XSD validation schema should be skipped. Note, to limit I/O calls schema validation will not be conducted during __call__ and instead will be applied when processing data using any method with utilises _loop_table.

  • verbose (bool, default False) – Whether to print warnings.

failed_path_list

File paths which are either absent or without read permission.

Type:

list [str]

curated_path_list

File paths which are readable.

Type:

list [str]

Returns:

Returns the class instance with updated attributes.

Return type:

ECGTable instance

__init__(ecgreader: ~ecgprocess.utils.reader_tools.BaseReader, path_list: list[str], extract_meta: bool = True, extract_wave: bool = True, extract_median: bool = True, engineer_meta: ~typing.Callable = <function metadata_identity>, engineer_wave: ~typing.Callable = <function signal_identity>, engineer_median: ~typing.Callable = <function signal_identity>, schema: str | None = None) None[source]

Initialises a new instance of ECGTable.

__repr__()[source]

Return repr(self).

__str__()[source]

Return str(self).

__weakref__

list of weak references to the object

get_table(parsed_config: ConfigParser, unique: bool = True, kwargs_reader_call: dict[Any, Any] | None = None, kwargs_reader_extract: dict[Any, Any] | None = None) Self[source]

Returns ECG data as pandas.DataFrames.

Parameters:
  • parsed_config (ConfigParser) – A parsed configuration file which was mapped using ConfigParser.map.

  • unique (bool, default True) – ensures the UID metadata items are unique between files. Please ensure the UID key-value pair is appropriately set in the config file. If set to false a file-specific integer key will be assigned instead.

  • kwargs_reader_call (dict [any, any] or None, default NoneType) – passed to the kwargs of the BaseReader call method.

  • kwargs_reader_extract (dict [any, any] or None, default NoneType) – passed to the BaseReader extract method.

MetaDataTable

A table of the metadata.

Type:

pandas.DataFrame

WaveFormsTable

A long-formatted table with waveforms signals.

Type:

pandas.DataFrame

MedianBeatsTable

A long-formatted table with the median beat signals.

Type:

pandas.DataFrame

Returns:

Returns the class instance with updated attributes.

Return type:

ECGTable instance

Notes

To keep track of multiple ECG files the function will try to use the privileged variable UID. If this is not found in the supplied MetaData the function will instead assign an interger starting from 1 as a unique key.

write_ecg(parsed_config=<class 'ecgprocess.utils.config_tools.ConfigParser'>, chunk: int | None = None, target_tar: None | str = None, target_path: str = '.', tar_mode: str = 'w:gz', file_type: ~typing.Literal['table', 'numpy', 'tensorflow'] = 'table', tab_sep: str = '\t', tab_compression: str | None = 'gzip', tab_append: bool = True, unique: bool = True, write_failed: bool = True, write_chunk_record: bool = False, kwargs_reader_call: dict[~typing.Any, ~typing.Any] | None = None, kwargs_reader_extract: dict[~typing.Any, ~typing.Any] | None = None, kwargs_tab: None | dict[str, ~typing.Any] = None) Self[source]

Extracts chunks of ECG data, and writes these to a single or multiple target files which can be optionally tar compressed.

Parameters:
  • parsed_config (ConfigParser) – A parsed configuration file which was mapped using ConfigParser.map.

  • chunk (int, default NoneType) – The number of sources files written to a single processed file. For file_type=table set this to 1 with tab_append=True minimise the memory fingerprint. Set to NoneType to combine all the ECG data into a single target file.

  • target_tar (str, default NoneType) – The name of an optional tarfile where the individual files will be written to. The target_tar will be concatenated to target_path. Depending on the mode this directory will be tar.gz compressed for example. Set target_tar to NoneType to simply add the files directly to target_path. Note this will overwrite any potential directory or file with the provide name.

  • target_path (str, default ‘.’) – The full path where the files should be written to. If provided target_tar will be created underneath this path, otherwise the files will be directly written to the target_path terminal directory (assuming this is writable).

  • file_type ({‘table’, ‘numpy’, ‘tensorflow’}, default table) – Whether to write the files to tsv using pandas.DataFrame, to npz using numpy.savez, or to tfrecord using tensorflow.io.TFRecordWriter.

  • tar_mode (str, default w:gz) – The tarfile.open mode.

  • tab_sep (str, default t) – The file separator, which will be passed to pandas.DataFrame.to_csv.

  • tab_compression (str, default gzip) – The file compression passed to pandas.DataFrame.to_csv.

  • tab_append (bool, default True) – Whether individual chunks should be appended to the tsv file.

  • unique (bool, default True) – ensures the UID metadata items are unique between files. Please ensure the UID metadata key value pair is appropriately set in the config file. If set to False an file-specific integer key will be assigned instead.

  • write_failed (bool, default True) – Whether to write a text file to disk containing the failed file names with some information on why these failed.

  • write_chunk_record (bool, default True) – Whether to include a record matching the chunk indicator to the files included in each chunk.

  • kwargs_reader_call (dict [any, any] or None, default NoneType) – passed to the kwargs of the BaseReader call method.

  • kwargs_reader_extract (dict [any, any] or None, default NoneType) – passed to the BaseReader extract method.

  • kwargs_tab (dict [str, any] or None, default None) – Keyword argument for pd.DataFrame.to_csv.

target_path

The directory or tar file path were the files are written to.

Type:

str

Returns:

The class instance with updated attributes.

Return type:

ECGTable instance

Raises:

NotADirectoryError or PermissionError – If the target directory does not exist or is not writable.

Notes

While file_type=’table’ can store any kind of information, numpy and tfrecord are best used to store numerical/float data. Currenlty non-numerical data are therefore drooped from metadata for these filetypes.

For numpy and tfrecord the signal data will be automatically zero-padded to the longest signal. Missing signals will be presented as np.nan. The array columns match the canonical ECG lead order, please refer to ecg_tools.signal_dicts_to_numpy_array for the exact order.

ecgprocess.tabular.metadata_identity(metadata: dict[str, Any], verbose: bool = False, **kwargs) dict[str, Any][source]

A place holder identity function, simply returning the same input data

ecgprocess.tabular.signal_identity(signals: dict[str, ndarray], verbose: bool = False, **kwargs) dict[str, Any][source]

A place holder identity function, simply returning the same input data

utils.general

The general utils module

class ecgprocess.utils.general.ManagedProperty(name: str, types: tuple[type] | type | None = None)[source]

A generic property factory defining setters and getters, with optional type validation.

Parameters:
  • name (str) – The name of the setters and getters

  • types (Type, default NoneType) – Either a single type, or a tuple of types to test against.

enable_setter()[source]

Enables the setter for the property, allowing attribute assignment.

disable_setter()[source]

Disables the setter for the property, making the property read-only.

set_with_setter(instance, value)[source]

Enables the setter, sets the property value, and then disables the setter, ensuring controlled updates.

Returns:

A property object with getter and setter.

Return type:

property

disable_setter()[source]

Disable the setter for the property.

enable_setter()[source]

Enable the setter for the property.

set_with_setter(instance, value)[source]

Enable the setter, set the property value, and then disable the setter.

Parameters:
  • instance (object) – The instance on which the property is being set.

  • value (any) – The value to assign to the property.

ecgprocess.utils.general.assign_empty_default(arguments: list[Any], empty_object: Callable[[], Any]) list[Any][source]

Takes a list of arguments, checks if these are NoneType and if so asigns them ‘empty_object’.

Parameters:
  • arguments (list of arguments) – A list of arguments which may be set to NoneType.

  • empty_object (Callable that returns a mutable object) – Examples include a list or a dict.

Returns:

new_arguments – List with NoneType replaced by empty mutable object.

Return type:

list

Examples

>>> assign_empty_default(['hi', None, 'hello'], empty_object=list)
['hi', [], 'hello']

Notes

This function helps deal with the pitfall of assigning an empty mutable object as a default function argument, which would persist through multiple function calls, leading to unexpected/undesired behaviours.

ecgprocess.utils.general.chunk_list(lst: list[Any], size: int) Generator[list[Any], None, None][source]

Splits a given list into chunks of a specified size.

Parameters:
  • lst (list [any]) – A list of arbitrary length.

  • size (int) – The size of the chunks, should be larger than 0.

Yields:

list – A chunk of the input list of length size. The final chunk may be shorter if there aren’t enough elements left.

Examples

>>> data = list(range(10))
>>> gen = chunk_list(data, 3)
>>> next(gen)
[0, 1, 2]
ecgprocess.utils.general.list_tar(path: str, mode: str = 'r:gz') list[str][source]

Extract the content of a tar file and return this as a list

Parameters:
  • path (str,) – The path to the tar file.

  • mode (str, default r:gz) – The tarfile open mode.

Returns:

A list of filenames.

Return type:

list

ecgprocess.utils.general.parse_number(string: Any, sep: str = ',', dec: str = '.') list[float | int] | Any[source]

Check if a string is a numbers. Maps the string to a list of floats or ints.

Parameters:
  • string (any) – Strings and list with a single will be checked if this represent numbers, other object will be returned as is.

  • sep (str, default ‘,’) – The character used to separate values in a string.

  • dec (str, default ‘.’) – The character used as a decimal point.

Returns:

A list of parsed integers or floats or the original input.

Return type:

list [int | float] or any

Examples

>>> gen_utils.parse_number("1;2;3,5", sep=";", dec=",")
[1, 2, 3.5]
>>> gen_utils.parse_number(["1,2.5,3"])
[1, 2.5, 3]
>>> gen_utils.parse_number(['1,2.5,3', '2'])
['1,2.5,3', '2']
>>> parse_number(123)
123
ecgprocess.utils.general.replace_with_tar(old_dir: str, new_tar: str, mode: str = 'w:gz') None[source]

Moves the old_dir to a tar file, removing the old_dir.

Parameters:
  • old_dir (str) – The path to the old directory.

  • new_tar (str) – The path to the new tar file.

  • mode (str, default w:gz) – The tarfile.open mode.

Notes

The function does not return anything

ecgprocess.utils.general.string_concat(old: str, new: str, sep: str = ', ') list[str][source]

Concatenates two strings, checking if the old string might be NaN.

Parameters:
  • old (str or np.nan,) – The original string.

  • new (str) – A new string.

  • sep (str, default `, `) – The string separator.

Returns:

A concatenated string

Return type:

str

Notes

In general NaN is concidered a float and missing string information is better reflected by NoneType. Nevertheless one does find strings may be setted to NaN which is what this function deals with.

utils.config_tools

class ecgprocess.utils.config_tools.ConfigParser(path: str)[source]

Parses configuration files into structured data and optionally assigns this to a user supplied mapper instance.

Parameters:

path (str) – Path to the configuration file to be parsed.

get_section(section_name: Literal['MetaData', 'WaveForms', 'MedianBeats', 'OtherData']) dict[str, str][source]

Retrieve the parsed data for a specific section of the configuration file.

Parameters:

section_name ({'MetaData', 'WaveForms', 'MedianBeats', 'OtherData'}) – Name of the section to retrieve.

Returns:

A dictionary of key-value pairs for the requested section.

Return type:

dict

map(mapper: DataMap) None[source]

Maps the parsed data to a supplied DataMap instance.

mapper

An instance of the DataMap class, which is expected to have attributes corresponding to the parsed data sections (e.g., headers or types). These attributes will be updated with the data parsed by this class.

Type:

DataMap

Notes

The method does not return anything and simply creates the mapper attribute.

class ecgprocess.utils.config_tools.DataMap(WaveForms: dict = <factory>, MedianBeats: dict = <factory>, MetaData: dict = <factory>, OtherData: dict | None = None)[source]

A class to manage and map metadata, waveforms, and median beats, and optionally, other data attributes.

WaveForms

A config dict mapping the twelve ECG leads.

Type:

dict [str, str]

MedianBeats

A config dict mapping the twelve ECG leads.

Type:

dict [str, str]

MetaData

A dictionary containing metadata attributes and their values.

Type:

dict [str, str]

OtherData

An optional dictionary for additional data mappings.

Type:

dict [str, str], default NoneType

get_attributes(all_)[source]

Returns the names of attributes with non-None values.

keys(attr_name)[source]

Returns the keys of the specified dictionary (‘MetaData’ or ‘OtherData’), or both if no specific dictionary is specified.

items(attr_name)[source]

Returns the key-value pairs of the specified dictionary (‘MetaData’ or ‘OtherData’), or both if no specific dictionary is specified.

Examples

>>> data_map = DataMap()
>>> data_map.get_attributes()
['WaveForms', 'MedianBeats', 'MetaData']

Notes

The class is intended to work in conjunction with a config file processed using ConfigParser.map(DataMap). DataMap ensures any privileged attributes ommited from the config are added and set to None, and prevents the config file to be altered after processing.

property VALIDATE_ATTR: tuple

Exposes _VALIDATE_ATTR as a read-only property.

get_attributes(all_: bool = False) list[str][source]

Returns the names of attributes with non-None values.

all_

Whether to return all keys irrespective of whether their values are equal to None.

Type:

bool, default False

Returns:

A list of attribute names where the value is not None.

Return type:

list [str]

static get_leads()[source]

Initialising the leads names to None

items(attr_name: str | None = None) dict[str, dict[str, str]][source]

Return the key-value pairs forWaveForms, MedianBeats, MetaData, OtherData, or all dictionaries.

Parameters:

attr_name (str, default NoneType) – Specify the type of attribute to limit the result to that dictionary. If None, items for all dictionaries are returned.

Returns:

A dictionary where keys are attribute names and values are the key-value pairs of the dictionary.

Return type:

dict [str, dict [str, str]

keys(attr_name: str | None = None) list[str][source]

Return the keys for WaveForms, MedianBeats, MetaData, OtherData, or all dictionaries.

Parameters:

attr_name (str, default NoneType) – Specify attribute name to limit the result to that dictionary. If None, keys for all dictionaries are returned, without attempting to unique these.

Returns:

A list with dictionary keys.

Return type:

list [str]

class ecgprocess.utils.config_tools.OtherData(_data: dict[str, str] | None = <factory>)[source]

A class to map internal attribute names to DICOM/XML tags and collect data based on optional attributes.

update_values(**kwargs)[source]

Update the values of existing dictionary keys.

to_dict()

Returns the current dictionary.

keys()

Returns the dictionary keys.

update_values(**kwargs: Any | None) None[source]

Update the dictionary key and values.

Parameters:

**kwargs (dict [str, str]) – Key-value pairs mapping DICOM/XML tags (values) to attribute names (keys).

class ecgprocess.utils.config_tools.PrivilegedData(_data: dict[str, str] = <factory>)[source]

The core metadata dictionary, where the key values cannot not be changed because these are expected used by downstream programs. The values can however be user defined.

update_values(**kwargs)

Update the values of existing dictionary keys.

to_dict()

Returns the current dictionary.

keys()

Returns the dictionary keys.

The values are initialised to NoneType these should be set to values relevant for the relevant XML/DICOM tags/attributes.

>>> required = PrivilegedData()
>>> print(*required.keys(), sep='
‘)

unique identifier sampling frequency (original) sampling number (waveforms) sampling number (medianbeats) acquisition date study date channel number units (waveforms) units (medianbeats)

keys() list[str][source]

Returns the required keys.

to_dict() dict[str, str][source]

Returns the required data as a dictionary.

update_values(**kwargs: Any | None) None[source]

Update the values of existing immutable keys.

Parameters:

**kwargs (dict [str, str]) – Key-value pairs where the key is a required data field and the value is the updated DICOM/XML tag.

Raises:

KeyError – If a key in kwargs is not a valid required data field.

utils.reader_tools

Tools to help read and process ECG files.

Functions and classes are predominantly aimed to map ECG files to native python objects such as dictionary and systematically process these. Functions specifically focussed on ECG signals, such as calculating the limb leads, are collected in ecgprocess.utils.ecg_tools.

class ecgprocess.utils.reader_tools.BaseReader(waveforms=None, medianbeats=None, metadata=None, otherdata=None, raw=None)[source]

An ECGDICOMReader base class implementing the more efficient __slots__ for the waveform arrays, while still retaining __dict__ dynamic attribute creation.

ecgprocess.utils.reader_tools.dicom_to_dict(ds: Dataset) dict[str, Any][source]

Turn a pydicom Dataset into a dict with keys derived from the Element names.

Parameters:

ds (pydicom.dataset.Dataset) – The DICOM dataset to convert.

Returns:

A dictionary representation of the dataset.

Return type:

dict [str, any]

ecgprocess.utils.reader_tools.flatten_dict(d: dict[str, Any], parent_prefix: str = '', sep: str = '.', skip_root: bool = True) dict[str, Any][source]

Recursively flatten a nested dictionary, optionally skipping the root element.

Parameters:
  • d (dict) – The dictionary to flatten.

  • parent_prefix (str, default '') – The base string added as a prefix to all keys during recursion. Useful for maintaining context or indicating a higher-level structure.

  • sep (str, default '.') – The key separator

  • skip_root (bool, default True) – If True, skips the first level (root) key.

Returns:

A flattened dictionary where nested keys are concatenated into a single key.

Return type:

dict

Examples

>>> nested_dict = {
...     'a': {
...         'b': 1,
...         'c': {
...             'd': 2
...         }
...     }
... }
>>> flatten_dict(nested_dict)
{'b': 1, 'c.d': 2}
>>> flatten_dict(nested_dict, skip_root=False)
{'a.b': 1, 'a.c.d': 2}
>>> flatten_dict(nested_dict, sep='_', skip_root=False)
{'a_b': 1, 'a_c_d': 2}
ecgprocess.utils.reader_tools.get_ecg_data(data_dict: dict[str, Any], config: dict[str, str], parse_numeric: bool = True, as_array: bool = False, bits: dtype | None = None, skip_empty: bool = True, **kwargs: Any | None) tuple[dict[str, Any], list[str]][source]

Extracts metadata or signal data from a data_dict based on a supplied config dictionary.

Parameters:
  • data_dict (dict [str, any]) – A dictionary with keys and values matching the config object

  • config (dict [str, str]) – a dictionary where the values match some keys in data_dict and the keys represent the names these will be stored to.

  • parse_numeric (bool, default True) – Will check if a numbers are accidentally presented as strings and parse these to numbers.

  • as_array (bool, default False) – Whether data should be mapped to np.array using a direct map: np.array(., dtype=bits).

  • bits (np.dtype, default None) – np.array bits passed to numpy.array dtype.

  • skip_empty (bool, default True) – Whether to skip config values not matching data_dict keys.

  • **kwargs – keyword arguments to parse_number.

Returns:

  • dict – A dictionary with the extracted signal data as numpy.ndarray.

  • list – A list with config values which did not match data_dict keys.

ecgprocess.utils.reader_tools.subset_dict(data: dict[str, Any], pattern: dict[str, str], substitute: tuple[str, str] | None = ('_[0-9]{1,2}\\.*', ' '), character_trim: int = 0, verbose: bool = True, skip_empty: bool = True) dict[str, Any][source]

This will identify a subset of data items based on a startswith call using the pattern key, and based on the patternl values this function will identify the single entry in the subset whose values contains a unique name which will be added as a prefix to the subset keys.

Parameters:
  • data (dict [str, any]) – The dictionary to be subsetted and transformed.

  • pattern (dict [str, str]) – A dictionary where each key is a prefix to match against the keys of data, and each value defines the suffix to search within the matching keys in data. The value found in data corresponding to this suffix is used as the prefix for the resulting dictionary keys. The pattern keys will be matched to the data keys based on a startswith, while the pattern values will be matched to the data keys using: value in key.

  • substitute (tuple [str,`str`] or None, default (r”_[0-9]{1,2}.*”, “ “)) – A tuple containing a regular expression pattern and replacement string. This substitution is applied to the remaining portion of the data key after removing the matching prefix.

  • character_trim (int, default 0) – The number of charecters which should be removed from the right-hand side of the data key which did not match the pattern key.

  • verbose (bool, default True) – Whether warnings should be issued.

Returns:

A dictionary with keys grouped and transformed based on the pattern and substitute. The keys are prefixed with values derived from data.

Return type:

dict [str, any]

Examples

>>> data = {
...     "Sequence_0.Referenced Waveform Channels_0": "Channel 1",
...     "Sequence_0.Referenced Waveform Channels_1": "Channel 2",
...     "Sequence_0.Annotation Group Number": 1,
...     "Sequence_0.Unformatted Text Value": "Event A",
...     "Sequence_8.Measurement Units Code Sequence_0.Code Value": "bpm",
...     "Sequence_8.Measurement Units Code Sequence_0.Code Meaning": "Heart Rate",
...     "Sequence_15.Measurement Units Code Sequence_0.Code Meaning": "Temperature",
...     "Sequence_15.Referenced Waveform Channels_1": "Channel 10",
...     "Sequence_15.Numeric Value": 36.7,
...     "Other Annotation Sequence_1.Some Value": "Other Data",
...     "Other Annotation Sequence_1.Code Meaning": "Other Code Meaning",
... }
>>> pattern = {
...     "Sequence_15": "Code Meaning",
...     "Sequence_8": "Code Meaning",
...     "Sequence_16": "Code Meaning",
...     "Sequence_11": "Code Meaning",
...     "Other Annotation Sequence_1": "Code Meaning",
... }
>>> subset_dict(data, pattern)
{'Temperature (Referenced Waveform Channels)': 'Channel 10',
 'Temperature (Numeric Value)': 36.7,
 'Heart Rate (Measurement Units Code SequenceCode Value)': 'bpm',
 'Other Code Meaning (Some Value)': 'Other Data'}
ecgprocess.utils.reader_tools.validate_xml(xml_path: str | Path, xsd_path: str | Path, strict: bool = True, verbose: bool = True) _ElementTree[source]

Validates an XML file against an XSD schema.

Parameters:
  • xml_path (str or Path) – Path to the XML file.

  • xsd_path (str or Path) – Path to the XSD file.

  • strict (bool, default True) – If False, ignores elements in the XML that are not in the XSD.

Returns:

The parsed XML document.

Return type:

etree._ElementTree

Raises:

XMLValidationError – Raised if the XSD and XML are incompatible.

ecgprocess.utils.reader_tools.xml_to_dict(xml_doc: _ElementTree, encoding: str = 'utf-8') dict[str, Any][source]

Converts an lxml ElementTree document to a dictionary.

Parameters:
  • xml_doc (etree._ElementTree) – A validated XML document.

  • encoding (str, default utf-8)

Returns:

A dictionary representation of the XML data.

Return type:

dict [str, any]

utils.ecg_tools

Collecting established tools for ECG derivation or cleaning.

ecgprocess.utils.ecg_tools.get_limb_leads(signals: dict[str, array], lead_I: str = 'I', lead_II: str = 'II') dict[str, array][source]

Calculate the derived limb leads (III, aVR, aVL, aVF) from leads I and II.

Parameters:
  • signals (dict [str, np.array]) – A dictionary with the lead names as string keys and the signals as a 1D np.array.

  • lead_I (str, default ‘I’) – The key name for lead I in signals

  • lead_II (str, default ‘II’) – The key name for lead II in signals

Returns:

A dictionary including limb lead signals.

Return type:

dict

Notes

please see this url

for the relevant explantion about the relationships between leads I and II

and the limb leads.

ecgprocess.utils.ecg_tools.resampling_500hz(signals: dict[str, array], duration: int | float | None = None, median: bool = False) dict[str, array][source]

Re-sample an ECG signal to 500 hz.

Parameters:
  • signals (dict [str, np.array]) – A dictionary with the lead names as string keys and the signals as a 1D np.array.

  • duration (int or float) – The represents the duration of the ECG in seconds, which is calculated based on the fraction of number of samples by the sampling frequency in seconds. For raw wavefomrs duration determines the number of samples needed to get a 500hz sample: duration times 500.

  • median (bool, default False) – Set to true to resample a median beat ECG to 500hz. The duration of a median beat signal is 1.2 seconds, hence the sampling rate is fixed at: 1.2 times 500 = 600.

ecgprocess.utils.ecg_tools.signal_calibration(signal: ndarray, correctionfactor: float, baseline: float) ndarray[source]

Adjusts the ECG signal by subtracting the channel baseline from the signal, followed by multiplying the adjusted singal by the channel correction factor.

Parameters:
  • signal (np.ndarray) – The lead-specific ECG signal.

  • correctionfactor (float) – The channel correction factor.

  • baseline (float) – The channel baseline.

Returns:

The recalibrated signal.

Return type:

np.ndarray

ecgprocess.utils.ecg_tools.signal_dicts_to_numpy_array(signals: list[dict[str, ndarray]], leads: list[str] | None = ['I', 'II', 'III', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'aVF', 'aVL', 'aVR'], padding: bool = True) ndarray[source]

Convert a list of ECG signal dictionaries to a 3D NumPy array suitable for deep learning.

Parameters:
  • signals (list [dict [str, np.ndarray]]) – List where each dictionary represents an ECG sample with lead names as keys and numpy arrays as values.

  • leads (list [str] or None) – List of lead names to include and their order. If None, all unique leads across samples are used in sorted order.

  • padding (bool, default True) – Whether to pad shorter signals to the length of the longest signal. If False, all signals must have the same length. Default is True.

Returns:

3D NumPy array with shape (num_samples, num_leads, signal_length) containing the ECG data.

Return type:

np.ndarray

Raises:

ValueError – If ecg_data is empty. If any sample is missing leads specified in leads. If padding is False and signals have varying lengths.

Notes

The numpy array column matches the order of the supplied leads.

ecgprocess.utils.ecg_tools.signal_resolution(signal: ndarray, resolution_current: float, resolution_target: float) ndarray[source]

Adjust the amplitude scale of an ECG signal to match a desired resolution.

Parameters:
  • signal (np.ndarray) – The lead-specific ECG signal.

  • resolution_current (float) – The current resolution.

  • resolution_target (float) – The target resolution.

Returns:

The rescaled signal.

Return type:

np.ndarray

Example

>>> import numpy as np
>>> ecg_signal = np.array([10, 20, 30, 40, 50])
>>> current_res = 2.0  # each digital unit equals 2 μV
>>> new_signal = adjust_resolution(
...     ecg_signal,
...     resolution_current=current_res,
...     resolution_target=5
... )
>>> print(new_signal)
[ 25.  50.  75. 100. 125.]