Readers

process_xml

A module for extracting metadata, median beats, and raw waveforms from ECG XML files, allowing for XML validation.

This module provides an API through a reader class, which maps ECG data from XML files to class attributes. These attributes can be programmatically accessed and further processed by downstream ECGprocess modules or external programs leveraging the API.

class ecgprocess.process_xml.ECGXMLReader(_as_array=True, augment_leads=False, resample_500=True)[source]

Bases: BaseReader

Processes an XML file containing ECG data and extracts the metadata, median beats, and raw waveforms.

Parameters:
  • augment_leads (bool, default False) – Whether the augmented leads should be calculated if these are not already available in the source file.

  • resample_500 (bool, default True) – Whether to resample the ECG to a frequency of 500 Hertz. Note this will internally calculate the ECG duration in seconds. For the duration to be in seconds the sampling frequency/rate should be in seconds not milliseconds.

Parameters:
augment_leads

Whether the augmented leads were calculated if these were unavailable.

Type:

bool

resample_500

Whether the ECG was resampled to a 500 Hertz frequency.

Type:

bool

extract(config, skip_empty, parse_numeric, \*\*kwargs)[source]

Processes the XML file content applying optional lead augmentation and resampling. The XML content will be mapped to class attributes.

Parameters:
Return type:

Self

tags

A generic property factory defining setters and getters, with optional type validation.

Properties are read-only by default. Use set_with_setter to write a value; this temporarily unlocks the property on the specific instance using a per-instance lock key stored in the instance’s __dict__, avoiding the shared-state bug that arises when the lock flag is stored on the descriptor object itself (which is shared across all instances).

Parameters:
  • name (str) – The name of the setters and getters

  • types (Type, default NoneType) – Either a single type, or a tuple of types to test against.

set_with_setter(instance, value)

Temporarily unlocks the property on instance, sets the value, then re-locks it.

Returns:

property – A property object with getter and setter.

raw_data
augment_leads: bool = False
resample_500: bool = True
__call__(path, schema=None, verbose=False, **kwargs)[source]

Reads an .xml file containing ECG readings, optionally validates this based on a .xsd schema, and map the XML file to a flat dictionary.

Parameters:
  • path (str) – The path to the .xml file.

  • schema (str, default NoneType) – A path to an XML schema which will be used to validate the XML file against.

  • verbose (bool, default False) – Whether warnings and process info should be printed.

  • **kwargs (any) – keyword arguments passed to flatten_dict.

Parameters:
  • path (str)

  • schema (str | None)

  • verbose (bool)

  • kwargs (Any | None)

Return type:

Self

tags

A list of strings with parsed tags matching the raw_data keys.

Type:

list [str]

raw_data

The raw parsed data.

Type:

dict [str, any]

Returns:

self (ECGXMLReader instance) – Returns the class instance with updated attributes including the extracted XML data.

Raises:

XMLValidationError – If the XML file is not valid based on the supplied schema.

Parameters:
  • path (str)

  • schema (str | None)

  • verbose (bool)

  • kwargs (Any | None)

Return type:

Self

extract(config, bits=None, skip_empty=True, parse_numeric=True, **kwargs)[source]

Processes the raw ECG data and assign these to class attributes performing resampling and lead augmentation if requested.

Parameters:
  • config (ConfigParser) – A class instance of a parsed configuration file, mapping the XML content to class attributes. Specifically this should include dictionary attributes MetaData, WaveForms, MedianBeats, OtherData. The MetaData includes some privileged keys including essential information to describe an ECG instance, as well as non-privileged information. The difference between OtherData and MetaData is the way it is processed by other functions or methods with the OtherData processed without strong checks on its content. WaveForms and MedianBeats simply include the lead mappings. Please refer to the constants.CoreData class for the specifics

  • parse_numeric (bool, default True) – Whether to check for numeric data accidentally recorded as string and try to parse these to int or float depending on the presence of a decimal separator.

  • skip_empty (bool, default True) – Whether empty tags should be skipped or throw an error.

  • bits (np.dtype, default None) – np.array bits passed to numpy.array dtype.

  • **kwargs – The keyword arguments for reader_tools.get_ecg_data. For the waveforms and medianbeats as_array and bits are hard coded so these will raise an error if supplied as kwargs.

Parameters:
Return type:

Self

MetaData

ECG metadata.

Type:

dict [str, any]

Waveforms

The lead specific ECG waveforms.

Type:

dict [str, np.array]

MedianBeats

The lead specific ECG median beats.

Type:

dict [str, np.array]

OtherData

Other data.

Type:

dict [str, any]

Returns:

self (ECGXMLReader instance) – Returns the class instance with updated attributes including the extracted XML data.

Parameters:
Return type:

Self

WaveForms
MedianBeats
MetaData
OtherData
__init__(_as_array=True, augment_leads=False, resample_500=True)

Initialises slots entries to None.

Parameters:
Return type:

None

process_dicom

A module for extracting metadata, median beats, and raw waveforms from ECG DICOM files.

This module provides an API through a reader class, which maps ECG data from DICOM files to class attributes. These attributes can be programmatically accessed and further processed by downstream ECGprocess modules or external programs leveraging the API.

class ecgprocess.process_dicom.ECGDICOMReader(_as_array=True, augment_leads=False, resample_500=True)[source]

Bases: ECGXMLReader

Processes an DICOM file containing ECG data and extracts the metadata, median beats, and raw waveforms.

Parameters:
  • augment_leads (bool, default False) – Whether the augmented leads should be calculated if these are not already available in the source file.

  • resample_500 (bool, default True) – Whether to resample the ECG to a frequency of 500 Hertz.

Parameters:
augment_leads

Whether the augmented leads were calculated if these were unavailable.

Type:

bool

resample_500

Whether the ECG was resampled to a 500 Hertz frequency.

Type:

bool

extract(config, skip_empty, parse_numeric, \*\*kwargs)[source]

Processes the DICOM file content applying optional lead augmentation and resampling. The DICOM content will be mapped to class attributes.

Parameters:
Return type:

Self

tags

A generic property factory defining setters and getters, with optional type validation.

Properties are read-only by default. Use set_with_setter to write a value; this temporarily unlocks the property on the specific instance using a per-instance lock key stored in the instance’s __dict__, avoiding the shared-state bug that arises when the lock flag is stored on the descriptor object itself (which is shared across all instances).

Parameters:
  • name (str) – The name of the setters and getters

  • types (Type, default NoneType) – Either a single type, or a tuple of types to test against.

set_with_setter(instance, value)

Temporarily unlocks the property on instance, sets the value, then re-locks it.

Returns:

property – A property object with getter and setter.

raw_data
augment_leads: bool = False
resample_500: bool = True
__call__(path, verbose=False, **kwargs)[source]

Reads an .dcm file containing ECG readings.

Parameters:
  • path (str) – The path to a .dcm file.

  • verbose (bool, default False) – Whether warnings and process info should be printed.

  • **kwargs (any) – keyword arguments passed to flatten_dict.

Parameters:
Return type:

Self

tags

A list of strings with parsed tags matching the raw_data keys.

Type:

list [str]

raw_data

The raw parsed data.

Type:

dict [str, any]

Returns:

self (ECGDICOMReader instance) – Returns the class instance with updated attributes including the extracted DICOM data.

Parameters:
Return type:

Self

extract(config, bits=None, skip_empty=True, parse_numeric=True, pattern=None, substitute=('_[0-9]{1,2}\\.*', ' '), character_trim=0, **kwargs)[source]

Processes the raw ECG data and assign these to class attributes performing resampling and lead augmentation if requested.

Parameters:
  • config (ConfigParser) – A class instance of a parsed configuration file, mapping the DICOM content to class attributes. Specifically this should include dictionary attributes MetaData, WaveForms, MedianBeats, OtherData. The MetaData includes some privileged keys including essential information to describe an ECG instance, as well as non-privileged information. The difference between OtherData and MetaData is the way it is processed by other functions or methods with the OtherData processed without strong checks on its content. WaveForms and MedianBeats simply include the lead mappings. Please refer to the constants.CoreData class for the specifics

  • parse_numeric (bool, default True) – Whether to check for numeric data accidentally recorded as string and try to parse these to int or float depending on the presence of a decimal separator.

  • skip_empty (bool, default True) – Whether empty tags should be skipped or throw an error.

  • bits (np.dtype, default None) – np.array bits passed to numpy.array dtype.

  • pattern (dict [str, str], default NoneType) – Use this to extract a subset of items from MetaData based on the pattern key, and adds a unique name as a prefix to the keys of the selected subset. The unique name will be base on the value from the key which matches the pattern value.

  • substitute (tuple [str,`str`] or None, default :py:class:(r`”_[0-9]{1,2}.*”, ``" ")`) – A tuple containing a regular expression pattern and replacement string. This substitution is applied to the remaining portion of the data key after removing the matching prefix.

  • character_trim (int, default 0) – The number of characters which should be removed from the right-hand side of the data key which did not match the pattern key.

  • **kwargs – The keyword arguments for reader_tools.get_ecg_data. For the waveforms and medianbeats as_array and bits are hard coded so these will raise an error if supplied as kwargs.

Parameters:
Return type:

Self

MetaData

ECG metadata.

Type:

dict [str, any]

Waveforms

The lead specific ECG waveforms.

Type:

dict [str, np.array]

MedianBeats

The lead specific ECG median beats.

Type:

dict [str, np.array]

OtherData

Other data.

Type:

dict [str, any]

Returns:

self (ECGDICOMReader instance) – Returns the class instance with updated attributes including the extracted DICOM data.

Parameters:
Return type:

Self

WaveForms
MedianBeats
MetaData
OtherData
__init__(_as_array=True, augment_leads=False, resample_500=True)

Initialises slots entries to None.

Parameters:
Return type:

None

Tabular

tabular

A module to process ECG signal data and metadata to tabular form. This also includes 2D figures - which are strictly speaking tables of pixels.

The module leverages the existing process class instances and based on their class attributes extracts the requested data.

ecgprocess.tabular.metadata_identity(metadata, verbose=False, **kwargs)[source]

A place holder identity function, simply returning the same input data

Parameters:
Return type:

dict[str, Any]

ecgprocess.tabular.signal_identity(signals, verbose=False, **kwargs)[source]

A place holder identity function, simply returning the same input data

Parameters:
Return type:

dict[str, Any]

class ecgprocess.tabular.ECGTable(ecgreader, path_list, extract_meta=True, extract_wave=True, extract_median=True, engineer_meta=<function metadata_identity>, engineer_wave=<function signal_identity>, engineer_median=<function signal_identity>, schema=None, signal_length_w=None, signal_length_m=None, pad_value=0.0)[source]

Takes an BaseReader instance and loops over a supplied list of files containing ECG data and extract relevant information using a Processing instance and a Configuration instance. The extracted information can be mapped to a Pandas.DataFrame or saved to disk.

Parameters:
  • ecgreader (BaseReader) – An instance of the BaseReader data class.

  • path_list (list [str]) – A list of paths to one or more files containing ECG data.

  • extract_meta (bool, default True) – Whether to extract the metadata data.

  • extract_wave (bool, default True) – Whether to extract the raw waveforms.

  • extract_median (bool, default True) – Whether to extract the median beats.

  • engineer_meta (Callable, default metadata_identity) – A function applied to the internal meta_dict object. Please ensure the function includes parameter: meta_dict.

  • engineer_wave (Callable, default signal_identity) – A function applied to the internal wave_dict object. Please ensure the function includes parameters: wave_dict and **kwargs.

  • engineer_median (Callable, default signal_identity) – A function applied to the internal median_dict object. Please ensure the function includes parameters: median_dict and **kwargs.

  • schema (str or NoneType, default NoneType) – The path to an optional XSD schema to validate XML files. Set to None to ignore.

  • signal_length_w (int or None, default: None) – Target sample count for the per-record WaveForms signals. When set, each lead is right-padded with pad_value or right-truncated to this length inside write_ecg’s per-record loop, before engineer_wave runs. None disables waveform pad/truncate. The output column 'sampling number padded (waveforms)' records this target value (or None); it does not confirm that padding actually changed the array length. signal length is equal to duration (second) by sampling rate (Hz). For a standard 10 second ECG with 500 Hz the sampling rate is 5,000.

  • signal_length_m (int or None, default: None) – Target sample count for the per-record MedianBeats signals; same semantics as signal_length_w for medians. The output column 'sampling number padded (medianbeats)' records this target value (or None). For a 1.2 second median beat an 500 Hz sampling rate the expected sampling rate would be 600.

  • pad_value (int or float, default: 0.0) – Fill value used when right-padding shorter signals. The date type of the original array is preserved.

Parameters:
raw_path_list

A list of file paths.

Type:

list [str]

get_table(unique, \*\*kwargs)[source]

extract ECG data and maps these to class attributes.

Parameters:
Return type:

Self

write_ecg(chunk, target_tar, target_path, tar_mode, file_type, tab_sep,
tab_compression, tab_append, unique, write_failed, write_chunk_record,
kwargs_reader, kwargs_tab)

writes processed ECG data to a single or multiple files.

Notes

The engineering parameters can be used to supply functions that will be applied separately to dictionaries of each processed file before these dictionaries are combined and written to disk. The functions are applied in the following order to the internal objects: meta_dict (for metadata), wave_dict (for waveform signals), median_dict (for medianbeats signals). Please ensure the waveform and medianbeats functions includes a **kwargs parameter. The kwargs will be used internally to include the metadata. This ensures for example, that the engineer_wave function will have access to the metadata.

__init__(ecgreader, path_list, extract_meta=True, extract_wave=True, extract_median=True, engineer_meta=<function metadata_identity>, engineer_wave=<function signal_identity>, engineer_median=<function signal_identity>, schema=None, signal_length_w=None, signal_length_m=None, pad_value=0.0)[source]

Initialises a new instance of ECGTable.

Parameters:
Return type:

None

__call__(ignore_permission=True, ignore_data=False, ignore_invalid=False, confirm_meta=False, confirm_wave=False, confirm_median=False, verbose=False)[source]

Will take a BaseReader instance and loops over a list of file paths and and confirm the files exist and have appropriate read permission.

Parameters:
  • ignore_permission (bool, default True) – Skips file permission errors. The failed file names will be recorded for review.

  • ignore_data (bool, default False) – Whether files with missing MetaData, WaveForm, or MedianBeat attributes should be skipped. The file names for these failures will be recorded. Note, to limit I/O calls this step will not be conducted during __call__ and instead will be applied when processing data using any method with utilises _loop_table.

  • ignore_invalid (bool, default False) – Whether XML files who failed to an XSD validation schema should be skipped. Note, to limit I/O calls schema validation will not be conducted during __call__ and instead will be applied when processing data using any method with utilises _loop_table.

  • confirm_meta, confirm_wave, confirm_meta (bool, default False) – Whether to skip extraction the entire file when the indicated attribute is empty.

  • verbose (bool, default False) – Whether to print warnings.

Parameters:
  • ignore_permission (bool)

  • ignore_data (bool)

  • ignore_invalid (bool)

  • confirm_meta (bool)

  • confirm_wave (bool)

  • verbose (bool)

Return type:

Self

failed_path_list

File paths which are either absent or without read permission.

Type:

list [str]

curated_path_list

File paths which are readable.

Type:

list [str]

Returns:

ECGTable instance – Returns the class instance with updated attributes.

Parameters:
  • ignore_permission (bool)

  • ignore_data (bool)

  • ignore_invalid (bool)

  • confirm_meta (bool)

  • confirm_wave (bool)

  • verbose (bool)

Return type:

Self

get_table(parsed_config, unique=True, kwargs_reader_call=None, kwargs_reader_extract=None)[source]

Returns ECG data as pandas.DataFrames.

Parameters:
  • parsed_config (ConfigParser) – A parsed configuration file which was mapped using ConfigParser.map.

  • unique (bool, default True) – ensures the UID metadata items are unique between files. Please ensure the UID key-value pair is appropriately set in the config file. If set to false a file-specific integer key will be assigned instead.

  • kwargs_reader_call (dict [any, any] or None, default NoneType) – passed to the kwargs of the BaseReader call method.

  • kwargs_reader_extract (dict [any, any] or None, default NoneType) – passed to the BaseReader extract method.

Parameters:
Return type:

Self

MetaDataTable

A table of the metadata.

Type:

pandas.DataFrame

WaveFormsTable

A long-formatted table with waveforms signals.

Type:

pandas.DataFrame

MedianBeatsTable

A long-formatted table with the median beat signals.

Type:

pandas.DataFrame

Returns:

ECGTable instance – Returns the class instance with updated attributes.

Parameters:
Return type:

Self

Notes

To keep track of multiple ECG files the function will try to use the privileged variable UID. If this is not found in the supplied MetaData the function will instead assign an integer starting from 1 as a unique key.

write_ecg(parsed_config=<class 'ecgprocess.utils.config_tools.ConfigParser'>, chunk=None, target_tar=None, target_path='.', tar_mode='w:gz', file_type='table', tab_sep='\t', tab_compression='gzip', tab_append=True, unique=True, write_failed=True, write_chunk_record=False, kwargs_reader_call=None, kwargs_reader_extract=None, kwargs_tab=None)[source]

Extracts chunks of ECG data, and writes these to a single or multiple target files which can be optionally tar compressed.

Parameters:
  • parsed_config (ConfigParser) – A parsed configuration file which was mapped using ConfigParser.map.

  • chunk (int, default NoneType) – The number of sources files written to a single processed file. For file_type=table set this to 1 with tab_append=True minimise the memory fingerprint. Set to NoneType to combine all the ECG data into a single target file.

  • target_tar (str, default NoneType) – The name of an optional tarfile where the individual files will be written to. The target_tar will be concatenated to target_path. Depending on the mode this directory will be tar.gz compressed for example. Set target_tar to NoneType to simply add the files directly to target_path. Note this will overwrite any potential directory or file with the provide name.

  • target_path (str, default '.') – The full path where the files should be written to. If provided target_tar will be created underneath this path, otherwise the files will be directly written to the target_path terminal directory (assuming this is writable).

  • file_type ({'table', 'numpy', 'tensorflow'}, default table) – Whether to write the files to tsv using pandas.DataFrame, to npz using numpy.savez, or to tfrecord using tensorflow.io.TFRecordWriter.

  • tar_mode (str, default w:gz) – The tarfile.open mode.

  • tab_sep (str, default t) – The file separator, which will be passed to pandas.DataFrame.to_csv.

  • tab_compression (str, default gzip) – The file compression passed to pandas.DataFrame.to_csv.

  • tab_append (bool, default True) – Whether individual chunks should be appended to the tsv file.

  • unique (bool, default True) – ensures the UID metadata items are unique between files. Please ensure the UID metadata key value pair is appropriately set in the config file. If set to False an file-specific integer key will be assigned instead.

  • write_failed (bool, default True) – Whether to write a text file to disk containing the failed file names with some information on why these failed.

  • write_chunk_record (bool, default True) – Whether to include a record matching the chunk indicator to the files included in each chunk.

  • kwargs_reader_call (dict [any, any] or None, default NoneType) – passed to the kwargs of the BaseReader call method.

  • kwargs_reader_extract (dict [any, any] or None, default NoneType) – passed to the BaseReader extract method.

  • kwargs_tab (dict [str, any] or None, default None) – Keyword argument for pd.DataFrame.to_csv.

Parameters:
  • chunk (int | None)

  • target_tar (None | str)

  • target_path (str)

  • tar_mode (str)

  • file_type (Literal['table', 'numpy', 'tensorflow'])

  • tab_sep (str)

  • tab_compression (str | None)

  • tab_append (bool)

  • unique (bool)

  • write_failed (bool)

  • write_chunk_record (bool)

  • kwargs_reader_call (dict[Any, Any] | None)

  • kwargs_reader_extract (dict[Any, Any] | None)

  • kwargs_tab (None | dict[str, Any])

Return type:

Self

target_path

The directory or tar file path were the files are written to.

Type:

str

Returns:

ECGTable instance – The class instance with updated attributes.

Raises:

NotADirectoryError – If the target directory does not exist or is not writable.

Parameters:
  • chunk (int | None)

  • target_tar (None | str)

  • target_path (str)

  • tar_mode (str)

  • file_type (Literal['table', 'numpy', 'tensorflow'])

  • tab_sep (str)

  • tab_compression (str | None)

  • tab_append (bool)

  • unique (bool)

  • write_failed (bool)

  • write_chunk_record (bool)

  • kwargs_reader_call (dict[Any, Any] | None)

  • kwargs_reader_extract (dict[Any, Any] | None)

  • kwargs_tab (None | dict[str, Any])

Return type:

Self

Notes

While file_type=’table’ can store any kind of information, numpy and tfrecord are best used to store numerical/float data. Currently non-numerical data are therefore drooped from metadata for these filetypes.

For numpy and tfrecord the signal data will be automatically zero-padded to the longest signal. Missing signals will be presented as np.nan. The array columns match the canonical ECG lead order, please refer to ecg_tools.signal_dicts_to_numpy_array for the exact order.

Plotting

plot_ecgs

Tools to plot ECG signals.

ECGDrawing takes a called reader instance (ECGXMLReader or ECGDICOMReader) and renders the lead-specific ECG signals on a GridSpec figure with clinical ECG paper scaling.

class ecgprocess.plot_ecgs.ECGStyle(figsize=(20.0, 10.0), dpi=600, background_colour='#ffd6d6', major_grid_colour='#ff6666', minor_grid_colour='#ffaaaa', major_grid_linewidth=0.3, minor_grid_linewidth=0.2, trace_colour='black', trace_linewidth=0.4, paper_speed=25.0, mm_per_mv=10.0, clip_on_trace=False, hspace=0.0, wspace=0.0, x_lim=None, y_lim=(-2.0, 2.0), label_coordinates=(0.02, 0.95))[source]

Style configuration for ECGDrawing.

Parameters:
figsize

Figure width and height in inches.

Type:

tuple [float, float], default (20.0, 10.0)

dpi

Figure resolution in dots per inch.

Type:

int, default 600

background_colour

Axes background fill colour (ECG paper pink).

Type:

str, default '#ffd6d6'

major_grid_colour

Colour of the major grid lines.

Type:

str, default '#ff6666'

minor_grid_colour

Colour of the minor grid lines.

Type:

str, default '#ffaaaa'

major_grid_linewidth

Line width of the major grid lines in points.

Type:

float, default 0.3

minor_grid_linewidth

Line width of the minor grid lines in points.

Type:

float, default 0.2

trace_colour

ECG trace line colour.

Type:

str, default 'black'

trace_linewidth

ECG trace line width in points.

Type:

float, default 0.4

paper_speed

Paper speed in mm/s (standard clinical: 25 mm/s).

Type:

float, default 25.0

mm_per_mv

Voltage scaling in mm/mV (standard clinical: 10 mm/mV).

Type:

float, default 10.0

clip_on_trace

Whether each ECG trace line is clipped to its axes boundary. False replicates the physical ECG paper look.

Type:

bool, default False

hspace

Vertical spacing between GridSpec rows. Zero mimics continuous ECG paper; increase to visually separate panels.

Type:

float, default 0.0

wspace

Horizontal spacing between GridSpec columns. Zero mimics continuous ECG paper; increase to visually separate panels.

Type:

float, default 0.0

x_lim

X-axis limits in seconds. None auto-computes from n_samples / fs.

Type:

tuple [float, float] | None, default None

y_lim

Y-axis limits in applied to every lead panel. The default would be fine for mV, for μV simply multiply by 1,000.

Type:

tuple [float, float], default (-2.0, 2.0)

label_coordinates

Lead label position as (x, y) in axes coordinates (0–1).

Type:

tuple [float, float], default (0.02, 0.95)

Notes

Paper speed and mm_per_mv drive the minor/major grid step sizes using the standard clinical ECG paper mapping:

  • x minor = 1 / paper_speed (s) e.g. 0.04 s at 25 mm/s

  • x major = 5 / paper_speed (s) e.g. 0.20 s at 25 mm/s

  • y minor = 1 / mm_per_mv (mV) e.g. 0.1 mV at 10 mm/mV

  • y major = 5 / mm_per_mv (mV) e.g. 0.5 mV at 10 mm/mV

Settings not covered here can be applied after rendering via drawing.figure / drawing.axes (standard matplotlib), or passed as **kwargs to ECGDrawing.__call__ which forwards them to plt.figure().

figsize: tuple[float, float]

Alias for field number 0

dpi: int

Alias for field number 1

background_colour: str

Alias for field number 2

major_grid_colour: str

Alias for field number 3

minor_grid_colour: str

Alias for field number 4

major_grid_linewidth: float

Alias for field number 5

minor_grid_linewidth: float

Alias for field number 6

trace_colour: str

Alias for field number 7

trace_linewidth: float

Alias for field number 8

paper_speed: float

Alias for field number 9

mm_per_mv: float

Alias for field number 10

clip_on_trace: bool

Alias for field number 11

hspace: float

Alias for field number 12

wspace: float

Alias for field number 13

x_lim: tuple[float, float] | None

Alias for field number 14

y_lim: tuple[float, float]

Alias for field number 15

label_coordinates: tuple[float, float]

Alias for field number 16

class ecgprocess.plot_ecgs.ECGDrawing(leads=None)[source]

Takes a called reader instance and plots lead-specific ECG signals.

After calling, the instance exposes the rendered figure and per-lead axes so the caller can apply further matplotlib customisation directly.

Parameters:

leads (list [str] | None, default None) – Lead names to plot by default. Uses the standard 12-lead list when None.

Parameters:

leads (list[str] | None)

figure

The matplotlib figure.

Type:

plt.Figure

axes

Dict mapping lead name to its Axes.

Type:

dict[str, plt.Axes]

layout_mapper

The resolved mapper dict used for this render.

Type:

dict[str, tuple[int, slice]]

style

The ECGStyle instance used.

Type:

ECGStyle

minor_step_x

Minor x-axis step in seconds (1 / paper_speed).

Type:

float

major_step_x

Major x-axis step in seconds (5 / paper_speed).

Type:

float

minor_step_y

Minor y-axis step in mV (1 / mm_per_mv).

Type:

float

major_step_y

Major y-axis step in mV (5 / mm_per_mv).

Type:

float

paper_speed

Resolved paper speed in mm/s.

Type:

float

mm_per_mv

Resolved voltage scaling in mm/mV.

Type:

float

default_leads

Lead names plotted by default. Set at init; overridable per call.

Type:

list[str]

tile_x_axis(xlims)[source]

Tiles each column’s x-axis to an equal slice of the full signal, mimicking continuous ECG paper. Per-lead overrides via xlims.

Parameters:

xlims (dict[str, tuple[float, float]] | None)

Return type:

Self

to_numpy(crop, close)[source]

Renders the figure to a 3-D numpy array (RGBA).

Parameters:
Return type:

ndarray

SOURCE_MAP: dict[str, str] = {'median': 'MedianBeats', 'waveforms': 'WaveForms'}
__init__(leads=None)[source]

Initialises a new instance of ECGDrawing.

Parameters:

leads (list[str] | None)

Return type:

None

__call__(reader, mapper, source='waveforms', style=None, show_axes=None, sampling_rate=None, show_lead_labels=False, cal_pulse=None, cal_leads=None, kwargs_fig=None, kwargs_gridspec=None, kwargs_plot=None, kwargs_grid=None, kwargs_ticks=None, kwargs_background=None, kwargs_label=None)[source]

Creates an ECG figure from a called reader instance.

Parameters:
  • reader (Any) – A called ECGXMLReader or ECGDICOMReader instance.

  • mapper (dict [str, tuple [int, slice]]) – Layout dict mapping lead names to (row_index, col_slice). Dict insertion order determines plotting order; grid dimensions are derived from the mapper values. Mismatch warnings (missing or unrequested leads) are emitted at WARNING level via the ecgprocess.plot_ecgs logger.

  • source ({'waveforms', 'median'}, default 'waveforms') – Whether to plot WaveForms or MedianBeats signals. Can be modified to use a non-standard source name by updating self.SOURCE_MAP.

  • style (ECGStyle | None, optional) – Style configuration. Uses defaults when None.

  • show_axes (dict [str, str | None] | str | None, optional) – Which axis ticks and labels to show per lead. A string value applies uniformly: 'x' x-axis only, 'y' y-axis only, 'b' both, None hides all. A dict maps individual lead names to these same string values, leaving unlisted leads hidden.

  • sampling_rate (float | None, optional) – Sampling frequency in Hz. When None, read from reader.MetaData[CoreData.MetaData.SF]. Raises InputValidationError if neither source is available.

  • show_lead_labels (bool, default False) – When True, each panel is annotated with its lead name in the top-left corner using the default label style.

  • cal_pulse ({'square', 'line'} | None, default None) – Calibration pulse style to prepend to each lead signal before rendering. None disables the pulse. The pulse height follows the clinical 10 mm paper convention and is computed as 10.0 / style.mm_per_mv in the signal’s own units, so it always spans 10 mm of plotted paper regardless of unit choice (mV, μV, …). With the default mm_per_mv=10 this gives the standard 1 mV pulse for mV signals; for μV signals using e.g. mm_per_mv=10/1000 it gives 1000 signal units (a 1 mV-equivalent 10 mm pulse). 'square' draws the full rectangular pulse: 0.1 s baseline, then 0.2 s at the computed amplitude, then 0.1 s baseline. 'line' draws only a single vertical spike: 0.1 s baseline, then a 3-sample rise/peak/fall, then 0.1 s baseline, matching the convention used by some thermal-paper ECG machines. The caller’s data is not mutated.

  • cal_leads (list [str] | None, default None) – Restricts which leads receive the calibration pulse. When None, every lead in the signal data receives the pulse (provided cal_pulse is not None). Lead names not present in the signal data are ignored. Has no effect when cal_pulse is None.

  • kwargs_fig (dict | None, optional) – Extra keyword arguments forwarded to plt.figure(). Style fields (figsize, dpi) are used as defaults and are overridden by any matching keys here.

  • kwargs_gridspec (dict | None, optional) – Extra keyword arguments forwarded to GridSpec(). Style fields (hspace, wspace) are used as defaults and are overridden by any matching keys here.

  • kwargs_plot (dict | dict [str, dict] | None, optional) – Extra keyword arguments forwarded to ax.plot(). Pass a flat dict to apply uniformly, or a dict of dicts for per-lead control.

  • kwargs_grid (dict | dict [str, dict] | None, optional) – Extra keyword arguments forwarded to ax.grid(). Same uniform/per-lead convention as kwargs_plot.

  • kwargs_ticks (dict | dict [str, dict] | None, optional) – Extra keyword arguments forwarded to ax.tick_params(). Same uniform/per-lead convention as kwargs_plot.

  • kwargs_background (dict | dict [str, dict] | None, optional) – Extra keyword arguments forwarded to ax.patch.set(). Same uniform/per-lead convention as kwargs_plot.

  • kwargs_label (dict | dict [str, dict] | None, optional) – Extra keyword arguments forwarded to ax.text() for lead labels. Only used when show_lead_labels=True. Same uniform/per-lead convention as kwargs_plot. Caller-supplied values override the defaults (fontsize=7, fontweight=’bold’, va=’top’, ha=’left’).

Returns:

self (ECGDrawing) – Returns the instance with populated figure, axes, and calibration attributes.

Parameters:
Return type:

Self

tile_x_axis(xlims=None)[source]

Tile the x-axis so each column shows an equal slice of the total signal duration, mimicking continuous ECG paper.

The column count is derived from the mapper used at render time. Each column’s x window runs from x_min + col * segment to x_min + (col + 1) * segment, where segment = (x_max - x_min) / n_cols. Multi-column spans (e.g. a full-width rhythm strip) receive the proportionally wider window. Each row resets to zero independently.

Signal data outside a column’s x window is removed from the line artists rather than clipped. This prevents traces from bleeding horizontally into adjacent panels while leaving clip_on_trace=False intact so tall deflections can still cross row boundaries.

Parameters:

xlims (dict [str, tuple [float, float]] | None, optional) – Per-lead x-axis overrides. Keys are lead names; values are (x_min, x_max) in seconds. Leads not listed fall back to the auto-tiled column window.

Returns:

self (ECGDrawing) – Returns the instance for method chaining.

Raises:
  • NotCalledError – If __call__ has not been invoked yet.

  • InputValidationError – If xlims is not a dict or contains non-tuple values.

Parameters:

xlims (dict[str, tuple[float, float]] | None)

Return type:

Self

to_numpy(crop=False, close=True)[source]

Maps the matplotlib figure to a numpy array.

Parameters:
  • crop (bool, default False) – When True, applies tight layout before conversion, removing excess whitespace around the plot area.

  • close (bool, default True) – Whether to close the figure after extraction.

Returns:

array (np.ndarray) – 3-dimensional array of shape (height, width, 4) in RGBA format.

Raises:

NotCalledError – If __call__ has not been invoked yet.

Parameters:
Return type:

ndarray

figure
axes
layout_mapper
style
minor_step_x
major_step_x
minor_step_y
major_step_y
paper_speed
mm_per_mv

Utilities

utils.general

The general utils module

ecgprocess.utils.general.replace_with_tar(old_dir, new_tar, mode='w:gz')[source]

Moves the old_dir to a tar file, removing the old_dir.

Parameters:
  • old_dir (str) – The path to the old directory.

  • new_tar (str) – The path to the new tar file.

  • mode (str, default w:gz) – The tarfile.open mode.

Parameters:
Return type:

None

Notes

The function does not return anything

ecgprocess.utils.general.list_tar(path, mode='r:gz')[source]

Extract the content of a tar file and return this as a list

Parameters:
  • path (str,) – The path to the tar file.

  • mode (str, default r:gz) – The tarfile open mode.

Returns:

list – A list of filenames.

Parameters:
Return type:

list[str]

ecgprocess.utils.general.assign_empty_default(arguments, empty_object)[source]

Takes a list of arguments, checks if these are NoneType and if so assigns them ‘empty_object’.

Parameters:
  • arguments (list of arguments) – A list of arguments which may be set to NoneType.

  • empty_object (Callable that returns a mutable object) – Examples include a list or a dict.

Returns:

new_arguments (list) – List with NoneType replaced by empty mutable object.

Parameters:
Return type:

list[Any]

Examples

>>> assign_empty_default(['hi', None, 'hello'], empty_object=list)
['hi', [], 'hello']

Notes

This function helps deal with the pitfall of assigning an empty mutable object as a default function argument, which would persist through multiple function calls, leading to unexpected/undesired behaviours.

class ecgprocess.utils.general.ManagedProperty(name, types=None)[source]

A generic property factory defining setters and getters, with optional type validation.

Properties are read-only by default. Use set_with_setter to write a value; this temporarily unlocks the property on the specific instance using a per-instance lock key stored in the instance’s __dict__, avoiding the shared-state bug that arises when the lock flag is stored on the descriptor object itself (which is shared across all instances).

Parameters:
  • name (str) – The name of the setters and getters

  • types (Type, default NoneType) – Either a single type, or a tuple of types to test against.

Parameters:
set_with_setter(instance, value)[source]

Temporarily unlocks the property on instance, sets the value, then re-locks it.

Returns:

property – A property object with getter and setter.

Parameters:
set_with_setter(instance, value)[source]

Unlock the property on instance, set the value, then re-lock it.

Parameters:
  • instance (object) – The instance on which the property is being set.

  • value (any) – The value to assign to the property.

ecgprocess.utils.general.parse_number(string, sep=',', dec='.')[source]

Check if a string is a numbers. Maps the string to a list of floats or ints.

Parameters:
  • string (any) – Strings and list with a single will be checked if this represent numbers, other object will be returned as is.

  • sep (str, default ',') – The character used to separate values in a string.

  • dec (str, default '.') – The character used as a decimal point.

Returns:

list [int | float] or any – A list of parsed integers or floats or the original input.

Parameters:
Return type:

list[float | int] | Any

Examples

>>> gen_utils.parse_number("1;2;3,5", sep=";", dec=",")
[1, 2, 3.5]
>>> gen_utils.parse_number(["1,2.5,3"])
[1, 2.5, 3]
>>> gen_utils.parse_number(['1,2.5,3', '2'])
['1,2.5,3', '2']
>>> parse_number(123)
123
ecgprocess.utils.general.string_concat(old, new, sep=', ')[source]

Concatenates two strings, checking if the old string might be NaN.

Parameters:
  • old (str or np.nan,) – The original string.

  • new (str) – A new string.

  • sep (str, default :py:class:`, :py:class:`) – The string separator.

Returns:

str – A concatenated string

Parameters:
Return type:

list[str]

Notes

In general NaN is considered a float and missing string information is better reflected by NoneType. Nevertheless one does find strings which are set to NaN which is what this function deals with.

ecgprocess.utils.general.chunk_list(lst, size)[source]

Splits a given list into chunks of a specified size.

Parameters:
  • lst (list [any]) – A list of arbitrary length.

  • size (int) – The size of the chunks, should be larger than 0.

Yields:

list – A chunk of the input list of length size. The final chunk may be shorter if there aren’t enough elements left.

Parameters:
Return type:

Generator[list[Any], None, None]

Examples

>>> data = list(range(10))
>>> gen = chunk_list(data, 3)
>>> next(gen)
[0, 1, 2]
ecgprocess.utils.general.update_dict_with_warning(d1, d2, verbose=True)[source]

Dictionary update while raising a warning for key duplication.

Parameters:
  • d1 (dict) – The dictionary with old data.

  • d2 (dict) – The dictionary with new data.

  • verbose (bool, default True) – Whether to print a warning when there are duplicated keys with distinct values.

Parameters:
Return type:

dict

Notes

The value from d2 overwrites the value from d1 in the final result. The warning will only be raised when the values are distinct.

utils.config_tools

class ecgprocess.utils.config_tools.PrivilegedData(_data=<factory>)[source]

The core metadata dictionary, where the key values cannot not be changed because these are expected used by downstream programs. The values can however be user defined.

update_values(**kwargs)

Update the values of existing dictionary keys.

to_dict()

Returns the current dictionary.

keys()

Returns the dictionary keys.

The values are initialised to NoneType these should be set to values relevant for the relevant XML/DICOM tags/attributes.

>>> required = PrivilegedData()
>>> print(*required.keys(), sep='
‘)

unique identifier sampling frequency (original) sampling number (waveforms) sampling number (medianbeats) acquisition date study date channel number units (waveforms) units (medianbeats)

Parameters:

_data (dict[str, str])

update_values(**kwargs)[source]

Update the values of existing immutable keys.

Parameters:

**kwargs (dict [str, str]) – Key-value pairs where the key is a required data field and the value is the updated DICOM/XML tag.

Raises:

KeyError – If a key in kwargs is not a valid required data field.

Parameters:

kwargs (Any | None)

Return type:

None

keys()[source]

Returns the required keys.

Return type:

list[str]

to_dict()[source]

Returns the required data as a dictionary.

Return type:

dict[str, str]

class ecgprocess.utils.config_tools.OtherData(_data=<factory>)[source]

A class to map internal attribute names to DICOM/XML tags and collect data based on optional attributes.

Parameters:

_data (dict[str, str] | None)

update_values(**kwargs)[source]

Update the values of existing dictionary keys.

Parameters:

kwargs (Any | None)

Return type:

None

to_dict()

Returns the current dictionary.

keys()

Returns the dictionary keys.

update_values(**kwargs)[source]

Update the dictionary key and values.

Parameters:

**kwargs (dict [str, str]) – Key-value pairs mapping DICOM/XML tags (values) to attribute names (keys).

Parameters:

kwargs (Any | None)

Return type:

None

class ecgprocess.utils.config_tools.DataMap(WaveForms=<factory>, MedianBeats=<factory>, MetaData=<factory>, OtherData=None)[source]

A class to manage and map metadata, waveforms, and median beats, and optionally, other data attributes.

Parameters:
WaveForms

A config dict mapping the twelve ECG leads.

Type:

dict [str, str]

MedianBeats

A config dict mapping the twelve ECG leads.

Type:

dict [str, str]

MetaData

A dictionary containing metadata attributes and their values.

Type:

dict [str, str]

OtherData

An optional dictionary for additional data mappings.

Type:

dict [str, str], default NoneType

get_attributes(all_)[source]

Returns the names of attributes with non-None values.

Parameters:

all_ (bool)

Return type:

list[str]

keys(attr_name)[source]

Returns the keys of the specified dictionary (‘MetaData’ or ‘OtherData’), or both if no specific dictionary is specified.

Parameters:

attr_name (str | None)

Return type:

list[str]

items(attr_name)[source]

Returns the key-value pairs of the specified dictionary (‘MetaData’ or ‘OtherData’), or both if no specific dictionary is specified.

Parameters:

attr_name (str | None)

Return type:

dict[str, dict[str, str]]

Examples

>>> data_map = DataMap()
>>> data_map.get_attributes()
['WaveForms', 'MedianBeats', 'MetaData']

Notes

The class is intended to work in conjunction with a config file processed using ConfigParser.map(DataMap). DataMap ensures any privileged attributes omitted from the config are added and set to None, and prevents the config file to be altered after processing.

WaveForms: dict
MedianBeats: dict
MetaData: dict
OtherData: dict | None = None
static get_leads()[source]

Initialising the leads names to None

property VALIDATE_ATTR: tuple

Exposes _VALIDATE_ATTR as a read-only property.

get_attributes(all_=False)[source]

Returns the names of attributes with non-None values.

all_

Whether to return all keys irrespective of whether their values are equal to None.

Type:

bool, default False

Returns:

list [`str]` – A list of attribute names where the value is not None.

Parameters:

all_ (bool)

Return type:

list[str]

keys(attr_name=None)[source]

Return the keys for WaveForms, MedianBeats, MetaData, OtherData, or all dictionaries.

Parameters:

attr_name (str, default NoneType) – Specify attribute name to limit the result to that dictionary. If None, keys for all dictionaries are returned, without attempting to unique these.

Returns:

list [str] – A list with dictionary keys.

Parameters:

attr_name (str | None)

Return type:

list[str]

items(attr_name=None)[source]

Return the key-value pairs forWaveForms, MedianBeats, MetaData, OtherData, or all dictionaries.

Parameters:

attr_name (str, default NoneType) – Specify the type of attribute to limit the result to that dictionary. If None, items for all dictionaries are returned.

Returns:

dict [str, dict [str, str] – A dictionary where keys are attribute names and values are the key-value pairs of the dictionary.

Parameters:

attr_name (str | None)

Return type:

dict[str, dict[str, str]]

class ecgprocess.utils.config_tools.ConfigParser(path)[source]

Parses configuration files into structured data and optionally assigns this to a user supplied mapper instance.

Parameters:

path (str) – Path to the configuration file to be parsed.

Parameters:

path (str)

Notes

Required call sequence:

parser = ConfigParser(path)()   # parse the file
parser.map(mapper)              # map sections to a DataMap
parser.get_section(name)        # retrieve a section

Calling get_section() before map() raises NotCalledError.

path

A generic property factory defining setters and getters, with optional type validation.

Properties are read-only by default. Use set_with_setter to write a value; this temporarily unlocks the property on the specific instance using a per-instance lock key stored in the instance’s __dict__, avoiding the shared-state bug that arises when the lock flag is stored on the descriptor object itself (which is shared across all instances).

Parameters:
  • name (str) – The name of the setters and getters

  • types (Type, default NoneType) – Either a single type, or a tuple of types to test against.

set_with_setter(instance, value)

Temporarily unlocks the property on instance, sets the value, then re-locks it.

Returns:

property – A property object with getter and setter.

map(mapper)[source]

Maps the parsed data to a supplied DataMap instance.

Parameters:

mapper (DataMap)

Return type:

None

mapper

An instance of the DataMap class, which is expected to have attributes corresponding to the parsed data sections (e.g., headers or types). These attributes will be updated with the data parsed by this class.

Type:

DataMap

Notes

The method does not return anything and simply creates the mapper attribute.

get_section(section_name)[source]

Retrieve the parsed data for a specific section of the configuration file.

Parameters:

section_name ({'MetaData', 'WaveForms', 'MedianBeats', 'OtherData'}) – Name of the section to retrieve.

Returns:

section (dict [str, str]) – A dictionary of key-value pairs for the requested section.

Raises:
  • NotCalledError – If map() has not been called before this method.

  • ValueError – If section_name is not one of the allowed section names.

  • KeyError – If the requested section has no data.

Parameters:

section_name (Literal['MetaData', 'WaveForms', 'MedianBeats', 'OtherData'])

Return type:

dict[str, str]

utils.reader_tools

Tools to help read and process ECG files.

Functions and classes are predominantly aimed to map ECG files to native python objects such as dictionary and systematically process these. Functions specifically focussed on ECG signals, such as calculating the limb leads, are collected in ecgprocess.utils.ecg_tools.

class ecgprocess.utils.reader_tools.BaseReader(waveforms=None, medianbeats=None, metadata=None, otherdata=None, raw=None)[source]

Bases: object

An ECGDICOMReader base class implementing the more efficient __slots__ for the waveform arrays, while still retaining __dict__ dynamic attribute creation.

WaveForms
MedianBeats
MetaData
OtherData
raw_data
ecgprocess.utils.reader_tools.validate_xml(xml_path, xsd_path, strict=True, verbose=True)[source]

Validates an XML file against an XSD schema.

Parameters:
  • xml_path (str or Path) – Path to the XML file.

  • xsd_path (str or Path) – Path to the XSD file.

  • strict (bool, default True) – If False, ignores elements in the XML that are not in the XSD.

Returns:

etree._ElementTree – The parsed XML document.

Raises:

XMLValidationError – Raised if the XSD and XML are incompatible.

Parameters:
Return type:

_ElementTree

ecgprocess.utils.reader_tools.xml_to_dict(xml_doc, encoding='utf-8')[source]

Converts an lxml ElementTree document to a dictionary.

Parameters:
  • xml_doc (etree._ElementTree) – A validated XML document.

  • encoding (str, default utf-8)

Returns:

dict [`str`, any] – A dictionary representation of the XML data.

Parameters:
  • xml_doc (_ElementTree)

  • encoding (str)

Return type:

dict[str, Any]

ecgprocess.utils.reader_tools.dicom_to_dict(ds)[source]

Turn a pydicom Dataset into a dict with keys derived from the Element names.

Parameters:

ds (pydicom.dataset.Dataset) – The DICOM dataset to convert.

Returns:

dict [`str`, any] – A dictionary representation of the dataset.

Parameters:

ds (pydicom.dataset.Dataset)

Return type:

dict[str, Any]

ecgprocess.utils.reader_tools.flatten_dict(d, parent_prefix='', sep='.', skip_root=True)[source]

Recursively flatten a nested dictionary, optionally skipping the root element.

Parameters:
  • d (dict) – The dictionary to flatten.

  • parent_prefix (str, default '') – The base string added as a prefix to all keys during recursion. Useful for maintaining context or indicating a higher-level structure.

  • sep (str, default '.') – The key separator

  • skip_root (bool, default True) – If True, skips the first level (root) key.

Returns:

dict – A flattened dictionary where nested keys are concatenated into a single key.

Parameters:
Return type:

dict[str, Any]

Examples

>>> nested_dict = {
...     'a': {
...         'b': 1,
...         'c': {
...             'd': 2
...         }
...     }
... }
>>> flatten_dict(nested_dict)
{'b': 1, 'c.d': 2}
>>> flatten_dict(nested_dict, skip_root=False)
{'a.b': 1, 'a.c.d': 2}
>>> flatten_dict(nested_dict, sep='_', skip_root=False)
{'a_b': 1, 'a_c_d': 2}
ecgprocess.utils.reader_tools.get_ecg_data(data_dict, config, parse_numeric=True, as_array=False, bits=None, skip_empty=True, **kwargs)[source]

Extracts metadata or signal data from a data_dict based on a supplied config dictionary.

Parameters:
  • data_dict (dict [str, any]) – A dictionary with keys and values matching the config object

  • config (dict [str, str]) – a dictionary where the values match some keys in data_dict and the keys represent the names these will be stored to.

  • parse_numeric (bool, default True) – Will check if a numbers are accidentally presented as strings and parse these to numbers.

  • as_array (bool, default False) – Whether data should be mapped to np.array using a direct map: np.array(., dtype=bits).

  • bits (np.dtype, default None) – np.array bits passed to numpy.array dtype.

  • skip_empty (bool, default True) – Whether to skip config values not matching data_dict keys.

  • **kwargs – keyword arguments to parse_number.

Returns:

  • dict – A dictionary with the extracted signal data or metadata.

  • list – A list with config values which did not match data_dict keys.

Parameters:
Return type:

tuple[dict[str, Any], list[str]]

Notes

Whenever a config value starts with “[STARTSWITH]”, the algorithm gathers all relevant values in data_dict that share the specific prefix remaining after stripping the quoted text. These values are concatenated into a single string, where the original breaks between the values is indicated by [DELIM].

ecgprocess.utils.reader_tools.subset_dict(data, pattern, substitute=('_[0-9]{1,2}\\.*', ' '), character_trim=0, verbose=True, skip_empty=True)[source]

This will identify a subset of data items based on a startswith call using the pattern key, and based on the patternl values this function will identify the single entry in the subset whose values contains a unique name which will be added as a prefix to the subset keys.

Parameters:
  • data (dict [str, any]) – The dictionary to be subsetted and transformed.

  • pattern (dict [str, str]) – A dictionary where each key is a prefix to match against the keys of data, and each value defines the suffix to search within the matching keys in data. The value found in data corresponding to this suffix is used as the prefix for the resulting dictionary keys. The pattern keys will be matched to the data keys based on a startswith, while the pattern values will be matched to the data keys using: value in key.

  • substitute (tuple [str,`str`] or None, default :py:class:(r`”_[0-9]{1,2}.*”, ``" ")`) – A tuple containing a regular expression pattern and replacement string. This substitution is applied to the remaining portion of the data key after removing the matching prefix.

  • character_trim (int, default 0) – The number of characters which should be removed from the right-hand side of the data key which did not match the pattern key.

  • verbose (bool, default True) – Whether warnings should be issued.

Returns:

dict [`str`, any] – A dictionary with keys grouped and transformed based on the pattern and substitute. The keys are prefixed with values derived from data.

Parameters:
Return type:

dict[str, Any]

Examples

>>> data = {
...     "Sequence_0.Referenced Waveform Channels_0": "Channel 1",
...     "Sequence_0.Referenced Waveform Channels_1": "Channel 2",
...     "Sequence_0.Annotation Group Number": 1,
...     "Sequence_0.Unformatted Text Value": "Event A",
...     "Sequence_8.Measurement Units Code Sequence_0.Code Value": "bpm",
...     "Sequence_8.Measurement Units Code Sequence_0.Code Meaning": "Heart Rate",
...     "Sequence_15.Measurement Units Code Sequence_0.Code Meaning": "Temperature",
...     "Sequence_15.Referenced Waveform Channels_1": "Channel 10",
...     "Sequence_15.Numeric Value": 36.7,
...     "Other Annotation Sequence_1.Some Value": "Other Data",
...     "Other Annotation Sequence_1.Code Meaning": "Other Code Meaning",
... }
>>> pattern = {
...     "Sequence_15": "Code Meaning",
...     "Sequence_8": "Code Meaning",
...     "Sequence_16": "Code Meaning",
...     "Sequence_11": "Code Meaning",
...     "Other Annotation Sequence_1": "Code Meaning",
... }
>>> subset_dict(data, pattern)
{'Temperature (Referenced Waveform Channels)': 'Channel 10',
 'Temperature (Numeric Value)': 36.7,
 'Heart Rate (Measurement Units Code SequenceCode Value)': 'bpm',
 'Other Code Meaning (Some Value)': 'Other Data'}

utils.ecg_tools

Collecting established tools for ECG derivation or cleaning.

ecgprocess.utils.ecg_tools.resampling_500hz(signals, duration=None, median=False)[source]

Re-sample an ECG signal to 500 hz.

Parameters:
  • signals (dict [str, np.array]) – A dictionary with the lead names as string keys and the signals as a 1D np.array.

  • duration (int or float) – The represents the duration of the ECG in seconds, which is calculated based on the fraction of number of samples by the sampling frequency in seconds. For raw wavefomrs duration determines the number of samples needed to get a 500hz sample: duration times 500.

  • median (bool, default False) – Set to true to resample a median beat ECG to 500hz. The duration of a median beat signal is 1.2 seconds, hence the number of samples is fixed at: 1.2 times 500 = 600.

Parameters:
Return type:

dict[str, array]

ecgprocess.utils.ecg_tools.get_limb_leads(signals, lead_I='I', lead_II='II')[source]

Calculate the derived limb leads (III, aVR, aVL, aVF) from leads I and II.

Parameters:
  • signals (dict [str, np.array]) – A dictionary with the lead names as string keys and the signals as a 1D np.array.

  • lead_I (str, default 'I') – The key name for lead I in signals

  • lead_II (str, default 'II') – The key name for lead II in signals

Returns:

dict – A dictionary including limb lead signals.

Parameters:
Return type:

dict[str, array]

Notes

please see this url

for the relevant explanation about the relationships between leads I and II

and the limb leads.

ecgprocess.utils.ecg_tools.signal_dicts_to_numpy_array(signals, leads=('I', 'II', 'III', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'aVF', 'aVL', 'aVR'), padding=True)[source]

Convert a list of ECG signal dictionaries to a 3D NumPy array suitable for deep learning.

Parameters:
  • signals (list [dict [str, np.ndarray]]) – List where each dictionary represents an ECG sample with lead names as keys and numpy arrays as values.

  • leads (list [str] or tuple [str, ] or None, default _STANDARD_LEADS) – Lead names to include and their order. Defaults to the standard 12-lead set defined in _STANDARD_LEADS (an immutable tuple). Pass None to collect all unique leads found across samples in sorted order.

  • padding (bool, default True) – Whether to pad shorter signals to the length of the longest signal. If False, all signals must have the same length. Default is True.

Returns:

np.ndarray – 3D NumPy array with shape (num_samples, num_leads, signal_length) containing the ECG data.

Raises:

ValueError – If ecg_data is empty. If any sample is missing leads specified in leads. If padding is False and signals have varying lengths.

Parameters:
Return type:

ndarray

Notes

The numpy array column matches the order of the supplied leads.

ecgprocess.utils.ecg_tools.signal_calibration(signal, correctionfactor, baseline)[source]

Adjusts the ECG signal by subtracting the channel baseline from the signal, followed by multiplying the adjusted signal by the channel correction factor.

Parameters:
  • signal (np.ndarray) – The lead-specific ECG signal.

  • correctionfactor (float) – The channel correction factor.

  • baseline (float) – The channel baseline.

Returns:

np.ndarray – The recalibrated signal.

Parameters:
Return type:

ndarray

ecgprocess.utils.ecg_tools.signal_resolution(signal, resolution_current, resolution_target)[source]

Adjust the amplitude scale of an ECG signal to match a desired resolution.

Parameters:
  • signal (np.ndarray) – The lead-specific ECG signal.

  • resolution_current (float) – The current resolution.

  • resolution_target (float) – The target resolution.

Returns:

np.ndarray – The rescaled signal.

Parameters:
Return type:

ndarray

Example

>>> import numpy as np
>>> ecg_signal = np.array([10, 20, 30, 40, 50])
>>> current_res = 2.0  # each digital unit equals 2 μV
>>> new_signal = adjust_resolution(
...     ecg_signal,
...     resolution_current=current_res,
...     resolution_target=5
... )
>>> print(new_signal)
[ 25.  50.  75. 100. 125.]

utils.engineering_tools

A module containing a collection of function or classes which can be used as engineering functions in the Tabular module.

The listed programs are meant to provide an idea of potentially relevant solutions. Users can use these functions out of the box, adapt them, or simply write their own custom code.

When writing your own engineering solution remember that the first argument will take the metadata, waveforms, or metadata. The waveforms and metadata functions should have a kwargs argument which will be used internally by Tabular to pass meta_dict (the metadata of the file being processed) to the function environment making it available to alter the signal data.

ecgprocess.utils.engineering_tools.metadata_checkversion(metadata, expected_version=['1.02 SP03', 'MUSE_9.0.9.18167'], expected_manufacturer='GE Healthcare', expected_model='MV360', version_name='Softwave version', manufacturer_name='Manufacturer', model_name='Model name', **kwargs)[source]

Validates the DICOM file against the specified software version, manufacturer, and model. If any of these do not match, a FileValidationError is raised.

Parameters:
  • metadata (dict [str, any]) –

    A dictionary containing the metadata for a DICOM file. It must include:
    • “Softwave version”: The software version associated with the file.

    • “Manufacturer”: The manufacturer of the device.

    • “Model name”: The model name of the device.

  • expected_version (str or list [str], default [``’1.02 SP03’, ``'MUSE_9.0.9.18167']) – The software version.

  • expected_manufacturer (str or list [str], default 'GE Healthcare') – The manufacturer.

  • expected_model (str or list [str], 'MV360') – The model.

  • version_name (str) – The key name for version in meta_dict.

  • manufacturer_name (str) – The key name for manufacturer in meta_dict.

  • moel_name (str) – The key name for version in meta_dict.

Returns:

dict [`str`, any] – The input metadata dictionary if validation is successful.

Raises:

FileValidationError – If the DICOM metadata’s software version, manufacturer, or model does not match the respective expected values.

Parameters:
Return type:

dict[str, Any]

Notes

Depending on the ignore_invalid parameter of Tabular the failed filenames will be added to the invalid_list attribute.

ecgprocess.utils.engineering_tools.signal_correction(signals, baseline_name='wave_channel_baseline_', correctionfactor_name='wave_channel_correctionfactor_', **kwargs)[source]

Adjusts the signals by subtracting the channel baseline multiplied by the channel correction factor. These parameters must be provided in kwargs[TabNames.META_DICT].

Parameters:
  • signals (dict [str, np.ndarray]) – A dictionary mapping channel names (strings) to waveform arrays.

  • baseline_name (str) – The dictionary key name for the channel baseline. Will internally add a numeric suffix ranging from 0 to 11 (inclusive).

  • correctionfactor_name (str) – The dictionary key name for the channel correctionfactor. Will internally add a numeric suffix ranging from 0 to 11 (inclusive).

  • **kwargs – Additional keyword arguments, which must include a dictionary under the key TabNames.META_DICT. This dictionary should contain: - wave_channel_correctionfactor_i : float

    Correction factor for channel i.

    • wave_channel_baseline_ifloat

      Baseline offset for channel i.

Returns:

dict [`str`, np.ndarray] – The input signals dictionary with corrected signals.

Raises:

KeyError – If TabNames.META_DICT is not found in **kwargs.

Parameters:
Return type:

dict[str, ndarray]

ecgprocess.utils.engineering_tools.signal_standardise_res(signals, resolution_name='wave_channel_sens_', target_resolution=5.0, **kwargs)[source]

Standardise the resolution signal by adjusting the amplitude scale by the ratio of the source and target solution.

Parameters:
  • signals (dict [str, np.ndarray]) – A dictionary mapping channel names (strings) to waveform arrays.

  • resolution_name (str) – The dictionary key name for the channel sensitivity/resolution. Will internally add a numeric suffix ranging from 0 to 11 (inclusive).

  • target_resolution (float, default 5) – The target resolution.

  • **kwargs – Additional keyword arguments, which must include a dictionary under the key TabNames.META_DICT. This dictionary should contain: - wave_channel_sens : float

    The wave channel sensitivity/resolution for channel i.

Returns:

dict [`str`, np.ndarray] – The input signals dictionary with corrected signals.

Raises:

KeyError – If TabNames.META_DICT is not found in **kwargs. If resolution_name+str(i) is not found in TabNames.META_DICT.

Parameters:
Return type:

dict[str, ndarray]

Notes

The function will apply a scaling factor of source_resolution/target_resolution to ensure the returned signal has the desired target uV.

class ecgprocess.utils.engineering_tools.LeadMapper(accepted_mappings)[source]

Normalise ECG lead ordering when a device writes channels out of sequence.

Some DICOM devices store channels in a non-standard order (e.g. Lead II data in channel-0 slot). LeadMapper reads signal name X entries from meta_dict, builds an actual-label-to-key mapping, then reassigns each canonical key in signals to the correct array.

Parameters:

accepted_mappings (dict [str, list [str]]) – Maps each canonical lead key (e.g. 'I') to a list of device label strings that are acceptable for that lead (e.g. ['I', 'Lead I', 'Lead I (Einthoven)']).

Parameters:

accepted_mappings (dict[str, list[str]])

accepted_mappings

The accepted lead label mappings supplied at initialisation.

Type:

dict [str, list [str]]

__call__(signals, \*\*kwargs)[source]

Reassign signal arrays to their canonical lead keys.

Raises:

InputValidationError – If accepted_mappings is not a dict, or if any value is not a list.

Parameters:

accepted_mappings (dict[str, list[str]])

Notes

The callable works for both engineer_wave and engineer_median call sites in tabular.py since both share the same signature (signals, meta_dict=meta_temp).