Engineering callables in ECGProcess

ecgprocess.utils.engineering_tools provides ready-made callable objects and functions that can be passed directly to ECGTable via the engineer_meta, engineer_wave, and engineer_median parameters. Each callable receives the relevant data structure (a metadata dict or a signals dict) as its first positional argument, and the waveform/median callables also receive the current file’s metadata through a meta_dict keyword argument.

The implemented and showcased callables are non-exhaustive and the user is encouraged to generate (and contribute) their own functionality relevant for their own problems.

[1]:

import numpy as np
from ecgprocess.utils.engineering_tools import (
    metadata_checkversion,
    signal_correction,
    signal_standardise_res,
    LeadMapper,
)
from ecgprocess.example_data.examples import (
    parsed_config,
    list_dicom_paths,
)
from ecgprocess.process_dicom import ECGDICOMReader
from ecgprocess.tabular import ECGTable
from ecgprocess.errors import FileValidationError
# #### Relevant paths
dicom_paths = [
    list_dicom_paths()['example_dicom_1'],
    list_dicom_paths()['example_dicom_2'],
]
# #### Parsed config
parser_dicom = parsed_config()['parsed_dicom1']

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1776375908.676272   12670 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.

metadata_checkversion

metadata_checkversion validates three fields in the metadata dict — software version, manufacturer, and model name — against expected values supplied at call time. When validation passes the original dict is returned unchanged; a FileValidationError is raised on any mismatch.

The function is designed to be used as the engineer_meta callable in ECGTable so that files from unexpected devices are rejected before signal processing begins.

[2]:

# Direct call — passing validation
meta_valid = {
    'Softwave version': '1.02 SP03',
    'Manufacturer': 'GE Healthcare',
    'Model name': 'MV360',
}
result = metadata_checkversion(
    meta_valid,
    expected_version=['1.02 SP03', 'MUSE_9.0.9.18167'],
    expected_manufacturer='GE Healthcare',
    expected_model='MV360',
)
print('Validation passed; returned dict is same object:', result is meta_valid)

Validation passed; returned dict is same object: True

[3]:

# Direct call — failing validation (wrong manufacturer)
meta_invalid = {
    'Softwave version': '1.02 SP03',
    'Manufacturer': 'GE',
    'Model name': 'MAC55',
}
try:
    metadata_checkversion(
        meta_invalid,
        expected_version=['1.02 SP03', 'MUSE_9.0.9.18167'],
        expected_manufacturer='GE Healthcare',
        expected_model='MV360',
    )
except FileValidationError as exc:
    print('FileValidationError:', exc)

FileValidationError: Manufacturer `GE` failed to validate; expected one of ['GE Healthcare'].

signal_correction

signal_correction adjusts each signal array by subtracting baseline × correctionfactor on a per-channel basis. The per-channel baseline and correction factor values are read from meta_dict, which ECGTable passes automatically as a keyword argument.

The expected meta_dict keys follow the naming convention wave_channel_baseline_<i> and wave_channel_correctionfactor_<i> where i is the zero-based channel index.

[4]:

# Direct call — non-zero baseline and correction factor
synthetic_signals = {
    'I':  np.array([0.0, 10.0, 20.0]),
    'II': np.array([5.0, 15.0, 25.0]),
}
meta_with_correction = {
    'wave_channel_baseline_0': 10.0,
    'wave_channel_correctionfactor_0': 2.0,
    'wave_channel_baseline_1': 5.0,
    'wave_channel_correctionfactor_1': 1.0,
}
corrected = signal_correction(
    synthetic_signals,
    meta_dict=meta_with_correction,
)
# Lead I: [0-10*2, 10-10*2, 20-10*2] = [-20, -10, 0]
# Lead II: [5-5*1, 15-5*1, 25-5*1]  = [0, 10, 20]
print('Corrected I:', corrected['I'])
print('Corrected II:', corrected['II'])

Corrected I: [-20.   0.  20.]
Corrected II: [ 0. 10. 20.]

[5]:

# Via ECGTable — both example files have zero baselines so correction
# leaves the signals unchanged; verifies the pipeline wiring
reader = ECGDICOMReader()
ecgtable = ECGTable(
    reader,
    path_list=dicom_paths,
    engineer_wave=signal_correction,
    engineer_median=signal_correction,
)
table = ecgtable().get_table(parsed_config=parser_dicom)
table.WaveForms.head()

[5]:

	key	Sampling sequence	Lead	Voltage	Signal type
0	2.25.269796857626990821315969488216511468638	0	I	0.00	WaveForms
1	2.25.269796857626990821315969488216511468638	1	I	0.00	WaveForms
2	2.25.269796857626990821315969488216511468638	2	I	29.28	WaveForms
3	2.25.269796857626990821315969488216511468638	3	I	43.92	WaveForms
4	2.25.269796857626990821315969488216511468638	4	I	48.80	WaveForms

signal_standardise_res

signal_standardise_res rescales each signal array so that its amplitude represents the target_resolution µV per unit. The per-channel source resolution is read from meta_dict under the key wave_channel_sens_<i>. A scaling factor of source / target is applied.

The two example DICOM files have different native resolutions (4.88 µV and 5.0 µV respectively), making them a convenient real-world demonstration.

[6]:

# Direct call — synthetic signal at 4.88 uV rescaled to 5.0 uV
signal_488 = {'I': np.array([0.0, 4.88, 9.76])}
meta_488 = {'wave_channel_sens_0': 4.88}
rescaled = signal_standardise_res(
    signal_488,
    target_resolution=5.0,
    meta_dict=meta_488,
)
print('Input  I (4.88 uV/unit):', signal_488['I'])
print('Output I (5.0 uV/unit): ', rescaled['I'])

Input  I (4.88 uV/unit): [0.   4.88 9.76]
Output I (5.0 uV/unit):  [ 0.  5. 10.]

[7]:

# Via ECGTable — example_dicom_1 (4.88 uV) is rescaled;
# example_dicom_2 (5.0 uV) is unchanged
reader = ECGDICOMReader()
ecgtable_raw = ECGTable(reader, path_list=dicom_paths)
ecgtable_res = ECGTable(
    reader,
    path_list=dicom_paths,
    engineer_wave=signal_standardise_res,
    engineer_median=signal_standardise_res,
)
table_raw = ecgtable_raw().get_table(parsed_config=parser_dicom)
table_res = ecgtable_res().get_table(parsed_config=parser_dicom)
# Compare Lead I, first sample for each file
print('Before standardisation:')
print(table_raw.WaveForms.groupby('key').first()[['Lead', 'Voltage']])
print('\nAfter standardisation:')
print(table_res.WaveForms.groupby('key').first()[['Lead', 'Voltage']])

Before standardisation:
                                                   Lead  Voltage
key
1.3.6.1.4.1.40744.65.22183544986928027811655533...    I   -105.0
2.25.269796857626990821315969488216511468638          I      0.0

After standardisation:
                                                   Lead  Voltage
key
1.3.6.1.4.1.40744.65.22183544986928027811655533...    I   -105.0
2.25.269796857626990821315969488216511468638          I      0.0

LeadMapper

Some DICOM devices write ECG channels in a non-standard order — for example, Lead II data may be stored in the channel-0 slot. LeadMapper corrects this by reading signal name <i> entries from meta_dict, building an actual-device-label → canonical-key mapping, and reassigning the signal arrays accordingly.

LeadMapper is instantiated once with an accepted_mappings dict that maps each canonical key (e.g. 'I') to the list of device labels that are acceptable for that lead. The instance is then used as engineer_wave and/or engineer_median.

[8]:

# Construction and string representations
mapper = LeadMapper({
    'I':  ['Lead I',  'I',  'Lead I (Einthoven)'],
    'II': ['Lead II', 'II', 'Lead II (Einthoven)'],
})
print(repr(mapper))
print(str(mapper))

LeadMapper(accepted_mappings={'I': ['Lead I', 'I', 'Lead I (Einthoven)'], 'II': ['Lead II', 'II', 'Lead II (Einthoven)']})
LeadMapper with 2 canonical leads: ['I', 'II']

[9]:

# Synthetic remapping demo
# The device wrote Lead II data into channel-0 (key 'I') and
# Lead I data into channel-1 (key 'II').
lead_II_array = np.array([1.0, 2.0, 3.0])
lead_I_array  = np.array([4.0, 5.0, 6.0])
signals_swapped = {'I': lead_II_array, 'II': lead_I_array}
# meta_dict tells us what the device actually stored in each channel
meta_swapped = {
    'signal name 0': 'Lead II',
    'signal name 1': 'Lead I',
}
corrected_signals = mapper(
    signals_swapped,
    meta_dict=meta_swapped,
)
print('After remapping:')
print('  signals["I"]  (should be Lead I  [4,5,6]):', corrected_signals['I'])
print('  signals["II"] (should be Lead II [1,2,3]):', corrected_signals['II'])

After remapping:
  signals["I"]  (should be Lead I  [4,5,6]): [4. 5. 6.]
  signals["II"] (should be Lead II [1,2,3]): [1. 2. 3.]

Chaining callables

Multiple engineering steps can be composed into a single wrapper function and passed as one callable. The wrapper below applies baseline correction followed by resolution standardisation in a single pass.

[10]:

def correct_and_standardise(
    signals: dict,
    target_resolution: float = 5.0,
    verbose: bool = False,
    **kwargs,
) -> dict:
    """Apply baseline correction then resolution standardisation."""
    signals = signal_correction(signals, verbose=verbose, **kwargs)
    return signal_standardise_res(
        signals,
        target_resolution=target_resolution,
        verbose=verbose,
        **kwargs,
    )
reader = ECGDICOMReader()
ecgtable = ECGTable(
    reader,
    path_list=dicom_paths,
    engineer_wave=correct_and_standardise,
    engineer_median=correct_and_standardise,
)
table = ecgtable().get_table(parsed_config=parser_dicom)
table.WaveForms.head()

[10]:

	key	Sampling sequence	Lead	Voltage	Signal type
0	2.25.269796857626990821315969488216511468638	0	I	0.0	WaveForms
1	2.25.269796857626990821315969488216511468638	1	I	0.0	WaveForms
2	2.25.269796857626990821315969488216511468638	2	I	30.0	WaveForms
3	2.25.269796857626990821315969488216511468638	3	I	45.0	WaveForms
4	2.25.269796857626990821315969488216511468638	4	I	50.0	WaveForms