Data Extraction#
In this tutorial we’ll extract data from the MIMIC-IV Waveform Database.
Our objectives are to:
Extract signals from one segment of a record.
Limit the segment to only the required duration of relevant signals (i.e. 10 min of photoplethysmography and blood pressure signals).
Context: In the Data Exploration tutorial we learnt how to identify segments of waveform data which are suitable for a particular research study (i.e. which have the required duration of the required signals). We extracted metadata for such a segment, providing high-level details of what is contained in the segment (e.g. which signals, their sampling frequency, and their duration). Now we will go a step further to extract signals for analysis.
Setup#
Resource: These steps are taken from the Data Exploration tutorial.
Specify the required Python packages
import sys
from pathlib import Path
Install and import the WFDB toolbox
!pip install wfdb==4.0.0
import wfdb
Requirement already satisfied: wfdb==4.0.0 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (4.0.0)
Requirement already satisfied: SoundFile<0.12.0,>=0.10.0 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from wfdb==4.0.0) (0.11.0)
Requirement already satisfied: matplotlib<4.0.0,>=3.2.2 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from wfdb==4.0.0) (3.5.2)
Requirement already satisfied: numpy<2.0.0,>=1.10.1 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from wfdb==4.0.0) (1.26.4)
Requirement already satisfied: pandas<2.0.0,>=1.0.0 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from wfdb==4.0.0) (1.5.3)
Requirement already satisfied: requests<3.0.0,>=2.8.1 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from wfdb==4.0.0) (2.32.3)
Requirement already satisfied: scipy<2.0.0,>=1.0.0 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from wfdb==4.0.0) (1.14.0)
Requirement already satisfied: cycler>=0.10 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (4.53.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (1.4.5)
Requirement already satisfied: packaging>=20.0 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (24.1)
Requirement already satisfied: pillow>=6.2.0 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (10.4.0)
Requirement already satisfied: pyparsing>=2.2.1 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (3.1.2)
Requirement already satisfied: python-dateutil>=2.7 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from pandas<2.0.0,>=1.0.0->wfdb==4.0.0) (2024.1)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from requests<3.0.0,>=2.8.1->wfdb==4.0.0) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from requests<3.0.0,>=2.8.1->wfdb==4.0.0) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from requests<3.0.0,>=2.8.1->wfdb==4.0.0) (2.2.2)
Requirement already satisfied: certifi>=2017.4.17 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from requests<3.0.0,>=2.8.1->wfdb==4.0.0) (2024.7.4)
Requirement already satisfied: cffi>=1.0 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from SoundFile<0.12.0,>=0.10.0->wfdb==4.0.0) (1.16.0)
Requirement already satisfied: pycparser in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from cffi>=1.0->SoundFile<0.12.0,>=0.10.0->wfdb==4.0.0) (2.22)
Requirement already satisfied: six>=1.5 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (1.16.0)
Specify the settings for the MIMIC-IV database
# The name of the MIMIC-IV Waveform Database on PhysioNet
database_name = 'mimic4wdb/0.1.0'
Provide a list of segments which meet the requirements for the study (NB: these are copied from the end of the Data Exploration Tutorial).
segment_names = ['83404654_0005']
segment_dirs = ['mimic4wdb/0.1.0/waves/p100/p10020306/83404654']
Specify a segment from which to extract data
rel_segment_no = 0
rel_segment_name = segment_names[rel_segment_no]
rel_segment_dir = segment_dirs[rel_segment_no]
print(f"Specified segment '{rel_segment_name}' in directory: '{rel_segment_dir}'")
Specified segment '83404654_0005' in directory: 'mimic4wdb/0.1.0/waves/p100/p10020306/83404654'
Extension: Have a look at the files which make up this record here (NB: you will need to scroll to the bottom of the page).
Extract data for this segment#
Use the
rdrecord
function from the WFDB toolbox to read the data for this segment.
segment_data = wfdb.rdrecord(record_name=rel_segment_name, pn_dir=rel_segment_dir)
print(f"Data loaded from segment: {rel_segment_name}")
Data loaded from segment: 83404654_0005
Look at class type of the object in which the data are stored:
print(f"Data stored in class of type: {type(segment_data)}")
Data stored in class of type: <class 'wfdb.io.record.Record'>
Resource: You can find out more about the class representing single segment WFDB records here.
Find out about the signals which have been extracted
print(f"This segment contains waveform data for the following {segment_data.n_sig} signals: {segment_data.sig_name}")
print(f"The signals are sampled at a base rate of {segment_data.fs} Hz (and some are sampled at multiples of this)")
print(f"They last for {segment_data.sig_len/(60*segment_data.fs):.1f} minutes")
This segment contains waveform data for the following 6 signals: ['II', 'V', 'aVR', 'ABP', 'Pleth', 'Resp']
The signals are sampled at a base rate of 62.4725 Hz (and some are sampled at multiples of this)
They last for 52.4 minutes
Question: Can you find out which signals are sampled at multiples of the base sampling frequency by looking at the following contents of the 'segment_data' variable?
from pprint import pprint
pprint(vars(segment_data))
{'adc_gain': [200.0, 200.0, 200.0, 16.0, 4096.0, 4093.0],
'adc_res': [14, 14, 14, 13, 12, 12],
'adc_zero': [8192, 8192, 8192, 4096, 2048, 2048],
'base_counter': 10219520.0,
'base_date': None,
'base_time': None,
'baseline': [8192, 8192, 8192, 800, 0, 2],
'block_size': [0, 0, 0, 0, 0, 0],
'byte_offset': [None, None, None, None, None, None],
'checksum': [10167, 1300, 56956, 35887, 29987, 21750],
'comments': ['signal 0 (II): channel=0 bandpass=[0.5,35]',
'signal 1 (V): channel=1 bandpass=[0.5,35]',
'signal 2 (aVR): channel=2 bandpass=[0.5,35]'],
'counter_freq': 999.56,
'd_signal': None,
'e_d_signal': None,
'e_p_signal': None,
'file_name': ['83404654_0005e.dat',
'83404654_0005e.dat',
'83404654_0005e.dat',
'83404654_0005p.dat',
'83404654_0005p.dat',
'83404654_0005r.dat'],
'fmt': ['516', '516', '516', '516', '516', '516'],
'fs': 62.4725,
'init_value': [0, 0, 0, 0, 0, 0],
'n_sig': 6,
'p_signal': array([[ 0.00000000e+00, -6.50000000e-02, -5.00000000e-03,
nan, 5.02929688e-01, 1.56120205e-01],
[ 5.00000000e-03, -4.50000000e-02, -5.00000000e-03,
nan, 5.02929688e-01, 1.56853164e-01],
[ 1.50000000e-02, -2.50000000e-02, 5.00000000e-03,
nan, 5.02929688e-01, 1.57097484e-01],
...,
[-1.50000000e-02, 7.00000000e-02, -4.00000000e-02,
7.25000000e+01, 5.74951172e-01, 3.57683850e-01],
[-1.50000000e-02, 5.50000000e-02, -4.50000000e-02,
7.25000000e+01, 5.70800781e-01, 3.61104324e-01],
[ 0.00000000e+00, 9.00000000e-02, -5.50000000e-02,
7.25000000e+01, 5.62255859e-01, 3.63791840e-01]]),
'record_name': '83404654_0005',
'samps_per_frame': [4, 4, 4, 2, 2, 1],
'sig_len': 196480,
'sig_name': ['II', 'V', 'aVR', 'ABP', 'Pleth', 'Resp'],
'skew': [None, None, None, None, None, None],
'units': ['mV', 'mV', 'mV', 'mmHg', 'NU', 'Ohm']}