Data Extraction

Data Extraction#

In this tutorial we’ll extract data from the MIMIC-IV Waveform Database.

Our objectives are to:

  • Extract signals from one segment of a record.

  • Limit the segment to only the required duration of relevant signals (i.e. 10 min of photoplethysmography and blood pressure signals).

Context: In the Data Exploration tutorial we learnt how to identify segments of waveform data which are suitable for a particular research study (i.e. which have the required duration of the required signals). We extracted metadata for such a segment, providing high-level details of what is contained in the segment (e.g. which signals, their sampling frequency, and their duration). Now we will go a step further to extract signals for analysis.


Setup#

Resource: These steps are taken from the Data Exploration tutorial.

  • Specify the required Python packages

import sys
from pathlib import Path
  • Install and import the WFDB toolbox

!pip install wfdb==4.0.0
import wfdb
Requirement already satisfied: wfdb==4.0.0 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (4.0.0)
Requirement already satisfied: SoundFile<0.12.0,>=0.10.0 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from wfdb==4.0.0) (0.11.0)
Requirement already satisfied: matplotlib<4.0.0,>=3.2.2 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from wfdb==4.0.0) (3.5.2)
Requirement already satisfied: numpy<2.0.0,>=1.10.1 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from wfdb==4.0.0) (1.26.4)
Requirement already satisfied: pandas<2.0.0,>=1.0.0 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from wfdb==4.0.0) (1.5.3)
Requirement already satisfied: requests<3.0.0,>=2.8.1 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from wfdb==4.0.0) (2.32.3)
Requirement already satisfied: scipy<2.0.0,>=1.0.0 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from wfdb==4.0.0) (1.14.0)
Requirement already satisfied: cycler>=0.10 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (4.53.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (1.4.5)
Requirement already satisfied: packaging>=20.0 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (24.1)
Requirement already satisfied: pillow>=6.2.0 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (10.4.0)
Requirement already satisfied: pyparsing>=2.2.1 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (3.1.2)
Requirement already satisfied: python-dateutil>=2.7 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from pandas<2.0.0,>=1.0.0->wfdb==4.0.0) (2024.1)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from requests<3.0.0,>=2.8.1->wfdb==4.0.0) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from requests<3.0.0,>=2.8.1->wfdb==4.0.0) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from requests<3.0.0,>=2.8.1->wfdb==4.0.0) (2.2.2)
Requirement already satisfied: certifi>=2017.4.17 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from requests<3.0.0,>=2.8.1->wfdb==4.0.0) (2024.7.4)
Requirement already satisfied: cffi>=1.0 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from SoundFile<0.12.0,>=0.10.0->wfdb==4.0.0) (1.16.0)
Requirement already satisfied: pycparser in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from cffi>=1.0->SoundFile<0.12.0,>=0.10.0->wfdb==4.0.0) (2.22)
Requirement already satisfied: six>=1.5 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (1.16.0)
  • Specify the settings for the MIMIC-IV database

# The name of the MIMIC-IV Waveform Database on PhysioNet
database_name = 'mimic4wdb/0.1.0'
  • Provide a list of segments which meet the requirements for the study (NB: these are copied from the end of the Data Exploration Tutorial).

segment_names = ['83404654_0005']
segment_dirs = ['mimic4wdb/0.1.0/waves/p100/p10020306/83404654']
  • Specify a segment from which to extract data

rel_segment_no = 0
rel_segment_name = segment_names[rel_segment_no]
rel_segment_dir = segment_dirs[rel_segment_no]
print(f"Specified segment '{rel_segment_name}' in directory: '{rel_segment_dir}'")
Specified segment '83404654_0005' in directory: 'mimic4wdb/0.1.0/waves/p100/p10020306/83404654'

Extension: Have a look at the files which make up this record here (NB: you will need to scroll to the bottom of the page).


Extract data for this segment#

  • Use the rdrecord function from the WFDB toolbox to read the data for this segment.

segment_data = wfdb.rdrecord(record_name=rel_segment_name, pn_dir=rel_segment_dir) 
print(f"Data loaded from segment: {rel_segment_name}")
Data loaded from segment: 83404654_0005
  • Look at class type of the object in which the data are stored:

print(f"Data stored in class of type: {type(segment_data)}")
Data stored in class of type: <class 'wfdb.io.record.Record'>

Resource: You can find out more about the class representing single segment WFDB records here.

  • Find out about the signals which have been extracted

print(f"This segment contains waveform data for the following {segment_data.n_sig} signals: {segment_data.sig_name}")
print(f"The signals are sampled at a base rate of {segment_data.fs} Hz (and some are sampled at multiples of this)")
print(f"They last for {segment_data.sig_len/(60*segment_data.fs):.1f} minutes")
This segment contains waveform data for the following 6 signals: ['II', 'V', 'aVR', 'ABP', 'Pleth', 'Resp']
The signals are sampled at a base rate of 62.4725 Hz (and some are sampled at multiples of this)
They last for 52.4 minutes

Question: Can you find out which signals are sampled at multiples of the base sampling frequency by looking at the following contents of the 'segment_data' variable?

from pprint import pprint
pprint(vars(segment_data))
{'adc_gain': [200.0, 200.0, 200.0, 16.0, 4096.0, 4093.0],
 'adc_res': [14, 14, 14, 13, 12, 12],
 'adc_zero': [8192, 8192, 8192, 4096, 2048, 2048],
 'base_counter': 10219520.0,
 'base_date': None,
 'base_time': None,
 'baseline': [8192, 8192, 8192, 800, 0, 2],
 'block_size': [0, 0, 0, 0, 0, 0],
 'byte_offset': [None, None, None, None, None, None],
 'checksum': [10167, 1300, 56956, 35887, 29987, 21750],
 'comments': ['signal 0 (II): channel=0 bandpass=[0.5,35]',
              'signal 1 (V): channel=1 bandpass=[0.5,35]',
              'signal 2 (aVR): channel=2 bandpass=[0.5,35]'],
 'counter_freq': 999.56,
 'd_signal': None,
 'e_d_signal': None,
 'e_p_signal': None,
 'file_name': ['83404654_0005e.dat',
               '83404654_0005e.dat',
               '83404654_0005e.dat',
               '83404654_0005p.dat',
               '83404654_0005p.dat',
               '83404654_0005r.dat'],
 'fmt': ['516', '516', '516', '516', '516', '516'],
 'fs': 62.4725,
 'init_value': [0, 0, 0, 0, 0, 0],
 'n_sig': 6,
 'p_signal': array([[ 0.00000000e+00, -6.50000000e-02, -5.00000000e-03,
                    nan,  5.02929688e-01,  1.56120205e-01],
       [ 5.00000000e-03, -4.50000000e-02, -5.00000000e-03,
                    nan,  5.02929688e-01,  1.56853164e-01],
       [ 1.50000000e-02, -2.50000000e-02,  5.00000000e-03,
                    nan,  5.02929688e-01,  1.57097484e-01],
       ...,
       [-1.50000000e-02,  7.00000000e-02, -4.00000000e-02,
         7.25000000e+01,  5.74951172e-01,  3.57683850e-01],
       [-1.50000000e-02,  5.50000000e-02, -4.50000000e-02,
         7.25000000e+01,  5.70800781e-01,  3.61104324e-01],
       [ 0.00000000e+00,  9.00000000e-02, -5.50000000e-02,
         7.25000000e+01,  5.62255859e-01,  3.63791840e-01]]),
 'record_name': '83404654_0005',
 'samps_per_frame': [4, 4, 4, 2, 2, 1],
 'sig_len': 196480,
 'sig_name': ['II', 'V', 'aVR', 'ABP', 'Pleth', 'Resp'],
 'skew': [None, None, None, None, None, None],
 'units': ['mV', 'mV', 'mV', 'mmHg', 'NU', 'Ohm']}