# Data Extraction

In this tutorial we'll extract data from the MIMIC-IV Waveform Database.

Our **objectives** are to:
- Extract signals from one segment of a record.
- Limit the segment to only the required duration of relevant signals (_i.e._ 10 min of photoplethysmography and blood pressure signals).

<div class="alert alert-block alert-warning">
<p><b>Context:</b>
    In the <a href="https://wfdb.io/mimic_wfdb_tutorials/tutorial/notebooks/data-exploration.html">Data Exploration</a> tutorial we learnt how to identify segments of waveform data which are suitable for a particular research study (i.e. which have the required duration of the required signals). We extracted metadata for such a segment, providing high-level details of what is contained in the segment (e.g. which signals, their sampling frequency, and their duration). Now we will go a step further to extract signals for analysis.</p>
</div>

---
## Setup
<div class="alert alert-block alert-warning">
<p><b>Resource:</b> These steps are taken from the <a href="https://wfdb.io/mimic_wfdb_tutorials/tutorial/notebooks/data-exploration.html">Data Exploration</a> tutorial.</p></div>

- Specify the required Python packages

In [3]:
import sys
from pathlib import Path

- Install and import the WFDB toolbox

In [4]:
!pip install wfdb==4.0.0
import wfdb

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


- Specify the settings for the MIMIC-IV database

In [5]:
# The name of the MIMIC-IV Waveform Database on PhysioNet
database_name = 'mimic4wdb/0.1.0'

- Provide a list of segments which meet the requirements for the study (NB: these are copied from the end of the [Data Exploration Tutorial](https://wfdb.io/mimic_wfdb_tutorials/tutorial/notebooks/data-exploration.html)).

In [6]:
segment_names = ['83404654_0005']
segment_dirs = ['mimic4wdb/0.1.0/waves/p100/p10020306/83404654']

- Specify a segment from which to extract data

In [9]:
rel_segment_no = 0
rel_segment_name = segment_names[rel_segment_no]
rel_segment_dir = segment_dirs[rel_segment_no]
print(f"Specified segment '{rel_segment_name}' in directory: '{rel_segment_dir}'")

Specified segment '83404654_0005' in directory: 'mimic4wdb/0.1.0/waves/p100/p10020306/83404654'


<div class="alert alert-block alert-info">
<p><b>Extension:</b> Have a look at the files which make up this record <a href="https://physionet.org/content/mimic4wdb/0.1.0/waves/p100/p10020306/">here</a> (NB: you will need to scroll to the bottom of the page).</p>
</div>

---
## Extract data for this segment

- Use the [`rdrecord`](https://wfdb.readthedocs.io/en/latest/io.html#wfdb.io.rdrecord) function from the WFDB toolbox to read the data for this segment.

In [8]:
segment_data = wfdb.rdrecord(record_name=rel_segment_name, pn_dir=rel_segment_dir) 
print(f"Data loaded from segment: {rel_segment_name}")

Data loaded from segment: 83404654_0005


- Look at class type of the object in which the data are stored:

In [10]:
print(f"Data stored in class of type: {type(segment_data)}")

Data stored in class of type: <class 'wfdb.io.record.Record'>


<div class="alert alert-block alert-warning">
<p><b>Resource:</b> You can find out more about the class representing single segment WFDB records <a href="https://wfdb.readthedocs.io/en/stable/io.html?highlight=class#wfdb.io.Record">here</a>.</p>
</div>

- Find out about the signals which have been extracted

In [13]:
print(f"This segment contains waveform data for the following {segment_data.n_sig} signals: {segment_data.sig_name}")
print(f"The signals are sampled at a base rate of {segment_data.fs} Hz (and some are sampled at multiples of this)")
print(f"They last for {segment_data.sig_len/(60*segment_data.fs):.1f} minutes")

This segment contains waveform data for the following 6 signals: ['II', 'V', 'aVR', 'ABP', 'Pleth', 'Resp']
The signals are sampled at a base rate of 62.4725 Hz (and some are sampled at multiples of this)
They last for 52.4 minutes


<div class="alert alert-block alert-info">
<p><b>Question:</b> Can you find out which signals are sampled at multiples of the base sampling frequency by looking at the following contents of the 'segment_data' variable?</p>
</div>

In [16]:
from pprint import pprint
pprint(vars(segment_data))

{'adc_gain': [200.0, 200.0, 200.0, 16.0, 4096.0, 4093.0],
 'adc_res': [14, 14, 14, 13, 12, 12],
 'adc_zero': [8192, 8192, 8192, 4096, 2048, 2048],
 'base_counter': 10219520.0,
 'base_date': None,
 'base_time': None,
 'baseline': [8192, 8192, 8192, 800, 0, 2],
 'block_size': [0, 0, 0, 0, 0, 0],
 'byte_offset': [None, None, None, None, None, None],
 'checksum': [10167, 1300, 56956, 35887, 29987, 21750],
 'comments': ['signal 0 (II): channel=0 bandpass=[0.5,35]',
              'signal 1 (V): channel=1 bandpass=[0.5,35]',
              'signal 2 (aVR): channel=2 bandpass=[0.5,35]'],
 'counter_freq': 999.56,
 'd_signal': None,
 'e_d_signal': None,
 'e_p_signal': None,
 'file_name': ['83404654_0005e.dat',
               '83404654_0005e.dat',
               '83404654_0005e.dat',
               '83404654_0005p.dat',
               '83404654_0005p.dat',
               '83404654_0005r.dat'],
 'fmt': ['516', '516', '516', '516', '516', '516'],
 'fs': 62.4725,
 'init_value': [0, 0, 0, 0, 0, 0],
 '