{ "cells": [ { "cell_type": "markdown", "id": "5d037743", "metadata": { "id": "5d037743" }, "source": [ "# Data Extraction\n", "\n", "In this tutorial we'll extract data from the MIMIC-IV Waveform Database.\n", "\n", "Our **objectives** are to:\n", "- Extract signals from one segment of a record.\n", "- Limit the segment to only the required duration of relevant signals (_i.e._ 10 min of photoplethysmography and blood pressure signals)." ] }, { "cell_type": "markdown", "id": "fe20dd08", "metadata": { "id": "fe20dd08" }, "source": [ "
\n", "

Context:\n", " In the Data Exploration tutorial we learnt how to identify segments of waveform data which are suitable for a particular research study (i.e. which have the required duration of the required signals). We extracted metadata for such a segment, providing high-level details of what is contained in the segment (e.g. which signals, their sampling frequency, and their duration). Now we will go a step further to extract signals for analysis.

\n", "
" ] }, { "cell_type": "markdown", "id": "fd8a0055", "metadata": { "id": "fd8a0055" }, "source": [ "---\n", "## Setup\n", "
\n", "

Resource: These steps are taken from the Data Exploration tutorial.

" ] }, { "cell_type": "markdown", "id": "f4e37777", "metadata": { "id": "f4e37777" }, "source": [ "- Specify the required Python packages" ] }, { "cell_type": "code", "execution_count": 3, "id": "10fdf08b", "metadata": { "id": "10fdf08b" }, "outputs": [], "source": [ "import sys\n", "from pathlib import Path" ] }, { "cell_type": "markdown", "id": "ccce3426", "metadata": { "id": "ccce3426" }, "source": [ "- Install and import the WFDB toolbox" ] }, { "cell_type": "code", "execution_count": 4, "id": "06c8cc1f", "metadata": { "id": "06c8cc1f", "outputId": "747c5f42-e691-4981-fb53-c6f38007e456", "colab": { "base_uri": "https://localhost:8080/" } }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", "Requirement already satisfied: wfdb==4.0.0 in /usr/local/lib/python3.7/dist-packages (4.0.0)\n", "Requirement already satisfied: SoundFile<0.12.0,>=0.10.0 in /usr/local/lib/python3.7/dist-packages (from wfdb==4.0.0) (0.10.3.post1)\n", "Requirement already satisfied: requests<3.0.0,>=2.8.1 in /usr/local/lib/python3.7/dist-packages (from wfdb==4.0.0) (2.23.0)\n", "Requirement already satisfied: pandas<2.0.0,>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from wfdb==4.0.0) (1.3.5)\n", "Requirement already satisfied: scipy<2.0.0,>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from wfdb==4.0.0) (1.4.1)\n", "Requirement already satisfied: matplotlib<4.0.0,>=3.2.2 in /usr/local/lib/python3.7/dist-packages (from wfdb==4.0.0) (3.2.2)\n", "Requirement already satisfied: numpy<2.0.0,>=1.10.1 in /usr/local/lib/python3.7/dist-packages (from wfdb==4.0.0) (1.21.6)\n", "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (2.8.2)\n", "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (1.4.3)\n", "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (3.0.9)\n", "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (0.11.0)\n", "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from kiwisolver>=1.0.1->matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (4.1.1)\n", "Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas<2.0.0,>=1.0.0->wfdb==4.0.0) (2022.1)\n", "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.1->matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (1.15.0)\n", "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.8.1->wfdb==4.0.0) (2022.6.15)\n", "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.8.1->wfdb==4.0.0) (1.24.3)\n", "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.8.1->wfdb==4.0.0) (3.0.4)\n", "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.8.1->wfdb==4.0.0) (2.10)\n", "Requirement already satisfied: cffi>=1.0 in /usr/local/lib/python3.7/dist-packages (from SoundFile<0.12.0,>=0.10.0->wfdb==4.0.0) (1.15.0)\n", "Requirement already satisfied: pycparser in /usr/local/lib/python3.7/dist-packages (from cffi>=1.0->SoundFile<0.12.0,>=0.10.0->wfdb==4.0.0) (2.21)\n" ] } ], "source": [ "!pip install wfdb==4.0.0\n", "import wfdb" ] }, { "cell_type": "markdown", "id": "524ed046", "metadata": { "id": "524ed046" }, "source": [ "- Specify the settings for the MIMIC-IV database" ] }, { "cell_type": "code", "execution_count": 5, "id": "2915e121", "metadata": { "id": "2915e121" }, "outputs": [], "source": [ "# The name of the MIMIC-IV Waveform Database on PhysioNet\n", "database_name = 'mimic4wdb/0.1.0'" ] }, { "cell_type": "markdown", "id": "3ea79319", "metadata": { "id": "3ea79319" }, "source": [ "- Provide a list of segments which meet the requirements for the study (NB: these are copied from the end of the [Data Exploration Tutorial](https://wfdb.io/mimic_wfdb_tutorials/tutorial/notebooks/data-exploration.html))." ] }, { "cell_type": "code", "execution_count": 6, "id": "0ee58931", "metadata": { "id": "0ee58931" }, "outputs": [], "source": [ "segment_names = ['83404654_0005']\n", "segment_dirs = ['mimic4wdb/0.1.0/waves/p100/p10020306/83404654']" ] }, { "cell_type": "markdown", "id": "0e90110a", "metadata": { "id": "0e90110a" }, "source": [ "- Specify a segment from which to extract data" ] }, { "cell_type": "code", "execution_count": 9, "id": "05fb68d0", "metadata": { "id": "05fb68d0", "outputId": "776068c7-a586-4f0a-a6b3-3154bde5459a", "colab": { "base_uri": "https://localhost:8080/" } }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Specified segment '83404654_0005' in directory: 'mimic4wdb/0.1.0/waves/p100/p10020306/83404654'\n" ] } ], "source": [ "rel_segment_no = 0\n", "rel_segment_name = segment_names[rel_segment_no]\n", "rel_segment_dir = segment_dirs[rel_segment_no]\n", "print(f\"Specified segment '{rel_segment_name}' in directory: '{rel_segment_dir}'\")" ] }, { "cell_type": "markdown", "id": "d00513bd", "metadata": { "id": "d00513bd" }, "source": [ "
\n", "

Extension: Have a look at the files which make up this record here (NB: you will need to scroll to the bottom of the page).

\n", "
" ] }, { "cell_type": "markdown", "id": "3b2e6adb", "metadata": { "id": "3b2e6adb" }, "source": [ "---\n", "## Extract data for this segment" ] }, { "cell_type": "markdown", "id": "e8810358", "metadata": { "id": "e8810358" }, "source": [ "- Use the [`rdrecord`](https://wfdb.readthedocs.io/en/latest/io.html#wfdb.io.rdrecord) function from the WFDB toolbox to read the data for this segment." ] }, { "cell_type": "code", "execution_count": 8, "id": "8626ebac", "metadata": { "id": "8626ebac", "outputId": "8c0d3d8e-fb01-4a3a-e75f-6502941fad70", "colab": { "base_uri": "https://localhost:8080/" } }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Data loaded from segment: 83404654_0005\n" ] } ], "source": [ "segment_data = wfdb.rdrecord(record_name=rel_segment_name, pn_dir=rel_segment_dir) \n", "print(f\"Data loaded from segment: {rel_segment_name}\")" ] }, { "cell_type": "markdown", "id": "5032d6c4", "metadata": { "id": "5032d6c4" }, "source": [ "- Look at class type of the object in which the data are stored:" ] }, { "cell_type": "code", "execution_count": 10, "id": "967fa4ef", "metadata": { "id": "967fa4ef", "outputId": "9e9b7857-dfd6-470c-9722-9a3a196687c3", "colab": { "base_uri": "https://localhost:8080/" } }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Data stored in class of type: \n" ] } ], "source": [ "print(f\"Data stored in class of type: {type(segment_data)}\")" ] }, { "cell_type": "markdown", "id": "cf2d5ed7", "metadata": { "id": "cf2d5ed7" }, "source": [ "
\n", "

Resource: You can find out more about the class representing single segment WFDB records here.

\n", "
" ] }, { "cell_type": "markdown", "id": "85a0d656", "metadata": { "id": "85a0d656" }, "source": [ "- Find out about the signals which have been extracted" ] }, { "cell_type": "code", "execution_count": 13, "id": "6d5416b6", "metadata": { "id": "6d5416b6", "outputId": "e72883c0-0675-4e32-f814-a2bc84e03259", "colab": { "base_uri": "https://localhost:8080/" } }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "This segment contains waveform data for the following 6 signals: ['II', 'V', 'aVR', 'ABP', 'Pleth', 'Resp']\n", "The signals are sampled at a base rate of 62.4725 Hz (and some are sampled at multiples of this)\n", "They last for 52.4 minutes\n" ] } ], "source": [ "print(f\"This segment contains waveform data for the following {segment_data.n_sig} signals: {segment_data.sig_name}\")\n", "print(f\"The signals are sampled at a base rate of {segment_data.fs} Hz (and some are sampled at multiples of this)\")\n", "print(f\"They last for {segment_data.sig_len/(60*segment_data.fs):.1f} minutes\")" ] }, { "cell_type": "markdown", "id": "0d40fab4", "metadata": { "id": "0d40fab4" }, "source": [ "
\n", "

Question: Can you find out which signals are sampled at multiples of the base sampling frequency by looking at the following contents of the 'segment_data' variable?

\n", "
" ] }, { "cell_type": "code", "execution_count": 16, "id": "b0903fcf", "metadata": { "id": "b0903fcf", "outputId": "cee5ffd3-aaf5-4e46-9b3e-af94a844a9da", "colab": { "base_uri": "https://localhost:8080/" } }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "{'adc_gain': [200.0, 200.0, 200.0, 16.0, 4096.0, 4093.0],\n", " 'adc_res': [14, 14, 14, 13, 12, 12],\n", " 'adc_zero': [8192, 8192, 8192, 4096, 2048, 2048],\n", " 'base_counter': 10219520.0,\n", " 'base_date': None,\n", " 'base_time': None,\n", " 'baseline': [8192, 8192, 8192, 800, 0, 2],\n", " 'block_size': [0, 0, 0, 0, 0, 0],\n", " 'byte_offset': [None, None, None, None, None, None],\n", " 'checksum': [10167, 1300, 56956, 35887, 29987, 21750],\n", " 'comments': ['signal 0 (II): channel=0 bandpass=[0.5,35]',\n", " 'signal 1 (V): channel=1 bandpass=[0.5,35]',\n", " 'signal 2 (aVR): channel=2 bandpass=[0.5,35]'],\n", " 'counter_freq': 999.56,\n", " 'd_signal': None,\n", " 'e_d_signal': None,\n", " 'e_p_signal': None,\n", " 'file_name': ['83404654_0005e.dat',\n", " '83404654_0005e.dat',\n", " '83404654_0005e.dat',\n", " '83404654_0005p.dat',\n", " '83404654_0005p.dat',\n", " '83404654_0005r.dat'],\n", " 'fmt': ['516', '516', '516', '516', '516', '516'],\n", " 'fs': 62.4725,\n", " 'init_value': [0, 0, 0, 0, 0, 0],\n", " 'n_sig': 6,\n", " 'p_signal': array([[ 0.00000000e+00, -6.50000000e-02, -5.00000000e-03,\n", " nan, 5.02929688e-01, 1.56120205e-01],\n", " [ 5.00000000e-03, -4.50000000e-02, -5.00000000e-03,\n", " nan, 5.02929688e-01, 1.56853164e-01],\n", " [ 1.50000000e-02, -2.50000000e-02, 5.00000000e-03,\n", " nan, 5.02929688e-01, 1.57097484e-01],\n", " ...,\n", " [-1.50000000e-02, 7.00000000e-02, -4.00000000e-02,\n", " 7.25000000e+01, 5.74951172e-01, 3.57683850e-01],\n", " [-1.50000000e-02, 5.50000000e-02, -4.50000000e-02,\n", " 7.25000000e+01, 5.70800781e-01, 3.61104324e-01],\n", " [ 0.00000000e+00, 9.00000000e-02, -5.50000000e-02,\n", " 7.25000000e+01, 5.62255859e-01, 3.63791840e-01]]),\n", " 'record_name': '83404654_0005',\n", " 'samps_per_frame': [4, 4, 4, 2, 2, 1],\n", " 'sig_len': 196480,\n", " 'sig_name': ['II', 'V', 'aVR', 'ABP', 'Pleth', 'Resp'],\n", " 'skew': [None, None, None, None, None, None],\n", " 'units': ['mV', 'mV', 'mV', 'mmHg', 'NU', 'Ohm']}\n" ] } ], "source": [ "from pprint import pprint\n", "pprint(vars(segment_data))" ] }, { "cell_type": "code", "source": [ "" ], "metadata": { "id": "gKtupgmahzpt" }, "id": "gKtupgmahzpt", "execution_count": null, "outputs": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false }, "colab": { "name": "data-extraction.ipynb", "provenance": [] } }, "nbformat": 4, "nbformat_minor": 5 }