{
  "cells": [
    {
      "cell_type": "markdown",
      "id": "5d037743",
      "metadata": {
        "id": "5d037743"
      },
      "source": [
        "# Data Exploration"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "fbae8e9b",
      "metadata": {
        "id": "fbae8e9b"
      },
      "source": [
        "Let's begin by exploring data in the MIMIC Waveform Database.\n",
        "\n",
        "Our **objectives** are to:\n",
        "- Review the structure of the MIMIC Waveform Database (considering subjects, studies, records, and segments).\n",
        "- Load waveforms using the WFDB toolbox.\n",
        "- Find out which signals are present in selected records and segments, and how long the signals last.\n",
        "- Search for records that contain signals of interest."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "0b240726",
      "metadata": {
        "id": "0b240726"
      },
      "source": [
        "<div class=\"alert alert-block alert-warning\">\n",
        "<p><b>Resource:</b> You can find out more about the MIMIC Waveform Database <a href=\"https://physionet.org/content/mimic4wdb/0.1.0/\">here</a>.</p>\n",
        "</div>"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "28b8e213",
      "metadata": {
        "id": "28b8e213"
      },
      "source": [
        "---\n",
        "## Setup"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "5dac032e",
      "metadata": {
        "id": "5dac032e"
      },
      "source": [
        "### Specify the required Python packages\n",
        "We'll import the following:\n",
        "- _sys_: an essential python package\n",
        "- _pathlib_ (well a particular function from _pathlib_, called _Path_)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "ce3cdfde",
      "metadata": {
        "id": "ce3cdfde"
      },
      "outputs": [],
      "source": [
        "import sys\n",
        "from pathlib import Path"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "9976c5e4",
      "metadata": {
        "id": "9976c5e4"
      },
      "source": [
        "### Specify a particular version of the WFDB Toolbox"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "6533154b",
      "metadata": {
        "id": "6533154b"
      },
      "source": [
        "- _wfdb_: For this workshop we will be using version 4 of the WaveForm DataBase (WFDB) Toolbox package. The package contains tools for processing waveform data such as those found in MIMIC:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "5fdfa989",
      "metadata": {
        "id": "5fdfa989"
      },
      "outputs": [],
      "source": [
        "!pip install wfdb==4.0.0\n",
        "import wfdb"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "e11ce5b6",
      "metadata": {
        "id": "e11ce5b6"
      },
      "source": [
        "<div class=\"alert alert-block alert-warning\">\n",
        "<p><b>Resource:</b> You can find out more about the WFDB package <a href=\"https://physionet.org/content/wfdb-python/3.4.1/\">here</a>.</p>\n",
        "</div>"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "d492e49f",
      "metadata": {
        "id": "d492e49f"
      },
      "source": [
        "Now that we have imported these packages (_i.e._ toolboxes) we have a set of tools (functions) ready to use."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "e7d38297",
      "metadata": {
        "id": "e7d38297"
      },
      "source": [
        "### Specify the name of the MIMIC Waveform Database"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "68491718",
      "metadata": {
        "id": "68491718"
      },
      "source": [
        "- Specify the name of the MIMIC IV Waveform Database on Physionet, which comes from the URL: https://physionet.org/content/mimic4wdb/0.1.0/"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "982b8154",
      "metadata": {
        "id": "982b8154"
      },
      "outputs": [],
      "source": [
        "database_name = 'mimic4wdb/0.1.0'"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "e49196a6",
      "metadata": {
        "id": "e49196a6"
      },
      "source": [
        "---\n",
        "## Identify the records in the database"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "b476f9b7",
      "metadata": {
        "id": "b476f9b7"
      },
      "source": [
        "### Get a list of records\n",
        "\n",
        "- Use the [`get_record_list`](https://wfdb.readthedocs.io/en/latest/io.html#wfdb.io.get_record_list) function from the WFDB toolbox to get a list of records in the database."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "d91aa6a7",
      "metadata": {
        "id": "d91aa6a7",
        "outputId": "db8e3169-76ac-4bdd-bbaa-91cf626c1a6b",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "The 'mimic4wdb/0.1.0' database contains data from 198 subjects\n"
          ]
        }
      ],
      "source": [
        "# each subject may be associated with multiple records\n",
        "subjects = wfdb.get_record_list(database_name)\n",
        "print(f\"The '{database_name}' database contains data from {len(subjects)} subjects\")\n",
        "\n",
        "# set max number of records to load\n",
        "max_records_to_load = 200"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# iterate the subjects to get a list of records\n",
        "records = []\n",
        "for subject in subjects:\n",
        "    studies = wfdb.get_record_list(f'{database_name}/{subject}')\n",
        "    for study in studies:\n",
        "        records.append(Path(f'{subject}{study}'))\n",
        "        # stop if we've loaded enough records\n",
        "        if len(records) >= max_records_to_load:\n",
        "            print(\"Reached maximum required number of records.\")\n",
        "            break\n",
        "\n",
        "print(f\"Loaded {len(records)} records from the '{database_name}' database.\")"
      ],
      "metadata": {
        "id": "0RzQmqjiQ9LD",
        "outputId": "31eb6067-de92-4424-b32b-f292623215a5",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "id": "0RzQmqjiQ9LD",
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Reached maximum required number of records.\n",
            "Loaded 200 records from the 'mimic4wdb/0.1.0' database.\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "id": "fc82d67e",
      "metadata": {
        "id": "fc82d67e"
      },
      "source": [
        "### Look at the records"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "29552f5a",
      "metadata": {
        "id": "29552f5a"
      },
      "source": [
        "- Display the first few records"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "bb5745a7",
      "metadata": {
        "id": "bb5745a7",
        "outputId": "8fe32e59-c542-4a40-bd06-0c04fdcfbbfe",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "First five records: \n",
            " - waves/p100/p10014354/81739927/81739927\n",
            " - waves/p100/p10019003/87033314/87033314\n",
            " - waves/p100/p10020306/83404654/83404654\n",
            " - waves/p100/p10039708/83411188/83411188\n",
            " - waves/p100/p10039708/85583557/85583557\n",
            "\n",
            "Note the formatting of these records:\n",
            " - intermediate directory ('p100' in this case)\n",
            " - subject identifier (e.g. 'p10014354')\n",
            " - record identifier (e.g. '81739927'\n",
            " \n"
          ]
        }
      ],
      "source": [
        "# format and print first five records\n",
        "first_five_records = [str(x) for x in records[0:5]]\n",
        "first_five_records = \"\\n - \".join(first_five_records)\n",
        "print(f\"First five records: \\n - {first_five_records}\")\n",
        "\n",
        "print(\"\"\"\n",
        "Note the formatting of these records:\n",
        " - intermediate directory ('p100' in this case)\n",
        " - subject identifier (e.g. 'p10014354')\n",
        " - record identifier (e.g. '81739927'\n",
        " \"\"\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "b56c29d5",
      "metadata": {
        "id": "b56c29d5"
      },
      "source": [
        "<div class=\"alert alert-block alert-info\">\n",
        "<p><b>Q:</b> Can you print the names of the last five records? <br> <b>Hint:</b> in Python, the last five elements can be specified using '[-5:]'</p>\n",
        "</div>"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "cb21a93b",
      "metadata": {
        "id": "cb21a93b"
      },
      "source": [
        "---\n",
        "## Extract metadata for a record"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "c39dc9f3",
      "metadata": {
        "id": "c39dc9f3"
      },
      "source": [
        "Each record contains metadata stored in a header file, named \"`<record name>.hea`\""
      ]
    },
    {
      "cell_type": "markdown",
      "id": "3b2e6adb",
      "metadata": {
        "id": "3b2e6adb"
      },
      "source": [
        "### Specify the online directory containing a record's data"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "86eed39f",
      "metadata": {
        "id": "86eed39f",
        "outputId": "5cfa40d0-b4d4-4605-b677-164d9b603f90",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "PhysioNet directory specified for record: mimic4wdb/0.1.0/waves/p100/p10039708/83411188\n"
          ]
        }
      ],
      "source": [
        "# Specify the 4th record (note, in Python indexing begins at 0)\n",
        "idx = 3\n",
        "record = records[idx]\n",
        "record_dir = f'{database_name}/{record.parent}'\n",
        "print(\"PhysioNet directory specified for record: {}\".format(record_dir))"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "b5220ad3",
      "metadata": {
        "id": "b5220ad3"
      },
      "source": [
        "### Specify the subject identifier"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "d7a5bbef",
      "metadata": {
        "id": "d7a5bbef"
      },
      "source": [
        "Extract the record name (e.g. '83411188') from the record (e.g. 'p100/p10039708/83411188/83411188'):"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "b4bc247b",
      "metadata": {
        "id": "b4bc247b",
        "outputId": "a74ca902-ca05-496a-fd5d-2dbb0d95f998",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Record name: 83411188\n"
          ]
        }
      ],
      "source": [
        "record_name = record.name\n",
        "print(\"Record name: {}\".format(record_name))"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "742071da",
      "metadata": {
        "id": "742071da"
      },
      "source": [
        "### Load the metadata for this record\n",
        "- Use the [`rdheader`](https://wfdb.readthedocs.io/en/latest/io.html#wfdb.io.rdheader) function from the WFDB toolbox to load metadata from the record header file"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "c5a0afc5",
      "metadata": {
        "id": "c5a0afc5",
        "outputId": "13b3dfa2-d489-4a77-c07d-a5116d67b4ec",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Done: metadata loaded for record '83411188' from the header file at:\n",
            "https://physionet.org/content/mimic4wdb/0.1.0/waves/p100/p10039708/83411188/83411188.hea\n"
          ]
        }
      ],
      "source": [
        "record_data = wfdb.rdheader(record_name, pn_dir=record_dir, rd_segments=True)\n",
        "remote_url = \"https://physionet.org/content/\" + record_dir + \"/\" + record_name + \".hea\"\n",
        "print(f\"Done: metadata loaded for record '{record_name}' from the header file at:\\n{remote_url}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "f7a4d25d",
      "metadata": {
        "id": "f7a4d25d"
      },
      "source": [
        "---\n",
        "## Inspect details of physiological signals recorded in this record\n",
        "- Printing a few details of the signals from the extracted metadata"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "58630149",
      "metadata": {
        "id": "58630149",
        "outputId": "e19d66b1-690c-4cc5-c754-c4b5d1b16d38",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "- Number of signals: 6\n",
            "- Duration: 14.2 hours\n",
            "- Base sampling frequency: 62.4725 Hz\n"
          ]
        }
      ],
      "source": [
        "print(f\"- Number of signals: {record_data.n_sig}\".format())\n",
        "print(f\"- Duration: {record_data.sig_len/(record_data.fs*60*60):.1f} hours\") \n",
        "print(f\"- Base sampling frequency: {record_data.fs} Hz\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "7b3da17f",
      "metadata": {
        "id": "7b3da17f"
      },
      "source": [
        "---\n",
        "## Inspect the segments making up a record\n",
        "Each record is typically made up of several segments"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "b127c857",
      "metadata": {
        "id": "b127c857",
        "outputId": "4fe5a2b3-b95b-4bbe-db18-fabb199f0584",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "The 6 segments from record 83411188 are:\n",
            "['83411188_0000', '83411188_0001', '83411188_0002', '83411188_0003', '83411188_0004', '83411188_0005']\n"
          ]
        }
      ],
      "source": [
        "segments = record_data.seg_name\n",
        "print(f\"The {len(segments)} segments from record {record_name} are:\\n{segments}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "b379eaaf",
      "metadata": {
        "id": "b379eaaf"
      },
      "source": [
        "The format of filename for each segment is: `record directory, \"_\", segment number`"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "f19d231b",
      "metadata": {
        "id": "f19d231b"
      },
      "source": [
        "---\n",
        "## Inspect an individual segment\n",
        "### Read the metadata for this segment\n",
        "- Read the metadata from the header file"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "7f70d34f",
      "metadata": {
        "id": "7f70d34f",
        "outputId": "d1bd96de-09d9-4cf2-fa35-1bbcb5ddced4",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Header metadata loaded for: \n",
            "- the segment '83411188_0001'\n",
            "- in record '83411188'\n",
            "- for subject 'p10039708'\n",
            "\n"
          ]
        }
      ],
      "source": [
        "segment_metadata = wfdb.rdheader(record_name=segments[2], pn_dir=record_dir)\n",
        "\n",
        "print(f\"\"\"Header metadata loaded for: \n",
        "- the segment '{segments[2]}'\n",
        "- in record '{record_name}'\n",
        "- for subject '{str(Path(record_dir).parent.parts[-1])}'\n",
        "\"\"\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "d28771ac",
      "metadata": {
        "id": "d28771ac"
      },
      "source": [
        "### Find out what signals are present"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "324727df",
      "metadata": {
        "id": "324727df",
        "outputId": "223bdb49-5023-453d-f2b7-a016a603fec9",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "This segment contains the following signals: ['II', 'V', 'aVR', 'ABP', 'Pleth', 'Resp']\n",
            "The signals are measured in units of: ['mV', 'mV', 'mV', 'mmHg', 'NU', 'Ohm']\n"
          ]
        }
      ],
      "source": [
        "print(f\"This segment contains the following signals: {segment_metadata.sig_name}\")\n",
        "print(f\"The signals are measured in units of: {segment_metadata.units}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "f09b3f37",
      "metadata": {
        "id": "f09b3f37"
      },
      "source": [
        "See [here](https://archive.physionet.org/mimic2/mimic2_waveform_overview.shtml#signals-125-samplessecond) for definitions of signal abbreviations."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "3f56dd61",
      "metadata": {
        "id": "3f56dd61"
      },
      "source": [
        "<div class=\"alert alert-block alert-info\">\n",
        "<p><b>Q:</b> Which of these signals is no longer present in segment '83411188_0005'?</p>\n",
        "</div>"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "9f921f27",
      "metadata": {
        "id": "9f921f27"
      },
      "source": [
        "### Find out how long each signal lasts"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "d217b764",
      "metadata": {
        "id": "d217b764"
      },
      "source": [
        "All signals in a segment are time-aligned, measured at the same sampling frequency, and last the same duration:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "c44f00a7",
      "metadata": {
        "id": "c44f00a7",
        "outputId": "1cfa789e-b66b-4c8e-805b-4197c663ba18",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "The signals have a base sampling frequency of 62.5 Hz\n",
            "and they last for 0.9 minutes\n"
          ]
        }
      ],
      "source": [
        "print(f\"The signals have a base sampling frequency of {segment_metadata.fs:.1f} Hz\")\n",
        "print(f\"and they last for {segment_metadata.sig_len/(segment_metadata.fs*60):.1f} minutes\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "d2a80895",
      "metadata": {
        "id": "d2a80895"
      },
      "source": [
        "## Identify records suitable for analysis"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "1a3218d3",
      "metadata": {
        "id": "1a3218d3"
      },
      "source": [
        "- The signals and their durations vary from one record (and segment) to the next. \n",
        "- Since most studies require specific types of signals (e.g. blood pressure and photoplethysmography signals), we need to be able to identify which records (or segments) contain the required signals and duration."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "b02c0b4e",
      "metadata": {
        "id": "b02c0b4e"
      },
      "source": [
        "### Setup"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "5bb47556",
      "metadata": {
        "id": "5bb47556"
      },
      "outputs": [],
      "source": [
        "import pandas as pd\n",
        "from pprint import pprint"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "95181681",
      "metadata": {
        "id": "95181681",
        "outputId": "544c69db-59d9-432c-ee6c-10e1b0f54318",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Earlier, we loaded 200 records from the 'mimic4wdb/0.1.0' database.\n"
          ]
        }
      ],
      "source": [
        "print(f\"Earlier, we loaded {len(records)} records from the '{database_name}' database.\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "7f2b5955",
      "metadata": {
        "id": "7f2b5955"
      },
      "source": [
        "### Specify requirements"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "83f8611c",
      "metadata": {
        "id": "83f8611c"
      },
      "source": [
        "- Required signals"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "3d1505ab",
      "metadata": {
        "id": "3d1505ab"
      },
      "outputs": [],
      "source": [
        "required_sigs = ['ABP', 'Pleth']"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "03920810",
      "metadata": {
        "id": "03920810"
      },
      "source": [
        "- Required duration"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "568a93c1",
      "metadata": {
        "id": "568a93c1"
      },
      "outputs": [],
      "source": [
        "# convert from minutes to seconds\n",
        "req_seg_duration = 10*60 "
      ]
    },
    {
      "cell_type": "markdown",
      "id": "d49187cd",
      "metadata": {
        "id": "d49187cd"
      },
      "source": [
        "### Find out how many records meet the requirements"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "65f2cdce",
      "metadata": {
        "id": "65f2cdce"
      },
      "source": [
        "_NB: This step may take a while. The results are copied below to save running it yourself._"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "015b47d3",
      "metadata": {
        "id": "015b47d3"
      },
      "outputs": [],
      "source": [
        "matching_recs = {'dir':[], 'seg_name':[], 'length':[]}\n",
        "\n",
        "for record in records:\n",
        "    print('Record: {}'.format(record), end=\"\", flush=True)\n",
        "    record_dir = f'{database_name}/{record.parent}'\n",
        "    record_name = record.name\n",
        "    print(' (reading data)')\n",
        "    record_data = wfdb.rdheader(record_name,\n",
        "                                pn_dir=record_dir,\n",
        "                                rd_segments=True)\n",
        "\n",
        "    # Check whether the required signals are present in the record\n",
        "    sigs_present = record_data.sig_name\n",
        "    if not all(x in sigs_present for x in required_sigs):\n",
        "        print('   (missing signals)')\n",
        "        continue\n",
        "\n",
        "    # Get the segments for the record\n",
        "    segments = record_data.seg_name\n",
        "\n",
        "    # Check to see if the segment is 10 min long\n",
        "    # If not, move to the next one\n",
        "    gen = (segment for segment in segments if segment != '~')\n",
        "    for segment in gen:\n",
        "        print(' - Segment: {}'.format(segment), end=\"\", flush=True)\n",
        "        segment_metadata = wfdb.rdheader(record_name=segment,\n",
        "                                         pn_dir=record_dir)\n",
        "        seg_length = segment_metadata.sig_len/(segment_metadata.fs)\n",
        "\n",
        "        if seg_length < req_seg_duration:\n",
        "            print(f' (too short at {seg_length/60:.1f} mins)')\n",
        "            continue\n",
        "\n",
        "        # Next check that all required signals are present in the segment\n",
        "        sigs_present = segment_metadata.sig_name\n",
        "        \n",
        "        if all(x in sigs_present for x in required_sigs):\n",
        "            matching_recs['dir'].append(record_dir)\n",
        "            matching_recs['seg_name'].append(segment)\n",
        "            matching_recs['length'].append(seg_length)\n",
        "            print(' (met requirements)')\n",
        "            # Since we only need one segment per record break out of loop\n",
        "            break\n",
        "        else:\n",
        "            print(' (long enough, but missing signal(s))')\n",
        "\n",
        "print(f\"A total of {len(matching_recs['dir'])} records met the requirements:\")\n",
        "\n",
        "#df_matching_recs = pd.DataFrame(data=matching_recs)\n",
        "#df_matching_recs.to_csv('matching_records.csv', index=False)\n",
        "#p=1"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "75ec15f4",
      "metadata": {
        "id": "75ec15f4",
        "outputId": "3ea832cd-4a4b-4265-bc2b-275d0f6c1802",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "A total of 52 out of 200 records met the requirements.\n",
            "\n",
            "The relevant segment names are:\n",
            " - 83404654_0005\n",
            " - 82924339_0007\n",
            " - 84248019_0005\n",
            " - 82439920_0004\n",
            " - 82800131_0002\n",
            " - 84304393_0001\n",
            " - 89464742_0001\n",
            " - 88958796_0004\n",
            " - 88995377_0001\n",
            " - 85230771_0004\n",
            " - 86643930_0004\n",
            " - 81250824_0005\n",
            " - 87706224_0003\n",
            " - 83058614_0005\n",
            " - 82803505_0017\n",
            " - 88574629_0001\n",
            " - 87867111_0012\n",
            " - 84560969_0001\n",
            " - 87562386_0001\n",
            " - 88685937_0001\n",
            " - 86120311_0001\n",
            " - 89866183_0014\n",
            " - 89068160_0002\n",
            " - 86380383_0001\n",
            " - 85078610_0008\n",
            " - 87702634_0007\n",
            " - 84686667_0002\n",
            " - 84802706_0002\n",
            " - 81811182_0004\n",
            " - 84421559_0005\n",
            " - 88221516_0007\n",
            " - 80057524_0005\n",
            " - 84209926_0018\n",
            " - 83959636_0010\n",
            " - 89989722_0016\n",
            " - 89225487_0007\n",
            " - 84391267_0001\n",
            " - 80889556_0002\n",
            " - 85250558_0011\n",
            " - 84567505_0005\n",
            " - 85814172_0007\n",
            " - 88884866_0005\n",
            " - 80497954_0012\n",
            " - 80666640_0014\n",
            " - 84939605_0004\n",
            " - 82141753_0018\n",
            " - 86874920_0014\n",
            " - 84505262_0010\n",
            " - 86288257_0001\n",
            " - 89699401_0001\n",
            " - 88537698_0013\n",
            " - 83958172_0001\n",
            "\n",
            "The corresponding directories are: \n",
            " - mimic4wdb/0.1.0/waves/p100/p10020306/83404654\n",
            " - mimic4wdb/0.1.0/waves/p101/p10126957/82924339\n",
            " - mimic4wdb/0.1.0/waves/p102/p10209410/84248019\n",
            " - mimic4wdb/0.1.0/waves/p109/p10952189/82439920\n",
            " - mimic4wdb/0.1.0/waves/p111/p11109975/82800131\n",
            " - mimic4wdb/0.1.0/waves/p113/p11392990/84304393\n",
            " - mimic4wdb/0.1.0/waves/p121/p12168037/89464742\n",
            " - mimic4wdb/0.1.0/waves/p121/p12173569/88958796\n",
            " - mimic4wdb/0.1.0/waves/p121/p12188288/88995377\n",
            " - mimic4wdb/0.1.0/waves/p128/p12872596/85230771\n",
            " - mimic4wdb/0.1.0/waves/p129/p12933208/86643930\n",
            " - mimic4wdb/0.1.0/waves/p130/p13016481/81250824\n",
            " - mimic4wdb/0.1.0/waves/p132/p13240081/87706224\n",
            " - mimic4wdb/0.1.0/waves/p136/p13624686/83058614\n",
            " - mimic4wdb/0.1.0/waves/p137/p13791821/82803505\n",
            " - mimic4wdb/0.1.0/waves/p141/p14191565/88574629\n",
            " - mimic4wdb/0.1.0/waves/p142/p14285792/87867111\n",
            " - mimic4wdb/0.1.0/waves/p143/p14356077/84560969\n",
            " - mimic4wdb/0.1.0/waves/p143/p14363499/87562386\n",
            " - mimic4wdb/0.1.0/waves/p146/p14695840/88685937\n",
            " - mimic4wdb/0.1.0/waves/p149/p14931547/86120311\n",
            " - mimic4wdb/0.1.0/waves/p151/p15174162/89866183\n",
            " - mimic4wdb/0.1.0/waves/p153/p15312343/89068160\n",
            " - mimic4wdb/0.1.0/waves/p153/p15342703/86380383\n",
            " - mimic4wdb/0.1.0/waves/p155/p15552902/85078610\n",
            " - mimic4wdb/0.1.0/waves/p156/p15649186/87702634\n",
            " - mimic4wdb/0.1.0/waves/p158/p15857793/84686667\n",
            " - mimic4wdb/0.1.0/waves/p158/p15865327/84802706\n",
            " - mimic4wdb/0.1.0/waves/p158/p15896656/81811182\n",
            " - mimic4wdb/0.1.0/waves/p159/p15920699/84421559\n",
            " - mimic4wdb/0.1.0/waves/p160/p16034243/88221516\n",
            " - mimic4wdb/0.1.0/waves/p165/p16566444/80057524\n",
            " - mimic4wdb/0.1.0/waves/p166/p16644640/84209926\n",
            " - mimic4wdb/0.1.0/waves/p167/p16709726/83959636\n",
            " - mimic4wdb/0.1.0/waves/p167/p16715341/89989722\n",
            " - mimic4wdb/0.1.0/waves/p168/p16818396/89225487\n",
            " - mimic4wdb/0.1.0/waves/p170/p17032851/84391267\n",
            " - mimic4wdb/0.1.0/waves/p172/p17229504/80889556\n",
            " - mimic4wdb/0.1.0/waves/p173/p17301721/85250558\n",
            " - mimic4wdb/0.1.0/waves/p173/p17325001/84567505\n",
            " - mimic4wdb/0.1.0/waves/p174/p17490822/85814172\n",
            " - mimic4wdb/0.1.0/waves/p177/p17738824/88884866\n",
            " - mimic4wdb/0.1.0/waves/p177/p17744715/80497954\n",
            " - mimic4wdb/0.1.0/waves/p179/p17957832/80666640\n",
            " - mimic4wdb/0.1.0/waves/p180/p18080257/84939605\n",
            " - mimic4wdb/0.1.0/waves/p181/p18109577/82141753\n",
            " - mimic4wdb/0.1.0/waves/p183/p18324626/86874920\n",
            " - mimic4wdb/0.1.0/waves/p187/p18742074/84505262\n",
            " - mimic4wdb/0.1.0/waves/p188/p18824975/86288257\n",
            " - mimic4wdb/0.1.0/waves/p191/p19126489/89699401\n",
            " - mimic4wdb/0.1.0/waves/p193/p19313794/88537698\n",
            " - mimic4wdb/0.1.0/waves/p196/p19619764/83958172\n"
          ]
        }
      ],
      "source": [
        "print(f\"A total of {len(matching_recs['dir'])} out of {len(records)} records met the requirements.\")\n",
        "\n",
        "relevant_segments_names = \"\\n - \".join(matching_recs['seg_name'])\n",
        "print(f\"\\nThe relevant segment names are:\\n - {relevant_segments_names}\")\n",
        "\n",
        "relevant_dirs = \"\\n - \".join(matching_recs['dir'])\n",
        "print(f\"\\nThe corresponding directories are: \\n - {relevant_dirs}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "719f20f8",
      "metadata": {
        "id": "719f20f8"
      },
      "source": [
        "<div class=\"alert alert-block alert-info\">\n",
        "<p><b>Question:</b> Is this enough data for a study? Consider different types of studies, e.g. assessing the performance of a previously proposed algorithm to estimate BP from the PPG signal, vs. developing a deep learning approach to estimate BP from the PPG.</p>\n",
        "</div>"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "6fccda20",
      "metadata": {
        "id": "6fccda20"
      },
      "outputs": [],
      "source": [
        ""
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.8"
    },
    "toc": {
      "base_numbering": 1,
      "nav_menu": {},
      "number_sections": true,
      "sideBar": true,
      "skip_h1_title": true,
      "title_cell": "Table of Contents",
      "title_sidebar": "Contents",
      "toc_cell": false,
      "toc_position": {
        "height": "calc(100% - 180px)",
        "left": "10px",
        "top": "150px",
        "width": "306px"
      },
      "toc_section_display": true,
      "toc_window_display": false
    },
    "colab": {
      "name": "data-exploration.ipynb",
      "provenance": []
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}