Data Exploration#

Let’s begin by exploring data in the MIMIC Waveform Database.

Our objectives are to:

  • Review the structure of the MIMIC Waveform Database (considering subjects, studies, records, and segments).

  • Load waveforms using the WFDB toolbox.

  • Find out which signals are present in selected records and segments, and how long the signals last.

  • Search for records that contain signals of interest.

Resource: You can find out more about the MIMIC Waveform Database here.


Setup#

Specify the required Python packages#

We’ll import the following:

  • sys: an essential python package

  • pathlib (well a particular function from pathlib, called Path)

import sys
from pathlib import Path

Specify a particular version of the WFDB Toolbox#

  • wfdb: For this workshop we will be using version 4 of the WaveForm DataBase (WFDB) Toolbox package. The package contains tools for processing waveform data such as those found in MIMIC:

!pip install wfdb==4.0.0
import wfdb
Requirement already satisfied: wfdb==4.0.0 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (4.0.0)
Requirement already satisfied: SoundFile<0.12.0,>=0.10.0 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from wfdb==4.0.0) (0.11.0)
Requirement already satisfied: matplotlib<4.0.0,>=3.2.2 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from wfdb==4.0.0) (3.5.2)
Requirement already satisfied: numpy<2.0.0,>=1.10.1 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from wfdb==4.0.0) (1.26.4)
Requirement already satisfied: pandas<2.0.0,>=1.0.0 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from wfdb==4.0.0) (1.5.3)
Requirement already satisfied: requests<3.0.0,>=2.8.1 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from wfdb==4.0.0) (2.32.3)
Requirement already satisfied: scipy<2.0.0,>=1.0.0 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from wfdb==4.0.0) (1.14.0)
Requirement already satisfied: cycler>=0.10 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (4.53.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (1.4.5)
Requirement already satisfied: packaging>=20.0 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (24.1)
Requirement already satisfied: pillow>=6.2.0 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (10.4.0)
Requirement already satisfied: pyparsing>=2.2.1 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (3.1.2)
Requirement already satisfied: python-dateutil>=2.7 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from pandas<2.0.0,>=1.0.0->wfdb==4.0.0) (2024.1)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from requests<3.0.0,>=2.8.1->wfdb==4.0.0) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from requests<3.0.0,>=2.8.1->wfdb==4.0.0) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from requests<3.0.0,>=2.8.1->wfdb==4.0.0) (2.2.2)
Requirement already satisfied: certifi>=2017.4.17 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from requests<3.0.0,>=2.8.1->wfdb==4.0.0) (2024.7.4)
Requirement already satisfied: cffi>=1.0 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from SoundFile<0.12.0,>=0.10.0->wfdb==4.0.0) (1.16.0)
Requirement already satisfied: pycparser in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from cffi>=1.0->SoundFile<0.12.0,>=0.10.0->wfdb==4.0.0) (2.22)
Requirement already satisfied: six>=1.5 in /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib<4.0.0,>=3.2.2->wfdb==4.0.0) (1.16.0)

Resource: You can find out more about the WFDB package here.

Now that we have imported these packages (i.e. toolboxes) we have a set of tools (functions) ready to use.

Specify the name of the MIMIC Waveform Database#

database_name = 'mimic4wdb/0.1.0'

Identify the records in the database#

Get a list of records#

  • Use the get_record_list function from the WFDB toolbox to get a list of records in the database.

# each subject may be associated with multiple records
subjects = wfdb.get_record_list(database_name)
print(f"The '{database_name}' database contains data from {len(subjects)} subjects")

# set max number of records to load
max_records_to_load = 200
The 'mimic4wdb/0.1.0' database contains data from 198 subjects
# iterate the subjects to get a list of records
records = []
for subject in subjects:
    studies = wfdb.get_record_list(f'{database_name}/{subject}')
    for study in studies:
        records.append(Path(f'{subject}{study}'))
        # stop if we've loaded enough records
        if len(records) >= max_records_to_load:
            print("Reached maximum required number of records.")
            break

print(f"Loaded {len(records)} records from the '{database_name}' database.")
Reached maximum required number of records.
Loaded 200 records from the 'mimic4wdb/0.1.0' database.

Look at the records#

  • Display the first few records

# format and print first five records
first_five_records = [str(x) for x in records[0:5]]
first_five_records = "\n - ".join(first_five_records)
print(f"First five records: \n - {first_five_records}")

print("""
Note the formatting of these records:
 - intermediate directory ('p100' in this case)
 - subject identifier (e.g. 'p10014354')
 - record identifier (e.g. '81739927'
 """)
First five records: 
 - waves/p100/p10014354/81739927/81739927
 - waves/p100/p10019003/87033314/87033314
 - waves/p100/p10020306/83404654/83404654
 - waves/p100/p10039708/83411188/83411188
 - waves/p100/p10039708/85583557/85583557

Note the formatting of these records:
 - intermediate directory ('p100' in this case)
 - subject identifier (e.g. 'p10014354')
 - record identifier (e.g. '81739927'
 

Q: Can you print the names of the last five records?
Hint: in Python, the last five elements can be specified using '[-5:]'


Extract metadata for a record#

Each record contains metadata stored in a header file, named “<record name>.hea

Specify the online directory containing a record’s data#

# Specify the 4th record (note, in Python indexing begins at 0)
idx = 3
record = records[idx]
record_dir = f'{database_name}/{record.parent}'
print("PhysioNet directory specified for record: {}".format(record_dir))
PhysioNet directory specified for record: mimic4wdb/0.1.0/waves/p100/p10039708/83411188

Specify the subject identifier#

Extract the record name (e.g. ‘83411188’) from the record (e.g. ‘p100/p10039708/83411188/83411188’):

record_name = record.name
print("Record name: {}".format(record_name))
Record name: 83411188

Load the metadata for this record#

  • Use the rdheader function from the WFDB toolbox to load metadata from the record header file

record_data = wfdb.rdheader(record_name, pn_dir=record_dir, rd_segments=True)
remote_url = "https://physionet.org/content/" + record_dir + "/" + record_name + ".hea"
print(f"Done: metadata loaded for record '{record_name}' from the header file at:\n{remote_url}")
Done: metadata loaded for record '83411188' from the header file at:
https://physionet.org/content/mimic4wdb/0.1.0/waves/p100/p10039708/83411188/83411188.hea

Inspect details of physiological signals recorded in this record#

  • Printing a few details of the signals from the extracted metadata

print(f"- Number of signals: {record_data.n_sig}".format())
print(f"- Duration: {record_data.sig_len/(record_data.fs*60*60):.1f} hours") 
print(f"- Base sampling frequency: {record_data.fs} Hz")
- Number of signals: 6
- Duration: 14.2 hours
- Base sampling frequency: 62.4725 Hz

Inspect the segments making up a record#

Each record is typically made up of several segments

segments = record_data.seg_name
print(f"The {len(segments)} segments from record {record_name} are:\n{segments}")
The 6 segments from record 83411188 are:
['83411188_0000', '83411188_0001', '83411188_0002', '83411188_0003', '83411188_0004', '83411188_0005']

The format of filename for each segment is: record directory, "_", segment number


Inspect an individual segment#

Read the metadata for this segment#

  • Read the metadata from the header file

segment_metadata = wfdb.rdheader(record_name=segments[2], pn_dir=record_dir)

print(f"""Header metadata loaded for: 
- the segment '{segments[2]}'
- in record '{record_name}'
- for subject '{str(Path(record_dir).parent.parts[-1])}'
""")
Header metadata loaded for: 
- the segment '83411188_0002'
- in record '83411188'
- for subject 'p10039708'

Find out what signals are present#

print(f"This segment contains the following signals: {segment_metadata.sig_name}")
print(f"The signals are measured in units of: {segment_metadata.units}")
This segment contains the following signals: ['II', 'V', 'aVR', 'ABP', 'Pleth', 'Resp']
The signals are measured in units of: ['mV', 'mV', 'mV', 'mmHg', 'NU', 'Ohm']

See here for definitions of signal abbreviations.

Q: Which of these signals is no longer present in segment '83411188_0005'?

Find out how long each signal lasts#

All signals in a segment are time-aligned, measured at the same sampling frequency, and last the same duration:

print(f"The signals have a base sampling frequency of {segment_metadata.fs:.1f} Hz")
print(f"and they last for {segment_metadata.sig_len/(segment_metadata.fs*60):.1f} minutes")
The signals have a base sampling frequency of 62.5 Hz
and they last for 0.9 minutes

Identify records suitable for analysis#

  • The signals and their durations vary from one record (and segment) to the next.

  • Since most studies require specific types of signals (e.g. blood pressure and photoplethysmography signals), we need to be able to identify which records (or segments) contain the required signals and duration.

Setup#

import pandas as pd
from pprint import pprint
print(f"Earlier, we loaded {len(records)} records from the '{database_name}' database.")
Earlier, we loaded 200 records from the 'mimic4wdb/0.1.0' database.

Specify requirements#

  • Required signals

required_sigs = ['ABP', 'Pleth']
  • Required duration

# convert from minutes to seconds
req_seg_duration = 10*60 

Find out how many records meet the requirements#

NB: This step may take a while. The results are copied below to save running it yourself.

matching_recs = {'dir':[], 'seg_name':[], 'length':[]}

for record in records:
    print('Record: {}'.format(record), end="", flush=True)
    record_dir = f'{database_name}/{record.parent}'
    record_name = record.name
    print(' (reading data)')
    record_data = wfdb.rdheader(record_name,
                                pn_dir=record_dir,
                                rd_segments=True)

    # Check whether the required signals are present in the record
    sigs_present = record_data.sig_name
    if not all(x in sigs_present for x in required_sigs):
        print('   (missing signals)')
        continue

    # Get the segments for the record
    segments = record_data.seg_name

    # Check to see if the segment is 10 min long
    # If not, move to the next one
    gen = (segment for segment in segments if segment != '~')
    for segment in gen:
        print(' - Segment: {}'.format(segment), end="", flush=True)
        segment_metadata = wfdb.rdheader(record_name=segment,
                                         pn_dir=record_dir)
        seg_length = segment_metadata.sig_len/(segment_metadata.fs)

        if seg_length < req_seg_duration:
            print(f' (too short at {seg_length/60:.1f} mins)')
            continue

        # Next check that all required signals are present in the segment
        sigs_present = segment_metadata.sig_name
        
        if all(x in sigs_present for x in required_sigs):
            matching_recs['dir'].append(record_dir)
            matching_recs['seg_name'].append(segment)
            matching_recs['length'].append(seg_length)
            print(' (met requirements)')
            # Since we only need one segment per record break out of loop
            break
        else:
            print(' (long enough, but missing signal(s))')

print(f"A total of {len(matching_recs['dir'])} records met the requirements:")

#df_matching_recs = pd.DataFrame(data=matching_recs)
#df_matching_recs.to_csv('matching_records.csv', index=False)
#p=1
Record: waves/p100/p10014354/81739927/81739927
 (reading data)
   (missing signals)
Record: waves/p100/p10019003/87033314/87033314
 (reading data)
   (missing signals)
Record: waves/p100/p10020306/83404654/83404654
 (reading data)
 - Segment: 83404654_0000
 (too short at 0.0 mins)
 - Segment: 83404654_0001
 (long enough, but missing signal(s))
 - Segment: 83404654_0002
 (too short at 0.1 mins)
 - Segment: 83404654_0003
 (too short at 0.3 mins)
 - Segment: 83404654_0004
 (long enough, but missing signal(s))
 - Segment: 83404654_0005
 (met requirements)
Record: waves/p100/p10039708/83411188/83411188
 (reading data)
 - Segment: 83411188_0000
 (too short at 0.0 mins)
 - Segment: 83411188_0001
 (too short at 0.1 mins)
 - Segment: 83411188_0002
 (too short at 0.9 mins)
 - Segment: 83411188_0003
 (too short at 0.3 mins)
 - Segment: 83411188_0004
 (too short at 0.3 mins)
 - Segment: 83411188_0005
 (long enough, but missing signal(s))
Record: waves/p100/p10039708/85583557/85583557
 (reading data)
   (missing signals)
Record: waves/p100/p10079700/85594648/85594648
 (reading data)
   (missing signals)
Record: waves/p100/p10082591/84050536/84050536
 (reading data)
   (missing signals)
Record: waves/p101/p10100546/83268087/83268087
 (reading data)
   (missing signals)
Record: waves/p101/p10112163/88501826/88501826
 (reading data)
   (missing signals)
Record: waves/p101/p10126957/82924339/82924339
 (reading data)
 - Segment: 82924339_0000
 (too short at 0.0 mins)
 - Segment: 82924339_0001
 (too short at 0.2 mins)
 - Segment: 82924339_0002
 (too short at 0.1 mins)
 - Segment: 82924339_0003
 (too short at 0.4 mins)
 - Segment: 82924339_0004
 (too short at 0.1 mins)
 - Segment: 82924339_0005
 (too short at 0.0 mins)
 - Segment: 82924339_0006
 (too short at 5.3 mins)
 - Segment: 82924339_0007
 (met requirements)
Record: waves/p102/p10209410/84248019/84248019
 (reading data)
 - Segment: 84248019_0000
 (too short at 0.0 mins)
 - Segment: 84248019_0001
 (too short at 0.1 mins)
 - Segment: 84248019_0002
 (too short at 4.8 mins)
 - Segment: 84248019_0003
 (too short at 0.2 mins)
 - Segment: 84248019_0004
 (too short at 1.0 mins)
 - Segment: 84248019_0005
 (met requirements)
Record: waves/p103/p10303080/88399302/88399302
 (reading data)
   (missing signals)
Record: waves/p104/p10494990/88374538/88374538
 (reading data)
   (missing signals)
Record: waves/p105/p10560354/81105139/81105139
 (reading data)
   (missing signals)
Record: waves/p106/p10680081/86426168/86426168
 (reading data)
   (missing signals)
Record: waves/p108/p10882818/81826943/81826943
 (reading data)
   (missing signals)
Record: waves/p109/p10952189/82439920/82439920
 (reading data)
 - Segment: 82439920_0000
 (too short at 0.0 mins)
 - Segment: 82439920_0001
 (too short at 0.1 mins)
 - Segment: 82439920_0002
 (too short at 0.0 mins)
 - Segment: 82439920_0003
 (too short at 0.1 mins)
 - Segment: 82439920_0004
 (met requirements)
Record: waves/p110/p11013146/82432904/82432904
 (reading data)
   (missing signals)
Record: waves/p111/p11109975/82800131/82800131
 (reading data)
 - Segment: 82800131_0000
 (too short at 0.0 mins)
 - Segment: 82800131_0001
 (too short at 0.1 mins)
 - Segment: 82800131_0002
 (met requirements)
Record: waves/p113/p11320864/81312415/81312415
 (reading data)
   (missing signals)
Record: waves/p113/p11392990/84304393/84304393
 (reading data)
 - Segment: 84304393_0000
 (too short at 0.0 mins)
 - Segment: 84304393_0001
 (met requirements)
Record: waves/p115/p11552552/82650378/82650378
 (reading data)
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[19], line 8
      6 record_name = record.name
      7 print(' (reading data)')
----> 8 record_data = wfdb.rdheader(record_name,
      9                             pn_dir=record_dir,
     10                             rd_segments=True)
     12 # Check whether the required signals are present in the record
     13 sigs_present = record_data.sig_name

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/wfdb/io/record.py:1799, in rdheader(record_name, pn_dir, rd_segments)
   1797         header_content = f.read()
   1798 else:
-> 1799     header_content = download._stream_header(file_name, pn_dir)
   1801 # Separate comment and non-comment lines
   1802 header_lines, comment_lines = _header.parse_header_content(header_content)

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/wfdb/io/download.py:109, in _stream_header(file_name, pn_dir)
    107 # Get the content of the remote file
    108 with _url.openurl(url, "rb") as f:
--> 109     content = f.read()
    111 return content.decode("iso-8859-1")

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/wfdb/io/_url.py:581, in NetFile.read(self, size)
    578 else:
    579     raise ValueError("invalid size: %r" % (size,))
--> 581 result = b"".join(self._read_range(start, end))
    582 self._pos += len(result)
    583 return result

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/wfdb/io/_url.py:474, in NetFile._read_range(self, start, end)
    471         req_end = req_start + buffer_size
    472         buffer_store = True
--> 474 with RangeTransfer(self._current_url, req_start, req_end) as xfer:
    475     # Update current file URL.
    476     self._current_url = xfer.response_url
    478     # If we requested a range but the server doesn't support
    479     # random access, then unless buffering is disabled, save
    480     # entire file in the buffer.

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/wfdb/io/_url.py:163, in RangeTransfer.__init__(self, url, start, end)
    158     headers = {
    159         "Accept-Encoding": None,
    160     }
    162 session = _get_session()
--> 163 self._response = session.request(
    164     method, url, headers=headers, stream=True
    165 )
    166 self._content_iter = self._response.iter_content(4096)
    167 try:

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/requests/sessions.py:589, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    584 send_kwargs = {
    585     "timeout": timeout,
    586     "allow_redirects": allow_redirects,
    587 }
    588 send_kwargs.update(settings)
--> 589 resp = self.send(prep, **send_kwargs)
    591 return resp

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/requests/sessions.py:703, in Session.send(self, request, **kwargs)
    700 start = preferred_clock()
    702 # Send the request
--> 703 r = adapter.send(request, **kwargs)
    705 # Total elapsed time of the request (approximately)
    706 elapsed = preferred_clock() - start

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/requests/adapters.py:667, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
    664     timeout = TimeoutSauce(connect=timeout, read=timeout)
    666 try:
--> 667     resp = conn.urlopen(
    668         method=request.method,
    669         url=url,
    670         body=request.body,
    671         headers=request.headers,
    672         redirect=False,
    673         assert_same_host=False,
    674         preload_content=False,
    675         decode_content=False,
    676         retries=self.max_retries,
    677         timeout=timeout,
    678         chunked=chunked,
    679     )
    681 except (ProtocolError, OSError) as err:
    682     raise ConnectionError(err, request=request)

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/urllib3/connectionpool.py:789, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, preload_content, decode_content, **response_kw)
    786 response_conn = conn if not release_conn else None
    788 # Make the request on the HTTPConnection object
--> 789 response = self._make_request(
    790     conn,
    791     method,
    792     url,
    793     timeout=timeout_obj,
    794     body=body,
    795     headers=headers,
    796     chunked=chunked,
    797     retries=retries,
    798     response_conn=response_conn,
    799     preload_content=preload_content,
    800     decode_content=decode_content,
    801     **response_kw,
    802 )
    804 # Everything went great!
    805 clean_exit = True

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/urllib3/connectionpool.py:536, in HTTPConnectionPool._make_request(self, conn, method, url, body, headers, retries, timeout, chunked, response_conn, preload_content, decode_content, enforce_content_length)
    534 # Receive the response from the server
    535 try:
--> 536     response = conn.getresponse()
    537 except (BaseSSLError, OSError) as e:
    538     self._raise_timeout(err=e, url=url, timeout_value=read_timeout)

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/urllib3/connection.py:464, in HTTPConnection.getresponse(self)
    461 from .response import HTTPResponse
    463 # Get the response from http.client.HTTPConnection
--> 464 httplib_response = super().getresponse()
    466 try:
    467     assert_header_parsing(httplib_response.msg)

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/http/client.py:1375, in HTTPConnection.getresponse(self)
   1373 try:
   1374     try:
-> 1375         response.begin()
   1376     except ConnectionError:
   1377         self.close()

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/http/client.py:318, in HTTPResponse.begin(self)
    316 # read until we get a non-100 response
    317 while True:
--> 318     version, status, reason = self._read_status()
    319     if status != CONTINUE:
    320         break

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/http/client.py:279, in HTTPResponse._read_status(self)
    278 def _read_status(self):
--> 279     line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
    280     if len(line) > _MAXLINE:
    281         raise LineTooLong("status line")

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/socket.py:705, in SocketIO.readinto(self, b)
    703 while True:
    704     try:
--> 705         return self._sock.recv_into(b)
    706     except timeout:
    707         self._timeout_occurred = True

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/ssl.py:1307, in SSLSocket.recv_into(self, buffer, nbytes, flags)
   1303     if flags != 0:
   1304         raise ValueError(
   1305           "non-zero flags not allowed in calls to recv_into() on %s" %
   1306           self.__class__)
-> 1307     return self.read(nbytes, buffer)
   1308 else:
   1309     return super().recv_into(buffer, nbytes, flags)

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/ssl.py:1163, in SSLSocket.read(self, len, buffer)
   1161 try:
   1162     if buffer is not None:
-> 1163         return self._sslobj.read(len, buffer)
   1164     else:
   1165         return self._sslobj.read(len)

KeyboardInterrupt: 
print(f"A total of {len(matching_recs['dir'])} out of {len(records)} records met the requirements.")

relevant_segments_names = "\n - ".join(matching_recs['seg_name'])
print(f"\nThe relevant segment names are:\n - {relevant_segments_names}")

relevant_dirs = "\n - ".join(matching_recs['dir'])
print(f"\nThe corresponding directories are: \n - {relevant_dirs}")
A total of 52 out of 200 records met the requirements.

The relevant segment names are:
 - 83404654_0005
 - 82924339_0007
 - 84248019_0005
 - 82439920_0004
 - 82800131_0002
 - 84304393_0001
 - 89464742_0001
 - 88958796_0004
 - 88995377_0001
 - 85230771_0004
 - 86643930_0004
 - 81250824_0005
 - 87706224_0003
 - 83058614_0005
 - 82803505_0017
 - 88574629_0001
 - 87867111_0012
 - 84560969_0001
 - 87562386_0001
 - 88685937_0001
 - 86120311_0001
 - 89866183_0014
 - 89068160_0002
 - 86380383_0001
 - 85078610_0008
 - 87702634_0007
 - 84686667_0002
 - 84802706_0002
 - 81811182_0004
 - 84421559_0005
 - 88221516_0007
 - 80057524_0005
 - 84209926_0018
 - 83959636_0010
 - 89989722_0016
 - 89225487_0007
 - 84391267_0001
 - 80889556_0002
 - 85250558_0011
 - 84567505_0005
 - 85814172_0007
 - 88884866_0005
 - 80497954_0012
 - 80666640_0014
 - 84939605_0004
 - 82141753_0018
 - 86874920_0014
 - 84505262_0010
 - 86288257_0001
 - 89699401_0001
 - 88537698_0013
 - 83958172_0001

The corresponding directories are: 
 - mimic4wdb/0.1.0/waves/p100/p10020306/83404654
 - mimic4wdb/0.1.0/waves/p101/p10126957/82924339
 - mimic4wdb/0.1.0/waves/p102/p10209410/84248019
 - mimic4wdb/0.1.0/waves/p109/p10952189/82439920
 - mimic4wdb/0.1.0/waves/p111/p11109975/82800131
 - mimic4wdb/0.1.0/waves/p113/p11392990/84304393
 - mimic4wdb/0.1.0/waves/p121/p12168037/89464742
 - mimic4wdb/0.1.0/waves/p121/p12173569/88958796
 - mimic4wdb/0.1.0/waves/p121/p12188288/88995377
 - mimic4wdb/0.1.0/waves/p128/p12872596/85230771
 - mimic4wdb/0.1.0/waves/p129/p12933208/86643930
 - mimic4wdb/0.1.0/waves/p130/p13016481/81250824
 - mimic4wdb/0.1.0/waves/p132/p13240081/87706224
 - mimic4wdb/0.1.0/waves/p136/p13624686/83058614
 - mimic4wdb/0.1.0/waves/p137/p13791821/82803505
 - mimic4wdb/0.1.0/waves/p141/p14191565/88574629
 - mimic4wdb/0.1.0/waves/p142/p14285792/87867111
 - mimic4wdb/0.1.0/waves/p143/p14356077/84560969
 - mimic4wdb/0.1.0/waves/p143/p14363499/87562386
 - mimic4wdb/0.1.0/waves/p146/p14695840/88685937
 - mimic4wdb/0.1.0/waves/p149/p14931547/86120311
 - mimic4wdb/0.1.0/waves/p151/p15174162/89866183
 - mimic4wdb/0.1.0/waves/p153/p15312343/89068160
 - mimic4wdb/0.1.0/waves/p153/p15342703/86380383
 - mimic4wdb/0.1.0/waves/p155/p15552902/85078610
 - mimic4wdb/0.1.0/waves/p156/p15649186/87702634
 - mimic4wdb/0.1.0/waves/p158/p15857793/84686667
 - mimic4wdb/0.1.0/waves/p158/p15865327/84802706
 - mimic4wdb/0.1.0/waves/p158/p15896656/81811182
 - mimic4wdb/0.1.0/waves/p159/p15920699/84421559
 - mimic4wdb/0.1.0/waves/p160/p16034243/88221516
 - mimic4wdb/0.1.0/waves/p165/p16566444/80057524
 - mimic4wdb/0.1.0/waves/p166/p16644640/84209926
 - mimic4wdb/0.1.0/waves/p167/p16709726/83959636
 - mimic4wdb/0.1.0/waves/p167/p16715341/89989722
 - mimic4wdb/0.1.0/waves/p168/p16818396/89225487
 - mimic4wdb/0.1.0/waves/p170/p17032851/84391267
 - mimic4wdb/0.1.0/waves/p172/p17229504/80889556
 - mimic4wdb/0.1.0/waves/p173/p17301721/85250558
 - mimic4wdb/0.1.0/waves/p173/p17325001/84567505
 - mimic4wdb/0.1.0/waves/p174/p17490822/85814172
 - mimic4wdb/0.1.0/waves/p177/p17738824/88884866
 - mimic4wdb/0.1.0/waves/p177/p17744715/80497954
 - mimic4wdb/0.1.0/waves/p179/p17957832/80666640
 - mimic4wdb/0.1.0/waves/p180/p18080257/84939605
 - mimic4wdb/0.1.0/waves/p181/p18109577/82141753
 - mimic4wdb/0.1.0/waves/p183/p18324626/86874920
 - mimic4wdb/0.1.0/waves/p187/p18742074/84505262
 - mimic4wdb/0.1.0/waves/p188/p18824975/86288257
 - mimic4wdb/0.1.0/waves/p191/p19126489/89699401
 - mimic4wdb/0.1.0/waves/p193/p19313794/88537698
 - mimic4wdb/0.1.0/waves/p196/p19619764/83958172

Question: Is this enough data for a study? Consider different types of studies, e.g. assessing the performance of a previously proposed algorithm to estimate BP from the PPG signal, vs. developing a deep learning approach to estimate BP from the PPG.