From Idea to IRB: Proving Out Data-Intensive Medical Devices

Data in healthcare is becoming increasingly prominent, with many companies focusing on data insights as a primary value driver. With new IoT developments and technologies, physicians can close the gap between in-office observations and a detailed, complete picture of a patient’s health. Examples of data-driven solutions around preventative health, personalized medicine, and telehealth include:

  • Monitoring and diagnosing health issues with IoT sensors, computer vision, and/or machine learning for consumers, patients, and/or doctors
  • Enabling a device to identify health issues without requiring patients to go to a doctor for initial screening
  • Reducing the costs and load on healthcare in places where universal health care isn’t available

When developing a business that depends on intelligently connected data, a significant amount of engineering needs to occur to bring the idea into reality. To get to this point, a machine learning training dataset needs to be collected to enable classification and detection of the issue based on the incoming data. Although simple in theory, having the logistics planned out and well executed can expedite the data collection (especially for the initial data set) and speed up your time to market.

The general workflow of activities is as follows:

  1. Submit clinical trial application to the Institutional Review Board for approval
  2. Develop hardware prototypes for data collection
  3. Deploy prototypes to collect data for initial data set
  4. Conduct data processing, labelling, and analyses
  5. Develop algorithms based on data to identify health issues
  6. Revise prototypes as required based on insights from data
  7. Repeat collection and refine algorithms with new data



In this post, we’ll provide an overview of how you can implement your intelligent connected device business idea and start collecting data for the development of your proprietary machine learning algorithms to build out your value proposition.


The Institutional Review Board

Before collecting data, you must first gain approval from the Institutional Review Board (IRB). The IRB is an administrative body designed to protect the rights and welfare of humans that are recruited to participate in research activities in a clinical trial. The IRB’s main goal is to identify all possible risks in the study (including the hardware that is developed for the study), and how these risks will be mitigated to ensure protection of the rights and welfare of human subjects for research. 

Data security during IoT engineering is of utmost importance to maintain patient confidentiality. If the data contains any patient identifying information (i.e. a picture of the person’s face), then the data is required to be stored in a Health Insurance Portability and Accountability Act (HIPAA) compliant storage solution. If not, the storage solution must still be secure, but will have less stringent requirements.

Once the study has been accepted by the IRB, the device can be deployed for data collection.


Collecting Data

The device will need to collect and store data, and the data will need to end up in a data warehouse for processing, labeling, and algorithm building. The workflow will depend on whether the device is offline and standalone, or if cloud connectivity is available.

Non-Cloud Connected Devices
If a cloud connection is not an option, the device needs to store data locally and the data must be exported from the device to a host computer for final transfer to the data warehouse.

This method works at small volumes, but it relies on research assistants and/or clinicians to manually conduct the data transfer. While this mitigates the development cost of adding cloud connectivity, it adds operational overhead of maintaining the study as well as introducing sporadic data availability and potential user error. If the device malfunctions before data has been transferred, it makes data recovery difficult. Storage can also run out if data isn’t transferred in a timely manner, as clinicians have other priorities and may not be able to upload data as frequently as desired.

Cloud Connected Devices
If cloud connectivity is an option, then many of the above issues are mitigated. Of course, this comes with an additional development cost to consider, but it can be well worth it especially if the study will continue for an extended period.

IoT ecosystems have integrated themselves into our everyday lives

With a cloud connected device, data can be continually uploaded, monitored, and processed. For example, if image data is being collected, the images can be monitored to ensure high quality and fidelity and that the devices are operating correctly. Additionally, having cloud connected devices enables modifications and control of the devices from the cloud, so device parameters can be modified and software fixes can be applied over the air with minimal disruption to the study and data collection.

In terms of cloud providers, Azure and AWS are leading contenders, where both have APIs and SDKs for popular embedded platforms and microcontrollers. For this solution, the data flows from the device through the IoT hub and into a desired data warehouse, with an optional step for stream processing in between.

Cloud storage is also inexpensive; the total cost of ownership is typically lower for cloud solutions compared with self-hosted solutions. There are no upfront costs like hardware that needs to be purchased, provisioned, maintained, and replaced. Capacity can be added/removed on demand, as well as changes in storage performance, which provides the most flexibility to your business. You only need to pay for storage you actually use, and retention policies can be put into place such that less frequently accessed data can be automatically moved to lower cost tiers. Additionally, a centralized storage solution in the cloud can enable lifecycle management policies, where data can be automatically locked down in support of compliance requirements.

Many cloud storage options are available, depending on your storage and access requirements. Two common options are general object storage (Amazon S3, Azure Blob Storage) and larger, flexible, and more powerful data lake storage (Amazon, Azure). A brief overview is below:

Blob Storage

  • Great for images and other forms of data
  • Text or binary data
  • Inexpensive
  • Scalable, tiered
  • Pricing varies depending on required access (i.e. “hot” vs “cold”)
  • Pricing in Azure = First 50 TB: $0.01 per GB (cold), $0.0184 per GB (hot)



Data Lake Storage

  • Optimized for big data analytics workloads
  • Streaming analytics, machine learning data such as log files, IoT telemetry, click streams, etc.
  • Great for event streaming or IoT, can persist large amounts of relational and nonrelational data without transformation or schema definition
  • Built to handle high volumes of small writes at low latency, optimized for massive throughput
  • No size limits
  • More expensive, less configurable
  • Pricing in Azure = First 100 TB: $0.039 per GB



With any type of data warehousing, a few key challenges exist. With excess data, it may be difficult to guarantee the quality of the incoming data without monitoring in place. Data storage by itself doesn’t provide an integrated or holistic view of the data or insights, and ultimately, the data storage may become a dumping ground for data that is never actually analyzed or used for actionable insights. However, these pain points can be addressed with proper planning and implementation.

Source: Pragmatic Works

Analyzing Data

Once the data is collected (or is being collected), it must be processed, analyzed, and trained to enable the automatic detection of health issues (or other application). Engineers may opt to initially perform these development efforts locally due to the speed of prototyping. Later on, a processing pipeline can be built on the cloud to run automatically on incoming data.

Two main groups of cloud services exist. Azure Cognitive Services provides black boxes of trained models for you to use, which can enable faster analyses since less development effort is required to be able to analyze data with intelligence. However, for full flexibility, Azure Machine Learning provides a platform for you to build your own models on. However, there is also the option of simply using a virtual machine with customized specifications to match your processing needs for maximum flexibility. A brief overview is provided below:

Azure Cognitive Services

  • If your application falls into one of their use case categories, and if you need to get out of the door fast and are satisfied with the results it returns
  • Provides machine learning outputs without requiring machine learning expertise
  • Example applications: Language processing, speech recognition, computer vision content analysis, face detection, etc


Azure Machine Learning

  • Build, train, deploy, automate, manage, and track machine learning models
  • Options to write Python or R code, or work with no-code/low-code options
  • Quick to get to a point where results can be served via an API, and interface results into custom application (i.e. PowerBI, custom dashboard, or other insights)
  • Good for quickly testing a ML driven product idea, gets product viability results fast
  • Becomes more complicated to use for more sophisticated/deep learning stuff
  • Faster development since computations are done on the cloud, reducing time-intensive tasks like cross-validation or model training

At this stage, our clients will typically take the data from here. Their value proposition is centered around the intelligent analysis of the data, and our work is to enable the data to get from the collection source to the analysis pipeline.



Data presents a large promise for the future of healthcare and medical treatment. The collection of high quality data is vital to improving patient outcomes, and will enable care providers to be able to identify symptoms and diagnose issues more quickly and easily than before. Integrating connected devices and real-time data will provide the medical industry the ability to allow for broader insights into health and disease.

We have extensive experience collaborating with companies in the connected health space, and our expertise in leveraging emerging technologies will allow us to create solutions to help patients receive better care. Email us at [email protected] to explore how we can help you on this journey.


Please wait...
Scroll to Top