Scaling Clinical Data Sharing for a Global CRO

The client is a global contract research organization (CRO), supporting pharmaceutical and biotech companies with clinical trial operations and data-driven insights. As sponsors pushed for faster, cleaner access to trial data and greater flexibility in how they consumed it—the CRO’s Azure Synapse–based Centralized Master Data Management (CMDM) platform became a bottleneck.

Pipelines ran slowly, data quality issues surfaced in sponsor deliveries and onboarding new sponsors demanded heavy custom engineering. The monolithic Synapse setup drove high total cost of ownership, while rigid sharing options made it hard to support modern data consumers. Sponsors increasingly expected compatibility with platforms such as Snowflake and Databricks, but the existing architecture was not built for that reality.

To control cost and improve flexibility, the client also needed to decommission the Azure Synapse environment, which had effectively become a bronze layer in the landscape. The CMDM sitting above it as the silver and gold layers carried its own baggage; recurring data quality issues, latency in feeds to sponsors and limited options for how data could be exposed. At the same time, many sponsors were standardizing on Snowflake and wanted their clinical datasets delivered natively there. Any modernized solution had to address both challenges: retiring the costly Synapse estate and building a Snowflake-ready, sponsor-centric foundation.

Sponsors wanted clinical data that arrived faster, cleaner and in the tools they already trust. Over time, delays in data delivery extended trial timelines and strained sponsor confidence. Without rethinking the data platform, the CRO risked slower growth, higher costs and eroding credibility as a trusted data partner.

To move from a constrained CMDM to a sponsor-ready ecosystem, the CRO needed more than a lift-and-shift—it needed a new data spine designed for sharing and scale.

Persistent partnered with the client to modernize its data ecosystem into a multi-cloud, share-ready lakehouse capable of scaling across sponsors, platforms and geographies.

Re-platforming the CMDM core: Migrated legacy Synapse-based CMDM workloads to a Databricks lakehouse designed for governed, high-performance clinical data processing.
End-to-end pipeline orchestration: Standardized ingestion and transformation using Databricks Workflows, replacing fragmented scripts with observable, automated flows.
Multimodal data sharing: Enabled Snowflake Data Sharing, Delta Sharing on Databricks and Denodo/SFTP to support sponsors regardless of their preferred data platform.
Security and governance built in: Leveraged Snowflake’s HIPAA-compliant environment, Azure AD–based identity and role-based controls to secure sponsor access across clouds.

Persistent Team Innovations

Beyond the core migration and sharing layers, the Persistent team piloted targeted innovations to demonstrate the full potential of the new stack in practice.

Agentic ETL and Conversational Interface

Built an automated ETL pipeline using Cortex Agents and Streamlit, with SCD Type 2 implemented on the gold layer.
Leveraged Snowflake Document AI to extract structured data from unstructured STTM sheets.
Developed an NLP-powered conversational interface using Snowflake Cortex Intelligence enabling clinical teams to query datasets and gain real-time insights.
Showcased the solution at a Snowflake hackathon, where it received strong appreciation.

Study Onboarding and Offboarding Automation

Developed a Streamlit app for self-service study onboarding and offboarding, eliminating day-to-day developer dependency.
Maintained detailed audit logs and a user-friendly interface to improve operational efficiency and transparency.
Demonstrated the app to client leadership, highlighting the step-change in process automation and usability.

With the new lakehouse foundation in place, focus shifted to making the journey from raw ingestion to sponsor-ready datasets predictable, governed and reusable across engagements.

In the modernized architecture, clinical trial data flows through four governed stages:

Inbound integration: Data from Veeva CTMS and other trial systems is ingested into Databricks, where pipelines manage cleansing, enrichment and standardization.
Curation and reconciliation: Within Databricks, automated transformations prepare curated datasets, while Persistent’s iAURA Reconciliation Accelerator compares source and target tables, schemas and CSV files to ensure outbound data accurately reflects what was originally captured.
Outbound sharing and collaboration: The platform supports multiple modes of secure delivery:
- Snowflake Data Sharing for sponsors on Snowflake, enabling frictionless, zero-movement sharing.
- Delta Sharing via Databricks for near real-time collaboration across diverse environments.
- Denodo APIs and SFTP for sponsors relying on file-based or API-driven integration.
Security and access control: Role-based access control, single sign-on, IP whitelisting and network policies define clear security boundaries for cross-cloud and cross-region exchanges.

A single governed lakehouse now powers multiple sponsor-ready views, without rebuilding pipelines every time. Automated Azure DevOps pipelines cut release cycles from weeks to hours while keeping Databricks and Snowflake environments consistent and compliant.

Business Impact

From modernized pipelines to multimodal sharing, the transformation delivered tangible benefits across cost, agility and sponsor satisfaction:

25% reduction in total cost of ownership through optimized compute and storage.
Faster onboarding of 16–20 sponsors, with turnaround time reduced by more than 40%.
Higher data quality and transparency enabled by iAURA-driven reconciliation and validation.
Greater flexibility, with sponsors now able to switch between Snowflake, Databricks, or API-based access without re-engineering integrations.
Enhanced sponsor experience, driven by timely, self-service access to clinical data and clearer visibility into ongoing trials.

Turning each sponsor feed into a reusable asset, not another one-off build.

These outcomes align with broader industry trends.

McKinsey research indicates that holistic transformation in drug development can accelerate medicines to market hundreds of days faster while reducing development costs by up to 25 percent at a portfolio level.

Behind these results was a deliberate blend of data engineering, platform strategy and domain insight—exactly what the client sought in a long-term partner.

Why Persistent And What Comes Next

Industry research indicates that CROs are accelerating efforts to unify clinical operations and data as a foundation for future growth. The client chose Persistent for its ability to blend cloud-native data engineering with deep healthcare and life sciences experience.

Cloud-native data engineering: Cross-platform expertise across Databricks, Snowflake, Denodo and Azure DevOps to design a lakehouse that can evolve with sponsor preferences.
Trustable data through iAURA: Use of the iAURA Reconciliation Accelerator to validate that outbound clinical datasets faithfully match source systems.
Healthcare and CRO context: Deep Familiarity with clinical workflows, regulatory requirements and sponsor expectations around data timeliness and quality.

With the modernized data spine in place, Persistent and the client are now positioned to unlock advanced analytics and AI-driven use cases on top of a stable, share-ready foundation.

Modernizing the data spine today creates a pathway for AI-powered insights tomorrowand turns clinical data sharing from a constraint into a competitive advantage.

Turn clinical data sharing into a growth lever, not a bottleneck. Talk with Persistent.

Contact us

(*) Asterisk denotes mandatory fields

Scaling Clinical Data Sharing for a Global CRO

Persistent Team Innovations

Business Impact

Why Persistent And What Comes Next

Explore our Industry & Service Offerings

Related Content

Related Content

Contact us

Scaling Clinical Data Sharing for a Global CRO

Our Approach: A Modern, Share-Ready Data Lakehouse

Persistent Team Innovations

How It Works: From Ingestion to Sponsor-Ready Data

Business Impact

Speed, Savings and Sponsor Experience

Why Persistent And What Comes Next

Explore our Industry & Service Offerings

Related Content

Related Content

Contact us