Healthcare AI Fails at the Data Layer: Privacy, Governance & Trust | Sid Dutta

Episode 27 April 30, 2026

Listen on: Apple Podcasts Spotify Buzzsprout

Finding value in The Signal Room? Leave a 30-second review on Apple Podcasts — it is the single best way to help more healthcare AI leaders find the show.

with Sid Dutta

Sid Dutta joins the show to examine why healthcare AI initiatives stall before models are ever deployed, what privacy-preserving infrastructure actually looks like in practice, and the runtime trust controls that have to live at the data layer for clinical AI to work at scale.

Show Notes

Most healthcare AI conversations focus on models. The pilots that fail almost never fail there. They fail at the data layer — where access, governance, and runtime control either exist or they do not. In this episode of The Signal Room, Chris Hutchins sits down with Sid Dutta, a 24-year cybersecurity veteran and Founder & CEO of Privaclave AI, to examine why healthcare data governance is the bottleneck for clinical AI implementation, what privacy-preserving AI looks like when it actually works, and how organizations can collaborate on sensitive patient data without exposing it.

What We Cover

Why healthcare AI initiatives stall before models are ever deployed
How runtime data protection differs from static perimeter controls in non-deterministic AI workflows
What privacy-preserving AI actually means in practice — tokenization, federated learning, secure enclaves, differential privacy
Why cross-institutional research breaks down when data leaves an EHR boundary
Where shadow AI emerges — and how to remove the friction that creates it
What separates trustworthy AI infrastructure from a checkbox compliance posture

Key Takeaways

The bottleneck is the data layer, not the model. Healthcare AI does not fail because the model is wrong. It fails because the data layer cannot be governed safely as data moves between systems, copilots, and agents.
Static security models break in AI workflows. Encryption at rest and in transit clear an audit checkbox without protecting data once it leaves an EHR. Runtime, context-aware controls are the only governance that survives non-deterministic agents.
Stop framing privacy and access as opposing forces. Privacy-preserving infrastructure is the unlock for cross-institutional research, real-time clinical decision support, and partnerships that have been blocked for compliance reasons.
Trust comes from technical enforcement, not contracts. Data-sharing agreements describe intent. Auditability, traceability, and runtime policy enforcement deliver it.
Block-mode is a sign of immaturity. Organizations that default to "deny" instead of enabling controlled data usage are signaling that their governance model is not ready for AI partnerships.

Frameworks & Tools Mentioned

Tokenization, format-preserving encryption, deterministic encryption
Homomorphic encryption, federated learning, differential privacy
Trusted Execution Environments (confidential computing, secure enclaves)
Data Discovery & Classification / DSPM (Data Security Posture Management)
Runtime context-aware access controls and intent-based policy enforcement
Data clean rooms for cross-institutional analytics and research

Timestamps

0:00 – Cold open and guest introduction
2:14 – Why healthcare AI stalls before models deploy
7:04 – Data sharing, regulations, and runtime control
11:58 – What privacy-preserving AI actually means
16:54 – What becomes possible: cross-institutional research at scale
22:04 – Misconceptions and the rise of shadow AI
27:48 – Why cross-institution collaboration is hard
31:00 – Privacy-preserving infrastructure as the partnership unlock
35:31 – Infrastructure vs models: the underestimated data layer
40:19 – Building trustworthy AI: governance and shared accountability
43:09 – Signals an organization isn't ready
48:55 – From institutional to network-centric data ecosystems
55:21 – What leaders should pay attention to right now

About Sid Dutta

Sid Dutta is the Founder & CEO of Privaclave AI and a 24-year cybersecurity veteran with more than a decade leading data protection and privacy engineering at scale. Before Privaclave, he was Vice President of Data Protection & Privacy Engineering at Activision Blizzard (Microsoft Gaming), Product Head of Voltage SecureData at OpenText (formerly Micro Focus), Vice President & Global Head of Data Protection & Applied Cryptography at Worldpay, and Director of Cryptographic Utilities & Services at American Express. He holds nine issued patents in cryptography, blockchain, and tokenization, with one additional patent pending, and has served on multiple vendor and cybersecurity advisory boards.

Related Resources

Episode: Cybersecurity, Ethical Leadership, and AI with Guman Chauhan
Episode: AI Security Risks and Cybersecurity with Mahdi Hashemi
Episode: Rethinking Healthcare Data Strategy with Tina Trinh
Topic: Data Quality & Security
Topic: AI Ethics & Governance

▶ Full Episode Transcript ~9,702 words

Chris Hutchins: Today's conversation is with Sid Dutta, a cybersecurity veteran who has been a data protection executive across various large corporations such as American Express, Worldpay, Activision Blizzard King, and product head of voltage data security at Micro Focus, which is now OpenText. He is now the founder and CEO of Privaclave AI, which is solving some of the deep rooted challenges in protecting sensitive data, especially now in the age of AI. Sid's work focuses on one of the most strained layers in AI data flows. While much of the market is focused on models and algorithms, his perspective centers on the runtime visibility, governance, and protection that bakes in privacy and trust by default and by design. He works on enabling secure privacy preserving collaboration, making it possible to analyze and process sensitive data within healthcare, financial services, and other industry sectors across legacy and AI workflows without exposing it and impacting it to customer or patient privacy. That shift has implications not just for innovation, but for how institutions think about risk, partnerships, and accountability. This conversation focuses on where AI efforts stall before they begin, what is limiting progress today, and what needs to change for healthcare data to be used safely and effectively at scale. Sid? I want to start with the layer that gets the least amount of attention. Before I do, I want to say thank you for being here and welcome to the signal room.

Sid Dutta: Thank you for having me, Chris.

Chris Hutchins: AI gets most of the focus in health care right now, but a lot of initiatives stall long before models are ever deployed. Where do organizations actually hit barriers when it comes to data? And why is health care data so difficult to access and collaborate around compared to other industries?

Sid Dutta: yeah, it's a loaded question. So I'm going to break it apart little bit. But I think that's a great place to start the conversation, right? Because honestly, the biggest bottleneck in healthcare, AI, aren't the models, right? I think it's the data and it's just not healthcare, but yeah, in general. Most organizations, they're not struggling anymore with building AI, but they're struggling with using their data safely. And as to healthcare, data within healthcare is uniquely hard for a few reasons. First, it's highly sensitive. You're dealing with PHI, and protected health information regulated under, and the risk isn't just financial, it's patient. trust and safety. So that makes organizations extremely cautious. Second, the data is fragmented, right? I it sits across so many places within electronic health record systems or electronic medical record systems, imaging systems, labs, billing platforms, all in different formats, different owners, and often with limited interoperability. And then the controls typically that are implemented at these platforms, particularly with the

Chris Hutchins: Right. Right.

Sid Dutta: systems like Epic and Cerner. They're very well maintained as long as the data is sitting within those platforms, but they don't travel with the data. The moment data leaves the EHR boundary, whether it's going to an AI model, a co-pilot essentially accessing it, an agent AI framework where an agent goes from being an MCP client to talk to an MCP server to access EHR as a tool or resource. or even for patient analytics, know, sending to a third party or in a cloud collaboration, all those native protections are gone at that point. It's kind of in the wild, wild west. So that's where I think typically where we would find that the initiatives will stall because now security and compliance teams, you know, they are concerned about that. We don't really have visibility and control. And we also don't understand some of the intent because typically when it gets to the non-deterministic aspect of AI identities, what we call as non-human identities, that we know who is accessing and what they're allowed to access, but what they're actually going to be doing with that data and what's the intent behind that data that typically is not understood with our traditional security systems and guardrails. So that leads to delays of projects and then typically, you're not moving past, you know, pilots and POCs. And compared to other industries, know, healthcare has a much lower tolerance of risk. So, but it's still relying on security models that were designed for static systems and not dynamic AI workflows. And that's probably what the gap is in between data access and data protection, slowing down.

Chris Hutchins: Yeah. Yeah. That's an important point you just mentioned too, because healthcare by definition has never been static, but we've always measured things like it is. The pace things are moving at this point, it's really kind of imperative that what you're talking about, we are bringing it to the forefront and having these conversations and people are understanding how urgent it is that we start to think about this differently because of the fact the pace is what it is. And it's not static. The practice of medicine is always considered a practice. And unfortunately, we keep discovering new things probably because we're, they're self imposed in terms of the health conditions that seem to be popping up every so often. COVID-19, I think we all heard about SARS, but that was a whole nother level that we'd never seen before. And I don't think there's any shortage of those things in the future, not only epidemics or pandemics, but just new conditions because of all the crazy things that we think are improving our lives. it's chemicals or whatever that go into food processes. But getting to some of the challenges from an organizational standpoint, and when you're talking about interoperability, it's always been a complaint that physicians have had that I've talked to. But there's a lot of legitimate reasons organizations actually need to collaborate and share data. What are some of the concerns that you see that pop up first?

Sid Dutta: Yeah, I think the first concern I would say that will pop up is what happens if that data got exposed, right? So typically when the collaboration is happening. there is either a data share agreement in place in terms of ensuring that as the data essentially is shared with other entities, the use of protecting the data kind of lies with them. But that at the end of the day, compliance and security and the patient privacy is not really laid out on documentation and contracts, right? When the security controls are not in place, And that's where typically the concerns will be that, okay, there's one aspect of, I shall protect my data that I'm sharing or as we are collaborating on it. But the other part is, I have the runtime and real time security controls implemented to ensure my data doesn't essentially get leaked and then results in a breach scenario and then loss of patient trust.

Chris Hutchins: Yeah, there's this other aspect too. You talk about the approvals for what data movement and collaboration is gonna go on. There's also this movement now from a regulatory standpoint towards transparency and explainability, which adds a whole nother layer of complexity for the whole idea about informed consent. Because I mean, if you can't speak plainly about it in terms of a non-technical person's gonna understand that they really can't be truly considered informed. So definitely some additional layers of just of challenge on top of that. That kind of gets more to some of these privacy regulations that we're hearing about around the transparency. California, I was the first one that at least I've heard about that was kind of mandating this explainability and transparency. But how do these regulations, institutional risk and governance structure, shape what is possible? It seems like there's probably limitation, or at least it feels like there's more than is comfortable. For me? I mean, I'd to hear your perspective, because I think there's constraints. mean, data accessibility versus technical capability, those are two different things.

Sid Dutta: Well, that's true because I think there's obviously a lot of the privacy enhancing technologies, you know, that, and they have existed even before, you know, AI was a real concern. And then we have actually, you know, had tried a lot of these, you know, capabilities where things like homomorphic encryption, things like differential privacy, things like federated learning, and even trusted execution environment where we just call it as confidential computing or secure enclaves where technically all data is basically being accessed within that secure enclave. So at the bottom is that you can get the value from the data without exposing the data itself. But a lot of cases, some of these have not been in a very practical or enterprise grade and not successfully put into practice. So in those areas, the concerns around usability of the data or essentially the performance. for those processes that are operating under those privacy enhancing technologies have been under question. But there are other methods which have proved to be more effective, where data de-identification or data desensitization. and having those data obfuscated and persistently protected as they're moving, as they're being stored or being accessed or being analyzed. So in a lot of cases, those have been proven to be a much better alternative, but each of them has its own effort and engineering and complexity. And in lot of cases that you either lack the necessary resources or the funding to implement that. And when it becomes an after the fact, it even gets a lot more invasive and complex and expensive. So then you tend to kind of relax your controls to something that is very default and very traditional of platform level, know, security data encryption at rest, but that potentially puts the data at risk. But some of these other techniques have been very... hard to implement and that's where I think some of the gaps actually reside.

Chris Hutchins: Yeah, there's definitely some things that are unique. When you talk about the security and all the regulatory and policy-driven things that we have to deal with. having an evolving model is very, particularly with AI, it's a whole different ballgame and what you got approvals for a month ago, if you don't revisit some of those things, the things that you approved may mean something completely different very, very shortly. And I think that's one of the challenges in addition to the technology, which I think why I have so much respect for the work that you and your company are doing, because there's so many other areas that people have to worry about. This is one. they need some professionals to be paying attention to it. So the grading, know, perfectly lead into a concept that I've heard you use, but I'd really love to understand a little bit more about it. You've talked about privacy preserving AI. In practical terms, what does that actually mean?

Sid Dutta: Yeah, I guess I think I kind of alluded to in the previous question that you had. So privacy preserving AI, technically at the core, it means that without exposing the data itself, you can still use the data, right? So that's where I think some of these privacy enhancing technologies and stuff like that come into play.

Chris Hutchins: Right. Is this more the identification process or tokens or tokenization? How do you accomplish?

Sid Dutta: It could be a lot of, it's a combination of various things. And in a lot of cases, either you're picking one versus the other or sometimes in combination. So the shift is that from moving data to AI to bringing AI to control data environments. So that's kind of where it all started. That how do you collaborate on the data, where that data is actually within a very secure control environment. It's like the concept of like secure unclamped. So you have anything operating under that or within that enclave. So you are bringing your AI and the processing into that rather than take the data and send it to an AI model to do the processing and stuff. Because that's what typically happens these days is like, you're actually sending the data to the models and do inferences and predictions and stuff. And a lot of cases, the guardrails that you're putting in is at runtime, you're analyzing on that data that you're putting into the AI and then basically making sure that the data which are sensitive enough. PHI, PII, know, that technically the model doesn't really need, but everything else around that data is what it is analyzing and inferring on. So how do you at runtime ensure that they are de-identified? Sometimes minimization of the data is also a process that gets into in practice where only expose what's absolutely necessary. And then after that, anything that needs to be, you know, sent to the AI model make sure they are masked or tokenized because one thing you'll have to make sure that the model is able to do its job and not hallucinate because if you sense something and that doesn't make any sense to the model it obviously will give you results that are not going to be favorable for your process right so it's a very interesting balance that needs to be struck, that what you protect and how much you protect, that essentially gives you the privacy, as well as it doesn't make the business process do something that was not expected. And then, of course, some of the other technologies we've used, I alluded to homomorphic encryption, trusted execution, and all that stuff. it has been very hard to keep data. and bring the AI to that secure control environment. So typically the most frequent and practiced models have been ensuring how you keep the data and the context secured while you're sending the data to the AI. Because the part is a lot of cases these AI models are operating externally to the institutions, right? mean, hospitals don't have their own models that are built in a lot of cases. So in that case, you are taking advantage of a hyperscaler foundational model that they are hosting. And in that case, even though that's probably by contract or by segregation or logical tenanting, you're keeping that within that particular tenant. But at end of the day, you're still sending it outside your own environment.

Chris Hutchins: Yeah. I think when you talk about this, basically having the minimum. data access that you need to actually accomplish something. I think it's definitely something that's a challenge, but to think about what we can do from a speed and processing standpoint now with AI models. For me, think, we've talked about this a little bit, so I know that this is one of the areas you're excited about. So for somebody who's really protecting privacy and all that, you're also excited about what's possible. And this is one of the things I really, really appreciate about what you bring to this conversation. and what your leadership means across the industry. So when these models work, what kinds of analysis or research become possible that weren't feasible before? mean, just pause and think back just a couple of years now when we had all of a sudden had COVID vaccines. There's a speed involved there that was never possible before. But I'd love to hear what you're thinking about in terms of what now actually we could consider that we couldn't even think about before.

Sid Dutta: Yeah, there's like opportunities are so many, right? Even on our side of when we actually build products and technically how about the guardrails that we put in and then take advantage of these models. There were things that probably would have, you know, we would have spent a significant amount of engineering effort and cycles, you know, to get the results that currently with these models we are getting, right? Now in a enterprise sector like healthcare, It fundamentally changes what healthcare can do with that data. Today, a lot of high-value cases are either slowed down or never attempted, not because they are technically hard, but because they are too risky from a data perspective. privacy preserving approaches if they are put in place and they are put in at runtime and they're put in in a more contextual and context aware and internalized fashion that you're making certain decisions as data is moving to these AI models. If you are able to successfully do that, you unlock things like cross-institution research at scale. hospitals and research centers can collaborate on patient cohorts without actually sharing raw PHI. That means better clinical insights, faster validation with the more diverse datasets. Sometimes real time clinical decision support. can safely assist physicians using sensitive patient data in the moment without exposing that data outside the control boundary. And then, know, you know, back office or some of the capabilities from productivity standpoint where the billing or any other department they're using, you know, AI co-pilots and agents and clinicians, and administrative staff, you know, they can interact with the AI systems without worrying that sensitive data is leaking into external models, right? I mean, they're like, I was actually talking to someone who... was a founder of a health tech company. And they were solving that how appointment setting, you know, within hospitals, it has been a big problem, right? And they're trying to solve that through AI, you know, analyzing, you know, historical data in terms of patient visits and appointments and how they are staffed in across each hour of the day and across each day of the weekend versus each week, month and so on and so forth. There's years of data that they're leveraging in order to... patients get, you care much faster rather than having to wait for an appointment or doctor to be available two months down the line, right? So, of course, you know, secondary use of data is, those are all there, like population health, drug discovery, operational analytics, you know, become far more accessible because they can be used in a desensitized, you know, controlled fashion.

Chris Hutchins: Right. Right. Yeah, to me this is one of the more exciting things because when I was working in New York we had a... a dermatologist who was studying some pretty rare conditions because of his practice. And when he was able to get his hands on some de-identified data sets on a larger scale, he was actually able to get enough of a cohort that was statistically significant enough to come to some really remarkable... revelations in terms of these rare conditions that he just didn't have enough context for as a clinician to treat. That's just one example, but to me that makes the work that you guys are doing so much more important because if we can actually pull a single thread and understand the whole end-to-end life cycle of a patient's record and really understand what's happening there, and we can put that into a larger context, there's just no limit.

Sid Dutta: Bye.

Chris Hutchins: from what I can see. I'm excited about that. There's some unfortunate ways that people think about concepts of privacy and governance in particular. What you're talking about doing something that's a little bit different, you're talking about using them as an enabler. It's not a roadblock. talk about to me the misconceptions maybe that people have about privacy preserving technologies today.

Sid Dutta: Yep. Yeah, I think this has existed for a long, long time. Even my background has been predominantly financial services, even in those sectors. whenever it came to implementing data centric security or privacy preserving capabilities, it's this notion of losing utility of the data. Because typically how data is de-identified or desensitized is within a cryptographic measures, encryption, tokenization, sometimes one way hash operation, but you anonymize that data. It's a reversible operation. And sometimes you do basic masking and redaction and stuff like that. So all of these are techniques that we have used for decades to keep that data protected and preventing misuse of that data. But for legitimate use cases, when processes are running, the concern is that I will lose the ability to use the data, my business processes are gonna break. So if you were heavily masking or distorting data to a point that it's no longer useful. And that used to be true in a lot of cases where very, I would say rudimentary obfuscation used to be done like masking, like make everything x, x, x, right? In that case, lose the referential integrity of a data if the characters have been replaced with that, or you have blanked out, right, redacted a patient social security number or credit card number. In that case, how do you really operate that on a structured data set? And even with probabilistic encryption methods, you actually had different outputs of ciphertexts. get created after you did the encryption of the same data, even with the same key, because every time you do generate a different output. So you you are not able to search on that unless you implement a certain tag or like a static, some sort of a hash to do the searches on that. But with deterministic encryption, format preserving encryption, tokenization, all of these capabilities have enabled you to obfuscate the data, replace a real data with a token or a surrogate of that, and then use that across your workflows. And then you still enable people to search on that data to join different tables, using that as a primary key, foreign key, all of the database operations that you can get done. And sometimes with additional privacy enhancing capabilities, there's a belief that these approaches are too complex and too academic, Federated learning, homomorphic encryption, these are hard to optimize. And then a lot of cases, that's true. because there has to be certain very niche use cases where these apply well, but they don't apply broadly. Trusted execution environments, these like where everything is running under this encrypted memory and it's all in a secure enclave. Even if you are able to take a dump of that memory and there was a memory leak, still you will find data encrypted. But they operate in very confines of certain parameters, but the moment that data leaves, you don't have have a homogenous ecosystem that everything stays encrypted as it moves and then on the other side that it gets usable. So I think that's where probably a lot of concern is that I want to innovate, but security privacy is bringing me down, right? I mean, it's a barrier to innovation. And in that case, what happens is they try to find the guardrails and that's where you find a lot of the shadow AI happening as well. So security and privacy, you know, they are looked as enablers that also has to be proven that they are indeed enabler and still do your job and do more because now these are baked into the design and by default, so you don't have to really implement something because that's one of the friction that typically innovators feel. Right? now I have to also implement something to take care of privacy and security. And that essentially, you know, drags them away from the innovation and the actual business impact that they're trying to make. But these are some things that have to kind of operate automatically at runtime. And in that aspect, they don't feel the pain and the weight of being the ones who have to implement data privacy and security. And if something just works, then they will not bypass it. So you will have less amount of shadow AI happening, which puts the enterprises more at risk.

Chris Hutchins: Right. Yeah, shadow AI is a whole crazy topic that we could probably speak on for a few days. Because every organization's got some level of it, probably more than they think, based on what I'm sure you've seen the same kind of thing in your work as well. We're definitely in a place where

Sid Dutta: Yeah.

Chris Hutchins: there's a growing need, I think the pandemic kind of exposed some things in this regard that there's a need for collaboration across institutions, which typically has been a little difficult. Why do think that's been so difficult?

Sid Dutta: Yeah, I think it's something that has been hard to execute because I think the data gravity combined with the risk that's associated with it. So every institution holds sensitive patient data, but the moment data leaves its boundary, you lose the control on it. And once control is lost, the accountability doesn't go away, it actually getting increased. But in that case, you definitely do not have the direct authority or control on that data and ensure safeguards in place. Typically what happens that you err on the side of caution, don't move the data. Right? So in that case,

Chris Hutchins: Right. That's we've all seen most often probably.

Sid Dutta: Yeah, you're putting that guardrail because if I can control the data once it's shared, I might as well not fear at all. And then you have fragmentation, right? mean, different EHR systems, different data standards, different governance models. And if two institutions want to collaborate, aligning on how data is accessed and shared and protected becomes a project in itself, right? Because each of these institutions have very different layers or levels of maturity when it comes to their privacy program or data governance or data security program. And in that case, you may have a gold standard on this side, but at the moment the data is shared and they are probably way less mature in that aspect. So that's where some of the challenges are. And obviously your privacy and regulatory exposure, know, violation of HIPAA can mean heavy penalties and long-term scrutiny. Data breaches are happening every day. It's almost unfortunate that we are kind of getting immune to these news unless something very, massive that impacts every one of us. And then of course, reputational damage, all these things are there. So even a well-intentioned collaboration gets slowed down because of these real concerns, given that either the practices or the maturity across the

Chris Hutchins: Yeah, it's true.

Sid Dutta: collaboration are not at par or all of them can't align on the right way of how do we securely share data, how do we collaborate without exposing the data and then basically that's a model that technically implementation is hard. has the right intentions but sometimes economically, financially, operationally, sometimes it becomes a very big challenge, depending on how big you are and how many entities you are collaborating with and how much data you're talking about.

Chris Hutchins: I think this sets up beautifully one of the things I was really excited to get you to talk a little bit about because this is really complicated stuff and I've been living around this risk stuff for quite some time myself, but for people who might not be familiar. how does this capability that you're using, like for example, when I say tokenization, I'm not sure if people understand what that means for technology standpoint or methodology standpoint, but there's a number of different ways that you are working with solutioning to be able to preserve that privacy within the infrastructure. This will change how partnerships can be approached, but. really though the role that you and your colleagues are playing is a massive, massive advantage for people if they can start to understand enough to be comfortable and confident that you've got a way for them to actually accelerate in ways that they never thought were possible. So how should that cause us to think differently about the way we can approach partnerships?

Sid Dutta: No, you're right. I mean, I this was another conversation that I was having with two co-founders of another health tech startup that they were actually saying that we don't get the kind of data from the hospitals that we wish we did. And that would have enabled our platforms and our models to actually do a lot better job at the service that we are offering to them. And the only reason that the hospitals were not sharing a lot of those data because they just didn't have a mechanism to send something that they could have a control on or they would not have the concern of that data getting exposed after they have shared. So I think when it comes to that, it's a win-win situation for like, I'm a hospital, I'm leveraging a third party to provide me with a service. And I'm sharing certain data for them to analyze so that they can provide that service. But I'm not sharing enough that they actually would do a superior job on it and they are constrained within whatever they are getting. So on their side, they're saying we could do a lot more if you only sent us some of these other data elements. And hospital's like, okay, I can really send you that because in that case, I'm going to be now risking patient privacy in lot of cases because I don't really have a way to do that. So I think instead of asking, you know, how do we securely share data, you know, when it comes to the privacy preserving. infrastructure and stuff like that. You shift to like, how do we collaborate without exposing data, right? And that's a very different model. So you enable data to stay sometimes, you know, within its home environment and you have a control policy driven access at runtime. And then based on context, you make decisions about what gets revealed and what doesn't. In a lot of cases, you bring it home around the other cases where as you are sharing, then you ensure what gets shared, how much you need, and in that case, what's the context behind this data sharing? Because all data sharing are not the same, Understanding that in a context, the intent behind that in a sharing and in that case, what data I should be obfuscating and should actually leave it as it is. It's a runtime decision. I think, you know, products solutions that are able to make that runtime decisions are going to enable, you know, these partnerships to be more strengthened. Right. And in that case, the, today, these partnerships are entirely based on these contracts. Right. I mean, it's like, this is how we know it needs to be, but you don't have any control after the fact. It's kind of more of a legal document versus an implemented security control. So it would be. privacy preserving infrastructure as well as these different technologies and solutions, these become less about legal complexity and data transfer agreements, but more about controlled interoperability. And obviously the impact is huge that you can move on, move from months of negotiation and hesitation to enabling joint research, shared analytics and insights much faster.

Chris Hutchins: You mentioned infrastructure. I think this is an important thing, too. There's a ton of focus on algorithms, models, how many ways that could go sideways. We could tee that up in another area we could talk about for days. But let's talk about the infrastructure itself. To support what you're talking about, how important is the infrastructure to support this?

Sid Dutta: you So I think today there's a lot more focus is given on the models. Like what model I'm using and the other part is how good that model is compared to something else. But in a real world deployment, Infrastructure is kind of the difference between a demo and something that actually runs in production, right? So you can have the best model in the world, but if your data layer isn't secure or observable, it's not controlled and it's not clean, right? And the data hygiene has been one those biggest problems because if it's all a garbage in and garbage out thing, right? mean, you put garbage into something, it is going to cough out garbage regardless of, you what the model. So that's as without, you know, having that data layer, you know, that is hygienic and it's controlled, it's secure and observable, but you simply can use it at scale, particularly when it comes to healthcare. So I think the underestimation has been done.

Chris Hutchins: Fortunately, that's the way it works.

Sid Dutta: on the data layer. We have actually put too much of focus on the models and stuff. And it's just not about storing or accessing data anymore. It's understanding the data, like what that data is that is being accessed. What's the context and intent behind how it's being used? And then enforce protection dynamically as data flows across these systems and co-pilots and agents. So existing infrastructure wasn't designed for that, right? Because we... was built for static applications as I was talking about that deterministic world, right? You know the application was developed in this way. This is what it does. These are the data elements it takes in and this is what it does with that data. This is where it shares or stores. And we move to an AI, non-deterministic world where data is constantly moving, you know, being recombined, being generated, being interpreted in unpredictable ways. And then autonomous decisions are being made because you have not pre-wired an agent, you know, to do a certain thing. I mean, the agent is making decisions on its own. So as this big shift has happened, in that case, we have our infrastructure was not kept. couldn't evolve at that speed. So that's where I think some of the biggest challenges and that's where, know, technologically we can do a lot and we can build powerful models, process massive data sets and generate insights in second. But operationally, when it comes to data privacy, security, the question gets asked, can we do this safely without losing control of our data? And right now, that answer in a lot of cases is uncertain. We're not sure. So it's not really the innovation, it's the confidence and control at the data layer that organizations do not have. And that's where I think we see that the challenges and then how AI has outpaced the way enterprises have built infrastructure and their governance model.

Chris Hutchins: Yeah. Yeah, I mean, that's obviously a gap that has existed long before AI, which is the expectations of what we can do versus what we can actually implement. I think everyone hopes that that was easy, but not the best they can hit. But that's just not the way it works, unfortunately.

Sid Dutta: Bye. Yeah, we always had these challenges to a point, but I think AI has just kind of amplified it to multiple fold. I mean, that's where I think the biggest main point is because we always tried to catch up with the digital transformation and has things mature and evolve. But this leap has been very significant. And that's where we kind of like find ourselves standing at the bottom of the cliff while something is way up there.

Chris Hutchins: Right. it's just, we're, I don't know, just so many different angles to this stuff. We want to talk a little bit more about, know, we didn't really dig into trust a whole lot yet. I want to talk about that too. But, you know, governance, trust, and stewardship, I think people are coming at things with a different level of understanding of what those things are. Trust is another area where we have big challenges in almost every area of life because of the erosion of trust in human relationships. But what we need to be able to figure out in this kind of environment. When we really do have to be able to share information that's sensitive across organizations, how do you even go about structuring the governance and accountability to really enable this in the way that we're really doing our duty to really protect and preserve that information?

Sid Dutta: Yeah, I think this is a tough one. I think easier said than done, right? mean, the evolution has to be from ownership to shared accountability, right? And with very clear boundaries. Now, at a minimum, you need to have a of these layers working together, right? One, that you need to have a clarity on the policies. Like, everyone agrees what data can be used for what purpose under what conditions, right? and technical enforcement. those policies are just documentation, they're enforced in real time as data is accessed and used. And in lot of cases, that is the biggest pain point. People can write policies. And I've written myself standards and policies, and where we found that the technical implementation was the biggest challenge. then, typically, people would file exceptions to those policies and standards and move on, because that was one of the issues. So technical enforcement is like you just don't have a policy that you can show an auditor that we have a policy in place but that was implemented that is the LD and the next one is how do you audit and have a traceability and auditability? aspect, right? Every interaction with the sensitive data that's logged, that's visible, that's attributable, because in a collaborative environment, trust doesn't come from contracts alone, right? It comes from very powerful control. So that's where I think some of the things that has to happen, and as I put the disclaimer, easier said than done, but those are some of the foundational ones. And it takes quite a bit of time to get that in place.

Chris Hutchins: Great. Right. I think that's an area where I think probably there's a lot of folks out there. They're really not sure where they stand. What are some of the signals that the leader should be maybe looking for that suggests this organization may not be ready to collaborate safely?

Sid Dutta: I come across a lot of these when I talk to security professionals or data governance professionals while talking about their risks and stuff. And you'll be surprised in lot of cases. they have still a lot of reliance or over reliance on the static controls that have existed for decades and that's what is perceived to be good enough in the new world. You come to a point where is this an educational moment for them? And you don't want to talk about stuff that they personally believe in or this is something that's their own decision.

Chris Hutchins: Right. we have.

Sid Dutta: So how do you have someone to be a believer when they were a non-believer in certain cases? So here, the governance is mostly role-based, access control, policies, contracts. There's no runtime enforcement. And particularly, even when data is stored or shared, there are two check boxes that they will check as long as they're able to demonstrate that with their auditors. and the auditors are fine with that. Like, is your data encrypted on REST? Is your data encrypted in transit? And basically, in both of those cases, you're having TLS, HTTPS operations that are happening, which checks that box. And anything you put in any database or in a cloud storage, by default, it'll be encrypted on REST, which technically doesn't really do anything from a security standpoint. So these are some of the things that you will find if you're like talking to someone about this and then they are talking these as their controls, then you know there's a red flag, right? Lack of visibility into data flows. If they can't clearly answer where sensitive data is going, especially, you know, with AI workflows, they're not right. Today, tools like DSPM, Data Security Posture Management, they go and do scan different data stores. So it's a point in time visibility of data at rest. But the moment the data leaves the systems and where they're going, DSPMs and these tools, can't really tell you. So your visibility into data flows, if you don't have that, then you know that obviously they're not ready. And then I think one of the biggest challenges I've had when I was a security practitioner, being part of whether it's American Express or WorldPay or Activision Blizzard or Micro Focus, the ownership of risk, that's very unclear in a lot of enterprises. And particularly the bigger you grow, that's like one of the things that we struggle with. If there was a database that's like a shared data mart or a data warehouse and a lot of teams are potentially pumping data into it and then lot of teams are taking advantage of that data store to get some information out of it, it's very hard in a lot of cases to understand who owns that data store and basically who owns the risk if that data store actually got attacked and data breaches happened. And a lot of cases what happens is this one guy on the IT side who manages the server is basically approving requests for access to that database. And he doesn't even own the data. But for him, the easiest route is to check on the approve button rather than decline and go through a series of conversation why that person requesting access should need it and whether that's a legitimate risk or not. basically people taking the easy way route, basically, hey, let's it. But he will not. be the steward of that data and he would not own the risk if that database got hacked. Obviously, currently you don't even know who has ownership of that data. So that's one of the big issues where it comes to readiness. I mean, do you have very clear ownership of the risk and of the assets? And then of course.

Chris Hutchins: Yeah. Yeah, I think even the answer to that question may not be sufficient if the answer is what you just said. It's just one person. You have a risk.

Sid Dutta: Right. Yeah, exactly. that person has like, obviously, just say, I was told that this is a system where everything comes and given nobody actually knew who the data owner was, that maybe, oh, given you own the server, you're the IT person, you are that. And so that's a very unfortunate thing, particularly in very complicated and matrixed organization. These are some of the things that are very common. And I think last but not the least, when you have the mentality of block first. If the default approach is to restrict everything because it's easier than enabling it safely, then you know it's a clear signal that infrastructure and the paradigm and the governance model isn't there yet for an option. I think those are probably some of the ones that I can.

Chris Hutchins: That's a really important point. I don't think I've been in a health system. in 30 years that didn't have those kind of challenges at some point. You know, as a person who's responsible to create analytics when someone is needing data that I can't go get, it is a little frustrating because I have to be able to explain why you can't meet the objective. And it's really an uncomfortable thing because organizationally, we ought to have our act together so that we can make those kind of decisions when it's really meaningful and important to do so. And if you're to your point, if these are the things that you're hearing, this is something that really needs to get some attention. They're going to have a very good reason to have my friend sit here on standby and pick up the phone. But let's talk about the future, because I know we're kind of coming up on time here. As these technologies evolve, how will the healthcare data ecosystem change?

Sid Dutta: Well, I think there'll probably be a shift from institutional centric to a network centric healthcare data ecosystem. Given that all the dominoes fall into place. know, data is siloed in every hospital, pair, research organization operates within its own boundaries in the future. Data will still live in those environments, but it will be accessible, know, controlled, interoperable way across a network. then instead of asking at that point of time who owns the data, you know, we will start asking, you know, how can this data be safely used across the ecosystem? So I think when it comes to collaboration, My hunch is that some of the things that are going to be more common is where we will have a much more federated and distributed research network with multiple institutions will be contributing to shared insights without pulling raw data. AI enabled clinical ecosystems where co-pilots and agents can operate across systems, but obviously with strict context aware controls, what can they access and reveal? potentially a human in the loop, and obviously will be there in a lot of cases, particularly for some of the critical processes. And I think for having data clean rooms and secure collaboration environments for players, know, like pharma providers and players working together on analytics trials and population health. And I think it's going to be more context aware and purpose driven data access model, right? Rather than having role based access control or attribute based access control is going to be more intent based, know, dynamic access controls and putting these figures, which means basically the same user could see different data depending on what they're trying to do. Right. mean, SID essentially is, you know, trying to access the same data store, but given the prompt that SID is providing has a very different intent than...

Chris Hutchins: Yes, makes sense.

Sid Dutta: time we did, and even the last time we basically got some portion of the data in this, the intent dynamically would be analyzed to say, no, no, no, maybe this is not something I should be exposing to at this point, even though SID's role is already there to access that database. So I think we'll see a lot more of that getting tightened and more context-driven, dynamic runtime decisions.

Chris Hutchins: Yeah. And I think there's definitely some noise out there when it comes to what needs to be done for governance and controls. But it seems like there's some other things that maybe people aren't aware of. So from where you stand, what signals are suggesting to you that the industry might be entering a new phase? Because it seems like it might be.

Sid Dutta: Yeah, I think we're probably getting past that can we use AI mode, right? I mean, it's kind of in that, I think we're going to be moving from that to like, how do we use it safely at scale? Right, so then, you know, there's going to be a growing realization that data protection can be an afterthought, right? Which unfortunately, a lot of cases it is. And it kind of has to be built into how the AI systems operate. And the only way that could happen is when it gets in the runtime mode, gets into the data pipelines and the AI pipelines seamlessly and in very frictionless manner. because you can't rewire these AI agents and co-pilots any longer. They don't operate in a very well-defined scope, right? They will make certain decisions. So your guardrails have to be outside and particularly at that data layer and at the data pipeline layer. And then you probably see some, if you're seeing some early adopters ship from blocking the data. to enabling controlled data usage, which is a very different mindset. Because our typical approach has been, and particularly in health care and some of the other regulated industries, you're either in the monitoring mode or in the blocking mode. And the moment you are in the blocking mode, then obviously you're stopping innovation and you're stopping or actually creating business disruption. So you have to shift from there is like, how do I allow this pipeline and this process to go through without exposing my data? In that case, you're essentially enabling control data usage and control data movement. So that's where I think some of the signals we'll find where. see that we are actually moving into that new phase.

Chris Hutchins: Yeah, it's definitely not an if or should we. the sudden leap that we all learned about a month or two ago with regard to what OpenAI and Anthropic are doing kind of puts lights afire. There's ticking clock going down on us where we have the opportunity to get some things right or we're going to be really far behind and it's going to be detrimental. So as we kind of close out, if leaders want to unlock the value of their data responsibly, what are the things that you should be paying attention to right now that they may be overlooking?

Sid Dutta: day.

Chris Hutchins: I think time before that was of the essence.

Sid Dutta: Yeah, I mean, I don't know if like whether there is one thing, right? I mean, that's one of the challenges of security leaders or data leaders in institutions like healthcare. But if they want to unlock the value of their data responsibly, I think right now they will. they would probably want to focus on like, stop thinking about the data access and the data protection as separate problems. We have been operating in that model for a long time and there's a data governance aspect of what data we have and who has access to it. essentially doesn't come with that, has not been part of that. In a lot of cases, these teams are probably working together or sometimes it's under the same leadership. In lot of them, in most cases, they are not. So, and by data protection, we are again in that blocking mode. So sometimes this party will think the other party as an adversary rather than partners and that we should be working together. So that separation is exactly, I think, what's slowing organizations down. Most teams are still operating in a model where one side is trying to enable access and innovation, other side is trying to restrict and reduce risk. And that creates the friction. And as friction is created, then you have the bypasses happening. In that case, you're introducing, utterly introducing more risk in the enterprise. So what's often overlooked is the need for this unified runtime approach where access and production happen together in real time and based on context and intent. So I think these functions and the approaches have to come So that's what I would see as one the very important things. The other thing is basically don't over index on models. The real differentiator won't be which model you're using, but it is how confidently you can use your data with that model. Because at end of the day, the models are fed with data and then models deal with data and then that's what they create outcomes based on the data. Basically means that if you're able to confidently use your data with that model, it comes down to invisibility, control, and ability to enforce policies dynamically as data moves. So that's what would be required to boost that confidence, regardless of what model I chose. I can use my data.

Chris Hutchins: Yeah, that... You know, as we're wrapping up, I just want to encourage the listeners. We're definitely in a period of time where we want to find a way to be able to enable the data to be used in responsible manner. And this is, I want to say something that might be a little controversial, but I've been a part of the health care provider sector for the larger part of my life. We often tend to be a little ambitious and think that we're prepared to just build things on our own. gotten so ridiculously specialized and there's so much time and energy being spent on both sides of an equation. One is trying to advance the capability and stay ahead of those who are trying to penetrate it and do the various things. The escalation is constant there. This is not something you want to be a hobbyist to try to solve. You don't want to be a well-meaning person just thinking you really can do this. You really need solid partners. And this is where Sid and the team at Privaclave come in from my perspective. And I just want to encourage all of you, I'm going to have a whole bunch of information for you in the show notes, but please take the opportunity to look at this. If you're in a place where you're feeling that pressure, you've got to be able to enable responsible use. This is a conversation that you want to engage in. So please make sure that you should take a look and reach out to Sid and his team because... you really have to have solid partners and this is not a scenario where you can have 25 different small players that are use case driven. This is a foundational aspect that he's been talking about. So, said, I really just appreciate you taking the time and I think that... The takeaways for me are many, but you talked about contextual awareness, which really was an important thing as well. And what I hear from you is not stop and roadblock. It's let me help you accelerate. Let me help you do it in a way that's gonna protect people and protect their privacy. I love that. As we wrapped before, just close it out. Anything that you're thinking about over the next year or so that could be a real game changer that you've got your eye on, this is one I'm always curious about. Because you see things from a different lens because you're really dealing with things under the covers that most of us don't even understand.

Sid Dutta: Yeah, I would say it's just to kind of underscore the point I was talking about. I think at the end of the day, know, AI doesn't create value from models and it creates value from data. Whoever can use data safely in real time will actually win. And in that case, we'll have to pivot from the perimeter based type of controls, trying to build bigger walls around the protect surfaces to focus on the data itself. And then focusing on the data itself is essentially not just from a security standpoint, but from its hygiene standpoint as well. So I think a lot more focus will be around the data rather than the systems that are storing the data and their native controls and the network where the systems actually reside. So I think at the end of the day, will decide that whoever is actually doing a better job, actually winning or in lot of cases actually succeeding in every sector, they actually have the most. cleansed data and also they're using AI very effectively with maximizing their usage or introduction of those data into those AI models because they have taken the privacy and the security aspect well, given that they can confidently push a lot more than typically they are able to now. So I think that that's what probably my closing thought would be. So because we can see these two things differently, it all has to come together if the adoption and the value that we want to generate out of AI is something that all of our success measures and KPIs will be based on.

Chris Hutchins: Well, I didn't hear you say that an algorithm or AI are going to be the superstar. So I think that's heartening for a lot of folks, because there's a lot of different fears that are out there. But you're certainly making me feel a lot better. So I appreciate that.

Sid Dutta: Yeah. Yes. Yeah, of course, there's too many debates and too many schools of thought on that. AGI, of course, those are all great topics. But yeah, of course, my focus has been pretty grounded at this point to fix the fundamentals, which has been broken for a long, long time. And you can't expect different results by doing the same things.

Chris Hutchins: Right. Yes. There's a term that we call people like that, I think I'll just leave that. I think people know what it is. But Sid, thank you so much for coming on the show. This has been a fascinating conversation for me. I've learned a lot. And I'm excited for people to be able to hear from you and looking forward to continuing to follow your successes. I'm quite impressed with what I've seen so far. We've only met just a few months ago. So I really can't thank you enough. This has been fantastic. Thank you so much.

Sid Dutta: you Thanks. Well, thank you. Yeah, I mean, it was a great opportunity for me to come to your session and have this dialogue. So really enjoyed the time and looking forward for more collaboration opportunities. And same thing, we'll have to see how this whole space plays out over the next few years, I guess.

Chris Hutchins: Thank you. Okay.

Healthcare AI Fails at the Data Layer: Privacy, Governance & Trust | Sid Dutta

Show Notes

What We Cover

Key Takeaways

Frameworks & Tools Mentioned

Timestamps

About Sid Dutta

Related Resources

Guest

Sid Dutta

Follow the Show

Healthcare AI Fails at the Data Layer: Privacy, Governance & Trust | Sid Dutta

Show Notes

What We Cover

Key Takeaways

Frameworks & Tools Mentioned

Timestamps

About Sid Dutta

Related Resources

Guest

Sid Dutta

Sponsor

Follow the Show