Overcast

They explore how schema changes, inconsistent definitions (like “customer”), and weak governance can break both your analytics and MLs, and what companies can do to get their data AI-ready, from metadata management to observability.

Collate is a semantic intelligence platform built on a semantic metadata graph for discovery, governance, and AI observability across your data ecosystem.

Connect with Harsha on LinkedIn.

Congrats to user buttonsrtoys, who won a Famous Question badge for their question Possible to edit PDF without embedded font installed?.

[Intro Music]

Ryan Donovan: Hello, and welcome to the Stack Overflow Podcast, a place to talk all things software and technology. I'm Ryan Donovan, your host, and today we are talking about [how] with AI, the rubber hits the road and tries to dig into production data. It can't handle all that structured data, there. So, we're gonna talk about what's the issue and potential solutions with co-founder and CTO at Collate, Harsha Chintalapani. So, welcome to the show, Harsha.

Harsha Chintalapani: Thanks, man.

Ryan Donovan: Of course. So, before we get into it, we'd like to get to know our guest a little bit. So, tell us a little bit about how you got into software and technology.

Harsha Chintalapani: Yeah, so my career started back in Yahoo, all the way back into 2007 when I started working on search engines in product search, and how do we rank, and everything else. So, that's how I started, and that's where I started exporting to the big data, how you index the large amounts of data. And thankfully, Yahoo, at that time, had a research ring who was working on productionizing map pictures, which today is Hadoop, and getting my hands on it to actually scale our indexing system for product search. So, that's how I got into data, and that's where my journey started in the tech.

Ryan Donovan: We've been told one of AI's sort of very good use cases is with processing and understanding data, but definitely have heard some issues where the data was not processed correctly or not understood. And today, we're talking about specifically the issues with production data, structured real-time data. What are the issues around that?

Harsha Chintalapani: Yeah, I think I can go into a few examples of where we are coming from. So again, going back to Yahoo days, you don't really think of real-time data itself becoming a big scale back in 2008. So, it started [with] Yahoo. Obviously, Google is trying to produce map pictures and Yahoo open-sourced that as a form of Hadoop. It helped web indexing, product search indexing, to scale the huge amounts of data that we get across the web, and how do we actually in real-time index it at a scale and put it into a search engine, so that users can use through the app? So, from there, my co-founder, who is one of the core engineers behind Hadoop, went on to be co-founders of Hortonworks, which not only kind of open-sourced all of the technologies that we're building in Hadoop, in Yahoo, and made it a supportable, enterprise-ready, deployable solution. So, I get to meet Suresh in Hortonworks, became a committer and PMC in Apache Kafka and Apache Spark, which deal with real-time data and indexing at distributor systems. So, our focus during the early stage of big data movement is how do we actually scale and deploy systems that can actually accept that they won't appear at that automation is storing, and process them efficiently, and make them available for analytics, right? That's where your entire business intelligence comes from, your ML models, and everything else. So, during our journey, what we end up [with is] user behavior, the company's behavior is, okay, no. [Inaudible] is there, and everything else- great.' And we see the movement towards the cloud solutions themselves. What we notice is the distributor system, the complexity of crossing, is becoming a solved solution, because you have amazing systems coming through, through the cloud providers across Amazon, Google, Azure, and whatnot. And we found what happens if data is exploring, data can be processed efficiently. Is that data is being useful to the automations? Is it actually moving the needle? Are they getting the right business stations and everything else? So, at that point we thought, hey, what are the companies that are kind of [inaudible] of the challenges are solutions, in fact, right? With that, we both joined Uber's inter-product reset on data. Like, how do you analyze all of this data efficiently and make decisions, make riders and the drivers make itterations in real time, and all of that, right? So, when [looking] outside in, Uber looked like an amazing place because there's amazing engineering work that is going on. We thought, hey, we're gonna learn a lot about how to operate data at scale, not just store and process, but understand the data at scale. But when we went there, the problems are every day or every other day. Problems are different scopes. The problems are not so much in processing the data itself, but understanding the data. Like when the schema changes, what breaks downstream? If you ask a [for] business concept, such as location, it depends on which team, which user you ask, which engineer you ask -you get a different definition. So, location is such a core concept to Uber, we're not able to understand [it] efficiently and universally. The data itself, discovery problems, right? When data infrastructure became self-service, like we did in Uber, everyone wants to pay it because everyone wants to run some experiments, [and] understand the data. So, if you have a trips table, there might be hundreds of different trips table named slightly differently. Now, if someone is building a quarterly report and they wanna show that, 'in this quarter we made X amount of trips,' which table should they go to? This is a famous incident that happened. We under reported a number of trips taken because analyst accidentally found a table that is not kept up to date. So, how do you analyze the data that is fresh and complete? Then comes the lineage problems, then comes the GDPR problems. When the GDPR mandate came in, I think 200 of us manually classified all the data across the board. So, how do you manage the data at this scale? To me, to your question, is this the problem new to the AI just because LLMs are coming into the page? No, it's a problem that has existed. Now, it became amplified that much more, because now you have an entire data ecosystem you're just throwing [to the] LLM, and it can't figure out what you don't as a human. So, that's the challenge that you're facing.

Ryan Donovan: It's interesting. Uber seems to have pioneered a lot of problems real time. We talked to a bunch of other companies that have spun out of things that of managing the massive service ecosystem there, and also the data there. So, we talking about these problems, are these solely problems with Uber scale, or do these affect smaller companies, too?

Harsha Chintalapani: Yeah, so our initial reaction [was], hey, this is probably very unique to Uber because of the amounts of data, the amount of people, and all the combinations that can come into play. But when we went down, talked to our network, we have been building that infrastructure from Yahoo days, Hortonworks has tons of customers, of waiting capabilities, right? You have fortune 500 all the way to middle tier automations, but after a certain very small scale, data becomes that much of a harder problem to maintain because [or] your incentive. You wanna set data engineers free in the sense that you don't wanna gate keep them on how to use it because you wanna understand how they analyze the data, and how they come up with the ideas, and the explosion of ML models too, right? You're actually infusing this data that is coming through for ML models to do AP testing. You wanna analyze, for example, in case the of Uber, and what is the ETA across the restaurants, what exists in all of this. So, there are a lot of cool things that happen because of the data that is coming in. So, it became a problem even at a middle scale, or even a small scale. Outside of, I would say, when a company comes in a startup, everyone is- we are all data engineers to an extent. That's part of the problem, right? As soon as you started building a data team, or you started saying that, 'hey, we have a product, we need to analyze the data that is coming in,' this becomes a problem. Even at Collate, we are a small company right now. We see , hey, where is this customer stage, and who is the customer? Are they happy with our product? And all of these things, we're doing our own in the top 40 in terms of data. The data compared to Uber is extremely small, right? We don't even want to compare to that. But there is a challenge, inherent challenges of even the small data, that lead to understanding the data. And what is a customer? What is a customer means? What are the different measurements that you wanna- metrics you wanna measure against? all of them beyond, let's say, 10 people, where you can talk to each other. It becomes that hard to maintain.

Ryan Donovan: A lot of companies split their production database and their analytics database. Is this something that comes from that, where you're taking trips or whatever, and aggregating them as something else? Or is this a sort of standardization of names problem, or some secret third thing?

Harsha Chintalapani: Just to address the first point: so, you have your traditional databases where your application's running against your users, and APS and [inaudible]. The need for the big data came from- the need for the data warehouse came because you don't wanna run a long-form analysis on a transaction database. They're not meant for that, and you're running that huge query. You may be impacting your user traffic. That negatively affects the application of the APS that you build. And secondly, when we're building original cached databases, it's not necessarily a time series data, right? You wanna measure how many times Harsha comes to, let's say, Amazon. You can go extrapolate that into cities, zip code regions, types of categories, and everything, a lot of dimensions, and everything else. That, and there are logs that are coming into your servers for when you want to analyze all of that. So, it became efficient to actually store that into a data warehouse in different dimensions. You get raw data, then you split that into dimensional tables. Then you have fact tables, which actually speaks to what exactly Azure is doing in this time. So, there is different reading concepts that came into place that allow you to derive this business intelligence. So, that's the reason we moved away from APS query, you know, MySQL Postgres in production, to Snowflake, or Databricks, or similar data warehouses.

Ryan Donovan: AI, I've heard a promise that it doesn't need structure to the data. You could just dump a bunch of unstructured data on it, and it would figure it out. But I think what you're saying is that once you hit these real schemas, these real metrics, real joins, it falters a little bit. Why does AI falter when it comes to structured data?

Harsha Chintalapani: Yeah, so this is where we are talking about semantics, right? So, what exactly [are] semantics? Right now, the context is the word that everybody's throwing around in AI. It's just an important thing. So, you and I agree on what annual recurring revenue is. That's great, but we didn't document [it] anywhere. We just spoke to eachother. Now, from two people, it became 20 people; now we need to go educate 20 people. 20 to 100, it becomes that much harder. So, unless you have this- and now you throw LLMs into it. How does it know what Harsha and Ryan discussed? What exactly does' error ' mean. Now, it becomes a lot more [of an] organizational challenge. That's where we are talking about [how] it's a human problem, and it becomes that much more extrapolated with the LLM's thrown into the mix. One of the examples that I quote is, 'what is a customer?' So, it is an automation. If you talk about a marketing person, the customer might be anyone coming to a website and clicking, spending some time. If you ask sales, they might say that, hey, anyone who has talked to us about buying our product is a potential customer. And if you come to product engineering like us, they'll say, ' hey, anyone [who] is in production is a customer.' So, the definition of a customer varies from the person you speak to and the domain they're in, right? That is one challenge. How does customer 'health' [get] maintained? What data are we collecting? What telemetry are we collecting? It's probably in different places. You have tickets coming in, you have product usage. You may have meetings with them to understand how they're using the product or whatnot. So, all of those different signals, now can get into your data infrastructure through various different channels. You created all of these tables. But if I, as a product guy, came in and asked, 'hey, show me our customer health report.' How [is an] LLM able to find the table of the data to analyze that, and give you the response back? That is a challenge that we're in right now. So, if you flip around, this could be a discovery problem itself. If you take away [the] LLM, [and] if you ask me, how do I find the customer's data? I need to go search somewhere in different places, that is the one problem. And if I do find [it], there might be a lot of duplicated data; I need to have a good understanding of what is the right table to choose from. If I do find the right table, I need to have access. If I do get the access, I need to trust the data that the pipeline responsible for that particular table is actually functioning. We have good data quality, data freshness. I'm not looking at a stale data from 90 days [ago], I'm actually getting the latest data. So, that's a human problem throwing the LLM. It's as complicated as it can get, because now, it doesn't have any context. It doesn't have any semantics as well, which is how is a customer defined? What is a customer health metric? How do we calculate what signals, what score, that we associate with different services that they're interacting with? So, that becomes a semantics problem in the AI world.

Ryan Donovan: A lot of these semantics are, like you said, you and I, we're talking, we figure out what a customer means for one of us for product, or sales, or whatever. But this is about making that sort of tacit assumption explicit, and figuring out what you're talking about when you're talking about your business. How do you go about discovering that in a business? That seems like that's a huge problem outside of the data. It's just like a business person problem.

Harsha Chintalapani: Yeah. I think the goal for us is to start with Metadata itself, not the data, right? So, if you go into the data that's huge of records or whatnot to make sense out of it. So, we go as a first outer problem, this is exactly what we have done at Uber, as well. When one of the examples we got there is an analyst in Uber, when they join, it takes them almost three to four months to even actually be starting to become productive, because someone says, 'hey, go figure out what are the top order items in Mexico City?' And from there, they need to reverse engineer where the data comes [from], and they'll now go ask the Hadoop team, and Hadoop team says that, hey, [inaudible], Uber Eats is the owner of it. At Uber Eats, one engineer leaves, they don't know the context of it. Finally, they found the table. They don't know if the data quality is being kept up to date. They need to jump into a bunch of other tools to figure that out. Data access is a problem, and once they finally understand, the columns don't make any sense because it came from applications data. Nobody actually spent time to document it. That's a problem that we're trying to extrapolate that into [the] AI world; it becomes that much harder. So, when we started Uber, it's, hey, how do we actually understand all the systems coming through? So, [inaudible]. So, we brought in all the Metadata. As a first order problem, is Metadata itself is the time when we're actually building open Metadata, and before Uber, there is no standard to capture the Metadata. By that, we mean what a table should look like, what should be the structure for a table's Metadata. It should have a name, it should have a service. It belongs to a database, it belongs to a schema. It must have ownerships because data without ownership, it's like code without ownership, right? It must have some level of data quality signal absorb signals, so that you know the user who's using it should know that, 'hey, this data is actually old. This is kept up to date.' So, we went and defined the met schemas and brought in every automation part of it to go and scan, and get the metta itself. So, if you didn't do the scan, if you ask people to do it, that never gets done. So, we automated the hard problem of getting the Metadata. So, what does that do? Is the ability to understand if I go and search for customer data, I can actually say that, hey, customer data isn't high in this region, and it's owned by [a] customer support team, [a] certain part of the automation. Now, as an analyst, I can not only discover, I can go to the right person to ask the questions about the data. That itself solves one part of the problem, right? So, same thing that we brought into open Met Metadata is to actually define, we wanted this problem because we're looking for this tardy iteration of the product that we're building. We built Apache Atlas, then I worked at Google, and now with Open Metadata. So, we went and created schemas that capture every part of data, including some parts of unstructured data, as well. And how do these things relate to each other? A customer's table has 'address is classified' as BI sensitive. So, all of these are automatically created for you when you talk to your snowflake, your Hadoops, your MySQL, your Postgres, so all the way from APS when the data enters through the network request, that lands with traditional databases. From there, you may be using Airflow 5 Kafka Connect to move the data into your data warehouse. From there, to your [inaudible] for business inclusion dashboards. So, end to end data movement is captured and addressed systematically in a schema-first way. And all of them have an RDF store, Associated RDF relationship, which is, 'customer table has ownership from accounting.' 'Customer belongs to finance domain,' 'customer has PII sensitive data.' So, all of these are actually expressed in [a] knowledge graph. So, when you connect all of these systems, it's automatically doing all of these things. And while we are doing this, we are saying, 'AI should not be the ones to make it data ready for AI. AI should actually make the data ready for itself.' That's where we brought in agents to document the data, [and] to classify the data. So, once you scan all of these things, you have end-to-end understanding of how the data flows, who owns what part of the data, what kind of schemas it has, what that schema actually means in a description or in a documentation. Then, you have data quality tests automatically added for you based on the semantics that we understand of how the data is being used, because we're gonna look at all the query logs also on how the data is being used. So, essentially, it becomes a map for your data.

Ryan Donovan: It sounds like you're tying the data to almost the larger org and the org processes, right? Which I know that is complicated process to figure out who owns what. Why is that so complicated? Why do engineering systems and processes get so complicated, where you don't know who owns the data?

Harsha Chintalapani: So, that's a great point, right? So, this is what we call a scattered cultural problem. So, as engineering, when we go to a service, when we are putting a service into production, we understood as an industry how to process it better, right? Data is lagging behind it. So, what I mean by that is: in typical organizations, you go through a decent document of why we're building a service? What problems are we solving? And when we're actually trying to productionize, we bring in all the necessary parties, stakeholders, DevOps, and whatnot, saying, 'hey, this product is going into production. This affects so and so's other services. These are the SLA guarantees. This is how we do run books. If you need to call if something goes wrong, here is an on-duty rotation that we have to further service the call.' Whereas when data comes in, we don't do any of that, right? Everyone is doing self-service. They're saying that, 'oh, I'm gonna do customer _xyz because I want to run some experiments, because I don't wanna touch the main customer table.' But a user who's not oblivious to all of these things, a business user, for example, 'I'm gonna look for customer data and customer _xyz up here, first of all, oh, it makes sense. It looks like customer data, and I see the schema. I'm gonna start using it.' Whereas it's one person's experimental table, right? So, we do not have the same cultural effect towards the engineering that we do into the data. So, this is one of the aspects that we actually brought in from our engineering kind of culture back into data, right? So, what we [do] when we're actually bringing into data is we walk backwards from business analytics. [This is] one of the most important data dashboards that all the business users are using. You can analyze based on the usage, you can get the anecdotal evidence from the people. So, from there, you walk up foot saying that, 'hey, this is the dashboard that we are making [inaudible] daily to the business. There are like 10 or 20 of them. Now, let's walk upwards the chain, which is the lineage, right? So, that's where, when we're bringing the Metadata, doing column lineage, we walk up the chain saying that, 'this dashboard depends on this data. If this is a most important dashboard, then this must be important.' So, then we, what we call ' tiering,' so services as tiering concepts, right? An example would be Kafka and Uber are tier zero because Kafka goes down, everything comes to stall all our applications and stuff. So, people like me as a Kafka expert, will jump in and go fix it, but we don't have the same implications to the data. We wanna bring that. So, if this is a tier one report that we're sending to external parties, if this is a tier one dashboard that every ex could be looking at on their mornings, it must be maintained in a tier one fashion. Which means we go, mark, walk up the chain of the lineage, and mark everything as tier one. When you're marking it as a tier one, we bring in certain aspects to it, which is data must be owned for a tier one, it must be documented. It must have a data quality test. It must have an observability test. It must have on duty call rotation, as well. So, that's exactly what you're talking about. We know about engineering but not about data. So, it's more of an artificial cultural thingy. But we brought in some of the concepts that made successful from engineering point of view into the data.

Ryan Donovan: Yeah. It does seem like in the last few years, I don't know, if because of AI, but maybe in parallel, that data engineering has started to get a same kind of tooling, same kind of processes, same kind of treatment as the actual engineering. How did we survive without it?

Harsha Chintalapani: That's the reason for the firefights that we used to have. The example would be like, no, we have tips. How do they process the tips? Because when you get off your pride, you won't get immediate notification. You probably look towards three hours later, a day later, it'll pop up saying, 'how was your ride?' Users tend to choose that there hours later or days later for the driver. That means as a pipeline, we need to process what the trip cost so that drivers get payments, but tips get processed as they come in, right? It may not be the exact pay date, but they will get approved into the next pay date or whatnot. And the pipeline responsible for that failed, and nobody noticed. That's because the pipeline doesn't have any ownership. Nobody properly understood that this is the most important pipeline that affecting an extremely important part of your product that is a driver. And what is the outcome of that? The outcome is bad press to Uber of, ' you guys are not processing the payment. Something bad is going on.' But in fact, a pipeline went down. So, six months later, someone notices that, 'oh, this is a huge accounting error.' Then you rush backwards and do all of these things. For your question, the answer is backfills, right? So, if the data crossing has happened, but the [inaudible] have already happened, now, think of this for [the] ML model. So, we spent huge amounts of money. I don't wanna quote the number, but it's a huge number, where we targeted the wrong demographic for the ads and incentives, because the data came in wrong. You go through [and] the model says, 'hey, this [is a] demo breakfast, if you advertise in coupons, they tend to buy more Uber Eats, more Uber rides, or whatnot, right? So, those are the implications of not doing the data right.

Ryan Donovan: It sounds like back in the day we backfilled whenever mistakes happened, and I assume that a lot of people are restructuring their data systems to make it more understandable. When do you think is the right point in a sort of company lifestyle to start thinking about the Metadata and start thinking about the connections?

Harsha Chintalapani: As soon as you start a data team itself, where you said, 'okay, there is a value in our data. That means we're gonna invest and create a data team.' I offered, there is two signs of understanding of this, which is a centralized data team and a half data team, which sits along with each department, which, as team skills, there is a different route they take, which either way, you can't do it either way. But once you start doing that, that means as a business, you said to yourself that, hey, there's definitely value in data. We need to start investing in there through the personal, right? Just putting the personal without safeguards is what gets into all kinds of trouble that you have. The dashboards not giving you the right things, the ML model is not working, the [inaudible] is not correct, that we just talked about all the concepts from. It's not about, 'I need to have hundreds of columns of tables, petabytes of data, then only governance is the problem.' The reason for how this happens backwards is we always rush towards the most important problem right now. Our most important problem that we think, right? It's not necessarily, sometimes, I'm guilty of not writing enough testing because I wanna solve a bug, but you're gonna backfire if not one day from a month, two months from now. So, that's essentially what the kind of culturally we ran into as in, 'oh, let's get engineers, let's get an insights person. We'll figure it all on the go.' But that, you will end up paying dearly because by then, you go exponentially in the data, and it doesn't make sense to anyone what data we have. Then, you retract to think, oh, we need a governance solution. We need a Metadata solution.

Ryan Donovan: So, let's assume we were talking to some company that didn't take your advice to start early. They went and chased product market fit. They gotta have a product now, and now they're like, 'oh no, we gotta figure out the semantics of our data.' What can they start doing? What's the sort of first, highest value thing they can do? And then, how do they do the rest of it?

Harsha Chintalapani: We need the Uber when we're trying to build- we built up a checklist, but this is more of a Metadata platform that we're building. We looked up what is available in open source. There aren't that many options. Nowadays, there are a pretty good option for anyone who's looking for small, medium, or larger that they can start with. So, why that is necessary? You have automated connectors that can pull Metadata as they're changing. You don't need to do anything. Just connect, put the credential up and running. That at least gives you an understanding of what data you have. We have a lot of companies coming into open community channels right now saying that, 'oh, we're able to tell you 16,000 dashboards.' My question, how did you get to 16,000 dashboard? That's crazy.

Ryan Donovan: How do you read 16,000 dashboards?

Harsha Chintalapani: Exactly. Who created it? And that's an amazing thing. They were able to understand the usage patterns that [was] expressed in the open Metadata. How many are scale? How many are not being used anymore and able to clear out some of the data that they no longer lead? So, I would go start with that. Then, the second important thing is how do you get to the semantics? Understand which is your metrics, which is your business concepts. That's where you have a glossaries and metrics catalog that we have within Open Metadata that will help you to collaborate around your automation to get an understanding of it. The benefit of that is, with AI coming into play, AI can now open to MCP. We expose semantic solution, all the relations, and knowledge [inaudible] right in open source. You don't need to go to any solutions there. And get use of the text to, SQL just asking questions, becoming the most important thing that all come after, which completely makes sense. You want business users to be- the traditional aspect is business users want to find certain metrics, and they have to come sit with the data engineer. Data engineers need to create a pipeline, create a table. Then, business analytics comes in and says, creates a dashboard, and everything. That part, that whole process [is] going away, or it's squeezing in short because of AI. But AI should know what exactly the context and semantics of the data [is] that you're using. What is the right data? So, when you're saying, ' hey, find me a customer error for the last one week,' it should know that how customer error is defined. So, it'll go look at the metrics catalog if you defined, and understand who the customer is from the glossary. And from there, through those relations, it'll find the right tables. Even if you have huge amounts of data, it'll find the right table to get the right data for you. As Open Metadata, it's not just a Metadata platform. We brought in discovery governance, data quality, and observability, right in open source, so that you can actually deploy data quality at scale and get the signals into open Metadata. So, one of the interesting things, even for us when we do our own dark coding with this platform, it'll tell you that, a customer, this last one we 20% of rows missing. So, that is the most important thing as a user. If I'm actually taking this, 'oh, this is great. I'm gonna take this to my CEO, my csuite, whoever is responsible wants to hear this.' It needs to be grown with a certain level of truth, which is how the data is looking. So, that is what I would recommend there, because you want all of this 360-degree view in one platform rather than 10 different tools you want to try to munch within.

Ryan Donovan: So, how is how you're doing the open Metadata in general, and the semantic understanding of data, how's that evolving?

Harsha Chintalapani: There is [inaudible] who are doing a spreadsheet management, all the way to a proper tooling Metadata platform through Metadata becoming the foundational principle behind how the AI can react, right? So, that's the relationship with between the data itself. So, to take an example, there is a huge amount of Semantic Web [that] came in. If I remember, early 2000, late nineties to 2000, where it was a great idea. Implementation is becoming that much harder because everyone has to adopt to semantic web path. How this, ' Harsha has address, address has this tagging. And forcing the entire web to adapt to that is a losing battle, to an extent. But now, our DFA knowledge gap is coming in with vengeance because that's where your relations are stored, that's where your understanding of the data stored. It just started Metadata, what does an account mean? Now, account has these users. User means this. All of the relations can be driven through this RDS that helps reason for LLMs. And again, if you want to apply the entirety of your data, actual rows and columns, it's gonna be, again, a losing battle. That's where you go in the middle ground of, 'I'm gonna reason with it, I'm gonna create relations with the Metadata. Now, once you have that, you can actually give the Metadata to the LLMs so that they can understand how to understand customer error, how to create a query of a customer error. Then, you delegate that query to obviously, the snowflakes who are kind miles ahead in terms of how to optimize the queries, how to run the queries. So, you're not reinventing the wheel that is already solved. What you are trying to do is, how do you get the data in the right query based on the platform that you're doing? That's where the Metta is becoming that much important for the teams and the automations who are trying to one-up AI, right now.

Ryan Donovan: It is that time of the show where we shout out somebody who dropped by Stack Overflow, shared some knowledge, shared some curiosity, and earned themselves a badge. So, today we're shouting out a Famous Question Badge winner -somebody who dropped a question that got 10,000 views. So, congrats to @buttonsrtoys for asking, 'Possible to edit PDF without embedded font installed?' So, if you're curious about that, we'll have the answer for you in the show notes. I'm Ryan Donovan. I edit the blog, host the podcast here at Stack Overflow. If you have questions, concerns, comments, topics to cover, please email me at podcast@stackoverflow.com, and if you want to reach out to me directly, you can find me on LinkedIn.

Harsha Chintalapani: Hi, I'm Harsha Chintalapani, Co-founder and CTO of Collate, Co-creator of Open Metadata, and Open Source Metadata platform. If you want to reach out to us through Open Source channel and Collate, you can touch base with us on openMetadata.arc. Get collate.io for Collate. Thank you so much for that.

Ryan Donovan: Thank you for listening, everyone, and we'll talk to you next time.

Your LLM issues are really data issues

TRANSCRIPT

Add to the discussion