Transcript: CSPS Data Demo Week: Using Automation to Find and Manage Information
[The CSPS logo appears on screen.]
[John Medcof appears in video chat panel.]
John Medcof, Canada School of Public Service: Good day and a virtual welcome to the Canada School of Public Service. My name is John Medcof, lead faculty here at the school. And I'm happy to be with you for today's event as a part of Data Demo Week.
Before we begin, I would like of knowledge that the land from which I'm joining you is the unceded territory of the Algonquin Anishinaabe people. And I know that many of you today are joining us likely from other parts of the country, and I encourage you to maybe take a moment and recognize and acknowledge the traditional indigenous territory from which you are participating. Thank you.
A few things to share in terms of logistics. So, today's event is going to be in English, but with simultaneous interpretation available. We also welcome questions from participants as part of the demonstration. So, if you would like to ask something to our panelists, you can use the raise-hand function at the side of your screen. Finally, we recommend it you log off for VPN to help you experience the event to the fullest level. If you do happen to experience any technical issues, we recommend you just relaunch the webcast link that we provided to you. But with that, let's jump into today's event.
Maybe I'd start by saying that leading technologies draw on the power of data and artificial intelligence and machine learning can help the public service innovate in providing the best possible services, support, and leadership to Canadians. And as these technologies process numbers, summarize data and manage digital transactions, I think they can provide new opportunities and avenues for us as public servants to focus on strategic and creative elements of our work in new and potentially fruitful ways. And so, throughout Data Demo Week, we're exploring how organizations across the government of Canada are using the power of data and artificial intelligence and machine learning to help the public service innovate and leverage these technologies moving forward. And I should also acknowledge that this learning series is presented in collaboration with Innovation, Science and Economic Development Canada, as well as with our colleagues in the GC Data Community. So, big thanks to both of them.
[Three panelists join the chat.]
And with that, I'm pleased to welcome our three guests today. Our first is Sonya Read, who is the acting assistant secretary of Digital Policy and Services with the Office of the Chief Information Officer at the Treasure Board of Canada Secretariat. Sonya previously held a position of executive director of digital policy where she led policy initiatives to support the digital transformation of government. Welcome, Sonya. It's great to see you.
I'm also pleased to introduce Jason Cassidy, who's the founder and chief executive officer of Shinydocs Corporation. Jason has been solving enterprise technology challenges for over 20 years as a new product developer and entrepreneur. His award-winning solutions are helping transform the digital landscape from traditional ECM to modern content services.
And joining him as Norm Friend, who's a solution to consultant at Shinydocs. Norm has worked the in the enterprise information management space, also for over 20 years, specializing in implementing records and information management solutions in organizations across many industries. Thank you so much, Jason and Norm, for joining us here today. And welcome to you both.
Jason Cassidy: Yeah. It's great to be here. Thanks a lot, John.
Norm Friend: Thank you, John.
John Medcof: Great. So, in terms of the rollout of the event, I'm first going to invite Sonya to provide some opening marks and reflections about the information management landscape in the government of Canada, from the perspective of the OCIO. And we're then going to have Jason and Norm provide an overview and a demonstration of the Shinydocs application as an example of some of the leading work being done in the area of using automation to find and manage information. Very excited about that. Following that we're going to do a Q&A period where we'll have a chance to discuss the demonstration with them. And with that, Sonya, over to you.
Sonya Read, Treasury Board of Canada Secretariat: Thanks so much, John and everybody, for joining us today. I am super excited to be here today as part of this discussion, as part of the Canada School for the Public Service Data Demo Week. I am very happy to be here today to talk about the importance of management, data and information. And as I'm joining you from my home, I'm hoping desperately no one rings the doorbell and my dog doesn't start freaking out. So, we'll see how that goes for the next few minutes.
Over the past few years as ADM responsible for both information management and access to information policy in the sector of treasury board that is also responsible for leading digital transformation, the application of automated tools and systems to support good information management is something I have become keenly interested in. And the importance of those tools is something I am keenly aware of. Information and Information and data within the Government of Canada are strategic assets that we must leverage. We use it to make our policy in operational decisions. We need it to understand how our policies and programmes work. We need it to deliver better services and just to do better generally.
If we aren't managing our information and data properly, these things can be really difficult to do efficiently. If you can't find something, you can't use it. And if you can find it, but it's not fit for purpose, you still can't use it. Or at least you can't use it very easily. The policy and service and digital, which came into effect last year, sets out objectives and requirements so that information and data are managed as a strategic asset throughout its entire life cycle, so that they are increasingly interoperable to enabling ongoing use so that we can support our government of Canada, priorities of openness and transparency. And so, that privacy and security are fully built into how we manage our information and data.
This includes a series of responsibilities for senior officials such as the chief information officer. We have to identify what our information and data of business value is within our organization. And we have to find out, we have to know how long we need to retain it before we eventually dispose of it. But it also includes obligations for managers and employees, so everybody who's on this call today. And we have to understand our obligations around our responsibilities for information management to document our decisions and to make sure that it's placed in the repositories and the places where we can find it or where others can find it later. The amount of information and data is increasing exponentially on a daily basis.
Our ability to manage these assets manually does not work, and especially does not work as we've been in a remote work environment for an extended period of time. At the same time, information and data are more important than ever, a strategic organizational assets. And not just for your individual organization, but for the government of Canada as a whole.
It's critical for achieving digital transformation, service improvement, and supporting transparency through access to information and other open government initiatives. The expansion of disruptive technologies like artificial intelligence, machine learning, and cloud computing are further challenging the old paradigms and the ways that we've been doing things for many years. CIOs are responsible for an ensuring that digital systems are the preferred means of creating, capturing, and managing firm information. But information management must continue to reflect the requirements of the digital age.
We need to increasingly leverage digital tools and technologies to automate and digitize manual IM processes as much as possible so that we can better understand our information and derive consistent value from it. We need to use these tools so that the average employee does not have to be an expert on information management.
So, the average person who is creating a document should not have to make expert decisions about what metadata they need to apply to that information and how that information is going to be disposed of at the end of its life cycle. We have to figure out how to make the right thing to do the easy thing to do for our employees. And we have to take the guesswork out of it. While also making it easier for the IM specialists and the other information and data experts to do their jobs, whether that's managing or using the information and data that we have. Information holds one of our most valuable assets. It must be maintained and managed so that it can be located and used by those who need it.
And this is where the potential truly lies for information management, to transform this asset into a business enabler, as the applications for information and the opportunities for it reuse are literally boundless within the government of Canada. By harnessing the power of automation and artificial intelligence, removing the reliance on end users to manage information, IM can demonstrate incredible business values to organizations and across the government of Canada.
So, this is a potential that we really, really need to tap into, not just so that we can find it when we're responding to an access to information request, but so that we can find it when someone asks us to produce a note that may have already been produced four or five times within the life cycle of a department. Modern digital tools and technologies will help us get there.
To this end, TBS, we are reviewing and updating and simplifying our own IM policy instruments. We want to ensure that we have the right rules in place for managing information and data in the digital age, and that we're enabling the right tools to do this, tools and systems that will support employees and quickly and easily finding the information and data that they need and when they need it.
So, I want to thank the CSPS for organizing the data demo events this week. Throughout all of this week, we have been seeing demonstrations of different tools to support data and information management. I hope the demos this week are helping to inspire practitioners across the GC, but what's the art possible so that we can continue to make the most of our critical information and data holdings. Thank you very much.
John Medcof: Thank you so much, Sonya. I think that gives us a great frame for today's demonstration. And I think you've really sort of set out the challenges and the opportunities in the information management space. So, for thank you so much for being here with us. With that, I'm going to turn the floor over to Jason and Norm to walk us through the demonstration.
[A screen is shared, showing a slide reading "Shinydocs. Using Automation to Find and Manage Information."]
Jason Cassidy: That's great. Thank a lot, John. And especially thanks, Sonya. Sonya, I wish that we could have you do the intro to every one of our presentations, because it set the tone for the importance of information governance. And I like that you touched on the fact that it's- we have technology now that it shouldn't make it the individual public servants problem to do effective information management. We do have tools, and We're excited to show you some of the tools and the utilization that we've been doing with those.
[As Jason speaks, slides summarize his words.]
So, I want to make this really relevant to a public servant so that you understand why you should care and how this should affect you. The important information, as Sonya was saying, needs to be recalled, probably by people who aren't you in the future to do decision making, comply with laws, ensure safety. There's all sorts of really great reasons why we need to have information management and information governance.
The issue that we come across though, especially now with more work at home and hybrid environments, is that information takes so many different forms. So, I'm going to define it today. We're not going to worry about the stuff that's in really formal databases and in really already organized data. I want to think more about the ad-hoc communications that you have, text, email, formal documents, informal documents, stuff that you might just whip together or stuff that you might use from a template, rich media, like this presentation itself is going to have transcripts and there's going to be value in here that might need to be recalled in the future, tabular data that might be embedded right in a document. So, we use open government data for a lot of our data sources and quite often, right in that PDF document or the Excel. There's all sorts of important tabular data, which tells more of a story than acts as a database.
So, we need this information. We need to be able to recover it. And we need to be able to use it as part of our future decision making instead of just as exhaust that comes out of the back end of a business process. And the complexity associated with just saying to somebody, "put it in a different system so that we can do information management" is only really good. And it only really works if it's not actually hampering the ecosystem that it's already part of.
So, there's people across Canada who are draughts people who will work in CAD drawings and those types of complicated tools. You can't just say move to one drive for them. They need their tool set in order to do the work that's important for them to do. There are people have linked Excel spreadsheets and these types of things that might break. I know dealing with their talking to the CRA, they might have as many as 10,000 of these documents with macros that depend upon the circumstance that they're already in.
We don't want to break that ecosystem just for the benefit of recallability or information governance in the future. We have to compliment that ecosystem, yet still comply with the laws and the requirements that we have. So, these are the kind of higher level goals that we're going to be talking about today as we get into the technology.
So, we're going to show you how massive organizations are already doing this. I'll give an example of Bruce Power, who we're in year three now doing this type of technology with them. When we first talked to them, they wanted to be in a situation where the only limiter that they had was the number of people that they hired. But that wasn't the limiters that they had. The limiter that they had was, whenever something needs to be done at this very safe nuclear facility that's one of the largest on the planet, with 4,000 employees, 6,000 contractors, and then 10,000 people in their supply chain network, whenever something needed to be done, they needed to research before they could press that work order button.
[A new slide reads "During today's data demo, Shinydocs will show you how massive organizations are rethinking data, and what public servants can expect in the future from the data they create every day."]
And often the research might be 40 or 60 hours of very important people's time to gather up the right information. So, we came to Bruce Power saying, why don't you pre-calculate all that so we can bring you back the right information in seconds? So, you can say, for any work order, for any procurement document, for any drawing, you want to make sure that you have the right control information so that people are using the latest information and making sure that their employees are safe.
These are the types of things that the government of Canada needs to do as well. You don't want to have to search for 40 hours only to say, I hope I have the right information. Wouldn't it be neat if we use technology to pre-calculate and find a lot of that information so that you get more of a best-buy shopping experience or a Google-like experience for all of your information rather than having to know where it is, and then go actively find it. And even worse, being forced to do that metadata work up front and doing something so that somebody, later on down the line can do this. So, let's make it real. How does this journey happen?
[A new slide reads "How does this automation journey happen? Act 1: Discover your business potential. Act 2: Enrich and connect to your business. Act 3: Act with confidence."]
How does somebody go from a little bit of fog, maybe a little bit of ignorance about where all their data is and what all their data is, to confidence and certainty in being able to act like Bruce Power is and/or other customers.
Well, it starts off with what we call the heroes journey. There's an act one, there's an act two, and there's an act three. And Norm Friend here, a solution consultant, amazing background in this technology and more, who's going to be doing some demonstration for you. But think about this is just like the hero's journey, you need to discover who you are and what you're capable of before you can act with confidence.
And in act one, we'll show how to discover the business potential, how to unlock the things that are easily unlocked in the data. In act two are going to show how it gets enriched and connected to your business in an automated way. Because, quite frankly, you have billions and billions of documents. No humans are going to go through that between now and infinity and then line it with your business. You need machines to do that. And we'll show you how those machines do it.
And act three is all about you. It is now you act with confidence. You want to move all of your data into One Drive M365, or it records into GCdocs. The thing that's holding you back from that is not the idea that migration itself is hard. It's you can't even get started because you don't know the information that should go where and how to do the metadata. Or maybe those workloads need to stay on shared drives. And those workloads need to stay in email. You can't make those decisions without going through a good act one and act two. And you never started act one and act two because you never knew how, and you never had the technology. And we're changing that today.
So, let's have a look at the hero's journey, the first piece that is act one. In act one, Norm's going to show you right now, we're going to find redundant, obsolete, trivial information. We're going to find initial personally identifiable information and align it with certain obvious business assets. And then we're going to talk about what's possible. Making sure that you realize there's going to be two parts to this. There's the cool behind the scenes, complicated- here's all of your data with visualizations in that. Don't worry about that as a public servant, just know that there's a magic trick that's happening using AI and other things that Norm's going to show you.
And then separately, there's a cool interface. He's going to show you a very Google-like wave of just ploughing through and finding the right piece of information at the right time in seconds instead of hours. So, with that, I'm going to stop sharing and let's, let's go over to Norm.
[Jason's screenshare ends. Norm shares his screen, showing three file finders, two of which are in internet tabs, and one is native to the computer.]
Norm Friend: Thanks, Jason. So, let's start with discovery. This is a typical customer that we work with where we will have a number of repositories. So, at the top you can see we have some file shares. On the right, you see we have open GCdocs. And on the bottom, we could have SharePoint Online. All of these are different repositories storing information. Exchange is another one. We have a whole bunch of these.
And so, what we want to do, as we said, is take inventory and look at creating a platform where we can start doing some discovery. So, the first thing is to correlate this into a single area.
[He shows complex pie charts correlated with bar graphs.]
So, by doing that, we end up getting a visualization like this, where I can now view all of my content, again, across all of those different repositories, in a single space. And I can start getting some basic information about this particular set of information or my repository.
And I can filter. I can say, well, I only want to look at what's in SharePoint Online. But the idea is I now have a good sense of where the information is coming from and what kind of information is there. Now, as part of this first discovery phase, I also want to do some cleansing, right? What we call ROT, which is redundant, obsolete, and trivial. And so, again, we run a number of rules and processes over top of our content.
And then we can start looking at, all right, well, what kind of information do I have as redundant? So, these are all my duplicate copies of information. Maybe there's email copies sitting out there. Maybe I have files that have been unused for seven years. There might be trivial files like files that have no extension, zero byte files. Just a lot of that noise within our organizations. And you would be surprised at how much in information comes from this ROT. In many of our organizations, we can see up to 40 to 50% of that content within those repositories, those dark of redundant, obsolete, and trivial information.
The next stage is to start applying, and this is, again, that part of that discovery, is to look at some general information that we would want to apply across the enterprise. Breaking things down to like who owns this information, right? So, is this owned by sales or is this owned by engineering? Or is it owned by a different group?
I also want to pull out general entities. What if I could grab every single date out of a document, every single piece of currency? In this case, everybody's name in an actual document. And so, by doing this, again, across all of those repositories, we now have a very good view of what we have in these repositories.
And by building these indices, now we can start looking at some of the outcomes that we get. So, I'm going to build on these as we go through the enrichment and we go through the action, the different acts. But in this case, if I look here, I know that I've got a bunch of information by a person named Josie. Okay? So, I want to actually look up Josie's name.
[In a Shinydocs tab, Norm types "Josie" into a search bar. A results page lists files, shows a preview and presents filter options in a sidebar.]
And so, I can say, all right, I want to find out anything that contains her name. And again, I can choose, maybe I just want GCdocs, maybe there's a specific I want to find this across my entire enterprise.
And so, now what happens is I have a very, very easy-to-use, Google-like search experience. We're going to get into things like some of the facets that you can use. But now I can clearly see the documents that have that particular piece of information now that I've enriched. And again, it doesn't matter where this information sits. It will bring back the information within all of those repositories.
So, that sets the stage of that first act of getting that landscape, that discovery, the inventory, and getting its prepared to do the next act, which is the understanding.
[The screenshare toggles back to Jason's slideshow. It summarizes his key points.]
Jason Cassidy: Thank you, Norm. And so, to recap there, what you saw was the initial enrichment steps. You saw how we understand the data, find the duplication, find people's names, find dates, find money, find all of the obvious things that are easy to do for an organization. And that is step one.
So, step to two now is when we really start connecting it to your business. This requires some data stewardship or some people that know your business really well, but it is automated. This is 10,000 documents at a time or hundreds of thousands of documents at a time. So, what Norm's going to show you is how to use the in intelligence. That's already baked into your organization in order to now enrich the data, because, as you saw, the last thing you showed was this cool, easy search interface.
How do you get to easy while you do the enrichment? You connect it to the data, you make it so that anything that you need to think of is already pre-calculated and therefore you as a checkbox or as a slider instead of you having to think about where the content is and go and find it. So, I'm going to stop sharing here and then let's get into the enrichment, Norm.
[Norm shares a form and a spreadsheet full of metadata.]
Norm Friend: For sure. Thanks. So, the first bullet point there is super important. And we use a term unified information model as a framework where we have information sitting in all of these applications that we have. And that information is related to a number of documents that we saw in those repositories. So, in this scenario, I want to pick on a purchase order.
So, on the right-hand side, I have an actual document, it's a PDF document of a purchase order that was provided from, in this case, Adams Crooks and Low. And on the right-hand side, I have my ERP system that has a whole bunch of information around the company name, the short name, the city they came from, the province, all of this structured information. And sure enough, I'm going to have information like the PO number and the vendor number.
That information is no different than the information we see in the actual document. So, part of this unified information model is starting to bring these concepts and this data entity into a single environment. And the result set of something is like this.
[Norm shows more pie charts and bar graphs.]
So, as we do some further enrichment, I can now look at documents that I've tagged with my, in this case, my purchase order information so I know which ones are small, medium, large ones, which ones have tax and which ones don't have tax. I could see my year-over-year growth based on all of those different purchase orders.
And I could even see these are the purchase orders that I have available. And so, I can see here's my [inaudible] in crooks. I can drill down into each one of these. But what we've done, again, across all of those different repositories, is start doing some enrichment. And one of the things is, again, taking all of those line of business applications and bringing them over into our index to be able to then do further analysis and further discovery. Now, now that I've done that, let's go back to Discovery search.
[Norm navigates to the Shinydocs Discovery search page and uses the search bar.]
So, now, I'm looking for... I'm in finance and I'm looking for, who do I want to search for? Auer and Walter. So, now I'm looking for a purchase order for this particular group. And again, maybe I want the exact phrase of that thing. Again, I can pick a particular area, but I just want to search anything that has Auer and Walter in it.
And so, the results set, again, comes back, but now that I've additional enrichment, I can start doing things like, well, I know it's a purchase order. So, I don't care about some of those sales proposals that are out there that we worked with. I just want the purchase orders related to that.
[He filters results.]
Or, what if I'm looking for a purchase order total amount. So, now I want to find that purchase order that it's between say 300K and let's say 500K or 600K. Because I've done that enrichment, now through things like our search, you have the ability to leverage that just like you would on any other website where I'm very quickly filtering my results to find that specific documents I need.
And then further enrichment. What if I could do things like this? And we did this for a particular leasing company on the side here where I now want to look at all my lease agreements based on their geographical location.
[Norm shows maps with yellow and red dots on them.]
Or I want to take a look at my entire index, and I want to find where that document is geographically related to? Again, across all of those repositories, by simply taking that starting of enrichment of act one, and then adding to it all of these different enrichments, depending again on those data domains that we're looking to enrich.
Jason Cassidy: That's great, Norm. And what we find with our customers is this, there's no one act three that rules them all. It is the idea that, now that you can search the information with the ease of Google-like experience, that all the hard work that they did on the public web, now you've got done inside of your organization and ecosystem. It's better than just searching for keywords, as you saw, because it knows geolocation information. It knows temporal information like time-based information, how old something might be, and how deprecated it might be from an interest point of view.
And most importantly, it knows how to line up with your business assets. If you have case numbers, if you have like, as Norm showed, their purchase order numbers, if you have vendor IDs, people's names, staff names. It does the pre-calculating of all this information. So, now this becomes filters or facets for what you want to do. So, it allows you then to really understand all of your unstructured data and make the of decisions.
For example, there's one thing that we talk about a lot when we talk with the government of Canada, is about how to modernize simple things like data storage, how to start utilizing the tools that they've already purchased like GCdocs and Microsoft 365 and all of the cool elements that come along with that. And the biggest limiter is our 0.3 here, is that, how do you actually move to something unless you know what you have in the first place?
So, this takes you from, in some cases, abject ignorance to complete confidence. It allows you to avoid the embarrassment of, perhaps a data leak, or a hack, or that type of thing where you don't even know what got lost because you didn't even know people were looking at it. And now you go to the point where every document on my file system, in my email, on GCdocs, on SharePoint, it doesn't matter where it is, we know exactly what it is. We can secure it the way it's supposed to be. Move it exactly to the place it's supposed to be.
And most importantly, we don't disrupt anybody's workflow because we'll know for sure. If we go back to our friends, the drafts people who are working in CAD, we know that we're not going to move any of their content until they are ready to move. We know that we're not going to break any linked Excel spreadsheets because we're not going to move them until we have a solution. And in the old world that I used to work in, we'd move somebody in linked Excel into a web-based things like GCdocs. And then we'd have to do link fixing across 10,000 documents. It's almost like breaking your legs and then selling you a crutch. It's silly. Don't break anything. Act with confidence. And it's going to make your life easier as a public servant and make you best informed in order to make these decisions.
And I hope that, despite the fact we're not showing anybody to a save-as in Microsoft Word, PowerPoint, Excel, or in CAD, or anything like that, one of the underlying cool secrets of this entire act one, act two, act three is it really compliments the fact that you're going to work the way that you work. So, we want to compliment that. So, Norm, why don't we just go through a couple more use cases? Because we can't be comprehensive with all the different cool act threes, the things that you can do, the now that you're the hero, but maybe walk through a few more examples of what we can do.
Norm Friend: Absolutely. And as mentioned, the action part of what... Once we'd actually got to understanding the action part is going to be different for every single use case within these repositories, because I now have the ability to action these things. In this case, I'm disposing. It could be migrations. It could be, I need to now give this... We're in data week. I now need to give this as a data to catalogue to further do analytics in any other analytics system that I want to do. Because we've now created a structure within our unstructured data.
And to the point of confidence, again, I'm going back to search. Because we've done this enrichment, what if I knew that exact purchase order that I need to find?
[Norm uses the advanced search feature of the Shinydocs Discovery search bar. Only one result comes back.]
And what are the chances, now, that when I know that exact one purchase order, that one document, regardless of what repository, in this case, it happens to be in GCdocs. It doesn't matter where it is. We've done the enrichment. And now we have the ability to do the actions like being able to super quickly find that content that we need.
Jason Cassidy: That's awesome. Thank you, Norm. So, what we'll do is go through a couple more just background items, how we got here and the type of utilization of this technology and things to think of as a public servant. And then let's get back to together with John and we'll answer some questions as questions are pumping into our system here.
So, just to give a background of where this came from. There's something that really interesting happens between 2007 and 2009. And that is everybody on this call figured out how to create the digital content and nobody was ready for it. And I'm not saying anything bad about GCdocs or SharePoint or file shares or anything. It was just, it happened to us.
[In his slideshow, Jason shows a graph with data amounts over time. At about 2007, the data begins trending exponentially upward. Two points are highlighted: on bar at about 2005. It's marked "largest SharePoint." The other is at about 2007 and is marked "Colonial Pipelines."]
And here's an example. This is one of our customers data. I'm not going to tell you who. But in this case, it's 760 terabytes of data of just file shares. It's 1200 file shares. And when you think about the types of things that are happening in these organizations, for example, this isn't the colonial pipelines data. We're just using this as an example. This little blue area here is a hundred gigs of data. A very tiny amount of data one-70th or less of the overall content. That's how my much was hacked in colonial pipelines that brought down fuel flow to the Eastern Seaboard of the United States.
So, think about this company here with literally hundreds of times that data now. And we call out the idea that this isn't specifically the largest SharePoint. Just what's in green here is represented of the largest SharePoint instances on the planet, which tend to be around 100 to 200 terabytes of information.
In order for this organization to put everything into M-365 or put everything into GCdocs, they would have to have seven or eight of the largest instances of these things on the planet. That dog won't hunt. It's not a real thing. And their data is increasing as you can see. So, the only way to deal with this is to automate the understanding of it. And then automate putting the right things in the right place. Of course, we want things like GCdocs for records management. We need it. Of course, we want things like M365 for collaboration and a hundred other uses. Of course. And I'm not saying we're not doing that. What I'm suggesting is there's a step zero. There is a prerequisite step, which is understanding what your data is, how people are using it, automate that, and then automate the activity of bringing it to the right place when the users are ready.
And I hope that this visualization really reinforces the fact that anybody that says that our strategy is moving it somewhere, that's not really the strategy. The strategy has to be, I'm going to understand what's going on, and then I'm going to decide the right place to move it based that. And this just reinforces this.
And again, this is not an anti any platform speech. And please I don't want anybody to take it that way, because even at the head of the call here, one of our colleagues was literally working in GCdocs and they were saying, they have a love-hate relationship with it. They get the value, but they understand how it might limit their ecosystem that they're working in. And in my opinion, that's because there hasn't been really good strategy, guidance, and technology to get people there as opposed to the actual utilization of the systems themselves.
So, I put it in all caps, is I'd like everybody to demand complimenting the way that you work. Please don't ask me to change until we understand what the change impact is going to have on us. And then leverage systems like what Shinydocs is showing here in order to get people there. And I hope we can count on people to be vocal on that. So, we'll find the data wherever it is.
I know that Facebook isn't the best example of maybe social conscience lately, but it is a wonderful example of how to package up huge amounts of information and find the right piece of data. And obviously, they do this to you, not for you, but Shinydocs is doing it for you. We connect every file share, we connect to every SharePoint, we connect to every GCdocs, every email. Know where it is, know what it is, and how it can move. But we're not saying put it into our Shinydocs. There's no such thing as that. This is just an understanding and a network across that you can ask questions of. And that is the modern solution.
[A slide shows a long list of pros to adopting an enriched data platform such as "The modern solution requires no migration or performs a migration so seamlessly that users will never notice it."]
So, maybe take a screenshot of this. All sorts of good points that I'm not going to go through point by point as to the value. But the last thing I will say is some people ask us well, okay, why now? What's different now? And it is a great question. Now is the first time in the history of mankind that we actually have the computing capacity, the network capacity to do this at computational scale. We can crawl your information fast enough. We can enrich your information fast enough. These activities, 20 years ago, would've taken 10 years to do something now that it takes 10 months or 10 weeks. So, we can be very thankful that the entire industry has put us on the shoulders of giants that allow us now to apply these technologies to the specific data management problem.
And if you're truly interested in this as a decision maker, we do have a standing offer for everything that you've seen here today. And keep in mind this is just as much a strategy as it is software. But of course, we are a software company. So, we're happy to talk to you about strategy and software as we go here. And back to John. Thanks so much for giving us the opportunity to talk.
John Medcof: Yeah. Thank you so much, Jason. Thank you so much, Norm, for this tour of Shinydocs as an example of the exciting change potential that is really happening right now, as you said, in the information management domain series is about demonstrating the power of these technologies and thinking you've really given us a view into the value of being able to find the right information at the right time. Very, very cool.
And I felt when Norm pulled up that screen in act one, that was my desktop this morning. And a lot of the challenges you highlighted about the kinds of data each of us are sitting on really rang true with me. So, we do have some questions that have started to come in from the audience and maybe I will turn into some of those. So, first of all, question is how does Shinydocs identify filters on unstructured data? Does it find patterns by itself through machine learning or is it some kind of manual process?
[The screen share ends.]
Jason Cassidy: Yeah. That's a great place to start, John. It's definitely not manual as we suggested. Our customers have billions of documents. Bruce Power alone had a trillion data points, going back to 1968. You just can't do anything manually. What's very cool is, a lot of times, organizations already know a lot about your data. As Norm showed there, he showed an Excel spreadsheet that was from an ERP system.
Quite often, we do know from, we'll have spreadsheets of some structured information. What are the purchase order numbers? What are the cases that people are working on? These types of things. And the first move is to go and pre-calculate across, where do we find this across all the data? And then you can make some pretty easy and obvious inferences. You can use machine learning and classification to identify something of, okay, what is a work order? What is this type of report from whatever department that you're in?
We can use those types of things. But then you can triangulate that with other information. Well, if it always contains this type of title and it always contains this type of content and information, it's usually this type of document. And you can make some inferences from that. And it only works if you can do the massive scale search across the information and discovery. And it only works if you have some information that seed it with this business concepts.
So, what we find is about 70 or 80% of the data gets that initial inference. You pretty much know what it is just from the structured information that you make available to it. And then there's a long tail from there. The last 20% is sometimes you have to do proprietary work, but it's never manual, because we draw the line at manual. That's yesterday's work and it's never worked.
[John speaks silently.]
Oh. Looks like we lost your mic there, John. And I'm looking at the Zoom Chat right now. You can just paste the question into the Zoom Chat if you'd like as well.
John Medcof: Jason, I had to reconnect my headphones there.
Jason Cassidy: Oh, there we go.
John Medcof: Sure. Sure, if you can hear me now.
Jason Cassidy: You're good to go, John, you sound great.
John Medcof: Okay. Here's is the second question. [inaudible] or taken during a breach, how would that work?
Jason Cassidy: Yeah. So, the issue that we see customers encounter is when you have a data breach, you'll generally know the area of interest that was attacked. It'll be the following file shares, or it might be this web application, an area inside of SharePoint or something like that. And the issue that people run into is, okay, how do we know for sure? Sometimes there is analytical information like audit histories that can be brought into our analytics engine to check that specifically.
More often than not people don't have that foresight. So, you just us have to be able to use a tool like ours to say, well, what are the concepts that are in the 10 million documents that may have been compromised. Because you're not going to go through 10 million documents line by line and say, this is the personal information that might have been exploited or this type of thing. You can only know the concepts.
Well, we return the concepts instantly. So, the idea that the concepts come back really fast, and we can say for sure, this is the type of PII that was compromised or not. This is the type of proprietary information that was compromised or not. These are the types of things where people can have some certainty. Even better, if we can bring the audit into the analytics, then it's really good.
John Medcof: That is great. Thank you so much. And the questions keep coming in. So, I'm going to go to what viewers are asking. One of them is asking, could you expand on how enrichment is done, and how much time it takes to set that up at the outset?
Jason Cassidy: Yeah. There's two different types and that's why we break it into act one and act two. Act one happens very quickly. It's as fast as the machines can consume it, but it is, in a sense, generic people's names are people's names. And the way natural language figures it out. And obviously. Canada's a wonderful place for this because we have a plurality of cultures. So, there's no such thing as just a database of people's names.
But the way content is used in English and in French and in other languages is makes it obvious when names are used. And that's what Norm was showing to find that initial enrichment. So, terabytes of data, hundreds of terabytes of data for that act one can happen in a few weeks or less, depending on the volume of data that you have.
Act two, for example, with a big organization like Bruce Power that has hundreds of terabytes of information, a lot of it is very proprietary, it's very specific to their level of asset management, their level of controlled nuclear information and the work that they do with the Canadian security establishment and WANO and these other things. There's no AI tool on the planet that can just cherry pick that.
What you need to do then is bring in all those business concepts and then automate the business concepts. And in that case, it can take a few you months or it might even take a year. But the beauty is, once you've done that work, then you have those models for ever. Every new document that shows up from now until the end of time gets automatically treated with the right treatment for your information governance. Every change that you need to make is now small and incremental. And it's not human that has to learn everything about this document to make sure it gets into the right spot.
So, the payoff is huge for what, quite frankly, isn't... It's about the same amount of effort as somebody manually dragging and dropping a few thousand documents. If you do that same effort, automating it, now you have it forever.
John Medcof: Okay. Thank you. That's really helpful. And one thing that we really struck me as you were talking, and as Norm was doing the demonstration, is the degree to which there's been a shift to focus on the end user as how we use these tools every day. And I think Sonya talked about that average public servant at the beginning of her remarks. And I very much feel like that average public servant. So, could you maybe speak a little bit to how there's been an evolution and perhaps a greater focus now in tools like these for thinking about the end user and the average user within the organization and not just the people enabling them behind the scenes?
Jason Cassidy: Yeah. Certainly, John. Technology used to just be technology. And you'd hire IT and then they'd do technology things. And that would get all of the investment if it ever worked. If it ever just nailed it and it made your life easier, then all the money would still be going there. The problem is, it always technically worked, but it never worked for you.
And that's why the switch is happening right now, is that organizations, governments, private organizations and everything in between are realizing, I'm only going to put my money, I'm only going to put my bets on things that are proving to have positive outcomes.
So, when you're doing a digital transformation, and that's a really loaded word, that can be a lot of things from robotics to the type of work that we're doing here and others, but when you're doing a digital transformation, I suggest aligning it with one of four things, the product that you provide to your customers, the service that you provide to your customers, the customer experience of staff or your customers, or your core business processes like Bruce Power and others where they just made their business processes so much more powerful.
As soon as you are doing that, it's no longer an IT project. It's all about the outcome an experience for other people. Because, I think you'd agree that... by 2040, there's going to be no such thing as paper processes. We're not going to go down to our local canada.ca office to do anything with paper anymore.
Well, how did we get there? We got there by looking at the customer experience that we have, the staff at experience and making sure that those are digitized and automated so that we can focus on the highest value knowledge work rather than worrying about the minutia of making sure I save the document into the right spot.
John Medcof: Okay. That is great. And that really speaks to me as an individual user. Thank you. Another question that's come in from one of our viewers today is they're curious to hear more about the record-keeping features. Where is document disposition performed? Does it remain in the repository level or can Shinydocs centralize it? And does this help with the transfer of documents to archives?
Jason Cassidy: Absolutely. And that is actually the pedigree where we came from. It is an are we were the OpenText Technology Partner of the year for three out of four years for our first four years of existence. And it was because we complimented that process. And the reason that we got away from it is just because the volume was too small. We found that only 1% or half a percent of content was in things like GCdocs. We're like, no, we want to hit bigger. We want to hit everybody.
Which does kind of beg these exact questions, is records are everywhere. How do you find those records everywhere? And you need something like this analytics. I'll give a very specific example. Let's say that you have a control document that says everybody must use this newest version. This isn't in important record. It is in GCdocs.
How do you make sure that Norm's not using the version that he keeps in his email? How do you make sure that somebody's not using the file share? While our analytics tool through simple things like duplication just goes through and they find every copy of it everywhere in your entire network. So, that then when somebody searches, we can put a big red flag behind all the deprecated versions and say, only use this record.
We don't care necessarily if it's in GCdocs. We can help you get it there. And we have migration tools and all those types of things. So, it's kind of two parts. Making sure that people do the right decisions with their records. And then as Norm showed, he did briefly show a dashboard of, here's all this stuff that is right for disposition based upon its temporal information and classification. Here it is. That's cool.
And sometimes things like GCdocs has built-in record keeping. It'll just do it perfectly. We're not worried about that. SharePoint, you can add additive solutions for it. We're not worried about that. What we are worried about is making sure that our system then on file shares or email is giving the document the right treatment. And we have done lots of work there. And happy to talk to public servants about how we can help them with the other 90% of the data that's just out there.
John Medcof: Yeah. Thank you. Norm's example, really again, hit home for me with, I feel there is probably some redundancy of documentation between some of my spaces and I probably wouldn't even know which one is the one I should be using. So, appreciate that. We've got a couple more questions that have come in. We're running out of time, but a couple really good ones.
To Jason, you mentioned in step two that data governance is required. And no matter the tool used, this is one of the first issues faced by records managers and other information data management specialists. And without agreed-upon governance, any tool will fail. So, how can Shinydocs, both the tool and the team help us in automating governance?
Jason Cassidy: Governance itself has, like if I go... Myself, I'm part of the working group with Standards Council of Canada. I know a lot about records management. I'm part of that with Standards Council. I'm also part of the Artificial Intelligence Working Group as well, for Standards Council. So, I get that there is a rigorous way of doing business for records keeping. And then it elevates to a bigger umbrella of information governance.
And I think that you, as a records professional, not you, John, specifically, but the people asking the questions are experts. Like if you go and start at ISO 15489, the records management base specification, and then work your way up from that, everything's there, from automated metadata and everything else.
What we believe is missing, what we're certain is missing now... This was an experiment three years ago. The experiment's over. What we're certain is missing is how do you bring a petabyte of data to that party? How do you bring 10 petabytes of data to that party? There's no standardization for that.
And that's where Shinydocs comes in and saying, once what a document is using either our classification engine or using the automated analytics that Norm showing, once you know its classification, then you can take over as the data steward or the records manager. You got that. And we don't want to get in your way.
Your issue, as far as we can tell, is the fact that you're having a hard time finding this information across all these systems. So, what you do is you say, John, stop doing what you're doing. You need to put it in my place before I can get what I need. And sorry, John's not going to do that for you. So, we need to bring that data to the party. And that's the part that we do.
And I agree our tool requires just as much information governance, and I guess you'll have to take my word forward with seven minutes left, that we have a whole bunch of things built in for that. But we're happy to talk to people individually about how that works.
John Medcof: Okay, great. Thank you, Jason. We're down to-
Jason Cassidy: Sorry if we're going under the bus.
John Medcof: No, that's... It's not entirely off the mark. So, I'm happy to be our example person here. But look, we do have just a few minutes left. And I want to maybe cast our look forward a bit. And speaking from my own perspective as maybe the average public servant that Sonya mentioned, to me, it just seems that the evolution of enterprise content management tools has been exponential over the last few years.
And applications like what you showed us today bring amazing information processing capabilities with those ease-of-use features I was talking about before many of us, or at least me, we couldn't even imagine this, let's say five years ago. So, looking forward, what do you see as the next area of opportunity for modern content management services?
Jason Cassidy: Yeah. I like that you phrased it that way because to me it's not about a technology. Our technology's going to get better. So, is everybody else's. The thing to me that's always been missing is this unified strategy as to how we are going to go about making public servants successful. How are you going to get the right information at the right time so you're not waiting 60 hours to fill that request?
And those questions, it's more like saying who's good at regional and international transportation. Is it governments? Is it Boeing? Is it a car manufacturer? It's everybody. And there's all sorts of things that need to come together so that we can have a safe way of getting stuff from point A to point B. And the same thing goes for information. It is our software, but it doesn't start there. It starts with the strategy of what do we need out of this stuff? How do we make it safe for everybody? And then how does discrete technologies come in?
So, I think if we're very successful in 10 years from now, people won't be saying words like I have a SharePoint strategy, or I have a Shinydocs strategy. People are going to be saying, I have an information strategy by which I know what people need to do their jobs. And then I'm going to go find technology to support that. And once we get there, when people abandon the idea of technology-first and go to strategy-first, then we win. And I'm going to say we still got a bit of a hangover of technology-first right now.
John Medcof: Wow. What a great point to end on. Thank you very much really, Norm and Jason. I appreciate your both being here with us today. I found the discussion really interesting. I had a little audio issue at one point there that I'll apologize for, but really exciting to see the potential of these applications and these technologies in the government of Canada. And I think your demo really gave us a sense of the power and of the possibility.
And thank you as well to our participants from across the country who joined us for today's virtual event. I hope the demonstration of how leading technologies can optimize our information management practices provide with a useful and inspiring learning opportunity for all of you. And maybe the last thing I'd say is I'd invite anyone who may be interested in learning more about these topics to join the data management community or else to tune in and see what they have coming up terms of future data demo events. So, thank you, Norm. Thank you, Jason. Greatly appreciate your being with us here again. And have a great day.
Jason Cassidy: Thank you so much, John.
Norm Friend: Take care, everyone.
[The video chat fades to CSPS logo.]