Risk assessment of documents with AI - OpenText Magellan Risk Guard

Application example of how to create a connection to OpenText's Magellan Risk Guard API.
In a previous blog article we presented how to connect the REST API OpenText RiskGuard with native features of Synesty, without there being a ready-to-use Synesty add-on yet.
Now we have created a new add-on for this connection. In this article we show how to use this add-on, to build an interface with Synesty and how to use it to automatically check documents with Risk Guard.
OpenText Magellan Risk Guard
Thanks to the presence of ChatGPT, AI services are on everyone’s lips right now. The information management provider OpenText uses AI in many of their products.
The product Magellan Risk Guard caught our eye because it helps us with an important topic that we are constantly facing within the e-commerce and IT world: data protection and compliance
Today, the amount of information stored in enterprise systems is increasing exponentially. Unfortunately, this also increases the risks for your company and its reputation due to inappropriate or confidential content and information. A manual classification of this content is time-consuming or even impossible due to the amount of data and is also prone to misjudgments caused by human error.
What is the Magellan Risk Guard interface capable of?
OpenText Magellan Risk Guard uses machine learning to recognize content of documents. This can be used for text recognition and classification, for example to uncover harmful, sensitive and inappropriate texts, images, video and audio files in corporate content. Available as an AI product and an information risk management API service, Magellan Risk Guard enables organizations to act on detected risky content to increase compliance and improve data governance.
The Magellan Risk Guard REST Service is a stateless API that can extract information from documents using Magellan Text Mining. This can be used to e.g.:
- Detection of PII (Personally Identifiable Information such as name, email), Personal Secure Information (PSI such as medical diagnoses, access data)
- Classify images to detect threats/risks (violence, alcohol, weapons) or hate speech detection
- Detect retention periods for specific documents
- Recognition of data on invoice documents such as country codes, e.g. for tax classification
Who needs it or what is it for?
There are various reasons why companies should or must automatically scan their data for certain content, such as:
- Owners of essential platforms are obliged to check the content posted by their users (The European Union Digital Services Act (DSA) stipulated that any platform that is used by more than 10 percent of citizens is considered essential.)
- A company was fined a large sum for a data breach that included log-in and payment information of nearly 400,000 people.
- A well-known fast-food chain went through a public relations scandal related to nude photos of employees found on the company’s servers. (Source: https://blogs.opentext.com/introducing-opentext-magellan-risk-guard/)
Current case: ChatGPT and privacy
To prevent sensitive data from being inadvertently transmitted to external services, such as ChatGPT, (e.g. if you use our ChatGPT add-on), you could have your respective requests checked by the Magellan Risk Guard API for personal or sensitive data before transmission, and only those Submit data that is considered non-critical.
The banning of the AI-based chatbot ChatGPT by Italy’s data protection authority shows that this is a problem.
Tutorial - Connecting Synesty to Magellan Risk Guard REST API to process documents
In the following section we show step by step how to create a connection with Synesty.
In our example use case we must detect documents containing personal data, that normally should not contain that. An email alert must then be sent to a specific person to draw attention to the document and its contents.
OpenText Developer Backend
First of all, to use Magellan Risk Guard, an OpenText Developer test account must be created. A new tenant and a new app are created in its backend. Later we use their API keys to send requests to the Magellan Risk Guard REST API.
A detailed instruction about this can be found in our documentation.
Set up in Synesty Studio
Under my connection a new OpenText - Risk Guard account must be created, to establish the connection with the OpenText API.
A detailed instruction about this can be found in our documentation.
Content analysis via the Risk Guard API
To check a document for risks by the Risk Guard, we create a Flow.
The Flow initially consists of the steps URLDownload, OpenText-RiskGuard-Check-Documents, JSONReaderVisual and Mapper.
With the URLDownload (1) an example-document gets downloaded.
The Step OpenText-RiskGuard-Check-Documents (2) then uploads the file via Risk Guards API and returns us the response.
With a click on the preview
button, it shows us a spreadsheet of the response, which shows what categories of records were found.
In our case, Magellan Risk Guard found financial and tax information
in the category personal information
.
The details column contains a JSON string, where every dataset that was found is listed:
Now we parse the details column with the JSONReaderVisual Step (3).
The result is a spreadsheet of those found in the document:
- Risk categories such as violence, alcohol, guns, pornography and more
- Personally Identifiable Information (PII) such as name, credit card numbers, or insurance numbers
- Sensitive Personal Information (SPII) such as Performance data, financial/tax data or political opinions
Now with the Mapper Step, the spreadsheet can then be edited according to your own wishes.
To improve the overview, the column titles can be shortened/simplified:
With a Mappingset you can make the CartridgeIDs
(entity type of the data record) readable.
The most important columns and their meaning:
ConfidenceScore
= How confident Risk Guard is that it has assigned the data record to the correct entity type.RelevancyScore
= How important the entity or classification is to a file, e.g. B. If a “phone number” is found three times in a document, it is more relevant than a number found only once.Frequency
= How many occurrences of a single record were found?
Columns in which the values of the data records are put out:
-
Subterm.value
= Value of extracted data record- In this column, a value was returned for each record during our testing.
-
nfinderNormalized
= Value of extracted data record- This column returned a value for most records during our testing, but some cells were missing values.
-
ClientNormalized
= Value of extracted data record- Only the values of the Social Security Numbers were output in this column. The remaining cells remained empty.
Values are only output sporadically in the remaining columns:
-
MainTermValue
= Value of extracted data record -
Subterm.value_1
= Value of extracted data record -
Subterm.value_2
= Value of extracted data record
Overview of the spreadsheet:
What can we do with the obtained data?
The raw data of the API can now be processed a little further. For this we will edit the spreadsheet in such a way (remove columns, group by CartridgeIDs) that we can see how often which records were found in our document:
We are now expanding the flow, to send an alarm email if personal data is found in a document.
If no personal data was found, the flow can be stopped at this point. You can do this with the StopFlowIf step.
Otherwise it should be continued.
Now the relevant data records are categorized with several filter steps:
Filter 1: Filter - Personally Identifiable Information
filter condition:
(For data records which are only considered personal in combination with others, conditions with AND can also be created here.)
Filter 2: Filter - Personally Identifiable Financial information
filter condition:
Filter 3: Filter - Access Data
After filtering, the alarm email can now be generated and sent. To do this, we use the EmailSend step.
We choose a recipient and enter the subject.
In the Message field we write our message and list the data records found with their frequency according to our categories. The email message can be created using Freemarker scripting. It contains the results of the previous filter steps.
Result
This is what it looks like when we click on “Step-Preview”:
…and this is how the email looks after the flow execution:
Conclusion
Due to the constantly growing amount of data and content, the risks for companies are increasing. A manual evaluation of content is time-consuming due to the amount of data and prone to misjudgments caused by human error. Suitable software tools can be used to identify and react to these risks.
With our test example, we have shown how the OpenText Magellan Risk Guard API can be connected in a short time in order to automatically analyze your own content.
If you want to test this yourself now, take a look at our flow template.
You can find out more about the possibilities of Magellan Risk Guard in this video:
Feel free to contact us if you have any further questions.
Further information
- Flow template to try out the example. All you need is a free test account
- Synesty OpenText-RiskGuard addon documentation
- Magellan Risk Guard API documentation
- Introducing OpenText Magellan Risk Guard
Our whitepaper for Makers: No Code Integration & Automation
Related articles
Last updated 2023-06-12