Risk assessment of documents with AI - OpenText Magellan Risk Guard

Application example of how to create a connection to OpenText's Magellan Risk Guard API.

In our tutorials we regularly present application examples, APIs and connectors that can be connected with native features of Synesty, without there being a ready-to-use Synesty add-on yet.

In this article we take a look at the REST API Magellan Risk Guard by OpenText. We show how to build an interface with Synesty to use it to automatically check documents.

OpenText Magellan Risk Guard

Thanks to the presence of ChatGPT, AI services are on everyone's lips right now. The information management provider OpenText uses AI in many of their products.

The product Magellan Risk Guard caught our eye because it helps us with an important topic that we are constantly facing within the e-commerce and IT world: data protection and compliance

Today, the amount of information stored in enterprise systems is increasing exponentially. Unfortunately, this also increases the risks for your company and its reputation due to inappropriate or confidential content and information. A manual classification of this content is time-consuming or even impossible due to the amount of data and is also prone to misjudgments caused by human error.

What is the Magellan Risk Guard interface capable of?

OpenText Magellan Risk Guard uses machine learning to recognize content of documents. This can be used for text recognition and classification, for example to uncover harmful, sensitive and inappropriate texts, images, video and audio files in corporate content. Available as an AI product and an information risk management API service, Magellan Risk Guard enables organizations to act on detected risky content to increase compliance and improve data governance.

The Magellan Risk Guard REST Service is a stateless API that can extract information from documents using Magellan Text Mining. This can be used to e.g.:

Detection of PII (Personally Identifiable Information such as name, email), Personal Secure Information (PSI such as medical diagnoses, access data)
Classify images to detect threats/risks (violence, alcohol, weapons) or hate speech detection
Detect retention periods for specific documents
Recognition of data on invoice documents such as country codes, e.g. for tax classification

Who needs it or what is it for?

There are various reasons why companies should or must automatically scan their data for certain content, such as:

Owners of essential platforms are obliged to check the content posted by their users (The European Union Digital Services Act (DSA) stipulated that any platform that is used by more than 10 percent of citizens is considered essential.)
A company was fined a large sum for a data breach that included log-in and payment information of nearly 400,000 people.
A well-known fast-food chain went through a public relations scandal related to nude photos of employees found on the company's servers. (Source: https://blogs.opentext.com/introducing-opentext-magellan-risk-guard/)

Current case: ChatGPT and privacy

To prevent sensitive data from being inadvertently transmitted to external services, such as ChatGPT, (e.g. if you use our ChatGPT add-on), you could have your respective requests checked by the Magellan Risk Guard API for personal or sensitive data before transmission, and only those Submit data that is considered non-critical.

The banning of the AI-based chatbot ChatGPT by Italy's data protection authority shows that this is a problem.

Tutorial - Connecting Synesty to Magellan Risk Guard REST API to process documents

In the following section we show step by step how to create a connection with Synesty.

In our example use case we must detect documents containing personal data, that normally should not contain that. An email alert must then be sent to a specific person to draw attention to the document and its contents.

OpenText Developer Backend

First of all, to use Magellan Risk Guard, an OpenText Developer test account must be created. A new tenant and a new app are created in its backend. Later we use their API keys to send requests to the Magellan Risk Guard REST API.

Set up API access

The Risk Guard API uses the OAuth2 ClientCredentials Authentication, which can be configured in Synesty as an HTTP account. To do this, you create an HTTP account with the following data:

Type: OAuth 2.0
baseURL: https://na-1-dev.api.opentext.com/mtm-riskguard/api/v1/process
Granttype: Client Credentials
ClientID: (located in ot2_client_details_{YOUR APP NAME}_confidential.json which was downloaded at the end of the app creation)
Client Secret: (located in ot2_client_details_{YOUR APP NAME}_confidential.json which was downloaded at the end of the app creation)
Token URL: https://na-1-dev.api.opentext.com/tenants/{MY TENANT ID}/oauth2/token
- The tenant ID is located in the ot2_client_details_{YOUR APP NAME}_confidential.json which was downloaded at the end of the app creation.
Header Prefix: Bearer

A click on "Start Configuration" then fills the access token that is needed for the API calls.

Content analysis via the Risk Guard API

To check a document for risks by the Risk Guard, we create a Flow.

The Flow initially consists of the steps URLDownload, JSONReaderVisual and Mapper.

With the URLDownload (1) an example-document is uploaded via the API. The URL download response is a .JSON file with the found data records.

The JSONReaderVisual (2) then reads the file and extracts the data fields we need.

Categories of records found:

Every single record found:

The result is a spreadsheet of those found in the document:

Risk categories such as violence, alcohol, guns, pornography and more
Personally Identifiable Information (PII) such as name, credit card numbers, or insurance numbers
Sensitive Personal Information (SPII) such as Performance data, financial/tax data or political opinions

With the Mapper Step, the spreadsheet can then be edited according to your own wishes.

To improve the overview, the column titles can be shortened/simplified:

With a Mappingset you can make the CartridgeIDs (entity type of the data record) readable.

The most important columns and their meaning:

ConfidenceScore = How confident Risk Guard is that it has assigned the data record to the correct entity type.
RelevancyScore = How important the entity or classification is to a file, e.g. B. If a "phone number" is found three times in a document, it is more relevant than a number found only once.
Frequency = How many occurrences of a single record were found?

Columns in which the values of the data records are put out:

Subterm.value = Value of extracted data record
- In this column, a value was returned for each record during our testing.
nfinderNormalized = Value of extracted data record
- This column returned a value for most records during our testing, but some cells were missing values.
ClientNormalized = Value of extracted data record
- Only the values of the Social Security Numbers were output in this column. The remaining cells remained empty.

Values are only output sporadically in the remaining columns:

MainTermValue = Value of extracted data record
Subterm.value_1 = Value of extracted data record
Subterm.value_2 = Value of extracted data record

Overview of the spreadsheet:

What can we do with the obtained data?

The raw data of the API can now be processed a little further. For this we will edit the spreadsheet in such a way (remove columns, group by CartridgeIDs) that we can see how often which records were found in our document:

We are now expanding the flow, to send an alarm e-mail if personal data is found in a document.

If no personal data was found, the flow can be stopped at this point. You can do this with the StopFlowIf step .

Otherwise it should be continued.

Now the relevant data records are categorized with several filter steps:

Filter 1: Filter - Personally Identifiable Information

filter condition:

(For data records which are only considered personal in combination with others, conditions with AND can also be created here.)

Filter 2: Filter - Personally Identifiable Financial information

filter condition:

Filter 3: Filter - Access Data

After filtering, the alarm email can now be generated and sent. To do this, we use the EmailSend step.

We choose a recipient and enter the subject.

In the Message field we write our message and list the data records found with their frequency according to our categories. The email message can be created using Freemarker scripting. It contains the results of the previous filter steps.

Result

This is what it looks like when we click on "Step-Preview":

...and this is how the e-mail looks after the flow execution:

Conclusion

Due to the constantly growing amount of data and content, the risks for companies are increasing. A manual evaluation of content is time-consuming due to the amount of data and prone to misjudgments caused by human error. Suitable software tools can be used to identify and react to these risks.

With our test example, we have shown how the OpenText Magellan Risk Guard API can be connected in a short time in order to automatically analyze your own content.

If you want to test this yourself now, take a look at our flow template.

You can find out more about the possibilities of Magellan Risk Guard in this video:

Das Video wird bei Klick von Youtube geladen und abgespielt. Dabei stellt Ihr Browser eine Verbindung zu den Youtube-Servern her. Es gelten die Datenschutzhinweise von Google / Youtube

Feel free to contact us if you have any further questions.