Risk assessment of documents with AI - OpenText Magellan Risk Guard

Risk assessment of documents with AI - OpenText Magellan Risk Guard - Logo

Application example of how to create a connection to OpenText's Magellan Risk Guard API.

In a previous blog article we presented how to connect the REST API OpenText RiskGuard with native features of Synesty, without there being a ready-to-use Synesty add-on yet.

Now we have created a new add-on for this connection. In this article we show how to use this add-on, to build an interface with Synesty and how to use it to automatically check documents with Risk Guard.

OpenText Magellan Risk Guard

Thanks to the presence of ChatGPT, AI services are on everyone’s lips right now. The information management provider OpenText uses AI in many of their products.

The product Magellan Risk Guard caught our eye because it helps us with an important topic that we are constantly facing within the e-commerce and IT world: data protection and compliance

Today, the amount of information stored in enterprise systems is increasing exponentially. Unfortunately, this also increases the risks for your company and its reputation due to inappropriate or confidential content and information. A manual classification of this content is time-consuming or even impossible due to the amount of data and is also prone to misjudgments caused by human error.

What is the Magellan Risk Guard interface capable of?

OpenText Magellan Risk Guard uses machine learning to recognize content of documents. This can be used for text recognition and classification, for example to uncover harmful, sensitive and inappropriate texts, images, video and audio files in corporate content. Available as an AI product and an information risk management API service, Magellan Risk Guard enables organizations to act on detected risky content to increase compliance and improve data governance.

The Magellan Risk Guard REST Service is a stateless API that can extract information from documents using Magellan Text Mining. This can be used to e.g.:

  • Detection of PII (Personally Identifiable Information such as name, email), Personal Secure Information (PSI such as medical diagnoses, access data)
  • Classify images to detect threats/risks (violence, alcohol, weapons) or hate speech detection
  • Detect retention periods for specific documents
  • Recognition of data on invoice documents such as country codes, e.g. for tax classification

Who needs it or what is it for?

There are various reasons why companies should or must automatically scan their data for certain content, such as:

  • Owners of essential platforms are obliged to check the content posted by their users (The European Union Digital Services Act (DSA) stipulated that any platform that is used by more than 10 percent of citizens is considered essential.)
  • A company was fined a large sum for a data breach that included log-in and payment information of nearly 400,000 people.
  • A well-known fast-food chain went through a public relations scandal related to nude photos of employees found on the company’s servers. (Source: https://blogs.opentext.com/introducing-opentext-magellan-risk-guard/)

Current case: ChatGPT and privacy

To prevent sensitive data from being inadvertently transmitted to external services, such as ChatGPT, (e.g. if you use our ChatGPT add-on), you could have your respective requests checked by the Magellan Risk Guard API for personal or sensitive data before transmission, and only those Submit data that is considered non-critical.

The banning of the AI-based chatbot ChatGPT by Italy’s data protection authority shows that this is a problem.

Tutorial - Connecting Synesty to Magellan Risk Guard REST API to process documents

In the following section we show step by step how to create a connection with Synesty.

In our example use case we must detect documents containing personal data, that normally should not contain that. An email alert must then be sent to a specific person to draw attention to the document and its contents.

OpenText Developer Backend

First of all, to use Magellan Risk Guard, an OpenText Developer test account must be created. A new tenant and a new app are created in its backend. Later we use their API keys to send requests to the Magellan Risk Guard REST API.

A detailed instruction about this can be found in our documentation.

Set up in Synesty Studio

Under my connection a new OpenText - Risk Guard account must be created, to establish the connection with the OpenText API.

A detailed instruction about this can be found in our documentation.

Content analysis via the Risk Guard API

To check a document for risks by the Risk Guard, we create a Flow.

The Flow initially consists of the steps URLDownload, OpenText-RiskGuard-Check-Documents, JSONReaderVisual and Mapper.

With the URLDownload (1) an example-document gets downloaded.

The Step OpenText-RiskGuard-Check-Documents (2) then uploads the file via Risk Guards API and returns us the response.

Magellan Risk Guard - Risk assessment of documents with AI Picture3

Magellan Risk Guard - Risk assessment of documents with AI Picture4

With a click on the preview button, it shows us a spreadsheet of the response, which shows what categories of records were found.

In our case, Magellan Risk Guard found financial and tax information in the category personal information.

Magellan Risk Guard - Risk assessment of documents with AI Picture5

Magellan Risk Guard - Risk assessment of documents with AI Picture6

The details column contains a JSON string, where every dataset that was found is listed:

Magellan Risk Guard - Risk assessment of documents with AI Picture7

Now we parse the details column with the JSONReaderVisual Step (3).

Magellan Risk Guard - Risk assessment of documents with AI Picture8

The result is a spreadsheet of those found in the document:

  • Risk categories such as violence, alcohol, guns, pornography and more
  • Personally Identifiable Information (PII) such as name, credit card numbers, or insurance numbers
  • Sensitive Personal Information (SPII) such as Performance data, financial/tax data or political opinions

Now with the Mapper Step, the spreadsheet can then be edited according to your own wishes.

To improve the overview, the column titles can be shortened/simplified:

Magellan Risk Guard - Risk assessment of documents with AI Picture9

Magellan Risk Guard - Risk assessment of documents with AI Picture10

With a Mappingset you can make the CartridgeIDs (entity type of the data record) readable.

Magellan Risk Guard - Risk assessment of documents with AI Picture11

The most important columns and their meaning:

  • ConfidenceScore = How confident Risk Guard is that it has assigned the data record to the correct entity type.
  • RelevancyScore = How important the entity or classification is to a file, e.g. B. If a “phone number” is found three times in a document, it is more relevant than a number found only once.
  • Frequency = How many occurrences of a single record were found?

Columns in which the values ​​of the data records are put out:

  • Subterm.value = Value of extracted data record

    • In this column, a value was returned for each record during our testing.
  • nfinderNormalized = Value of extracted data record

    • This column returned a value for most records during our testing, but some cells were missing values.
  • ClientNormalized = Value of extracted data record

    • Only the values ​​of the Social Security Numbers were output in this column. The remaining cells remained empty.

Values ​​are only output sporadically in the remaining columns:

  • MainTermValue = Value of extracted data record

  • Subterm.value_1 = Value of extracted data record

  • Subterm.value_2 = Value of extracted data record

Overview of the spreadsheet:

Magellan Risk Guard - Risk assessment of documents with AI Picture12

What can we do with the obtained data?

The raw data of the API can now be processed a little further. For this we will edit the spreadsheet in such a way (remove columns, group by CartridgeIDs) that we can see how often which records were found in our document:

Magellan Risk Guard - Risk assessment of documents with AI Picture13

We are now expanding the flow, to send an alarm email if personal data is found in a document.

If no personal data was found, the flow can be stopped at this point. You can do this with the StopFlowIf step.

Magellan Risk Guard - Risk assessment of documents with AI Picture14

Otherwise it should be continued.

Now the relevant data records are categorized with several filter steps:

Magellan Risk Guard - Risk assessment of documents with AI Picture15

Filter 1: Filter - Personally Identifiable Information

filter condition:

Magellan Risk Guard - Risk assessment of documents with AI Picture16

(For data records which are only considered personal in combination with others, conditions with AND can also be created here.)

Filter 2: Filter - Personally Identifiable Financial information

filter condition:

Magellan Risk Guard - Risk assessment of documents with AI Picture17

Filter 3: Filter - Access Data

Magellan Risk Guard - Risk assessment of documents with AI Picture18

After filtering, the alarm email can now be generated and sent. To do this, we use the EmailSend step.

We choose a recipient and enter the subject.

Magellan Risk Guard - Risk assessment of documents with AI Picture19

In the Message field we write our message and list the data records found with their frequency according to our categories. The email message can be created using Freemarker scripting. It contains the results of the previous filter steps.

Magellan Risk Guard - Risk assessment of documents with AI Picture19

Result

This is what it looks like when we click on “Step-Preview”:

Magellan Risk Guard - Risk assessment of documents with AI Picture21

…and this is how the email looks after the flow execution:

Magellan Risk Guard - Risk assessment of documents with AI Picture21

Conclusion

Due to the constantly growing amount of data and content, the risks for companies are increasing. A manual evaluation of content is time-consuming due to the amount of data and prone to misjudgments caused by human error. Suitable software tools can be used to identify and react to these risks.

With our test example, we have shown how the OpenText Magellan Risk Guard API can be connected in a short time in order to automatically analyze your own content.

If you want to test this yourself now, take a look at our flow template.

You can find out more about the possibilities of Magellan Risk Guard in this video:

Das Video wird bei Klick von Youtube geladen und abgespielt. Dabei stellt Ihr Browser eine Verbindung zu den Youtube-Servern her. Es gelten die Datenschutzhinweise von Google / Youtube.

Feel free to contact us if you have any further questions.

Further information

Our whitepaper for Makers: No Code Integration & Automation

#ai#tutorial#OpenText-Risk-Guard

Related articles


Last updated 2023-06-12