Cloud Data Loss Prevention (Cloud DLP) ora fa parte di Sensitive Data Protection. Il nome dell'API rimane invariato: API Cloud Data Loss Prevention (API DLP). Per informazioni sui servizi che compongono Sensitive Data Protection, consulta la panoramica di Sensitive Data Protection.

Questa pagina è stata tradotta dall'API Cloud Translation.

Controlla la presenza di dati sensibili nello spazio di archiviazione e nei database di Google Cloud

La corretta gestione dei dati sensibili archiviati in un repository di archiviazione inizia con la classificazione dell'archiviazione: identificare la posizione dei dati sensibili nel repository, il tipo di dati sensibili e il modo in cui vengono utilizzati. Queste informazioni possono aiutarti a impostare correttamente le autorizzazioni di condivisione econtrollo dell'accessol'accesso e possono far parte di un piano di monitoraggio continuo.

Sensitive Data Protection può rilevare e classificare i dati sensibili archiviati in una posizione Cloud Storage, in un tipo Datastore o in una tabella BigQuery. Durante la scansione dei file nelle posizioni di Cloud Storage, Sensitive Data Protection supporta la scansione di file binari, di testo, immagine, Microsoft Word, Microsoft Excel, Microsoft PowerPoint, PDF e Apache Avro. I file di tipi non riconosciuti vengono analizzati come file binari. Per ulteriori informazioni sui tipi di file supportati, vedi Tipi di file supportati.

Per ispezionare lo spazio di archiviazione e i database per l'individuazione di dati sensibili, devi specificare la posizione dei dati e il tipo di dati sensibili che Sensitive Data Protection deve cercare. Sensitive Data Protection avvia un job che esamina i dati nella posizione specificata e poi rende disponibili i dettagli sugli infoTypes trovati nei contenuti, sui valori di probabilità e altro ancora.

Puoi configurare l'ispezione di spazio di archiviazione e database utilizzando Sensitive Data Protection nella console Google Cloud , tramite l'API DLP RESTful o in modo programmatico utilizzando una libreria client Sensitive Data Protection in una delle diverse lingue.

Questo argomento include:

Best practice per la configurazione delle scansioni di Google Cloud repository di archiviazione e database.
Istruzioni per configurare una scansione di ispezione utilizzando Sensitive Data Protection nella console Google Cloud e (facoltativamente) per pianificare scansioni di ispezione periodiche.
Esempi di codice e JSON per ogni tipo di repository di archiviazione: (Cloud Storage, Firestore in modalità Datastore (Datastore) e BigQuery). Google Cloud
Una panoramica dettagliata delle opzioni di configurazione per i job di scansione.
Istruzioni su come recuperare i risultati della scansione e come gestire i job di scansione creati da ogni richiesta riuscita.

Best practice

Identificare e dare la priorità alla scansione

È importante valutare prima le risorse e specificare quelle con la priorità più alta per la scansione. Quando inizi, potresti avere un backlog di dati da classificare e sarà impossibile analizzarli tutti immediatamente. Scegli inizialmente i dati che presentano il rischio potenziale più elevato, ad esempio i dati a cui si accede di frequente, ampiamente accessibili o sconosciuti.

Assicurarsi che Sensitive Data Protection possa accedere ai tuoi dati

Sensitive Data Protection deve essere in grado di accedere ai dati da scansionare. Assicurati che l'account di servizio di Sensitive Data Protection possa leggere le tue risorse.

Limitare l'ambito delle prime scansioni

Per risultati ottimali, limita l'ambito dei primi job anziché scansionare tutti i dati. Inizia con una tabella, un bucket o alcuni file e utilizza il campionamento. Limitando l'ambito delle prime scansioni, puoi determinare meglio quali rilevatori attivare e quali regole di esclusione potrebbero essere necessarie per ridurre i falsi positivi, in modo che i risultati siano più significativi. Evita di attivare tutti i tipi di informazioni se non ti servono tutti, in quanto i falsi positivi o i risultati inutilizzabili potrebbero rendere più difficile la valutazione del rischio. Sebbene utili in determinati scenari, gli InfoType come DATE, TIME, DOMAIN_NAME e URL corrispondono a un'ampia gamma di risultati e potrebbero non essere utili per attivare scansioni di grandi quantità di dati.

Quando campioni un file strutturato, ad esempio un file CSV, TSV o Avro, assicurati che la dimensione del campione sia sufficiente a coprire l'intera intestazione del file e una riga di dati. Per maggiori informazioni, consulta la sezione Scansione di file strutturati in modalità di analisi strutturata.

Pianificare le scansioni

Utilizza i trigger dei job di Sensitive Data Protection per eseguire automaticamente le scansioni e generare risultati giornalieri, settimanali o trimestrali. Queste scansioni possono anche essere configurate per ispezionare solo i dati che sono cambiati dall'ultima scansione, il che può far risparmiare tempo e ridurre i costi. L'esecuzione regolare di scansioni può aiutarti a identificare tendenze o anomalie nei risultati.

Latenza job

Non sono garantiti obiettivi del livello di servizio (SLO) per i job e i trigger dei job. La latenza è influenzata da diversi fattori, tra cui la quantità di dati da analizzare, il repository di archiviazione analizzato, il tipo e il numero di infoType che stai cercando, la regione in cui viene elaborato il job e le risorse di computing disponibili in quella regione. Pertanto, la latenza dei job di ispezione non può essere determinata in anticipo.

Per ridurre la latenza dei job, puoi provare a:

Se il campionamento è disponibile per il tuo job o trigger di job, attivalo.
Evita di attivare i tipi di informazioni che non ti servono. Sebbene i seguenti siano utili in determinati scenari, questi tipi di informazioni possono rallentare notevolmente l'esecuzione delle richieste rispetto a quelle che non li includono:
- PERSON_NAME
- FEMALE_NAME
- MALE_NAME
- FIRST_NAME
- LAST_NAME
- DATE_OF_BIRTH
- LOCATION
- STREET_ADDRESS
- ORGANIZATION_NAME
Specifica sempre gli infoType in modo esplicito. Non utilizzare un elenco infoType vuoto.
Se possibile, utilizza una regione di elaborazione diversa.

Se i problemi di latenza con i job persistono dopo aver provato queste tecniche, valuta la possibilità di utilizzare richieste content.inspect o content.deidentify anziché job. Questi metodi sono coperti dall'accordo sul livello del servizio. Per ulteriori informazioni, consulta l'Accordo sul livello del servizio di Sensitive Data Protection.

Prima di iniziare

Le istruzioni fornite in questo argomento presuppongono quanto segue:

Hai abilitato la fatturazione.

Scopri come attivare la fatturazione
Hai attivato Sensitive Data Protection.

Abilita Sensitive Data Protection

La classificazione dell'archiviazione richiede il seguente ambito OAuth: https://www.googleapis.com/auth/cloud-platform. Per ulteriori informazioni, consulta la pagina Autenticazione nell'API DLP.

Ispeziona una posizione Cloud Storage

Puoi configurare un'ispezione Sensitive Data Protection di una posizione Cloud Storage utilizzando la console Google Cloud , l'API DLP tramite richieste REST o RPC oppure a livello di programmazione in diversi linguaggi utilizzando una libreria client. Per informazioni sui parametri inclusi nei seguenti esempi di codice e JSON, vedi "Configurare l'ispezione dell'archiviazione" più avanti in questo argomento.

Sensitive Data Protection si basa sulle estensioni dei file e sui tipi di media (MIME) per identificare i tipi di file da analizzare e le modalità di scansione da applicare. Ad esempio, la protezione dei dati sensibili analizza un file .txt in modalità testo normale, anche se il file è strutturato come un file CSV, che normalmente viene analizzato in modalità di analisi strutturata.

Per configurare un job di scansione di un bucket Cloud Storage utilizzando Sensitive Data Protection:

Console

Questa sezione descrive come ispezionare un bucket o una cartella Cloud Storage. Se vuoi che Sensitive Data Protection crei anche una copia anonimizzata dei tuoi dati, consulta Anonimizza i dati sensibili archiviati in Cloud Storage utilizzando la console. Google Cloud

Nella sezione Sensitive Data Protection della console Google Cloud , vai alla pagina Crea job o trigger di job.

Vai a Crea job o trigger di job

Nota :puoi avviare una scansione di Sensitive Data Protection anche direttamente dal browser Cloud Storage. Nella colonna più a destra del bucket che vuoi analizzare, fai clic sul menu Altre azioni (visualizzato come tre puntini disposti in verticale) , quindi fai clic su Analizza con Sensitive Data Protection. La pagina di creazione del job di Sensitive Data Protection si apre in una scheda separata.
Inserisci le informazioni sul job Sensitive Data Protection e fai clic su Continua per completare ogni passaggio:
- Per il passaggio 1: scegli i dati di input, assegna un nome al job inserendo un valore nel campo Nome. In Posizione, scegli Cloud Storage dal menu Tipo di archiviazione, poi inserisci la posizione dei dati da scansionare. La sezione Campionamento è preconfigurata per eseguire una scansione di esempio sui tuoi dati. Puoi modificare il campo Percentuale di oggetti scansionati nel bucket per risparmiare risorse se hai una grande quantità di dati. Per maggiori dettagli, consulta Scegliere i dati di input.
- (Facoltativo) Per il passaggio 2: configura il rilevamento, puoi configurare i tipi di dati da cercare, chiamati "infoTypes". Puoi scegliere dall'elenco di infoType predefiniti oppure selezionare un modello, se esistente. Per maggiori dettagli, vedi Configurare il rilevamento.
- (Facoltativo) Per il passaggio 3: aggiungi azioni, assicurati che l'opzione Notifica via email sia attivata.
  
  Attiva Salva in BigQuery per pubblicare i risultati di Sensitive Data Protection in una tabella BigQuery. Fornisci quanto segue:
  - In ID progetto, inserisci l'ID progetto in cui sono archiviati i risultati.
  - In ID set di dati, inserisci il nome del set di dati in cui sono archiviati i risultati.
  - (Facoltativo) In ID tabella, inserisci il nome della tabella che archivia i risultati. Se non viene specificato alcun ID tabella, a una nuova tabella viene assegnato un nome predefinito simile al seguente: dlp_googleapis_[DATE]_1234567890, dove [DATE] rappresenta la data in cui viene eseguita la scansione. Se specifichi una tabella esistente, i risultati vengono aggiunti.
  - (Facoltativo) Attiva l'opzione Includi citazione per includere le stringhe che corrispondono a un rilevatore di infoType. Le citazioni sono potenzialmente sensibili, quindi per impostazione predefinita Sensitive Data Protection non le include nei risultati.
  Quando i dati vengono scritti in una tabella BigQuery, la fatturazione e l'utilizzo delle quote vengono applicati al progetto che contiene la tabella di destinazione.
  
  Se vuoi creare una copia anonimizzata dei tuoi dati, attiva l'opzione Crea una copia anonimizzata. Per ulteriori informazioni, consulta Anonimizzare i dati sensibili archiviati in Cloud Storage utilizzando la consoleGoogle Cloud .
  
  Puoi anche salvare i risultati in Pub/Sub, Security Command Center, Data Catalog e Cloud Monitoring. Per maggiori dettagli, vedi Aggiungere azioni.
- (Facoltativo) Per il passaggio 4: pianifica, per eseguire la scansione una sola volta, lascia il menu impostato su Nessuno. Per programmare l'esecuzione periodica delle scansioni, fai clic su Crea un trigger per eseguire il job su base periodica. Per maggiori dettagli, vedi Pianificazione.
Fai clic su Crea.
Al termine del job Sensitive Data Protection, viene visualizzata la pagina dei dettagli del job e riceverai una notifica via email. Puoi visualizzare i risultati dell'ispezione nella pagina dei dettagli del job.
(Facoltativo) Se hai scelto di pubblicare i risultati di Sensitive Data Protection in BigQuery, nella pagina Dettagli job, fai clic su Visualizza risultati in BigQuery per aprire la tabella nell'interfaccia utente web di BigQuery. Puoi quindi eseguire query sulla tabella e analizzare i risultati. Per ulteriori informazioni sull'esecuzione di query sui risultati in BigQuery, consulta Esecuzione di query sui risultati di Sensitive Data Protection in BigQuery.

Protocollo

Di seguito è riportato un esempio di JSON che può essere inviato in una richiesta POST all'endpoint REST Sensitive Data Protection specificato. Questo esempio JSON mostra come utilizzare l'API DLP per ispezionare i bucket Cloud Storage. Per informazioni sui parametri inclusi nella richiesta, vedi "Configurare l'ispezione dell'archiviazione" più avanti in questo argomento.

Puoi provare rapidamente questa funzionalità in Explorer API nella pagina di riferimento per content.inspect:

Vai a Explorer API

Tieni presente che una richiesta riuscita, anche in Explorer API, creerà un nuovo job di scansione. Per informazioni su come controllare i lavori di scansione, vedi "Recuperare i risultati dell'ispezione", più avanti in questo argomento. Per informazioni generali sull'utilizzo di JSON per inviare richieste all&#39API DLPLP, consulta la guida rapida JSON.

Input JSON:

POST https://dlp.googleapis.com/v2/projects/[PROJECT-ID]/dlpJobs?key={YOUR_API_KEY}  {   "inspectJob":{     "storageConfig":{       "cloudStorageOptions":{         "fileSet":{           "url":"gs://[BUCKET-NAME]/*"         },         "bytesLimitPerFile":"1073741824"       },       "timespanConfig":{         "startTime":"2017-11-13T12:34:29.965633345Z",         "endTime":"2018-01-05T04:45:04.240912125Z"       }     },     "inspectConfig":{       "infoTypes":[         {           "name":"PHONE_NUMBER"         }       ],       "excludeInfoTypes":false,       "includeQuote":true,       "minLikelihood":"LIKELY"     },     "actions":[       {         "saveFindings":{           "outputConfig":{             "table":{               "projectId":"[PROJECT-ID]",               "datasetId":"[DATASET-ID]"             }           }         }       }     ]   } }

Output JSON:

{   "name":"projects/[PROJECT-ID]/dlpJobs/[JOB-ID]",   "type":"INSPECT_JOB",   "state":"PENDING",   "inspectDetails":{     "requestedOptions":{       "snapshotInspectTemplate":{        },       "jobConfig":{         "storageConfig":{           "cloudStorageOptions":{             "fileSet":{               "url":"gs://[BUCKET-NAME]/*"             },             "bytesLimitPerFile":"1073741824"           },           "timespanConfig":{             "startTime":"2017-11-13T12:34:29.965633345Z",             "endTime":"2018-01-05T04:45:04.240912125Z"           }         },         "inspectConfig":{           "infoTypes":[             {               "name":"PHONE_NUMBER"             }           ],           "minLikelihood":"LIKELY",           "limits":{            },           "includeQuote":true         },         "actions":[           {             "saveFindings":{               "outputConfig":{                 "table":{                   "projectId":"[PROJECT-ID]",                   "datasetId":"[DATASET-ID]",                   "tableId":"[NEW-TABLE-ID]"                 }               }             }           }         ]       }     }   },   "createTime":"2018-11-07T18:01:14.225Z" }

Java

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

Per eseguire l'autenticazione in Sensitive Data Protection, configura le Credenziali predefinite dell'applicazione. Per ulteriori informazioni, consulta Configura l'autenticazione per un ambiente di sviluppo locale.

 import com.google.api.core.SettableApiFuture; import com.google.cloud.dlp.v2.DlpServiceClient; import com.google.cloud.pubsub.v1.AckReplyConsumer; import com.google.cloud.pubsub.v1.MessageReceiver; import com.google.cloud.pubsub.v1.Subscriber; import com.google.privacy.dlp.v2.Action; import com.google.privacy.dlp.v2.CloudStorageOptions; import com.google.privacy.dlp.v2.CloudStorageOptions.FileSet; import com.google.privacy.dlp.v2.CreateDlpJobRequest; import com.google.privacy.dlp.v2.DlpJob; import com.google.privacy.dlp.v2.GetDlpJobRequest; import com.google.privacy.dlp.v2.InfoType; import com.google.privacy.dlp.v2.InfoTypeStats; import com.google.privacy.dlp.v2.InspectConfig; import com.google.privacy.dlp.v2.InspectDataSourceDetails; import com.google.privacy.dlp.v2.InspectJobConfig; import com.google.privacy.dlp.v2.LocationName; import com.google.privacy.dlp.v2.StorageConfig; import com.google.pubsub.v1.ProjectSubscriptionName; import com.google.pubsub.v1.PubsubMessage; import java.io.IOException; import java.util.List; import java.util.concurrent.ExecutionException; import java.util.concurrent.TimeUnit; import java.util.concurrent.TimeoutException; import java.util.stream.Collectors; import java.util.stream.Stream;  public class InspectGcsFile {    public static void main(String[] args) throws Exception {     // TODO(developer): Replace these variables before running the sample.     String projectId = "your-project-id";     String gcsUri = "gs://" + "your-bucket-name" + "/path/to/your/file.txt";     String topicId = "your-pubsub-topic-id";     String subscriptionId = "your-pubsub-subscription-id";     inspectGcsFile(projectId, gcsUri, topicId, subscriptionId);   }    // Inspects a file in a Google Cloud Storage Bucket.   public static void inspectGcsFile(       String projectId, String gcsUri, String topicId, String subscriptionId)       throws ExecutionException, InterruptedException, IOException {     // Initialize client that will be used to send requests. This client only needs to be created     // once, and can be reused for multiple requests. After completing all of your requests, call     // the "close" method on the client to safely clean up any remaining background resources.     try (DlpServiceClient dlp = DlpServiceClient.create()) {       // Specify the GCS file to be inspected.       CloudStorageOptions cloudStorageOptions =           CloudStorageOptions.newBuilder().setFileSet(FileSet.newBuilder().setUrl(gcsUri)).build();        StorageConfig storageConfig =           StorageConfig.newBuilder().setCloudStorageOptions(cloudStorageOptions).build();        // Specify the type of info the inspection will look for.       // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types       List<InfoType> infoTypes =           Stream.of("PHONE_NUMBER", "EMAIL_ADDRESS", "CREDIT_CARD_NUMBER")               .map(it -> InfoType.newBuilder().setName(it).build())               .collect(Collectors.toList());        // Specify how the content should be inspected.       InspectConfig inspectConfig =           InspectConfig.newBuilder().addAllInfoTypes(infoTypes).setIncludeQuote(true).build();        // Specify the action that is triggered when the job completes.       String pubSubTopic = String.format("projects/%s/topics/%s", projectId, topicId);       Action.PublishToPubSub publishToPubSub =           Action.PublishToPubSub.newBuilder().setTopic(pubSubTopic).build();       Action action = Action.newBuilder().setPubSub(publishToPubSub).build();        // Configure the long running job we want the service to perform.       InspectJobConfig inspectJobConfig =           InspectJobConfig.newBuilder()               .setStorageConfig(storageConfig)               .setInspectConfig(inspectConfig)               .addActions(action)               .build();        // Create the request for the job configured above.       CreateDlpJobRequest createDlpJobRequest =           CreateDlpJobRequest.newBuilder()               .setParent(LocationName.of(projectId, "global").toString())               .setInspectJob(inspectJobConfig)               .build();        // Use the client to send the request.       final DlpJob dlpJob = dlp.createDlpJob(createDlpJobRequest);       System.out.println("Job created: " + dlpJob.getName());        // Set up a Pub/Sub subscriber to listen on the job completion status       final SettableApiFuture<Boolean> done = SettableApiFuture.create();        ProjectSubscriptionName subscriptionName =           ProjectSubscriptionName.of(projectId, subscriptionId);        MessageReceiver messageHandler =           (PubsubMessage pubsubMessage, AckReplyConsumer ackReplyConsumer) -> {             handleMessage(dlpJob, done, pubsubMessage, ackReplyConsumer);           };       Subscriber subscriber = Subscriber.newBuilder(subscriptionName, messageHandler).build();       subscriber.startAsync();        // Wait for job completion semi-synchronously       // For long jobs, consider using a truly asynchronous execution model such as Cloud Functions       try {         done.get(15, TimeUnit.MINUTES);       } catch (TimeoutException e) {         System.out.println("Job was not completed after 15 minutes.");         return;       } finally {         subscriber.stopAsync();         subscriber.awaitTerminated();       }        // Get the latest state of the job from the service       GetDlpJobRequest request = GetDlpJobRequest.newBuilder().setName(dlpJob.getName()).build();       DlpJob completedJob = dlp.getDlpJob(request);        // Parse the response and process results.       System.out.println("Job status: " + completedJob.getState());       System.out.println("Job name: " + dlpJob.getName());       InspectDataSourceDetails.Result result = completedJob.getInspectDetails().getResult();       System.out.println("Findings: ");       for (InfoTypeStats infoTypeStat : result.getInfoTypeStatsList()) {         System.out.print("\tInfo type: " + infoTypeStat.getInfoType().getName());         System.out.println("\tCount: " + infoTypeStat.getCount());       }     }   }    // handleMessage injects the job and settableFuture into the message reciever interface   private static void handleMessage(       DlpJob job,       SettableApiFuture<Boolean> done,       PubsubMessage pubsubMessage,       AckReplyConsumer ackReplyConsumer) {     String messageAttribute = pubsubMessage.getAttributesMap().get("DlpJobName");     if (job.getName().equals(messageAttribute)) {       done.set(true);       ackReplyConsumer.ack();     } else {       ackReplyConsumer.nack();     }   } }

Node.js

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

// Import the Google Cloud client libraries const DLP = require('@google-cloud/dlp'); const {PubSub} = require('@google-cloud/pubsub');  // Instantiates clients const dlp = new DLP.DlpServiceClient(); const pubsub = new PubSub();  // The project ID to run the API call under // const projectId = 'my-project';  // The name of the bucket where the file resides. // const bucketName = 'YOUR-BUCKET';  // The path to the file within the bucket to inspect. // Can contain wildcards, e.g. "my-image.*" // const fileName = 'my-image.png';  // The minimum likelihood required before returning a match // const minLikelihood = 'LIKELIHOOD_UNSPECIFIED';  // The maximum number of findings to report per request (0 = server maximum) // const maxFindings = 0;  // The infoTypes of information to match // const infoTypes = [{ name: 'PHONE_NUMBER' }, { name: 'EMAIL_ADDRESS' }, { name: 'CREDIT_CARD_NUMBER' }];  // The customInfoTypes of information to match // const customInfoTypes = [{ infoType: { name: 'DICT_TYPE' }, dictionary: { wordList: { words: ['foo', 'bar', 'baz']}}}, //   { infoType: { name: 'REGEX_TYPE' }, regex: {pattern: '\\(\\d{3}\\) \\d{3}-\\d{4}'}}];  // The name of the Pub/Sub topic to notify once the job completes // TODO(developer): create a Pub/Sub topic to use for this // const topicId = 'MY-PUBSUB-TOPIC'  // The name of the Pub/Sub subscription to use when listening for job // completion notifications // TODO(developer): create a Pub/Sub subscription to use for this // const subscriptionId = 'MY-PUBSUB-SUBSCRIPTION'  async function inspectGCSFile() {   // Get reference to the file to be inspected   const storageItem = {     cloudStorageOptions: {       fileSet: {url: `gs://${bucketName}/${fileName}`},     },   };    // Construct request for creating an inspect job   const request = {     parent: `projects/${projectId}/locations/global`,     inspectJob: {       inspectConfig: {         infoTypes: infoTypes,         customInfoTypes: customInfoTypes,         minLikelihood: minLikelihood,         limits: {           maxFindingsPerRequest: maxFindings,         },       },       storageConfig: storageItem,       actions: [         {           pubSub: {             topic: `projects/${projectId}/topics/${topicId}`,           },         },       ],     },   };    // Create a GCS File inspection job and wait for it to complete   const [topicResponse] = await pubsub.topic(topicId).get();   // Verify the Pub/Sub topic and listen for job notifications via an   // existing subscription.   const subscription = await topicResponse.subscription(subscriptionId);   const [jobsResponse] = await dlp.createDlpJob(request);   // Get the job's ID   const jobName = jobsResponse.name;   // Watch the Pub/Sub topic until the DLP job finishes   await new Promise((resolve, reject) => {     const messageHandler = message => {       if (message.attributes && message.attributes.DlpJobName === jobName) {         message.ack();         subscription.removeListener('message', messageHandler);         subscription.removeListener('error', errorHandler);         resolve(jobName);       } else {         message.nack();       }     };      const errorHandler = err => {       subscription.removeListener('message', messageHandler);       subscription.removeListener('error', errorHandler);       reject(err);     };      subscription.on('message', messageHandler);     subscription.on('error', errorHandler);   });    setTimeout(() => {     console.log('Waiting for DLP job to fully complete');   }, 500);   const [job] = await dlp.getDlpJob({name: jobName});   console.log(`Job ${job.name} status: ${job.state}`);    const infoTypeStats = job.inspectDetails.result.infoTypeStats;   if (infoTypeStats.length > 0) {     infoTypeStats.forEach(infoTypeStat => {       console.log(         `  Found ${infoTypeStat.count} instance(s) of infoType ${infoTypeStat.infoType.name}.`       );     });   } else {     console.log('No findings.');   } } await inspectGCSFile();

Python

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

import threading from typing import List, Optional  import google.cloud.dlp import google.cloud.pubsub   def inspect_gcs_file(     project: str,     bucket: str,     filename: str,     topic_id: str,     subscription_id: str,     info_types: List[str],     custom_dictionaries: List[str] = None,     custom_regexes: List[str] = None,     min_likelihood: Optional[str] = None,     max_findings: Optional[int] = None,     timeout: int = 300, ) -> None:     """Uses the Data Loss Prevention API to analyze a file on GCS.     Args:         project: The Google Cloud project id to use as a parent resource.         bucket: The name of the GCS bucket containing the file, as a string.         filename: The name of the file in the bucket, including the path, as a             string; e.g. 'images/myfile.png'.         topic_id: The id of the Cloud Pub/Sub topic to which the API will             broadcast job completion. The topic must already exist.         subscription_id: The id of the Cloud Pub/Sub subscription to listen on             while waiting for job completion. The subscription must already             exist and be subscribed to the topic.         info_types: A list of strings representing info types to look for.             A full list of info type categories can be fetched from the API.         min_likelihood: A string representing the minimum likelihood threshold             that constitutes a match. One of: 'LIKELIHOOD_UNSPECIFIED',             'VERY_UNLIKELY', 'UNLIKELY', 'POSSIBLE', 'LIKELY', 'VERY_LIKELY'.         max_findings: The maximum number of findings to report; 0 = no maximum.         timeout: The number of seconds to wait for a response from the API.     Returns:         None; the response from the API is printed to the terminal.     """      # Instantiate a client.     dlp = google.cloud.dlp_v2.DlpServiceClient()      # Prepare info_types by converting the list of strings into a list of     # dictionaries (protos are also accepted).     if not info_types:         info_types = ["FIRST_NAME", "LAST_NAME", "EMAIL_ADDRESS"]     info_types = [{"name": info_type} for info_type in info_types]      # Prepare custom_info_types by parsing the dictionary word lists and     # regex patterns.     if custom_dictionaries is None:         custom_dictionaries = []     dictionaries = [         {             "info_type": {"name": f"CUSTOM_DICTIONARY_{i}"},             "dictionary": {"word_list": {"words": custom_dict.split(",")}},         }         for i, custom_dict in enumerate(custom_dictionaries)     ]     if custom_regexes is None:         custom_regexes = []     regexes = [         {             "info_type": {"name": f"CUSTOM_REGEX_{i}"},             "regex": {"pattern": custom_regex},         }         for i, custom_regex in enumerate(custom_regexes)     ]     custom_info_types = dictionaries + regexes      # Construct the configuration dictionary. Keys which are None may     # optionally be omitted entirely.     inspect_config = {         "info_types": info_types,         "custom_info_types": custom_info_types,         "min_likelihood": min_likelihood,         "limits": {"max_findings_per_request": max_findings},     }      # Construct a storage_config containing the file's URL.     url = f"gs://{bucket}/{filename}"     storage_config = {"cloud_storage_options": {"file_set": {"url": url}}}      # Convert the project id into full resource ids.     topic = google.cloud.pubsub.PublisherClient.topic_path(project, topic_id)     parent = f"projects/{project}/locations/global"      # Tell the API where to send a notification when the job is complete.     actions = [{"pub_sub": {"topic": topic}}]      # Construct the inspect_job, which defines the entire inspect content task.     inspect_job = {         "inspect_config": inspect_config,         "storage_config": storage_config,         "actions": actions,     }      operation = dlp.create_dlp_job(         request={"parent": parent, "inspect_job": inspect_job}     )     print(f"Inspection operation started: {operation.name}")      # Create a Pub/Sub client and find the subscription. The subscription is     # expected to already be listening to the topic.     subscriber = google.cloud.pubsub.SubscriberClient()     subscription_path = subscriber.subscription_path(project, subscription_id)      # Set up a callback to acknowledge a message. This closes around an event     # so that it can signal that it is done and the main thread can continue.     job_done = threading.Event()      def callback(message: google.cloud.pubsub_v1.subscriber.message.Message) -> None:         try:             if message.attributes["DlpJobName"] == operation.name:                 # This is the message we're looking for, so acknowledge it.                 message.ack()                  # Now that the job is done, fetch the results and print them.                 job = dlp.get_dlp_job(request={"name": operation.name})                 print(f"Job name: {job.name}")                 if job.inspect_details.result.info_type_stats:                     for finding in job.inspect_details.result.info_type_stats:                         print(                             f"Info type: {finding.info_type.name}; Count: {finding.count}"                         )                 else:                     print("No findings.")                  # Signal to the main thread that we can exit.                 job_done.set()             else:                 # This is not the message we're looking for.                 message.drop()         except Exception as e:             # Because this is executing in a thread, an exception won't be             # noted unless we print it manually.             print(e)             raise      subscriber.subscribe(subscription_path, callback=callback)     finished = job_done.wait(timeout=timeout)     if not finished:         print(             "No event received before the timeout. Please verify that the "             "subscription provided is subscribed to the topic provided."         )

Go

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

import ( 	"context" 	"fmt" 	"io" 	"strings" 	"time"  	dlp "cloud.google.com/go/dlp/apiv2" 	"cloud.google.com/go/dlp/apiv2/dlppb" 	"cloud.google.com/go/pubsub" )  // inspectGCSFile searches for the given info types in the given file. func inspectGCSFile(w io.Writer, projectID string, infoTypeNames []string, customDictionaries []string, customRegexes []string, pubSubTopic, pubSubSub, bucketName, fileName string) error { 	// projectID := "my-project-id" 	// infoTypeNames := []string{"US_SOCIAL_SECURITY_NUMBER"} 	// customDictionaries := []string{...} 	// customRegexes := []string{...} 	// pubSubTopic := "dlp-risk-sample-topic" 	// pubSubSub := "dlp-risk-sample-sub" 	// bucketName := "my-bucket" 	// fileName := "my-file.txt"  	ctx := context.Background() 	client, err := dlp.NewClient(ctx) 	if err != nil { 		return fmt.Errorf("dlp.NewClient: %w", err) 	}  	// Convert the info type strings to a list of InfoTypes. 	var infoTypes []*dlppb.InfoType 	for _, it := range infoTypeNames { 		infoTypes = append(infoTypes, &dlppb.InfoType{Name: it}) 	} 	// Convert the custom dictionary word lists and custom regexes to a list of CustomInfoTypes. 	var customInfoTypes []*dlppb.CustomInfoType 	for idx, it := range customDictionaries { 		customInfoTypes = append(customInfoTypes, &dlppb.CustomInfoType{ 			InfoType: &dlppb.InfoType{ 				Name: fmt.Sprintf("CUSTOM_DICTIONARY_%d", idx), 			}, 			Type: &dlppb.CustomInfoType_Dictionary_{ 				Dictionary: &dlppb.CustomInfoType_Dictionary{ 					Source: &dlppb.CustomInfoType_Dictionary_WordList_{ 						WordList: &dlppb.CustomInfoType_Dictionary_WordList{ 							Words: strings.Split(it, ","), 						}, 					}, 				}, 			}, 		}) 	} 	for idx, it := range customRegexes { 		customInfoTypes = append(customInfoTypes, &dlppb.CustomInfoType{ 			InfoType: &dlppb.InfoType{ 				Name: fmt.Sprintf("CUSTOM_REGEX_%d", idx), 			}, 			Type: &dlppb.CustomInfoType_Regex_{ 				Regex: &dlppb.CustomInfoType_Regex{ 					Pattern: it, 				}, 			}, 		}) 	}  	// Create a PubSub Client used to listen for when the inspect job finishes. 	pubsubClient, err := pubsub.NewClient(ctx, projectID) 	if err != nil { 		return fmt.Errorf("pubsub.NewClient: %w", err) 	} 	defer pubsubClient.Close()  	// Create a PubSub subscription we can use to listen for messages. 	// Create the Topic if it doesn't exist. 	t := pubsubClient.Topic(pubSubTopic) 	if exists, err := t.Exists(ctx); err != nil { 		return fmt.Errorf("t.Exists: %w", err) 	} else if !exists { 		if t, err = pubsubClient.CreateTopic(ctx, pubSubTopic); err != nil { 			return fmt.Errorf("CreateTopic: %w", err) 		} 	}  	// Create the Subscription if it doesn't exist. 	s := pubsubClient.Subscription(pubSubSub) 	if exists, err := s.Exists(ctx); err != nil { 		return fmt.Errorf("s.Exists: %w", err) 	} else if !exists { 		if s, err = pubsubClient.CreateSubscription(ctx, pubSubSub, pubsub.SubscriptionConfig{Topic: t}); err != nil { 			return fmt.Errorf("CreateSubscription: %w", err) 		} 	}  	// topic is the PubSub topic string where messages should be sent. 	topic := "projects/" + projectID + "/topics/" + pubSubTopic  	// Create a configured request. 	req := &dlppb.CreateDlpJobRequest{ 		Parent: fmt.Sprintf("projects/%s/locations/global", projectID), 		Job: &dlppb.CreateDlpJobRequest_InspectJob{ 			InspectJob: &dlppb.InspectJobConfig{ 				// StorageConfig describes where to find the data. 				StorageConfig: &dlppb.StorageConfig{ 					Type: &dlppb.StorageConfig_CloudStorageOptions{ 						CloudStorageOptions: &dlppb.CloudStorageOptions{ 							FileSet: &dlppb.CloudStorageOptions_FileSet{ 								Url: "gs://" + bucketName + "/" + fileName, 							}, 						}, 					}, 				}, 				// InspectConfig describes what fields to look for. 				InspectConfig: &dlppb.InspectConfig{ 					InfoTypes:       infoTypes, 					CustomInfoTypes: customInfoTypes, 					MinLikelihood:   dlppb.Likelihood_POSSIBLE, 					Limits: &dlppb.InspectConfig_FindingLimits{ 						MaxFindingsPerRequest: 10, 					}, 					IncludeQuote: true, 				}, 				// Send a message to PubSub using Actions. 				Actions: []*dlppb.Action{ 					{ 						Action: &dlppb.Action_PubSub{ 							PubSub: &dlppb.Action_PublishToPubSub{ 								Topic: topic, 							}, 						}, 					}, 				}, 			}, 		}, 	} 	// Create the inspect job. 	j, err := client.CreateDlpJob(ctx, req) 	if err != nil { 		return fmt.Errorf("CreateDlpJob: %w", err) 	} 	fmt.Fprintf(w, "Created job: %v\n", j.GetName())  	// Wait for the inspect job to finish by waiting for a PubSub message. 	// This only waits for 10 minutes. For long jobs, consider using a truly 	// asynchronous execution model such as Cloud Functions. 	ctx, cancel := context.WithTimeout(ctx, 10*time.Minute) 	defer cancel() 	err = s.Receive(ctx, func(ctx context.Context, msg *pubsub.Message) { 		// If this is the wrong job, do not process the result. 		if msg.Attributes["DlpJobName"] != j.GetName() { 			msg.Nack() 			return 		} 		msg.Ack()  		// Stop listening for more messages. 		defer cancel()  		resp, err := client.GetDlpJob(ctx, &dlppb.GetDlpJobRequest{ 			Name: j.GetName(), 		}) 		if err != nil { 			fmt.Fprintf(w, "Cloud not get job: %v", err) 			return 		} 		r := resp.GetInspectDetails().GetResult().GetInfoTypeStats() 		if len(r) == 0 { 			fmt.Fprintf(w, "No results") 		} 		for _, s := range r { 			fmt.Fprintf(w, "  Found %v instances of infoType %v\n", s.GetCount(), s.GetInfoType().GetName()) 		} 	}) 	if err != nil { 		return fmt.Errorf("Receive: %w", err) 	} 	return nil }

PHP

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

use Google\Cloud\Dlp\V2\Action; use Google\Cloud\Dlp\V2\Action\PublishToPubSub; use Google\Cloud\Dlp\V2\Client\DlpServiceClient; use Google\Cloud\Dlp\V2\CloudStorageOptions; use Google\Cloud\Dlp\V2\CloudStorageOptions\FileSet; use Google\Cloud\Dlp\V2\CreateDlpJobRequest; use Google\Cloud\Dlp\V2\DlpJob\JobState; use Google\Cloud\Dlp\V2\GetDlpJobRequest; use Google\Cloud\Dlp\V2\InfoType; use Google\Cloud\Dlp\V2\InspectConfig; use Google\Cloud\Dlp\V2\InspectConfig\FindingLimits; use Google\Cloud\Dlp\V2\InspectJobConfig; use Google\Cloud\Dlp\V2\Likelihood; use Google\Cloud\Dlp\V2\StorageConfig; use Google\Cloud\PubSub\PubSubClient;  /**  * Inspect a file stored on Google Cloud Storage , using Pub/Sub for job status notifications.  *  * @param string $callingProjectId  The project ID to run the API call under  * @param string $topicId           The name of the Pub/Sub topic to notify once the job completes  * @param string $subscriptionId    The name of the Pub/Sub subscription to use when listening for job  * @param string $bucketId          The name of the bucket where the file resides  * @param string $file              The path to the file within the bucket to inspect. Can contain wildcards e.g. "my-image.*"  * @param int    $maxFindings       (Optional) The maximum number of findings to report per request (0 = server maximum)  */ function inspect_gcs(     string $callingProjectId,     string $topicId,     string $subscriptionId,     string $bucketId,     string $file,     int $maxFindings = 0 ): void {     // Instantiate a client.     $dlp = new DlpServiceClient();     $pubsub = new PubSubClient();     $topic = $pubsub->topic($topicId);      // The infoTypes of information to match     $personNameInfoType = (new InfoType())         ->setName('PERSON_NAME');     $creditCardNumberInfoType = (new InfoType())         ->setName('CREDIT_CARD_NUMBER');     $infoTypes = [$personNameInfoType, $creditCardNumberInfoType];      // The minimum likelihood required before returning a match     $minLikelihood = likelihood::LIKELIHOOD_UNSPECIFIED;      // Specify finding limits     $limits = (new FindingLimits())         ->setMaxFindingsPerRequest($maxFindings);      // Construct items to be inspected     $fileSet = (new FileSet())         ->setUrl('gs://' . $bucketId . '/' . $file);      $cloudStorageOptions = (new CloudStorageOptions())         ->setFileSet($fileSet);      $storageConfig = (new StorageConfig())         ->setCloudStorageOptions($cloudStorageOptions);      // Construct the inspect config object     $inspectConfig = (new InspectConfig())         ->setMinLikelihood($minLikelihood)         ->setLimits($limits)         ->setInfoTypes($infoTypes);      // Construct the action to run when job completes     $pubSubAction = (new PublishToPubSub())         ->setTopic($topic->name());      $action = (new Action())         ->setPubSub($pubSubAction);      // Construct inspect job config to run     $inspectJob = (new InspectJobConfig())         ->setInspectConfig($inspectConfig)         ->setStorageConfig($storageConfig)         ->setActions([$action]);      // Listen for job notifications via an existing topic/subscription.     $subscription = $topic->subscription($subscriptionId);      // Submit request     $parent = "projects/$callingProjectId/locations/global";     $createDlpJobRequest = (new CreateDlpJobRequest())         ->setParent($parent)         ->setInspectJob($inspectJob);     $job = $dlp->createDlpJob($createDlpJobRequest);      // Poll Pub/Sub using exponential backoff until job finishes     // Consider using an asynchronous execution model such as Cloud Functions     $attempt = 1;     $startTime = time();     do {         foreach ($subscription->pull() as $message) {             if (                 isset($message->attributes()['DlpJobName']) &&                 $message->attributes()['DlpJobName'] === $job->getName()             ) {                 $subscription->acknowledge($message);                 // Get the updated job. Loop to avoid race condition with DLP API.                 do {                     $getDlpJobRequest = (new GetDlpJobRequest())                         ->setName($job->getName());                     $job = $dlp->getDlpJob($getDlpJobRequest);                 } while ($job->getState() == JobState::RUNNING);                 break 2; // break from parent do while             }         }         print('Waiting for job to complete' . PHP_EOL);         // Exponential backoff with max delay of 60 seconds         sleep(min(60, pow(2, ++$attempt)));     } while (time() - $startTime < 600); // 10 minute timeout      // Print finding counts     printf('Job %s status: %s' . PHP_EOL, $job->getName(), JobState::name($job->getState()));     switch ($job->getState()) {         case JobState::DONE:             $infoTypeStats = $job->getInspectDetails()->getResult()->getInfoTypeStats();             if (count($infoTypeStats) === 0) {                 print('No findings.' . PHP_EOL);             } else {                 foreach ($infoTypeStats as $infoTypeStat) {                     printf('  Found %s instance(s) of infoType %s' . PHP_EOL, $infoTypeStat->getCount(), $infoTypeStat->getInfoType()->getName());                 }             }             break;         case JobState::FAILED:             printf('Job %s had errors:' . PHP_EOL, $job->getName());             $errors = $job->getErrors();             foreach ($errors as $error) {                 var_dump($error->getDetails());             }             break;         case JobState::PENDING:             print('Job has not completed. Consider a longer timeout or an asynchronous execution model' . PHP_EOL);             break;         default:             print('Unexpected job state. Most likely, the job is either running or has not yet started.');     } }

C#

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

 using Google.Api.Gax.ResourceNames; using Google.Cloud.Dlp.V2; using Google.Cloud.PubSub.V1; using System; using System.Collections.Generic; using System.Threading; using System.Threading.Tasks; using static Google.Cloud.Dlp.V2.InspectConfig.Types;  public class InspectGoogleCloudStorage {     public static DlpJob InspectGCS(         string projectId,         Likelihood minLikelihood,         int maxFindings,         bool includeQuote,         IEnumerable<InfoType> infoTypes,         IEnumerable<CustomInfoType> customInfoTypes,         string bucketName,         string topicId,         string subscriptionId)     {         var inspectJob = new InspectJobConfig         {             StorageConfig = new StorageConfig             {                 CloudStorageOptions = new CloudStorageOptions                 {                     FileSet = new CloudStorageOptions.Types.FileSet { Url = $"gs://{bucketName}/*.txt" },                     BytesLimitPerFile = 1073741824                 },             },             InspectConfig = new InspectConfig             {                 InfoTypes = { infoTypes },                 CustomInfoTypes = { customInfoTypes },                 ExcludeInfoTypes = false,                 IncludeQuote = includeQuote,                 Limits = new FindingLimits                 {                     MaxFindingsPerRequest = maxFindings                 },                 MinLikelihood = minLikelihood             },             Actions =                 {                     new Google.Cloud.Dlp.V2.Action                     {                         // Send results to Pub/Sub topic                         PubSub = new Google.Cloud.Dlp.V2.Action.Types.PublishToPubSub                         {                             Topic = topicId,                         }                     }                 }         };          // Issue Create Dlp Job Request         var client = DlpServiceClient.Create();         var request = new CreateDlpJobRequest         {             InspectJob = inspectJob,             Parent = new LocationName(projectId, "global").ToString(),         };          // We need created job name         var dlpJob = client.CreateDlpJob(request);          // Get a pub/sub subscription and listen for DLP results         var fireEvent = new ManualResetEventSlim();          var subscriptionName = new SubscriptionName(projectId, subscriptionId);         var subscriber = SubscriberClient.CreateAsync(subscriptionName).Result;         subscriber.StartAsync(             (pubSubMessage, cancellationToken) =>             {                 // Given a message that we receive on this subscription, we should either acknowledge or decline it                 if (pubSubMessage.Attributes["DlpJobName"] == dlpJob.Name)                 {                     fireEvent.Set();                     return Task.FromResult(SubscriberClient.Reply.Ack);                 }                  return Task.FromResult(SubscriberClient.Reply.Nack);             });          // We block here until receiving a signal from a separate thread that is waiting on a message indicating receiving a result of Dlp job         if (fireEvent.Wait(TimeSpan.FromMinutes(1)))         {             // Stop the thread that is listening to messages as a result of StartAsync call earlier             subscriber.StopAsync(CancellationToken.None).Wait();              // Now we can inspect full job results             var job = client.GetDlpJob(new GetDlpJobRequest { DlpJobName = new DlpJobName(projectId, dlpJob.Name) });              // Inspect Job details             Console.WriteLine($"Processed bytes: {job.InspectDetails.Result.ProcessedBytes}");             Console.WriteLine($"Total estimated bytes: {job.InspectDetails.Result.TotalEstimatedBytes}");             var stats = job.InspectDetails.Result.InfoTypeStats;             Console.WriteLine("Found stats:");             foreach (var stat in stats)             {                 Console.WriteLine($"{stat.InfoType.Name}");             }              return job;         }          throw new InvalidOperationException("The wait failed on timeout");     } }

Ispezionare un tipo di Datastore

Puoi configurare un'ispezione di un tipo Datastore utilizzando la consoleGoogle Cloud , l'API DLP tramite richieste REST o RPC oppure in modo programmatico in diverse lingue utilizzando una libreria client.

Per configurare un job di scansione di un tipo Datastore utilizzando Sensitive Data Protection:

Console

Per configurare un job di scansione di un tipo Datastore utilizzando Sensitive Data Protection:

Nella sezione Sensitive Data Protection della console Google Cloud , vai alla pagina Crea job o trigger di job.

Vai a Crea job o trigger di job
Inserisci le informazioni sul job Sensitive Data Protection e fai clic su Continua per completare ogni passaggio:
- Per il passaggio 1: scegli i dati di input, inserisci gli identificatori per il progetto, lo spazio dei nomi (facoltativo) e il tipo che vuoi scansionare. Per maggiori dettagli, vedi Scegliere i dati di input.
- (Facoltativo) Per il passaggio 2: configura il rilevamento, puoi configurare i tipi di dati da cercare, chiamati "infoTypes". Puoi scegliere dall'elenco di infoType predefiniti oppure selezionare un modello, se esistente. Per maggiori dettagli, vedi Configurare il rilevamento.
- (Facoltativo) Per il passaggio 3: aggiungi azioni, assicurati che l'opzione Notifica via email sia attivata.
  
  Attiva Salva in BigQuery per pubblicare i risultati di Sensitive Data Protection in una tabella BigQuery. Fornisci quanto segue:
  - In ID progetto, inserisci l'ID progetto in cui sono archiviati i risultati.
  - In ID set di dati, inserisci il nome del set di dati in cui sono archiviati i risultati.
  - (Facoltativo) In ID tabella, inserisci il nome della tabella che archivia i risultati. Se non viene specificato alcun ID tabella, viene assegnato un nome predefinito a una nuova tabella simile al seguente: dlp_googleapis_[DATE]_1234567890. Se specifichi una tabella esistente, i risultati vengono aggiunti.
  Quando i dati vengono scritti in una tabella BigQuery, la fatturazione e l'utilizzo delle quote vengono applicati al progetto che contiene la tabella di destinazione.
  
  Per ulteriori informazioni sulle altre azioni elencate, consulta Aggiungere azioni.
- (Facoltativo) Per il passaggio 4: pianificazione, configura un intervallo di tempo o una pianificazione selezionando Specifica intervallo di tempo o Crea un trigger per eseguire il job su base periodica. Per ulteriori informazioni, consulta la sezione Pianificazione.
Fai clic su Crea.
Al termine del job Sensitive Data Protection, viene visualizzata la pagina dei dettagli del job e riceverai una notifica via email. Puoi visualizzare i risultati dell'ispezione nella pagina dei dettagli del job.
(Facoltativo) Se hai scelto di pubblicare i risultati di Sensitive Data Protection in BigQuery, nella pagina Dettagli job, fai clic su Visualizza risultati in BigQuery per aprire la tabella nell'interfaccia utente web di BigQuery. Puoi quindi eseguire query sulla tabella e analizzare i risultati. Per ulteriori informazioni sull'esecuzione di query sui risultati in BigQuery, consulta Esecuzione di query sui risultati di Sensitive Data Protection in BigQuery.

Protocollo

Di seguito è riportato un esempio di JSON che può essere inviato in una richiesta POST all'endpoint REST dell'API DLP specificato. Questo JSON di esempio mostra come utilizzare l'API DLP per esaminare i tipi di Datastore. Per informazioni sui parametri inclusi nella richiesta, vedi "Configurare l'ispezione dell'archiviazione", più avanti in questo argomento.

Puoi provare rapidamente questa funzionalità in Explorer API nella pagina di riferimento per dlpJobs.create:

Vai a Explorer API

Tieni presente che una richiesta riuscita, anche in Explorer API, creerà un nuovo job di scansione. Per informazioni su come controllare i job di scansione, vedi Recuperare i risultati dell'ispezione più avanti in questo argomento. Per informazioni generali sull'utilizzo di JSON per inviare richieste all&#39API DLPLP, consulta la guida rapida JSON.

Input JSON:

POST https://dlp.googleapis.com/v2/projects/[PROJECT-ID]/dlpJobs?key={YOUR_API_KEY}  {   "inspectJob":{     "storageConfig":{       "datastoreOptions":{         "kind":{           "name":"Example-Kind"         },         "partitionId":{           "namespaceId":"[NAMESPACE-ID]",           "projectId":"[PROJECT-ID]"         }       }     },     "inspectConfig":{       "infoTypes":[         {           "name":"PHONE_NUMBER"         }       ],       "excludeInfoTypes":false,       "includeQuote":true,       "minLikelihood":"LIKELY"     },     "actions":[       {         "saveFindings":{           "outputConfig":{             "table":{               "projectId":"[PROJECT-ID]",               "datasetId":"[BIGQUERY-DATASET-NAME]",               "tableId":"[BIGQUERY-TABLE-NAME]"             }           }         }       }     ]   } }

Java

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

 import com.google.api.core.SettableApiFuture; import com.google.cloud.dlp.v2.DlpServiceClient; import com.google.cloud.pubsub.v1.AckReplyConsumer; import com.google.cloud.pubsub.v1.MessageReceiver; import com.google.cloud.pubsub.v1.Subscriber; import com.google.privacy.dlp.v2.Action; import com.google.privacy.dlp.v2.CreateDlpJobRequest; import com.google.privacy.dlp.v2.DatastoreOptions; import com.google.privacy.dlp.v2.DlpJob; import com.google.privacy.dlp.v2.GetDlpJobRequest; import com.google.privacy.dlp.v2.InfoType; import com.google.privacy.dlp.v2.InfoTypeStats; import com.google.privacy.dlp.v2.InspectConfig; import com.google.privacy.dlp.v2.InspectDataSourceDetails; import com.google.privacy.dlp.v2.InspectJobConfig; import com.google.privacy.dlp.v2.KindExpression; import com.google.privacy.dlp.v2.LocationName; import com.google.privacy.dlp.v2.PartitionId; import com.google.privacy.dlp.v2.StorageConfig; import com.google.pubsub.v1.ProjectSubscriptionName; import com.google.pubsub.v1.PubsubMessage; import java.io.IOException; import java.util.List; import java.util.concurrent.ExecutionException; import java.util.concurrent.TimeUnit; import java.util.concurrent.TimeoutException; import java.util.stream.Collectors; import java.util.stream.Stream;  public class InspectDatastoreEntity {    public static void main(String[] args) throws Exception {     // TODO(developer): Replace these variables before running the sample.     String projectId = "your-project-id";     String datastoreNamespace = "your-datastore-namespace";     String datastoreKind = "your-datastore-kind";     String topicId = "your-pubsub-topic-id";     String subscriptionId = "your-pubsub-subscription-id";     insepctDatastoreEntity(projectId, datastoreNamespace, datastoreKind, topicId, subscriptionId);   }    // Inspects a Datastore Entity.   public static void insepctDatastoreEntity(       String projectId,       String datastoreNamespce,       String datastoreKind,       String topicId,       String subscriptionId)       throws ExecutionException, InterruptedException, IOException {     // Initialize client that will be used to send requests. This client only needs to be created     // once, and can be reused for multiple requests. After completing all of your requests, call     // the "close" method on the client to safely clean up any remaining background resources.     try (DlpServiceClient dlp = DlpServiceClient.create()) {       // Specify the Datastore entity to be inspected.       PartitionId partitionId =           PartitionId.newBuilder()               .setProjectId(projectId)               .setNamespaceId(datastoreNamespce)               .build();       KindExpression kindExpression = KindExpression.newBuilder().setName(datastoreKind).build();        DatastoreOptions datastoreOptions =           DatastoreOptions.newBuilder().setKind(kindExpression).setPartitionId(partitionId).build();        StorageConfig storageConfig =           StorageConfig.newBuilder().setDatastoreOptions(datastoreOptions).build();        // Specify the type of info the inspection will look for.       // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types       List<InfoType> infoTypes =           Stream.of("PHONE_NUMBER", "EMAIL_ADDRESS", "CREDIT_CARD_NUMBER")               .map(it -> InfoType.newBuilder().setName(it).build())               .collect(Collectors.toList());        // Specify how the content should be inspected.       InspectConfig inspectConfig =           InspectConfig.newBuilder().addAllInfoTypes(infoTypes).setIncludeQuote(true).build();        // Specify the action that is triggered when the job completes.       String pubSubTopic = String.format("projects/%s/topics/%s", projectId, topicId);       Action.PublishToPubSub publishToPubSub =           Action.PublishToPubSub.newBuilder().setTopic(pubSubTopic).build();       Action action = Action.newBuilder().setPubSub(publishToPubSub).build();        // Configure the long running job we want the service to perform.       InspectJobConfig inspectJobConfig =           InspectJobConfig.newBuilder()               .setStorageConfig(storageConfig)               .setInspectConfig(inspectConfig)               .addActions(action)               .build();        // Create the request for the job configured above.       CreateDlpJobRequest createDlpJobRequest =           CreateDlpJobRequest.newBuilder()               .setParent(LocationName.of(projectId, "global").toString())               .setInspectJob(inspectJobConfig)               .build();        // Use the client to send the request.       final DlpJob dlpJob = dlp.createDlpJob(createDlpJobRequest);       System.out.println("Job created: " + dlpJob.getName());        // Set up a Pub/Sub subscriber to listen on the job completion status       final SettableApiFuture<Boolean> done = SettableApiFuture.create();        ProjectSubscriptionName subscriptionName =           ProjectSubscriptionName.of(projectId, subscriptionId);        MessageReceiver messageHandler =           (PubsubMessage pubsubMessage, AckReplyConsumer ackReplyConsumer) -> {             handleMessage(dlpJob, done, pubsubMessage, ackReplyConsumer);           };       Subscriber subscriber = Subscriber.newBuilder(subscriptionName, messageHandler).build();       subscriber.startAsync();        // Wait for job completion semi-synchronously       // For long jobs, consider using a truly asynchronous execution model such as Cloud Functions       try {         done.get(15, TimeUnit.MINUTES);       } catch (TimeoutException e) {         System.out.println("Job was not completed after 15 minutes.");         return;       } finally {         subscriber.stopAsync();         subscriber.awaitTerminated();       }        // Get the latest state of the job from the service       GetDlpJobRequest request = GetDlpJobRequest.newBuilder().setName(dlpJob.getName()).build();       DlpJob completedJob = dlp.getDlpJob(request);        // Parse the response and process results.       System.out.println("Job status: " + completedJob.getState());       System.out.println("Job name: " + dlpJob.getName());       InspectDataSourceDetails.Result result = completedJob.getInspectDetails().getResult();       System.out.println("Findings: ");       for (InfoTypeStats infoTypeStat : result.getInfoTypeStatsList()) {         System.out.print("\tInfo type: " + infoTypeStat.getInfoType().getName());         System.out.println("\tCount: " + infoTypeStat.getCount());       }     }   }    // handleMessage injects the job and settableFuture into the message reciever interface   private static void handleMessage(       DlpJob job,       SettableApiFuture<Boolean> done,       PubsubMessage pubsubMessage,       AckReplyConsumer ackReplyConsumer) {     String messageAttribute = pubsubMessage.getAttributesMap().get("DlpJobName");     if (job.getName().equals(messageAttribute)) {       done.set(true);       ackReplyConsumer.ack();     } else {       ackReplyConsumer.nack();     }   } }

Node.js

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

// Import the Google Cloud client libraries const DLP = require('@google-cloud/dlp'); const {PubSub} = require('@google-cloud/pubsub');  // Instantiates clients const dlp = new DLP.DlpServiceClient(); const pubsub = new PubSub();  // The project ID to run the API call under // const projectId = 'my-project';  // The project ID the target Datastore is stored under // This may or may not equal the calling project ID // const dataProjectId = 'my-project';  // (Optional) The ID namespace of the Datastore document to inspect. // To ignore Datastore namespaces, set this to an empty string ('') // const namespaceId = '';  // The kind of the Datastore entity to inspect. // const kind = 'Person';  // The minimum likelihood required before returning a match // const minLikelihood = 'LIKELIHOOD_UNSPECIFIED';  // The maximum number of findings to report per request (0 = server maximum) // const maxFindings = 0;  // The infoTypes of information to match // const infoTypes = [{ name: 'PHONE_NUMBER' }, { name: 'EMAIL_ADDRESS' }, { name: 'CREDIT_CARD_NUMBER' }];  // The customInfoTypes of information to match // const customInfoTypes = [{ infoType: { name: 'DICT_TYPE' }, dictionary: { wordList: { words: ['foo', 'bar', 'baz']}}}, //   { infoType: { name: 'REGEX_TYPE' }, regex: {pattern: '\\(\\d{3}\\) \\d{3}-\\d{4}'}}];  // The name of the Pub/Sub topic to notify once the job completes // TODO(developer): create a Pub/Sub topic to use for this // const topicId = 'MY-PUBSUB-TOPIC'  // The name of the Pub/Sub subscription to use when listening for job // completion notifications // TODO(developer): create a Pub/Sub subscription to use for this // const subscriptionId = 'MY-PUBSUB-SUBSCRIPTION'  async function inspectDatastore() {   // Construct items to be inspected   const storageItems = {     datastoreOptions: {       partitionId: {         projectId: dataProjectId,         namespaceId: namespaceId,       },       kind: {         name: kind,       },     },   };    // Construct request for creating an inspect job   const request = {     parent: `projects/${projectId}/locations/global`,     inspectJob: {       inspectConfig: {         infoTypes: infoTypes,         customInfoTypes: customInfoTypes,         minLikelihood: minLikelihood,         limits: {           maxFindingsPerRequest: maxFindings,         },       },       storageConfig: storageItems,       actions: [         {           pubSub: {             topic: `projects/${projectId}/topics/${topicId}`,           },         },       ],     },   };   // Run inspect-job creation request   const [topicResponse] = await pubsub.topic(topicId).get();   // Verify the Pub/Sub topic and listen for job notifications via an   // existing subscription.   const subscription = await topicResponse.subscription(subscriptionId);   const [jobsResponse] = await dlp.createDlpJob(request);   const jobName = jobsResponse.name;   // Watch the Pub/Sub topic until the DLP job finishes   await new Promise((resolve, reject) => {     const messageHandler = message => {       if (message.attributes && message.attributes.DlpJobName === jobName) {         message.ack();         subscription.removeListener('message', messageHandler);         subscription.removeListener('error', errorHandler);         resolve(jobName);       } else {         message.nack();       }     };      const errorHandler = err => {       subscription.removeListener('message', messageHandler);       subscription.removeListener('error', errorHandler);       reject(err);     };      subscription.on('message', messageHandler);     subscription.on('error', errorHandler);   });   // Wait for DLP job to fully complete   setTimeout(() => {     console.log('Waiting for DLP job to fully complete');   }, 500);   const [job] = await dlp.getDlpJob({name: jobName});   console.log(`Job ${job.name} status: ${job.state}`);    const infoTypeStats = job.inspectDetails.result.infoTypeStats;   if (infoTypeStats.length > 0) {     infoTypeStats.forEach(infoTypeStat => {       console.log(         `  Found ${infoTypeStat.count} instance(s) of infoType ${infoTypeStat.infoType.name}.`       );     });   } else {     console.log('No findings.');   } } await inspectDatastore();

Python

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

import threading from typing import List, Optional  import google.cloud.dlp import google.cloud.pubsub   def inspect_datastore(     project: str,     datastore_project: str,     kind: str,     topic_id: str,     subscription_id: str,     info_types: List[str],     custom_dictionaries: List[str] = None,     custom_regexes: List[str] = None,     namespace_id: str = None,     min_likelihood: Optional[int] = None,     max_findings: Optional[int] = None,     timeout: int = 300, ) -> None:     """Uses the Data Loss Prevention API to analyze Datastore data.     Args:         project: The Google Cloud project id to use as a parent resource.         datastore_project: The Google Cloud project id of the target Datastore.         kind: The kind of the Datastore entity to inspect, e.g. 'Person'.         topic_id: The id of the Cloud Pub/Sub topic to which the API will             broadcast job completion. The topic must already exist.         subscription_id: The id of the Cloud Pub/Sub subscription to listen on             while waiting for job completion. The subscription must already             exist and be subscribed to the topic.         info_types: A list of strings representing info types to look for.             A full list of info type categories can be fetched from the API.         namespace_id: The namespace of the Datastore document, if applicable.         min_likelihood: A string representing the minimum likelihood threshold             that constitutes a match. One of: 'LIKELIHOOD_UNSPECIFIED',             'VERY_UNLIKELY', 'UNLIKELY', 'POSSIBLE', 'LIKELY', 'VERY_LIKELY'.         max_findings: The maximum number of findings to report; 0 = no maximum.         timeout: The number of seconds to wait for a response from the API.     Returns:         None; the response from the API is printed to the terminal.     """      # Instantiate a client.     dlp = google.cloud.dlp_v2.DlpServiceClient()      # Prepare info_types by converting the list of strings into a list of     # dictionaries (protos are also accepted).     if not info_types:         info_types = ["FIRST_NAME", "LAST_NAME", "EMAIL_ADDRESS"]     info_types = [{"name": info_type} for info_type in info_types]      # Prepare custom_info_types by parsing the dictionary word lists and     # regex patterns.     if custom_dictionaries is None:         custom_dictionaries = []     dictionaries = [         {             "info_type": {"name": f"CUSTOM_DICTIONARY_{i}"},             "dictionary": {"word_list": {"words": custom_dict.split(",")}},         }         for i, custom_dict in enumerate(custom_dictionaries)     ]     if custom_regexes is None:         custom_regexes = []     regexes = [         {             "info_type": {"name": f"CUSTOM_REGEX_{i}"},             "regex": {"pattern": custom_regex},         }         for i, custom_regex in enumerate(custom_regexes)     ]     custom_info_types = dictionaries + regexes      # Construct the configuration dictionary. Keys which are None may     # optionally be omitted entirely.     inspect_config = {         "info_types": info_types,         "custom_info_types": custom_info_types,         "min_likelihood": min_likelihood,         "limits": {"max_findings_per_request": max_findings},     }      # Construct a storage_config containing the target Datastore info.     storage_config = {         "datastore_options": {             "partition_id": {                 "project_id": datastore_project,                 "namespace_id": namespace_id,             },             "kind": {"name": kind},         }     }      # Convert the project id into full resource ids.     topic = google.cloud.pubsub.PublisherClient.topic_path(project, topic_id)     parent = f"projects/{project}/locations/global"      # Tell the API where to send a notification when the job is complete.     actions = [{"pub_sub": {"topic": topic}}]      # Construct the inspect_job, which defines the entire inspect content task.     inspect_job = {         "inspect_config": inspect_config,         "storage_config": storage_config,         "actions": actions,     }      operation = dlp.create_dlp_job(         request={"parent": parent, "inspect_job": inspect_job}     )     print(f"Inspection operation started: {operation.name}")      # Create a Pub/Sub client and find the subscription. The subscription is     # expected to already be listening to the topic.     subscriber = google.cloud.pubsub.SubscriberClient()     subscription_path = subscriber.subscription_path(project, subscription_id)      # Set up a callback to acknowledge a message. This closes around an event     # so that it can signal that it is done and the main thread can continue.     job_done = threading.Event()      def callback(message: google.cloud.pubsub_v1.subscriber.message.Message) -> None:         try:             if message.attributes["DlpJobName"] == operation.name:                 # This is the message we're looking for, so acknowledge it.                 message.ack()                  # Now that the job is done, fetch the results and print them.                 job = dlp.get_dlp_job(request={"name": operation.name})                 print(f"Job name: {job.name}")                 if job.inspect_details.result.info_type_stats:                     for finding in job.inspect_details.result.info_type_stats:                         print(                             f"Info type: {finding.info_type.name}; Count: {finding.count}"                         )                 else:                     print("No findings.")                  # Signal to the main thread that we can exit.                 job_done.set()             else:                 # This is not the message we're looking for.                 message.drop()         except Exception as e:             # Because this is executing in a thread, an exception won't be             # noted unless we print it manually.             print(e)             raise      # Register the callback and wait on the event.     subscriber.subscribe(subscription_path, callback=callback)      finished = job_done.wait(timeout=timeout)     if not finished:         print(             "No event received before the timeout. Please verify that the "             "subscription provided is subscribed to the topic provided."         )

Go

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

import ( 	"context" 	"fmt" 	"io" 	"strings" 	"time"  	dlp "cloud.google.com/go/dlp/apiv2" 	"cloud.google.com/go/dlp/apiv2/dlppb" 	"cloud.google.com/go/pubsub" )  // inspectDatastore searches for the given info types in the given dataset kind. func inspectDatastore(w io.Writer, projectID string, infoTypeNames []string, customDictionaries []string, customRegexes []string, pubSubTopic, pubSubSub, dataProject, namespaceID, kind string) error { 	// projectID := "my-project-id" 	// infoTypeNames := []string{"US_SOCIAL_SECURITY_NUMBER"} 	// customDictionaries := []string{...} 	// customRegexes := []string{...} 	// pubSubTopic := "dlp-risk-sample-topic" 	// pubSubSub := "dlp-risk-sample-sub" 	// namespaceID := "namespace-id" 	// kind := "MyKind"  	ctx := context.Background() 	client, err := dlp.NewClient(ctx) 	if err != nil { 		return fmt.Errorf("dlp.NewClient: %w", err) 	}  	// Convert the info type strings to a list of InfoTypes. 	var infoTypes []*dlppb.InfoType 	for _, it := range infoTypeNames { 		infoTypes = append(infoTypes, &dlppb.InfoType{Name: it}) 	} 	// Convert the custom dictionary word lists and custom regexes to a list of CustomInfoTypes. 	var customInfoTypes []*dlppb.CustomInfoType 	for idx, it := range customDictionaries { 		customInfoTypes = append(customInfoTypes, &dlppb.CustomInfoType{ 			InfoType: &dlppb.InfoType{ 				Name: fmt.Sprintf("CUSTOM_DICTIONARY_%d", idx), 			}, 			Type: &dlppb.CustomInfoType_Dictionary_{ 				Dictionary: &dlppb.CustomInfoType_Dictionary{ 					Source: &dlppb.CustomInfoType_Dictionary_WordList_{ 						WordList: &dlppb.CustomInfoType_Dictionary_WordList{ 							Words: strings.Split(it, ","), 						}, 					}, 				}, 			}, 		}) 	} 	for idx, it := range customRegexes { 		customInfoTypes = append(customInfoTypes, &dlppb.CustomInfoType{ 			InfoType: &dlppb.InfoType{ 				Name: fmt.Sprintf("CUSTOM_REGEX_%d", idx), 			}, 			Type: &dlppb.CustomInfoType_Regex_{ 				Regex: &dlppb.CustomInfoType_Regex{ 					Pattern: it, 				}, 			}, 		}) 	}  	// Create a PubSub Client used to listen for when the inspect job finishes. 	pubsubClient, err := pubsub.NewClient(ctx, projectID) 	if err != nil { 		return fmt.Errorf("pubsub.NewClient: %w", err) 	} 	defer pubsubClient.Close()  	// Create a PubSub subscription we can use to listen for messages. 	// Create the Topic if it doesn't exist. 	t := pubsubClient.Topic(pubSubTopic) 	if exists, err := t.Exists(ctx); err != nil { 		return fmt.Errorf("t.Exists: %w", err) 	} else if !exists { 		if t, err = pubsubClient.CreateTopic(ctx, pubSubTopic); err != nil { 			return fmt.Errorf("CreateTopic: %w", err) 		} 	}  	// Create the Subscription if it doesn't exist. 	s := pubsubClient.Subscription(pubSubSub) 	if exists, err := s.Exists(ctx); err != nil { 		return fmt.Errorf("s.Exists: %w", err) 	} else if !exists { 		if s, err = pubsubClient.CreateSubscription(ctx, pubSubSub, pubsub.SubscriptionConfig{Topic: t}); err != nil { 			return fmt.Errorf("CreateSubscription: %w", err) 		} 	}  	// topic is the PubSub topic string where messages should be sent. 	topic := "projects/" + projectID + "/topics/" + pubSubTopic  	// Create a configured request. 	req := &dlppb.CreateDlpJobRequest{ 		Parent: fmt.Sprintf("projects/%s/locations/global", projectID), 		Job: &dlppb.CreateDlpJobRequest_InspectJob{ 			InspectJob: &dlppb.InspectJobConfig{ 				// StorageConfig describes where to find the data. 				StorageConfig: &dlppb.StorageConfig{ 					Type: &dlppb.StorageConfig_DatastoreOptions{ 						DatastoreOptions: &dlppb.DatastoreOptions{ 							PartitionId: &dlppb.PartitionId{ 								ProjectId:   dataProject, 								NamespaceId: namespaceID, 							}, 							Kind: &dlppb.KindExpression{ 								Name: kind, 							}, 						}, 					}, 				}, 				// InspectConfig describes what fields to look for. 				InspectConfig: &dlppb.InspectConfig{ 					InfoTypes:       infoTypes, 					CustomInfoTypes: customInfoTypes, 					MinLikelihood:   dlppb.Likelihood_POSSIBLE, 					Limits: &dlppb.InspectConfig_FindingLimits{ 						MaxFindingsPerRequest: 10, 					}, 					IncludeQuote: true, 				}, 				// Send a message to PubSub using Actions. 				Actions: []*dlppb.Action{ 					{ 						Action: &dlppb.Action_PubSub{ 							PubSub: &dlppb.Action_PublishToPubSub{ 								Topic: topic, 							}, 						}, 					}, 				}, 			}, 		}, 	} 	// Create the inspect job. 	j, err := client.CreateDlpJob(ctx, req) 	if err != nil { 		return fmt.Errorf("CreateDlpJob: %w", err) 	} 	fmt.Fprintf(w, "Created job: %v\n", j.GetName())  	// Wait for the inspect job to finish by waiting for a PubSub message. 	// This only waits for 10 minutes. For long jobs, consider using a truly 	// asynchronous execution model such as Cloud Functions. 	ctx, cancel := context.WithTimeout(ctx, 10*time.Minute) 	defer cancel() 	err = s.Receive(ctx, func(ctx context.Context, msg *pubsub.Message) { 		// If this is the wrong job, do not process the result. 		if msg.Attributes["DlpJobName"] != j.GetName() { 			msg.Nack() 			return 		} 		msg.Ack()  		// Stop listening for more messages. 		defer cancel()  		resp, err := client.GetDlpJob(ctx, &dlppb.GetDlpJobRequest{ 			Name: j.GetName(), 		}) 		if err != nil { 			fmt.Fprintf(w, "Error getting completed job: %v\n", err) 			return 		} 		r := resp.GetInspectDetails().GetResult().GetInfoTypeStats() 		if len(r) == 0 { 			fmt.Fprintf(w, "No results") 			return 		} 		for _, s := range r { 			fmt.Fprintf(w, "  Found %v instances of infoType %v\n", s.GetCount(), s.GetInfoType().GetName()) 		} 	}) 	if err != nil { 		return fmt.Errorf("Receive: %w", err) 	} 	return nil }

PHP

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

use Google\Cloud\Dlp\V2\Action; use Google\Cloud\Dlp\V2\Action\PublishToPubSub; use Google\Cloud\Dlp\V2\Client\DlpServiceClient; use Google\Cloud\Dlp\V2\CreateDlpJobRequest; use Google\Cloud\Dlp\V2\DatastoreOptions; use Google\Cloud\Dlp\V2\DlpJob\JobState; use Google\Cloud\Dlp\V2\GetDlpJobRequest; use Google\Cloud\Dlp\V2\InfoType; use Google\Cloud\Dlp\V2\InspectConfig; use Google\Cloud\Dlp\V2\InspectConfig\FindingLimits; use Google\Cloud\Dlp\V2\InspectJobConfig; use Google\Cloud\Dlp\V2\KindExpression; use Google\Cloud\Dlp\V2\Likelihood; use Google\Cloud\Dlp\V2\PartitionId; use Google\Cloud\Dlp\V2\StorageConfig; use Google\Cloud\PubSub\PubSubClient;  /**  * Inspect Datastore, using Pub/Sub for job status notifications.  *  * @param string $callingProjectId  The project ID to run the API call under  * @param string $dataProjectId     The project ID containing the target Datastore  * @param string $topicId           The name of the Pub/Sub topic to notify once the job completes  * @param string $subscriptionId    The name of the Pub/Sub subscription to use when listening for job  * @param string $kind              The datastore kind to inspect  * @param string $namespaceId       The ID namespace of the Datastore document to inspect  * @param int    $maxFindings       (Optional) The maximum number of findings to report per request (0 = server maximum)  */ function inspect_datastore(     string $callingProjectId,     string $dataProjectId,     string $topicId,     string $subscriptionId,     string $kind,     string $namespaceId,     int $maxFindings = 0 ): void {     // Instantiate clients     $dlp = new DlpServiceClient();     $pubsub = new PubSubClient();     $topic = $pubsub->topic($topicId);      // The infoTypes of information to match     $personNameInfoType = (new InfoType())         ->setName('PERSON_NAME');     $phoneNumberInfoType = (new InfoType())         ->setName('PHONE_NUMBER');     $infoTypes = [$personNameInfoType, $phoneNumberInfoType];      // The minimum likelihood required before returning a match     $minLikelihood = likelihood::LIKELIHOOD_UNSPECIFIED;      // Specify finding limits     $limits = (new FindingLimits())         ->setMaxFindingsPerRequest($maxFindings);      // Construct items to be inspected     $partitionId = (new PartitionId())         ->setProjectId($dataProjectId)         ->setNamespaceId($namespaceId);      $kindExpression = (new KindExpression())         ->setName($kind);      $datastoreOptions = (new DatastoreOptions())         ->setPartitionId($partitionId)         ->setKind($kindExpression);      // Construct the inspect config object     $inspectConfig = (new InspectConfig())         ->setInfoTypes($infoTypes)         ->setMinLikelihood($minLikelihood)         ->setLimits($limits);      // Construct the storage config object     $storageConfig = (new StorageConfig())         ->setDatastoreOptions($datastoreOptions);      // Construct the action to run when job completes     $pubSubAction = (new PublishToPubSub())         ->setTopic($topic->name());      $action = (new Action())         ->setPubSub($pubSubAction);      // Construct inspect job config to run     $inspectJob = (new InspectJobConfig())         ->setInspectConfig($inspectConfig)         ->setStorageConfig($storageConfig)         ->setActions([$action]);      // Listen for job notifications via an existing topic/subscription.     $subscription = $topic->subscription($subscriptionId);      // Submit request     $parent = "projects/$callingProjectId/locations/global";     $createDlpJobRequest = (new CreateDlpJobRequest())         ->setParent($parent)         ->setInspectJob($inspectJob);     $job = $dlp->createDlpJob($createDlpJobRequest);      // Poll Pub/Sub using exponential backoff until job finishes     // Consider using an asynchronous execution model such as Cloud Functions     $attempt = 1;     $startTime = time();     do {         foreach ($subscription->pull() as $message) {             if (                 isset($message->attributes()['DlpJobName']) &&                 $message->attributes()['DlpJobName'] === $job->getName()             ) {                 $subscription->acknowledge($message);                 // Get the updated job. Loop to avoid race condition with DLP API.                 do {                     $getDlpJobRequest = (new GetDlpJobRequest())                         ->setName($job->getName());                     $job = $dlp->getDlpJob($getDlpJobRequest);                 } while ($job->getState() == JobState::RUNNING);                 break 2; // break from parent do while             }         }         print('Waiting for job to complete' . PHP_EOL);         // Exponential backoff with max delay of 60 seconds         sleep(min(60, pow(2, ++$attempt)));     } while (time() - $startTime < 600); // 10 minute timeout      // Print finding counts     printf('Job %s status: %s' . PHP_EOL, $job->getName(), JobState::name($job->getState()));     switch ($job->getState()) {         case JobState::DONE:             $infoTypeStats = $job->getInspectDetails()->getResult()->getInfoTypeStats();             if (count($infoTypeStats) === 0) {                 print('No findings.' . PHP_EOL);             } else {                 foreach ($infoTypeStats as $infoTypeStat) {                     printf('  Found %s instance(s) of infoType %s' . PHP_EOL, $infoTypeStat->getCount(), $infoTypeStat->getInfoType()->getName());                 }             }             break;         case JobState::FAILED:             printf('Job %s had errors:' . PHP_EOL, $job->getName());             $errors = $job->getErrors();             foreach ($errors as $error) {                 var_dump($error->getDetails());             }             break;         case JobState::PENDING:             print('Job has not completed. Consider a longer timeout or an asynchronous execution model' . PHP_EOL);             break;         default:             print('Unexpected job state.');     } }

C#

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

 using Google.Api.Gax.ResourceNames; using Google.Cloud.BigQuery.V2; using Google.Cloud.Dlp.V2; using Google.Protobuf.WellKnownTypes; using System; using System.Collections.Generic; using System.Threading; using static Google.Cloud.Dlp.V2.InspectConfig.Types;  public class InspectCloudDataStore {     public static object Inspect(         string projectId,         Likelihood minLikelihood,         int maxFindings,         bool includeQuote,         string kindName,         string namespaceId,         IEnumerable<InfoType> infoTypes,         IEnumerable<CustomInfoType> customInfoTypes,         string datasetId,         string tableId)     {         var inspectJob = new InspectJobConfig         {             StorageConfig = new StorageConfig             {                 DatastoreOptions = new DatastoreOptions                 {                     Kind = new KindExpression { Name = kindName },                     PartitionId = new PartitionId                     {                         NamespaceId = namespaceId,                         ProjectId = projectId,                     }                 },                 TimespanConfig = new StorageConfig.Types.TimespanConfig                 {                     StartTime = Timestamp.FromDateTime(System.DateTime.UtcNow.AddYears(-1)),                     EndTime = Timestamp.FromDateTime(System.DateTime.UtcNow)                 }             },              InspectConfig = new InspectConfig             {                 InfoTypes = { infoTypes },                 CustomInfoTypes = { customInfoTypes },                 Limits = new FindingLimits                 {                     MaxFindingsPerRequest = maxFindings                 },                 ExcludeInfoTypes = false,                 IncludeQuote = includeQuote,                 MinLikelihood = minLikelihood             },             Actions =                 {                     new Google.Cloud.Dlp.V2.Action                     {                         // Save results in BigQuery Table                         SaveFindings = new Google.Cloud.Dlp.V2.Action.Types.SaveFindings                         {                             OutputConfig = new OutputStorageConfig                             {                                 Table = new Google.Cloud.Dlp.V2.BigQueryTable                                 {                                     ProjectId = projectId,                                     DatasetId = datasetId,                                     TableId = tableId                                 }                             }                         },                     }                 }         };          // Issue Create Dlp Job Request         var client = DlpServiceClient.Create();         var request = new CreateDlpJobRequest         {             InspectJob = inspectJob,             Parent = new LocationName(projectId, "global").ToString(),         };          // We need created job name         var dlpJob = client.CreateDlpJob(request);         var jobName = dlpJob.Name;          // Make sure the job finishes before inspecting the results.         // Alternatively, we can inspect results opportunistically, but         // for testing purposes, we want consistent outcome         var finishedJob = EnsureJobFinishes(projectId, jobName);         var bigQueryClient = BigQueryClient.Create(projectId);         var table = bigQueryClient.GetTable(datasetId, tableId);          // Return only first page of 10 rows         Console.WriteLine("DLP v2 Results:");         var firstPage = table.ListRows(new ListRowsOptions { StartIndex = 0, PageSize = 10 });         foreach (var item in firstPage)         {             Console.WriteLine($"\t {item[""]}");         }          return finishedJob;     }      private static DlpJob EnsureJobFinishes(string projectId, string jobName)     {         var client = DlpServiceClient.Create();         var request = new GetDlpJobRequest         {             DlpJobName = new DlpJobName(projectId, jobName),         };          // Simple logic that gives the job 5*30 sec at most to complete - for testing purposes only         var numOfAttempts = 5;         do         {             var dlpJob = client.GetDlpJob(request);             numOfAttempts--;             if (dlpJob.State != DlpJob.Types.JobState.Running)             {                 return dlpJob;             }              Thread.Sleep(TimeSpan.FromSeconds(30));         } while (numOfAttempts > 0);          throw new InvalidOperationException("Job did not complete in time");     } }

Esamina una tabella BigQuery

Puoi configurare un'ispezione di una tabella BigQuery utilizzando Sensitive Data Protection tramite richieste REST o in modo programmatico in diverse lingue utilizzando una libreria client.

Per configurare un job di scansione di una tabella BigQuery utilizzando Sensitive Data Protection:

Console

Per configurare un job di scansione di una tabella BigQuery utilizzando Sensitive Data Protection:

Nella sezione Sensitive Data Protection della console Google Cloud , vai alla pagina Crea job o trigger di job.

Vai a Crea job o trigger di job
Inserisci le informazioni sul job Sensitive Data Protection e fai clic su Continua per completare ogni passaggio:
- Per il passaggio 1: scegli i dati di input, assegna un nome al job inserendo un valore nel campo Nome. In Posizione, scegli BigQuery dal menu Tipo di archiviazione, quindi inserisci le informazioni per la tabella da analizzare.
  
  La sezione Campionamento è preconfigurata per eseguire una scansione di esempio sui tuoi dati. Puoi modificare i campi Limita righe per e Numero massimo di righe per risparmiare risorse se hai una grande quantità di dati. Per maggiori dettagli, vedi Scegliere i dati di input.
- (Facoltativo) Se vuoi poter collegare ogni risultato alla riga che lo contiene, imposta il campo Campi identificativi.
  Inserisci i nomi delle colonne che identificano in modo univoco ogni riga all'interno della tabella. Se necessario, utilizza la notazione con il punto per specificare i campi nidificati. Puoi aggiungere tutti i campi che vuoi.
  
  Devi anche attivare l'azione Salva in BigQuery per esportare i risultati in BigQuery. Quando i risultati vengono esportati in BigQuery, ogni risultato contiene i rispettivi valori dei campi identificativi. Per ulteriori informazioni, vedi identifyingFields.
- (Facoltativo) Per il passaggio 2: configura il rilevamento, puoi configurare i tipi di dati da cercare, chiamati "infoTypes". Puoi scegliere dall'elenco di infoType predefiniti oppure selezionare un modello, se esistente. Per maggiori dettagli, vedi Configurare il rilevamento.
- (Facoltativo) Per il passaggio 3: aggiungi azioni, assicurati che l'opzione Notifica via email sia attivata.
  
  Attiva Salva in BigQuery per pubblicare i risultati di Sensitive Data Protection in una tabella BigQuery. Fornisci quanto segue:
  - In ID progetto, inserisci l'ID progetto in cui sono archiviati i risultati.
  - In ID set di dati, inserisci il nome del set di dati in cui sono archiviati i risultati.
  - (Facoltativo) In ID tabella, inserisci il nome della tabella che archivia i risultati. Se non viene specificato alcun ID tabella, viene assegnato un nome predefinito a una nuova tabella simile al seguente: dlp_googleapis_[DATE]_1234567890. Se specifichi una tabella esistente, i risultati vengono aggiunti.
  Quando i dati vengono scritti in una tabella BigQuery, la fatturazione e l'utilizzo delle quote vengono applicati al progetto che contiene la tabella di destinazione.
  
  Puoi anche salvare i risultati in Pub/Sub, Security Command Center e Data Catalog. Per maggiori dettagli, vedi Aggiungere azioni.
- (Facoltativo) Per il passaggio 4: pianifica, per eseguire la scansione una sola volta, lascia il menu impostato su Nessuno. Per programmare l'esecuzione periodica delle scansioni, fai clic su Crea un trigger per eseguire il job su base periodica. Per maggiori dettagli, vedi Pianificazione.
Fai clic su Crea.
Al termine del job Sensitive Data Protection, viene visualizzata la pagina dei dettagli del job e riceverai una notifica via email. Puoi visualizzare i risultati dell'ispezione nella pagina dei dettagli del job.
(Facoltativo) Se hai scelto di pubblicare i risultati di Sensitive Data Protection in BigQuery, nella pagina Dettagli job, fai clic su Visualizza risultati in BigQuery per aprire la tabella nell'interfaccia utente web di BigQuery. Puoi quindi eseguire query sulla tabella e analizzare i risultati. Per ulteriori informazioni sull'esecuzione di query sui risultati in BigQuery, consulta Esecuzione di query sui risultati di Sensitive Data Protection in BigQuery.

Protocollo

Di seguito è riportato un esempio di JSON che può essere inviato in una richiesta POST all'endpoint REST dell'API DLP specificato. Questo esempio JSON mostra come utilizzare l'API DLP per ispezionare le tabelle BigQuery. Per informazioni sui parametri inclusi nella richiesta, vedi "Configurare l'ispezione dell'archiviazione", più avanti in questo argomento.

Puoi provare rapidamente questa funzionalità in Explorer API nella pagina di riferimento per dlpJobs.create:

Vai a Explorer API

Input JSON:

POST https://dlp.googleapis.com/v2/projects/[PROJECT-ID]/dlpJobs?key={YOUR_API_KEY}  {   "inspectJob":{     "storageConfig":{       "bigQueryOptions":{         "tableReference":{           "projectId":"[PROJECT-ID]",           "datasetId":"[BIGQUERY-DATASET-NAME]",           "tableId":"[BIGQUERY-TABLE-NAME]"         },         "identifyingFields":[           {             "name":"id"           }         ]       },       "timespanConfig":{         "startTime":"2017-11-13T12:34:29.965633345Z ",         "endTime":"2018-01-05T04:45:04.240912125Z "       }     },     "inspectConfig":{       "infoTypes":[         {           "name":"PHONE_NUMBER"         }       ],       "excludeInfoTypes":false,       "includeQuote":true,       "minLikelihood":"LIKELY"     },     "actions":[       {         "saveFindings":{           "outputConfig":{             "table":{               "projectId":"[PROJECT-ID]",               "datasetId":"[BIGQUERY-DATASET-NAME]",               "tableId":"[BIGQUERY-TABLE-NAME]"             },             "outputSchema": "BASIC_COLUMNS"           }         }       }     ]   } }

Java

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

 import com.google.api.core.SettableApiFuture; import com.google.cloud.dlp.v2.DlpServiceClient; import com.google.cloud.pubsub.v1.AckReplyConsumer; import com.google.cloud.pubsub.v1.MessageReceiver; import com.google.cloud.pubsub.v1.Subscriber; import com.google.privacy.dlp.v2.Action; import com.google.privacy.dlp.v2.BigQueryOptions; import com.google.privacy.dlp.v2.BigQueryTable; import com.google.privacy.dlp.v2.CreateDlpJobRequest; import com.google.privacy.dlp.v2.DlpJob; import com.google.privacy.dlp.v2.GetDlpJobRequest; import com.google.privacy.dlp.v2.InfoType; import com.google.privacy.dlp.v2.InfoTypeStats; import com.google.privacy.dlp.v2.InspectConfig; import com.google.privacy.dlp.v2.InspectDataSourceDetails; import com.google.privacy.dlp.v2.InspectJobConfig; import com.google.privacy.dlp.v2.LocationName; import com.google.privacy.dlp.v2.StorageConfig; import com.google.pubsub.v1.ProjectSubscriptionName; import com.google.pubsub.v1.PubsubMessage; import java.io.IOException; import java.util.List; import java.util.concurrent.ExecutionException; import java.util.concurrent.TimeUnit; import java.util.concurrent.TimeoutException; import java.util.stream.Collectors; import java.util.stream.Stream;  public class InspectBigQueryTable {    public static void main(String[] args) throws Exception {     // TODO(developer): Replace these variables before running the sample.     String projectId = "your-project-id";     String bigQueryDatasetId = "your-bigquery-dataset-id";     String bigQueryTableId = "your-bigquery-table-id";     String topicId = "your-pubsub-topic-id";     String subscriptionId = "your-pubsub-subscription-id";     inspectBigQueryTable(projectId, bigQueryDatasetId, bigQueryTableId, topicId, subscriptionId);   }    // Inspects a BigQuery Table   public static void inspectBigQueryTable(       String projectId,       String bigQueryDatasetId,       String bigQueryTableId,       String topicId,       String subscriptionId)       throws ExecutionException, InterruptedException, IOException {     // Initialize client that will be used to send requests. This client only needs to be created     // once, and can be reused for multiple requests. After completing all of your requests, call     // the "close" method on the client to safely clean up any remaining background resources.     try (DlpServiceClient dlp = DlpServiceClient.create()) {       // Specify the BigQuery table to be inspected.       BigQueryTable tableReference =           BigQueryTable.newBuilder()               .setProjectId(projectId)               .setDatasetId(bigQueryDatasetId)               .setTableId(bigQueryTableId)               .build();        BigQueryOptions bigQueryOptions =           BigQueryOptions.newBuilder().setTableReference(tableReference).build();        StorageConfig storageConfig =           StorageConfig.newBuilder().setBigQueryOptions(bigQueryOptions).build();        // Specify the type of info the inspection will look for.       // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types       List<InfoType> infoTypes =           Stream.of("PHONE_NUMBER", "EMAIL_ADDRESS", "CREDIT_CARD_NUMBER")               .map(it -> InfoType.newBuilder().setName(it).build())               .collect(Collectors.toList());        // Specify how the content should be inspected.       InspectConfig inspectConfig =           InspectConfig.newBuilder().addAllInfoTypes(infoTypes).setIncludeQuote(true).build();        // Specify the action that is triggered when the job completes.       String pubSubTopic = String.format("projects/%s/topics/%s", projectId, topicId);       Action.PublishToPubSub publishToPubSub =           Action.PublishToPubSub.newBuilder().setTopic(pubSubTopic).build();       Action action = Action.newBuilder().setPubSub(publishToPubSub).build();        // Configure the long running job we want the service to perform.       InspectJobConfig inspectJobConfig =           InspectJobConfig.newBuilder()               .setStorageConfig(storageConfig)               .setInspectConfig(inspectConfig)               .addActions(action)               .build();        // Create the request for the job configured above.       CreateDlpJobRequest createDlpJobRequest =           CreateDlpJobRequest.newBuilder()               .setParent(LocationName.of(projectId, "global").toString())               .setInspectJob(inspectJobConfig)               .build();        // Use the client to send the request.       final DlpJob dlpJob = dlp.createDlpJob(createDlpJobRequest);       System.out.println("Job created: " + dlpJob.getName());        // Set up a Pub/Sub subscriber to listen on the job completion status       final SettableApiFuture<Boolean> done = SettableApiFuture.create();        ProjectSubscriptionName subscriptionName =           ProjectSubscriptionName.of(projectId, subscriptionId);        MessageReceiver messageHandler =           (PubsubMessage pubsubMessage, AckReplyConsumer ackReplyConsumer) -> {             handleMessage(dlpJob, done, pubsubMessage, ackReplyConsumer);           };       Subscriber subscriber = Subscriber.newBuilder(subscriptionName, messageHandler).build();       subscriber.startAsync();        // Wait for job completion semi-synchronously       // For long jobs, consider using a truly asynchronous execution model such as Cloud Functions       try {         done.get(15, TimeUnit.MINUTES);       } catch (TimeoutException e) {         System.out.println("Job was not completed after 15 minutes.");         return;       } finally {         subscriber.stopAsync();         subscriber.awaitTerminated();       }        // Get the latest state of the job from the service       GetDlpJobRequest request = GetDlpJobRequest.newBuilder().setName(dlpJob.getName()).build();       DlpJob completedJob = dlp.getDlpJob(request);        // Parse the response and process results.       System.out.println("Job status: " + completedJob.getState());       System.out.println("Job name: " + dlpJob.getName());       InspectDataSourceDetails.Result result = completedJob.getInspectDetails().getResult();       System.out.println("Findings: ");       for (InfoTypeStats infoTypeStat : result.getInfoTypeStatsList()) {         System.out.print("\tInfo type: " + infoTypeStat.getInfoType().getName());         System.out.println("\tCount: " + infoTypeStat.getCount());       }     }   }    // handleMessage injects the job and settableFuture into the message reciever interface   private static void handleMessage(       DlpJob job,       SettableApiFuture<Boolean> done,       PubsubMessage pubsubMessage,       AckReplyConsumer ackReplyConsumer) {     String messageAttribute = pubsubMessage.getAttributesMap().get("DlpJobName");     if (job.getName().equals(messageAttribute)) {       done.set(true);       ackReplyConsumer.ack();     } else {       ackReplyConsumer.nack();     }   } }

Node.js

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

// Import the Google Cloud client libraries const DLP = require('@google-cloud/dlp'); const {PubSub} = require('@google-cloud/pubsub');  // Instantiates clients const dlp = new DLP.DlpServiceClient(); const pubsub = new PubSub();  // The project ID to run the API call under // const projectId = 'my-project';  // The project ID the table is stored under // This may or (for public datasets) may not equal the calling project ID // const dataProjectId = 'my-project';  // The ID of the dataset to inspect, e.g. 'my_dataset' // const datasetId = 'my_dataset';  // The ID of the table to inspect, e.g. 'my_table' // const tableId = 'my_table';  // The minimum likelihood required before returning a match // const minLikelihood = 'LIKELIHOOD_UNSPECIFIED';  // The maximum number of findings to report per request (0 = server maximum) // const maxFindings = 0;  // The infoTypes of information to match // const infoTypes = [{ name: 'PHONE_NUMBER' }, { name: 'EMAIL_ADDRESS' }, { name: 'CREDIT_CARD_NUMBER' }];  // The customInfoTypes of information to match // const customInfoTypes = [{ infoType: { name: 'DICT_TYPE' }, dictionary: { wordList: { words: ['foo', 'bar', 'baz']}}}, //   { infoType: { name: 'REGEX_TYPE' }, regex: {pattern: '\\(\\d{3}\\) \\d{3}-\\d{4}'}}];  // The name of the Pub/Sub topic to notify once the job completes // TODO(developer): create a Pub/Sub topic to use for this // const topicId = 'MY-PUBSUB-TOPIC'  // The name of the Pub/Sub subscription to use when listening for job // completion notifications // TODO(developer): create a Pub/Sub subscription to use for this // const subscriptionId = 'MY-PUBSUB-SUBSCRIPTION'  async function inspectBigquery() {   // Construct item to be inspected   const storageItem = {     bigQueryOptions: {       tableReference: {         projectId: dataProjectId,         datasetId: datasetId,         tableId: tableId,       },     },   };    // Construct request for creating an inspect job   const request = {     parent: `projects/${projectId}/locations/global`,     inspectJob: {       inspectConfig: {         infoTypes: infoTypes,         customInfoTypes: customInfoTypes,         minLikelihood: minLikelihood,         limits: {           maxFindingsPerRequest: maxFindings,         },       },       storageConfig: storageItem,       actions: [         {           pubSub: {             topic: `projects/${projectId}/topics/${topicId}`,           },         },       ],     },   };    // Run inspect-job creation request   const [topicResponse] = await pubsub.topic(topicId).get();   // Verify the Pub/Sub topic and listen for job notifications via an   // existing subscription.   const subscription = await topicResponse.subscription(subscriptionId);   const [jobsResponse] = await dlp.createDlpJob(request);   const jobName = jobsResponse.name;   // Watch the Pub/Sub topic until the DLP job finishes   await new Promise((resolve, reject) => {     const messageHandler = message => {       if (message.attributes && message.attributes.DlpJobName === jobName) {         message.ack();         subscription.removeListener('message', messageHandler);         subscription.removeListener('error', errorHandler);         resolve(jobName);       } else {         message.nack();       }     };      const errorHandler = err => {       subscription.removeListener('message', messageHandler);       subscription.removeListener('error', errorHandler);       reject(err);     };      subscription.on('message', messageHandler);     subscription.on('error', errorHandler);   });   // Wait for DLP job to fully complete   setTimeout(() => {     console.log('Waiting for DLP job to fully complete');   }, 500);   const [job] = await dlp.getDlpJob({name: jobName});   console.log(`Job ${job.name} status: ${job.state}`);    const infoTypeStats = job.inspectDetails.result.infoTypeStats;   if (infoTypeStats.length > 0) {     infoTypeStats.forEach(infoTypeStat => {       console.log(         `  Found ${infoTypeStat.count} instance(s) of infoType ${infoTypeStat.infoType.name}.`       );     });   } else {     console.log('No findings.');   } }  await inspectBigquery();

Python

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

import threading from typing import List, Optional  import google.cloud.dlp import google.cloud.pubsub   def inspect_bigquery(     project: str,     bigquery_project: str,     dataset_id: str,     table_id: str,     topic_id: str,     subscription_id: str,     info_types: List[str],     custom_dictionaries: List[str] = None,     custom_regexes: List[str] = None,     min_likelihood: Optional[int] = None,     max_findings: Optional[int] = None,     timeout: int = 500, ) -> None:     """Uses the Data Loss Prevention API to analyze BigQuery data.     Args:         project: The Google Cloud project id to use as a parent resource.         bigquery_project: The Google Cloud project id of the target table.         dataset_id: The id of the target BigQuery dataset.         table_id: The id of the target BigQuery table.         topic_id: The id of the Cloud Pub/Sub topic to which the API will             broadcast job completion. The topic must already exist.         subscription_id: The id of the Cloud Pub/Sub subscription to listen on             while waiting for job completion. The subscription must already             exist and be subscribed to the topic.         info_types: A list of strings representing info types to look for.             A full list of info type categories can be fetched from the API.         min_likelihood: A string representing the minimum likelihood threshold             that constitutes a match. One of: 'LIKELIHOOD_UNSPECIFIED',             'VERY_UNLIKELY', 'UNLIKELY', 'POSSIBLE', 'LIKELY', 'VERY_LIKELY'.         max_findings: The maximum number of findings to report; 0 = no maximum.         timeout: The number of seconds to wait for a response from the API.     Returns:         None; the response from the API is printed to the terminal.     """      # Instantiate a client.     dlp = google.cloud.dlp_v2.DlpServiceClient()      # Prepare info_types by converting the list of strings into a list of     # dictionaries (protos are also accepted).     if not info_types:         info_types = ["FIRST_NAME", "LAST_NAME", "EMAIL_ADDRESS"]     info_types = [{"name": info_type} for info_type in info_types]      # Prepare custom_info_types by parsing the dictionary word lists and     # regex patterns.     if custom_dictionaries is None:         custom_dictionaries = []     dictionaries = [         {             "info_type": {"name": f"CUSTOM_DICTIONARY_{i}"},             "dictionary": {"word_list": {"words": custom_dict.split(",")}},         }         for i, custom_dict in enumerate(custom_dictionaries)     ]     if custom_regexes is None:         custom_regexes = []     regexes = [         {             "info_type": {"name": f"CUSTOM_REGEX_{i}"},             "regex": {"pattern": custom_regex},         }         for i, custom_regex in enumerate(custom_regexes)     ]     custom_info_types = dictionaries + regexes      # Construct the configuration dictionary. Keys which are None may     # optionally be omitted entirely.     inspect_config = {         "info_types": info_types,         "custom_info_types": custom_info_types,         "min_likelihood": min_likelihood,         "limits": {"max_findings_per_request": max_findings},     }      # Construct a storage_config containing the target Bigquery info.     storage_config = {         "big_query_options": {             "table_reference": {                 "project_id": bigquery_project,                 "dataset_id": dataset_id,                 "table_id": table_id,             }         }     }      # Convert the project id into full resource ids.     topic = google.cloud.pubsub.PublisherClient.topic_path(project, topic_id)     parent = f"projects/{project}/locations/global"      # Tell the API where to send a notification when the job is complete.     actions = [{"pub_sub": {"topic": topic}}]      # Construct the inspect_job, which defines the entire inspect content task.     inspect_job = {         "inspect_config": inspect_config,         "storage_config": storage_config,         "actions": actions,     }      operation = dlp.create_dlp_job(         request={"parent": parent, "inspect_job": inspect_job}     )     print(f"Inspection operation started: {operation.name}")      # Create a Pub/Sub client and find the subscription. The subscription is     # expected to already be listening to the topic.     subscriber = google.cloud.pubsub.SubscriberClient()     subscription_path = subscriber.subscription_path(project, subscription_id)      # Set up a callback to acknowledge a message. This closes around an event     # so that it can signal that it is done and the main thread can continue.     job_done = threading.Event()      def callback(message: google.cloud.pubsub_v1.subscriber.message.Message) -> None:         try:             if message.attributes["DlpJobName"] == operation.name:                 # This is the message we're looking for, so acknowledge it.                 message.ack()                  # Now that the job is done, fetch the results and print them.                 job = dlp.get_dlp_job(request={"name": operation.name})                 print(f"Job name: {job.name}")                 if job.inspect_details.result.info_type_stats:                     for finding in job.inspect_details.result.info_type_stats:                         print(                             "Info type: {}; Count: {}".format(                                 finding.info_type.name, finding.count                             )                         )                 else:                     print("No findings.")                  # Signal to the main thread that we can exit.                 job_done.set()             else:                 # This is not the message we're looking for.                 message.drop()         except Exception as e:             # Because this is executing in a thread, an exception won't be             # noted unless we print it manually.             print(e)             raise      # Register the callback and wait on the event.     subscriber.subscribe(subscription_path, callback=callback)     finished = job_done.wait(timeout=timeout)     if not finished:         print(             "No event received before the timeout. Please verify that the "             "subscription provided is subscribed to the topic provided."         )

Go

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

import ( 	"context" 	"fmt" 	"io" 	"strings" 	"time"  	dlp "cloud.google.com/go/dlp/apiv2" 	"cloud.google.com/go/dlp/apiv2/dlppb" 	"cloud.google.com/go/pubsub" )  // inspectBigquery searches for the given info types in the given Bigquery dataset table. func inspectBigquery(w io.Writer, projectID string, infoTypeNames []string, customDictionaries []string, customRegexes []string, pubSubTopic, pubSubSub, dataProject, datasetID, tableID string) error { 	// projectID := "my-project-id" 	// infoTypeNames := []string{"US_SOCIAL_SECURITY_NUMBER"} 	// customDictionaries := []string{...} 	// customRegexes := []string{...} 	// pubSubTopic := "dlp-risk-sample-topic" 	// pubSubSub := "dlp-risk-sample-sub" 	// dataProject := "my-data-project-ID" 	// datasetID := "my_dataset" 	// tableID := "mytable"  	ctx := context.Background()  	client, err := dlp.NewClient(ctx) 	if err != nil { 		return fmt.Errorf("dlp.NewClient: %w", err) 	}  	// Convert the info type strings to a list of InfoTypes. 	var infoTypes []*dlppb.InfoType 	for _, it := range infoTypeNames { 		infoTypes = append(infoTypes, &dlppb.InfoType{Name: it}) 	} 	// Convert the custom dictionary word lists and custom regexes to a list of CustomInfoTypes. 	var customInfoTypes []*dlppb.CustomInfoType 	for idx, it := range customDictionaries { 		customInfoTypes = append(customInfoTypes, &dlppb.CustomInfoType{ 			InfoType: &dlppb.InfoType{ 				Name: fmt.Sprintf("CUSTOM_DICTIONARY_%d", idx), 			}, 			Type: &dlppb.CustomInfoType_Dictionary_{ 				Dictionary: &dlppb.CustomInfoType_Dictionary{ 					Source: &dlppb.CustomInfoType_Dictionary_WordList_{ 						WordList: &dlppb.CustomInfoType_Dictionary_WordList{ 							Words: strings.Split(it, ","), 						}, 					}, 				}, 			}, 		}) 	} 	for idx, it := range customRegexes { 		customInfoTypes = append(customInfoTypes, &dlppb.CustomInfoType{ 			InfoType: &dlppb.InfoType{ 				Name: fmt.Sprintf("CUSTOM_REGEX_%d", idx), 			}, 			Type: &dlppb.CustomInfoType_Regex_{ 				Regex: &dlppb.CustomInfoType_Regex{ 					Pattern: it, 				}, 			}, 		}) 	}  	// Create a PubSub Client used to listen for when the inspect job finishes. 	pubsubClient, err := pubsub.NewClient(ctx, projectID) 	if err != nil { 		return fmt.Errorf("pubsub.NewClient: %w", err) 	} 	defer pubsubClient.Close()  	// Create a PubSub subscription we can use to listen for messages. 	// Create the Topic if it doesn't exist. 	t := pubsubClient.Topic(pubSubTopic) 	if exists, err := t.Exists(ctx); err != nil { 		return fmt.Errorf("t.Exists: %w", err) 	} else if !exists { 		if t, err = pubsubClient.CreateTopic(ctx, pubSubTopic); err != nil { 			return fmt.Errorf("CreateTopic: %w", err) 		} 	}  	// Create the Subscription if it doesn't exist. 	s := pubsubClient.Subscription(pubSubSub) 	if exists, err := s.Exists(ctx); err != nil { 		return fmt.Errorf("s.Exits: %w", err) 	} else if !exists { 		if s, err = pubsubClient.CreateSubscription(ctx, pubSubSub, pubsub.SubscriptionConfig{Topic: t}); err != nil { 			return fmt.Errorf("CreateSubscription: %w", err) 		} 	}  	// topic is the PubSub topic string where messages should be sent. 	topic := "projects/" + projectID + "/topics/" + pubSubTopic  	// Create a configured request. 	req := &dlppb.CreateDlpJobRequest{ 		Parent: fmt.Sprintf("projects/%s/locations/global", projectID), 		Job: &dlppb.CreateDlpJobRequest_InspectJob{ 			InspectJob: &dlppb.InspectJobConfig{ 				// StorageConfig describes where to find the data. 				StorageConfig: &dlppb.StorageConfig{ 					Type: &dlppb.StorageConfig_BigQueryOptions{ 						BigQueryOptions: &dlppb.BigQueryOptions{ 							TableReference: &dlppb.BigQueryTable{ 								ProjectId: dataProject, 								DatasetId: datasetID, 								TableId:   tableID, 							}, 						}, 					}, 				}, 				// InspectConfig describes what fields to look for. 				InspectConfig: &dlppb.InspectConfig{ 					InfoTypes:       infoTypes, 					CustomInfoTypes: customInfoTypes, 					MinLikelihood:   dlppb.Likelihood_POSSIBLE, 					Limits: &dlppb.InspectConfig_FindingLimits{ 						MaxFindingsPerRequest: 10, 					}, 					IncludeQuote: true, 				}, 				// Send a message to PubSub using Actions. 				Actions: []*dlppb.Action{ 					{ 						Action: &dlppb.Action_PubSub{ 							PubSub: &dlppb.Action_PublishToPubSub{ 								Topic: topic, 							}, 						}, 					}, 				}, 			}, 		}, 	} 	// Create the inspect job. 	j, err := client.CreateDlpJob(ctx, req) 	if err != nil { 		return fmt.Errorf("CreateDlpJob: %w", err) 	} 	fmt.Fprintf(w, "Created job: %v\n", j.GetName())  	// Wait for the inspect job to finish by waiting for a PubSub message. 	// This only waits for 10 minutes. For long jobs, consider using a truly 	// asynchronous execution model such as Cloud Functions. 	ctx, cancel := context.WithTimeout(ctx, 10*time.Minute) 	defer cancel() 	err = s.Receive(ctx, func(ctx context.Context, msg *pubsub.Message) { 		// If this is the wrong job, do not process the result. 		if msg.Attributes["DlpJobName"] != j.GetName() { 			msg.Nack() 			return 		} 		msg.Ack()  		// Stop listening for more messages. 		defer cancel()  		resp, err := client.GetDlpJob(ctx, &dlppb.GetDlpJobRequest{ 			Name: j.GetName(), 		}) 		if err != nil { 			fmt.Fprintf(w, "Error getting completed job: %v\n", err) 			return 		} 		r := resp.GetInspectDetails().GetResult().GetInfoTypeStats() 		if len(r) == 0 { 			fmt.Fprintf(w, "No results") 			return 		} 		for _, s := range r { 			fmt.Fprintf(w, "  Found %v instances of infoType %v\n", s.GetCount(), s.GetInfoType().GetName()) 		} 	}) 	if err != nil { 		return fmt.Errorf("Receive: %w", err) 	} 	return nil }

PHP

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

use Google\Cloud\Dlp\V2\Action; use Google\Cloud\Dlp\V2\Action\PublishToPubSub; use Google\Cloud\Dlp\V2\BigQueryOptions; use Google\Cloud\Dlp\V2\BigQueryTable; use Google\Cloud\Dlp\V2\Client\DlpServiceClient; use Google\Cloud\Dlp\V2\CreateDlpJobRequest; use Google\Cloud\Dlp\V2\DlpJob\JobState; use Google\Cloud\Dlp\V2\GetDlpJobRequest; use Google\Cloud\Dlp\V2\InfoType; use Google\Cloud\Dlp\V2\InspectConfig; use Google\Cloud\Dlp\V2\InspectConfig\FindingLimits; use Google\Cloud\Dlp\V2\InspectJobConfig; use Google\Cloud\Dlp\V2\Likelihood; use Google\Cloud\Dlp\V2\StorageConfig; use Google\Cloud\PubSub\PubSubClient;  /**  * Inspect a BigQuery table , using Pub/Sub for job status notifications.  *  * @param string $callingProjectId  The project ID to run the API call under  * @param string $dataProjectId     The project ID containing the target Datastore  * @param string $topicId           The name of the Pub/Sub topic to notify once the job completes  * @param string $subscriptionId    The name of the Pub/Sub subscription to use when listening for job  * @param string $datasetId         The ID of the dataset to inspect  * @param string $tableId           The ID of the table to inspect  * @param int    $maxFindings       (Optional) The maximum number of findings to report per request (0 = server maximum)  */ function inspect_bigquery(     string $callingProjectId,     string $dataProjectId,     string $topicId,     string $subscriptionId,     string $datasetId,     string $tableId,     int $maxFindings = 0 ): void {     // Instantiate a client.     $dlp = new DlpServiceClient();     $pubsub = new PubSubClient();     $topic = $pubsub->topic($topicId);      // The infoTypes of information to match     $personNameInfoType = (new InfoType())         ->setName('PERSON_NAME');     $creditCardNumberInfoType = (new InfoType())         ->setName('CREDIT_CARD_NUMBER');     $infoTypes = [$personNameInfoType, $creditCardNumberInfoType];      // The minimum likelihood required before returning a match     $minLikelihood = likelihood::LIKELIHOOD_UNSPECIFIED;      // Specify finding limits     $limits = (new FindingLimits())         ->setMaxFindingsPerRequest($maxFindings);      // Construct items to be inspected     $bigqueryTable = (new BigQueryTable())         ->setProjectId($dataProjectId)         ->setDatasetId($datasetId)         ->setTableId($tableId);      $bigQueryOptions = (new BigQueryOptions())         ->setTableReference($bigqueryTable);      $storageConfig = (new StorageConfig())         ->setBigQueryOptions($bigQueryOptions);      // Construct the inspect config object     $inspectConfig = (new InspectConfig())         ->setMinLikelihood($minLikelihood)         ->setLimits($limits)         ->setInfoTypes($infoTypes);      // Construct the action to run when job completes     $pubSubAction = (new PublishToPubSub())         ->setTopic($topic->name());      $action = (new Action())         ->setPubSub($pubSubAction);      // Construct inspect job config to run     $inspectJob = (new InspectJobConfig())         ->setInspectConfig($inspectConfig)         ->setStorageConfig($storageConfig)         ->setActions([$action]);      // Listen for job notifications via an existing topic/subscription.     $subscription = $topic->subscription($subscriptionId);      // Submit request     $parent = "projects/$callingProjectId/locations/global";     $createDlpJobRequest = (new CreateDlpJobRequest())         ->setParent($parent)         ->setInspectJob($inspectJob);     $job = $dlp->createDlpJob($createDlpJobRequest);      // Poll Pub/Sub using exponential backoff until job finishes     // Consider using an asynchronous execution model such as Cloud Functions     $attempt = 1;     $startTime = time();     do {         foreach ($subscription->pull() as $message) {             if (isset($message->attributes()['DlpJobName']) &&                 $message->attributes()['DlpJobName'] === $job->getName()) {                 $subscription->acknowledge($message);                 // Get the updated job. Loop to avoid race condition with DLP API.                 do {                     $getDlpJobRequest = (new GetDlpJobRequest())                         ->setName($job->getName());                     $job = $dlp->getDlpJob($getDlpJobRequest);                 } while ($job->getState() == JobState::RUNNING);                 break 2; // break from parent do while             }         }         print('Waiting for job to complete' . PHP_EOL);         // Exponential backoff with max delay of 60 seconds         sleep(min(60, pow(2, ++$attempt)));     } while (time() - $startTime < 600); // 10 minute timeout      // Print finding counts     printf('Job %s status: %s' . PHP_EOL, $job->getName(), JobState::name($job->getState()));     switch ($job->getState()) {         case JobState::DONE:             $infoTypeStats = $job->getInspectDetails()->getResult()->getInfoTypeStats();             if (count($infoTypeStats) === 0) {                 print('No findings.' . PHP_EOL);             } else {                 foreach ($infoTypeStats as $infoTypeStat) {                     printf(                         '  Found %s instance(s) of infoType %s' . PHP_EOL,                         $infoTypeStat->getCount(),                         $infoTypeStat->getInfoType()->getName()                     );                 }             }             break;         case JobState::FAILED:             printf('Job %s had errors:' . PHP_EOL, $job->getName());             $errors = $job->getErrors();             foreach ($errors as $error) {                 var_dump($error->getDetails());             }             break;         case JobState::PENDING:             print('Job has not completed. Consider a longer timeout or an asynchronous execution model' . PHP_EOL);             break;         default:             print('Unexpected job state. Most likely, the job is either running or has not yet started.');     } }

C#

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

 using Google.Api.Gax.ResourceNames; using Google.Cloud.BigQuery.V2; using Google.Cloud.Dlp.V2; using Google.Protobuf.WellKnownTypes; using System; using System.Collections.Generic; using System.Threading; using static Google.Cloud.Dlp.V2.InspectConfig.Types;  public class InspectBigQuery {     public static object Inspect(         string projectId,         Likelihood minLikelihood,         int maxFindings,         bool includeQuote,         IEnumerable<FieldId> identifyingFields,         IEnumerable<InfoType> infoTypes,         IEnumerable<CustomInfoType> customInfoTypes,         string datasetId,         string tableId)     {         var inspectJob = new InspectJobConfig         {             StorageConfig = new StorageConfig             {                 BigQueryOptions = new BigQueryOptions                 {                     TableReference = new Google.Cloud.Dlp.V2.BigQueryTable                     {                         ProjectId = projectId,                         DatasetId = datasetId,                         TableId = tableId,                     },                     IdentifyingFields =                         {                             identifyingFields                         }                 },                  TimespanConfig = new StorageConfig.Types.TimespanConfig                 {                     StartTime = Timestamp.FromDateTime(System.DateTime.UtcNow.AddYears(-1)),                     EndTime = Timestamp.FromDateTime(System.DateTime.UtcNow)                 }             },              InspectConfig = new InspectConfig             {                 InfoTypes = { infoTypes },                 CustomInfoTypes = { customInfoTypes },                 Limits = new FindingLimits                 {                     MaxFindingsPerRequest = maxFindings                 },                 ExcludeInfoTypes = false,                 IncludeQuote = includeQuote,                 MinLikelihood = minLikelihood             },             Actions =                 {                     new Google.Cloud.Dlp.V2.Action                     {                         // Save results in BigQuery Table                         SaveFindings = new Google.Cloud.Dlp.V2.Action.Types.SaveFindings                         {                             OutputConfig = new OutputStorageConfig                             {                                 Table = new Google.Cloud.Dlp.V2.BigQueryTable                                 {                                     ProjectId = projectId,                                     DatasetId = datasetId,                                     TableId = tableId                                 }                             }                         },                     }                 }         };          // Issue Create Dlp Job Request         var client = DlpServiceClient.Create();         var request = new CreateDlpJobRequest         {             InspectJob = inspectJob,             Parent = new LocationName(projectId, "global").ToString(),         };          // We need created job name         var dlpJob = client.CreateDlpJob(request);         var jobName = dlpJob.Name;          // Make sure the job finishes before inspecting the results.         // Alternatively, we can inspect results opportunistically, but         // for testing purposes, we want consistent outcome         var finishedJob = EnsureJobFinishes(projectId, jobName);         var bigQueryClient = BigQueryClient.Create(projectId);         var table = bigQueryClient.GetTable(datasetId, tableId);          // Return only first page of 10 rows         Console.WriteLine("DLP v2 Results:");         var firstPage = table.ListRows(new ListRowsOptions { StartIndex = 0, PageSize = 10 });         foreach (var item in firstPage)         {             Console.WriteLine($"\t {item[""]}");         }          return finishedJob;     }      private static DlpJob EnsureJobFinishes(string projectId, string jobName)     {         var client = DlpServiceClient.Create();         var request = new GetDlpJobRequest         {             DlpJobName = new DlpJobName(projectId, jobName),         };          // Simple logic that gives the job 5*30 sec at most to complete - for testing purposes only         var numOfAttempts = 5;         do         {             var dlpJob = client.GetDlpJob(request);             numOfAttempts--;             if (dlpJob.State != DlpJob.Types.JobState.Running)             {                 return dlpJob;             }              Thread.Sleep(TimeSpan.FromSeconds(30));         } while (numOfAttempts > 0);          throw new InvalidOperationException("Job did not complete in time");     } }

Configurare l'ispezione dello spazio di archiviazione

Per esaminare una posizione Cloud Storage, un tipo Datastore o una tabella BigQuery, invia una richiesta al metodo projects.dlpJobs.create dell'API DLP che contiene almeno la posizione dei dati da analizzare e cosa cercare. Oltre a questi parametri obbligatori, puoi anche specificare dove scrivere i risultati della scansione, le soglie di dimensioni e probabilità e altro ancora. Una richiesta riuscita comporta la creazione di un'istanza dell'oggetto DlpJob, che viene trattata in "Recuperare i risultati dell'ispezione".

Le opzioni di configurazione disponibili sono riepilogate di seguito:

InspectJobConfig object: Contiene le informazioni di configurazione per il job di ispezione. Tieni presente che l'oggetto InspectJobConfig viene utilizzato anche dall'oggetto JobTriggers per pianificare la creazione di DlpJob. Questo oggetto include:
- StorageConfig oggetto: obbligatorio. Contiene i dettagli del repository di archiviazione da scansionare:
  - A seconda del tipo di repository di archiviazione analizzato, deve essere incluso uno dei seguenti elementi nell'oggetto StorageConfig:
  - CloudStorageOptions object: contiene informazioni sul bucket Cloud Storage da analizzare.
  - DatastoreOptions object: Contains information about the Datastore data set to scan.
  - BigQueryOptions object: contiene informazioni sulla tabella BigQuery (e, facoltativamente, sui campi identificativi) da analizzare. Questo oggetto consente anche il campionamento dei risultati. Per saperne di più, vedi Attivare il campionamento dei risultati di seguito.
  - TimespanConfig object: facoltativo. Specifica l'intervallo di tempo degli elementi da includere nella scansione.
- InspectConfig oggetto: obbligatorio. Specifica cosa cercare, ad esempio i valori di infoTypes e probabilità.
  - Oggetti InfoType: Obbligatorio. Uno o più valori infoType da scansionare.
  - Likelihood enumerazione: facoltativo. Se impostato, Sensitive Data Protection restituirà solo i risultati uguali o superiori a questa soglia di probabilità. Se questo enum viene omesso, il valore predefinito è POSSIBLE.
  - FindingLimits object: facoltativo. Se impostato, questo oggetto consente di specificare un limite per il numero di risultati restituiti.
  - Parametro includeQuote: facoltativo. Il valore predefinito è false. Se impostato su true, ogni risultato includerà una citazione contestuale dei dati che lo hanno attivato.
  - Parametro excludeInfoTypes: facoltativo. Il valore predefinito è false. Se impostato su true, i risultati della scansione escluderanno le informazioni sul tipo per i risultati.
  - CustomInfoType objects: uno o più infoType personalizzati creati dall'utente. Per saperne di più sulla creazione di infoType personalizzati, consulta Creazione di rilevatori di infoType personalizzati.
- Stringa inspectTemplateName: facoltativa. Specifica un modello da utilizzare per compilare i valori predefiniti nell'oggetto InspectConfig. Se hai già specificato InspectConfig, i valori del modello verranno unificati.
- Action objects: facoltativo. Una o più azioni da eseguire al completamento del job. Ogni azione viene eseguita nell'ordine in cui è elencata. Qui specifichi dove scrivere i risultati o se pubblicare una notifica in un argomento Pub/Sub.
jobId: (Facoltativo) Un identificatore per il job restituito da Sensitive Data Protection. Se jobId viene omesso o è vuoto, il sistema crea un ID per il job. Se specificato, al job viene assegnato questo valore ID. L'ID job deve essere univoco e può contenere lettere maiuscole e minuscole, numeri e trattini, ovvero deve corrispondere alla seguente espressione regolare: [a-zA-Z\\d-]+.

Limitare la quantità di contenuti esaminati

Se esegui la scansione di tabelle BigQuery o bucket Cloud Storage, Sensitive Data Protection include un modo per eseguire la scansione di un sottoinsieme del set di dati. In questo modo viene fornito un campionamento dei risultati della scansione senza incorrere nei potenziali costi della scansione di un intero set di dati.

Le sezioni seguenti contengono informazioni sulla limitazione delle dimensioni delle scansioni di Cloud Storage e delle scansioni di BigQuery.

Limitare le scansioni di Cloud Storage

Puoi attivare il campionamento in Cloud Storage limitando la quantità di dati scansionati. Puoi indicare all'API DLP di eseguire la scansione solo dei file di una determinata dimensione, solo di determinati tipi di file e solo di una determinata percentuale del numero totale di file nel set di file di input. Per farlo, specifica i seguenti campi facoltativi all'interno di CloudStorageOptions:

bytesLimitPerFile: imposta il numero massimo di byte da analizzare in un file. Se le dimensioni di un file scansionato sono maggiori di questo valore, i byte rimanenti vengono omessi. L'impostazione di questo campo non ha effetto su determinati tipi di file. Per maggiori informazioni, consulta Limiti per i byte scansionati per file.
fileTypes[]: elenca i FileTypes da includere nella scansione. Questo valore può essere impostato su uno o più dei seguenti tipi enumerati.
filesLimitPercent: limita il numero di file da analizzare alla percentuale specificata dell'FileSet di input. Se specifichi 0 o 100, non c'è alcun limite.
sampleMethod: Come campionare i byte se non vengono scansionati tutti. La specifica di questo valore è significativa solo se utilizzata insieme a bytesLimitPerFile. Se non viene specificato, la scansione inizia dall'alto. Questo campo può essere impostato su uno dei due valori:
- TOP: la scansione inizia dall'alto.
- RANDOM_START: per ogni file più grande delle dimensioni specificate in bytesLimitPerFile, scegli in modo casuale l'offset da cui iniziare la scansione. I byte scansionati sono contigui.

Gli esempi seguenti mostrano l'utilizzo dell'API DLP per scansionare un sottoinsieme del 90% di un bucket Cloud Storage alla ricerca di nomi di persone. La scansione inizia da una posizione casuale nel set di dati e include solo file di testo inferiori a 200 byte.

C#

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

 using Google.Api.Gax.ResourceNames; using Google.Cloud.Dlp.V2; using Google.Cloud.PubSub.V1; using System.Collections.Generic; using System.Threading; using System.Threading.Tasks;  public class InspectStorageWithSampling {     public static async Task<DlpJob> InspectAsync(         string projectId,         string gcsUri,         string topicId,         string subId,         Likelihood minLikelihood = Likelihood.Possible,         IEnumerable<InfoType> infoTypes = null)     {          // Instantiate the dlp client.         var dlp = DlpServiceClient.Create();          // Construct Storage config by specifying the GCS file to be inspected         // and sample method.         var storageConfig = new StorageConfig         {             CloudStorageOptions = new CloudStorageOptions             {                 FileSet = new CloudStorageOptions.Types.FileSet                 {                     Url = gcsUri                 },                 BytesLimitPerFile = 200,                 FileTypes = { new FileType[] { FileType.Csv } },                 FilesLimitPercent = 90,                 SampleMethod = CloudStorageOptions.Types.SampleMethod.RandomStart             }         };          // Construct the Inspect Config and specify the type of info the inspection         // will look for.         var inspectConfig = new InspectConfig         {             InfoTypes =             {                 infoTypes ?? new InfoType[] { new InfoType { Name = "PERSON_NAME" } }             },             IncludeQuote = true,             MinLikelihood = minLikelihood         };          // Construct the pubsub action.         var actions = new Action[]         {             new Action             {                 PubSub = new Action.Types.PublishToPubSub                 {                     Topic = $"projects/{projectId}/topics/{topicId}"                 }             }         };          // Construct the inspect job config using above created objects.         var inspectJob = new InspectJobConfig         {             StorageConfig = storageConfig,             InspectConfig = inspectConfig,             Actions = { actions }         };          // Issue Create Dlp Job Request         var request = new CreateDlpJobRequest         {             InspectJob = inspectJob,             ParentAsLocationName = new LocationName(projectId, "global"),         };          // We keep the name of the job that we just created.         var dlpJob = dlp.CreateDlpJob(request);         var jobName = dlpJob.Name;          // Listen to pub/sub for the job         var subscriptionName = new SubscriptionName(projectId, subId);         var subscriber = await SubscriberClient.CreateAsync(             subscriptionName);          await subscriber.StartAsync((PubsubMessage message, CancellationToken cancel) =>         {             if (message.Attributes["DlpJobName"] == jobName)             {                 subscriber.StopAsync(cancel);                 return Task.FromResult(SubscriberClient.Reply.Ack);             }             else             {                 return Task.FromResult(SubscriberClient.Reply.Nack);             }         });          // Get the latest state of the job from the service         var resultJob = dlp.GetDlpJob(new GetDlpJobRequest         {             DlpJobName = DlpJobName.Parse(jobName)         });          // Parse the response and process results.         System.Console.WriteLine($"Job status: {resultJob.State}");         System.Console.WriteLine($"Job Name: {resultJob.Name}");          var result = resultJob.InspectDetails.Result;         foreach (var infoType in result.InfoTypeStats)         {             System.Console.WriteLine($"Info Type: {infoType.InfoType.Name}");             System.Console.WriteLine($"Count: {infoType.Count}");         }         return resultJob;     } }

Go

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

import ( 	"context" 	"fmt" 	"io" 	"time"  	dlp "cloud.google.com/go/dlp/apiv2" 	"cloud.google.com/go/dlp/apiv2/dlppb" 	"cloud.google.com/go/pubsub" )  // inspectGcsFileWithSampling inspects a storage with sampling func inspectGcsFileWithSampling(w io.Writer, projectID, gcsUri, topicID, subscriptionId string) error { 	// projectId := "your-project-id" 	// gcsUri := "gs://" + "your-bucket-name" + "/path/to/your/file.txt" 	// topicID := "your-pubsub-topic-id" 	// subscriptionId := "your-pubsub-subscription-id"  	ctx := context.Background()  	// Initialize a client once and reuse it to send multiple requests. Clients 	// are safe to use across goroutines. When the client is no longer needed, 	// call the Close method to cleanup its resources. 	client, err := dlp.NewClient(ctx) 	if err != nil { 		return err 	} 	// Closing the client safely cleans up background resources. 	defer client.Close()  	// Specify the GCS file to be inspected and sampling configuration 	var cloudStorageOptions = &dlppb.CloudStorageOptions{ 		FileSet: &dlppb.CloudStorageOptions_FileSet{ 			Url: gcsUri, 		}, 		BytesLimitPerFile: int64(200), 		FileTypes: []dlppb.FileType{ 			dlppb.FileType_TEXT_FILE, 		}, 		FilesLimitPercent: int32(90), 		SampleMethod:      dlppb.CloudStorageOptions_RANDOM_START, 	}  	var storageConfig = &dlppb.StorageConfig{ 		Type: &dlppb.StorageConfig_CloudStorageOptions{ 			CloudStorageOptions: cloudStorageOptions, 		}, 	}  	// Specify the type of info the inspection will look for. 	// See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types 	// Specify how the content should be inspected. 	var inspectConfig = &dlppb.InspectConfig{ 		InfoTypes: []*dlppb.InfoType{ 			{Name: "PERSON_NAME"}, 		}, 		ExcludeInfoTypes: true, 		IncludeQuote:     true, 		MinLikelihood:    dlppb.Likelihood_POSSIBLE, 	}  	// Create a PubSub Client used to listen for when the inspect job finishes. 	pubsubClient, err := pubsub.NewClient(ctx, projectID) 	if err != nil { 		return err 	} 	defer pubsubClient.Close()  	// Create a PubSub subscription we can use to listen for messages. 	// Create the Topic if it doesn't exist. 	t := pubsubClient.Topic(topicID) 	if exists, err := t.Exists(ctx); err != nil { 		return err 	} else if !exists { 		if t, err = pubsubClient.CreateTopic(ctx, topicID); err != nil { 			return err 		} 	}  	// Create the Subscription if it doesn't exist. 	s := pubsubClient.Subscription(subscriptionId) 	if exists, err := s.Exists(ctx); err != nil { 		return err 	} else if !exists { 		if s, err = pubsubClient.CreateSubscription(ctx, subscriptionId, pubsub.SubscriptionConfig{Topic: t}); err != nil { 			return err 		} 	}  	// topic is the PubSub topic string where messages should be sent. 	topic := "projects/" + projectID + "/topics/" + topicID  	var action = &dlppb.Action{ 		Action: &dlppb.Action_PubSub{ 			PubSub: &dlppb.Action_PublishToPubSub{ 				Topic: topic, 			}, 		}, 	}  	// Configure the long running job we want the service to perform. 	var inspectJobConfig = &dlppb.InspectJobConfig{ 		StorageConfig: storageConfig, 		InspectConfig: inspectConfig, 		Actions: []*dlppb.Action{ 			action, 		}, 	}  	// Create the request for the job configured above. 	req := &dlppb.CreateDlpJobRequest{ 		Parent: fmt.Sprintf("projects/%s/locations/global", projectID), 		Job: &dlppb.CreateDlpJobRequest_InspectJob{ 			InspectJob: inspectJobConfig, 		}, 	}  	// Use the client to send the request. 	j, err := client.CreateDlpJob(ctx, req) 	if err != nil { 		return err 	} 	fmt.Fprintf(w, "Job Created: %v", j.GetName())  	// Wait for the inspect job to finish by waiting for a PubSub message. 	// This only waits for 10 minutes. For long jobs, consider using a truly 	// asynchronous execution model such as Cloud Functions. 	ctx, cancel := context.WithTimeout(ctx, 10*time.Minute) 	defer cancel() 	err = s.Receive(ctx, func(ctx context.Context, msg *pubsub.Message) { 		// If this is the wrong job, do not process the result. 		if msg.Attributes["DlpJobName"] != j.GetName() { 			msg.Nack() 			return 		} 		msg.Ack()  		// Stop listening for more messages. 		defer cancel()  		resp, err := client.GetDlpJob(ctx, &dlppb.GetDlpJobRequest{ 			Name: j.GetName(), 		}) 		if err != nil { 			fmt.Fprintf(w, "Error getting completed job: %v\n", err) 			return 		} 		r := resp.GetInspectDetails().GetResult().GetInfoTypeStats() 		if len(r) == 0 { 			fmt.Fprintf(w, "No results") 			return 		} 		for _, s := range r { 			fmt.Fprintf(w, "\nFound %v instances of infoType %v\n", s.GetCount(), s.GetInfoType().GetName()) 		} 	}) 	if err != nil { 		return err 	} 	return nil  }

Java

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

 import com.google.api.core.SettableApiFuture; import com.google.cloud.dlp.v2.DlpServiceClient; import com.google.cloud.pubsub.v1.AckReplyConsumer; import com.google.cloud.pubsub.v1.MessageReceiver; import com.google.cloud.pubsub.v1.Subscriber; import com.google.privacy.dlp.v2.Action; import com.google.privacy.dlp.v2.CloudStorageOptions; import com.google.privacy.dlp.v2.CloudStorageOptions.FileSet; import com.google.privacy.dlp.v2.CloudStorageOptions.SampleMethod; import com.google.privacy.dlp.v2.CreateDlpJobRequest; import com.google.privacy.dlp.v2.DlpJob; import com.google.privacy.dlp.v2.FileType; import com.google.privacy.dlp.v2.GetDlpJobRequest; import com.google.privacy.dlp.v2.InfoType; import com.google.privacy.dlp.v2.InfoTypeStats; import com.google.privacy.dlp.v2.InspectConfig; import com.google.privacy.dlp.v2.InspectDataSourceDetails; import com.google.privacy.dlp.v2.InspectJobConfig; import com.google.privacy.dlp.v2.Likelihood; import com.google.privacy.dlp.v2.LocationName; import com.google.privacy.dlp.v2.StorageConfig; import com.google.pubsub.v1.ProjectSubscriptionName; import com.google.pubsub.v1.PubsubMessage; import java.io.IOException; import java.util.concurrent.ExecutionException; import java.util.concurrent.TimeUnit; import java.util.concurrent.TimeoutException;  public class InspectGcsFileWithSampling {    public static void main(String[] args) throws Exception {     // TODO(developer): Replace these variables before running the sample.     String projectId = "your-project-id";     String gcsUri = "gs://" + "your-bucket-name" + "/path/to/your/file.txt";     String topicId = "your-pubsub-topic-id";     String subscriptionId = "your-pubsub-subscription-id";     inspectGcsFileWithSampling(projectId, gcsUri, topicId, subscriptionId);   }    // Inspects a file in a Google Cloud Storage Bucket.   public static void inspectGcsFileWithSampling(       String projectId, String gcsUri, String topicId, String subscriptionId)       throws ExecutionException, InterruptedException, IOException {     // Initialize client that will be used to send requests. This client only needs to be created     // once, and can be reused for multiple requests. After completing all of your requests, call     // the "close" method on the client to safely clean up any remaining background resources.     try (DlpServiceClient dlp = DlpServiceClient.create()) {       // Specify the GCS file to be inspected and sampling configuration       CloudStorageOptions cloudStorageOptions =           CloudStorageOptions.newBuilder()               .setFileSet(FileSet.newBuilder().setUrl(gcsUri))               .setBytesLimitPerFile(200)               .addFileTypes(FileType.TEXT_FILE)               .setFilesLimitPercent(90)               .setSampleMethod(SampleMethod.RANDOM_START)               .build();        StorageConfig storageConfig =           StorageConfig.newBuilder().setCloudStorageOptions(cloudStorageOptions).build();        // Specify the type of info the inspection will look for.       // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types       InfoType infoType = InfoType.newBuilder().setName("PERSON_NAME").build();        // Specify how the content should be inspected.       InspectConfig inspectConfig =           InspectConfig.newBuilder()               .addInfoTypes(infoType)               .setExcludeInfoTypes(true)               .setIncludeQuote(true)               .setMinLikelihood(Likelihood.POSSIBLE)               .build();        // Specify the action that is triggered when the job completes.       String pubSubTopic = String.format("projects/%s/topics/%s", projectId, topicId);       Action.PublishToPubSub publishToPubSub =           Action.PublishToPubSub.newBuilder().setTopic(pubSubTopic).build();       Action action = Action.newBuilder().setPubSub(publishToPubSub).build();        // Configure the long running job we want the service to perform.       InspectJobConfig inspectJobConfig =           InspectJobConfig.newBuilder()               .setStorageConfig(storageConfig)               .setInspectConfig(inspectConfig)               .addActions(action)               .build();        // Create the request for the job configured above.       CreateDlpJobRequest createDlpJobRequest =           CreateDlpJobRequest.newBuilder()               .setParent(LocationName.of(projectId, "global").toString())               .setInspectJob(inspectJobConfig)               .build();        // Use the client to send the request.       final DlpJob dlpJob = dlp.createDlpJob(createDlpJobRequest);       System.out.println("Job created: " + dlpJob.getName());        // Set up a Pub/Sub subscriber to listen on the job completion status       final SettableApiFuture<Boolean> done = SettableApiFuture.create();        ProjectSubscriptionName subscriptionName =           ProjectSubscriptionName.of(projectId, subscriptionId);        MessageReceiver messageHandler =           (PubsubMessage pubsubMessage, AckReplyConsumer ackReplyConsumer) -> {             handleMessage(dlpJob, done, pubsubMessage, ackReplyConsumer);           };       Subscriber subscriber = Subscriber.newBuilder(subscriptionName, messageHandler).build();       subscriber.startAsync();        // Wait for job completion semi-synchronously       // For long jobs, consider using a truly asynchronous execution model such as Cloud Functions       try {         done.get(15, TimeUnit.MINUTES);       } catch (TimeoutException e) {         System.out.println("Job was not completed after 15 minutes.");         return;       } finally {         subscriber.stopAsync();         subscriber.awaitTerminated();       }        // Get the latest state of the job from the service       GetDlpJobRequest request = GetDlpJobRequest.newBuilder().setName(dlpJob.getName()).build();       DlpJob completedJob = dlp.getDlpJob(request);        // Parse the response and process results.       System.out.println("Job status: " + completedJob.getState());       System.out.println("Job name: " + dlpJob.getName());       InspectDataSourceDetails.Result result = completedJob.getInspectDetails().getResult();       System.out.println("Findings: ");       for (InfoTypeStats infoTypeStat : result.getInfoTypeStatsList()) {         System.out.print("\tInfo type: " + infoTypeStat.getInfoType().getName());         System.out.println("\tCount: " + infoTypeStat.getCount());       }     }   }    // handleMessage injects the job and settableFuture into the message reciever interface   private static void handleMessage(       DlpJob job,       SettableApiFuture<Boolean> done,       PubsubMessage pubsubMessage,       AckReplyConsumer ackReplyConsumer) {     String messageAttribute = pubsubMessage.getAttributesMap().get("DlpJobName");     if (job.getName().equals(messageAttribute)) {       done.set(true);       ackReplyConsumer.ack();     } else {       ackReplyConsumer.nack();     }   } }

Node.js

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

// Import the Google Cloud client libraries const DLP = require('@google-cloud/dlp'); const {PubSub} = require('@google-cloud/pubsub');  // Instantiates clients const dlp = new DLP.DlpServiceClient(); const pubsub = new PubSub();  // The project ID to run the API call under // const projectId = 'my-project';  // The gcs file path // const gcsUri = 'gs://" + "your-bucket-name" + "/path/to/your/file.txt';  // Specify the type of info the inspection will look for. // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types // const infoTypes = [{ name: 'PERSON_NAME' }];  // The name of the Pub/Sub topic to notify once the job completes // TODO(developer): create a Pub/Sub topic to use for this // const topicId = 'MY-PUBSUB-TOPIC'  // The name of the Pub/Sub subscription to use when listening for job // completion notifications // TODO(developer): create a Pub/Sub subscription to use for this // const subscriptionId = 'MY-PUBSUB-SUBSCRIPTION'  // DLP Job max time (in milliseconds) const DLP_JOB_WAIT_TIME = 15 * 1000 * 60;  async function inspectGcsFileSampling() {   // Specify the GCS file to be inspected and sampling configuration   const storageItemConfig = {     cloudStorageOptions: {       fileSet: {url: gcsUri},       bytesLimitPerFile: 200,       filesLimitPercent: 90,       fileTypes: [DLP.protos.google.privacy.dlp.v2.FileType.TEXT_FILE],       sampleMethod:         DLP.protos.google.privacy.dlp.v2.CloudStorageOptions.SampleMethod           .RANDOM_START,     },   };    // Specify how the content should be inspected.   const inspectConfig = {     infoTypes: infoTypes,     minLikelihood: DLP.protos.google.privacy.dlp.v2.Likelihood.POSSIBLE,     includeQuote: true,     excludeInfoTypes: true,   };    // Specify the action that is triggered when the job completes.   const actions = [     {       pubSub: {         topic: `projects/${projectId}/topics/${topicId}`,       },     },   ];    // Create the request for the job configured above.   const request = {     parent: `projects/${projectId}/locations/global`,     inspectJob: {       inspectConfig: inspectConfig,       storageConfig: storageItemConfig,       actions: actions,     },   };    // Use the client to send the request.   const [topicResponse] = await pubsub.topic(topicId).get();    // Verify the Pub/Sub topic and listen for job notifications via an   // existing subscription.   const subscription = await topicResponse.subscription(subscriptionId);    const [jobsResponse] = await dlp.createDlpJob(request);   const jobName = jobsResponse.name;   // Watch the Pub/Sub topic until the DLP job finishes   await new Promise((resolve, reject) => {     // Set up the timeout     const timer = setTimeout(() => {       reject(new Error('Timeout'));     }, DLP_JOB_WAIT_TIME);      const messageHandler = message => {       if (message.attributes && message.attributes.DlpJobName === jobName) {         message.ack();         subscription.removeListener('message', messageHandler);         subscription.removeListener('error', errorHandler);         clearTimeout(timer);         resolve(jobName);       } else {         message.nack();       }     };      const errorHandler = err => {       subscription.removeListener('message', messageHandler);       subscription.removeListener('error', errorHandler);       clearTimeout(timer);       reject(err);     };      subscription.on('message', messageHandler);     subscription.on('error', errorHandler);   });   const [job] = await dlp.getDlpJob({name: jobName});   console.log(`Job ${job.name} status: ${job.state}`);    const infoTypeStats = job.inspectDetails.result.infoTypeStats;   if (infoTypeStats.length > 0) {     infoTypeStats.forEach(infoTypeStat => {       console.log(         `  Found ${infoTypeStat.count} instance(s) of infoType ${infoTypeStat.infoType.name}.`       );     });   } else {     console.log('No findings.');   } }  await inspectGcsFileSampling();

PHP

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

use Google\Cloud\Dlp\V2\Action; use Google\Cloud\Dlp\V2\Action\PublishToPubSub; use Google\Cloud\Dlp\V2\BigQueryOptions\SampleMethod; use Google\Cloud\Dlp\V2\Client\DlpServiceClient; use Google\Cloud\Dlp\V2\CloudStorageOptions; use Google\Cloud\Dlp\V2\CloudStorageOptions\FileSet; use Google\Cloud\Dlp\V2\CreateDlpJobRequest; use Google\Cloud\Dlp\V2\DlpJob\JobState; use Google\Cloud\Dlp\V2\GetDlpJobRequest; use Google\Cloud\Dlp\V2\InfoType; use Google\Cloud\Dlp\V2\InspectConfig; use Google\Cloud\Dlp\V2\InspectJobConfig; use Google\Cloud\Dlp\V2\StorageConfig; use Google\Cloud\PubSub\PubSubClient;  /**  * Inspect storage with sampling.  * The following examples demonstrate using the Cloud DLP API to scan a 90% subset of a  * Cloud Storage bucket for person names. The scan starts from a random location in the dataset  * and only includes text files under 200 bytes.  *  * @param string $callingProjectId  The project ID to run the API call under.  * @param string $gcsUri            Google Cloud Storage file url.  * @param string $topicId           The ID of the Pub/Sub topic to notify once the job completes.  * @param string $subscriptionId    The ID of the Pub/Sub subscription to use when listening for job.  */ function inspect_gcs_with_sampling(     // TODO(developer): Replace sample parameters before running the code.     string $callingProjectId,     string $gcsUri = 'gs://GOOGLE_STORAGE_BUCKET_NAME/dlp_sample.csv',     string $topicId = 'dlp-pubsub-topic',     string $subscriptionId = 'dlp_subcription' ): void {     // Instantiate a client.     $dlp = new DlpServiceClient();     $pubsub = new PubSubClient();     $topic = $pubsub->topic($topicId);      // Construct the items to be inspected.     $cloudStorageOptions = (new CloudStorageOptions())         ->setFileSet((new FileSet())             ->setUrl($gcsUri))         ->setBytesLimitPerFile(200)         ->setFilesLimitPercent(90)         ->setSampleMethod(SampleMethod::RANDOM_START);      $storageConfig = (new StorageConfig())         ->setCloudStorageOptions($cloudStorageOptions);      // Specify the type of info the inspection will look for.     $phoneNumberInfoType = (new InfoType())         ->setName('PHONE_NUMBER');     $emailAddressInfoType = (new InfoType())         ->setName('EMAIL_ADDRESS');     $cardNumberInfoType = (new InfoType())         ->setName('CREDIT_CARD_NUMBER');     $infoTypes = [$phoneNumberInfoType, $emailAddressInfoType, $cardNumberInfoType];      // Specify how the content should be inspected.     $inspectConfig = (new InspectConfig())         ->setInfoTypes($infoTypes)         ->setIncludeQuote(true);      // Construct the action to run when job completes.     $action = (new Action())         ->setPubSub((new PublishToPubSub())             ->setTopic($topic->name()));      // Construct inspect job config to run.     $inspectJob = (new InspectJobConfig())         ->setInspectConfig($inspectConfig)         ->setStorageConfig($storageConfig)         ->setActions([$action]);      // Listen for job notifications via an existing topic/subscription.     $subscription = $topic->subscription($subscriptionId);      // Submit request.     $parent = "projects/$callingProjectId/locations/global";     $createDlpJobRequest = (new CreateDlpJobRequest())         ->setParent($parent)         ->setInspectJob($inspectJob);     $job = $dlp->createDlpJob($createDlpJobRequest);      // Poll Pub/Sub using exponential backoff until job finishes.     // Consider using an asynchronous execution model such as Cloud Functions.     $attempt = 1;     $startTime = time();     do {         foreach ($subscription->pull() as $message) {             if (                 isset($message->attributes()['DlpJobName']) &&                 $message->attributes()['DlpJobName'] === $job->getName()             ) {                 $subscription->acknowledge($message);                 // Get the updated job. Loop to avoid race condition with DLP API.                 do {                     $getDlpJobRequest = (new GetDlpJobRequest())                         ->setName($job->getName());                     $job = $dlp->getDlpJob($getDlpJobRequest);                 } while ($job->getState() == JobState::RUNNING);                 break 2; // break from parent do while.             }         }         printf('Waiting for job to complete' . PHP_EOL);         // Exponential backoff with max delay of 60 seconds.         sleep(min(60, pow(2, ++$attempt)));     } while (time() - $startTime < 600); // 10 minute timeout.      // Print finding counts.     printf('Job %s status: %s' . PHP_EOL, $job->getName(), JobState::name($job->getState()));     switch ($job->getState()) {         case JobState::DONE:             $infoTypeStats = $job->getInspectDetails()->getResult()->getInfoTypeStats();             if (count($infoTypeStats) === 0) {                 printf('No findings.' . PHP_EOL);             } else {                 foreach ($infoTypeStats as $infoTypeStat) {                     printf(                         '  Found %s instance(s) of infoType %s' . PHP_EOL,                         $infoTypeStat->getCount(),                         $infoTypeStat->getInfoType()->getName()                     );                 }             }             break;         case JobState::FAILED:             printf('Job %s had errors:' . PHP_EOL, $job->getName());             $errors = $job->getErrors();             foreach ($errors as $error) {                 var_dump($error->getDetails());             }             break;         case JobState::PENDING:             printf('Job has not completed. Consider a longer timeout or an asynchronous execution model' . PHP_EOL);             break;         default:             printf('Unexpected job state. Most likely, the job is either running or has not yet started.');     } }

Python

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

import threading from typing import List  import google.cloud.dlp import google.cloud.pubsub   def inspect_gcs_with_sampling(     project: str,     bucket: str,     topic_id: str,     subscription_id: str,     info_types: List[str] = None,     file_types: List[str] = None,     min_likelihood: str = None,     max_findings: int = None,     timeout: int = 300, ) -> None:     """Uses the Data Loss Prevention API to analyze files in GCS by     limiting the amount of data to be scanned.     Args:         project: The Google Cloud project id to use as a parent resource.         bucket: The name of the GCS bucket containing the file, as a string.         topic_id: The id of the Cloud Pub/Sub topic to which the API will             broadcast job completion. The topic must already exist.         subscription_id: The id of the Cloud Pub/Sub subscription to listen on             while waiting for job completion. The subscription must already             exist and be subscribed to the topic.         info_types: A list of strings representing infoTypes to look for.             A full list of info type categories can be fetched from the API.         file_types: Type of files in gcs bucket where the inspection would happen.         min_likelihood: A string representing the minimum likelihood threshold             that constitutes a match. One of: 'LIKELIHOOD_UNSPECIFIED',             'VERY_UNLIKELY', 'UNLIKELY', 'POSSIBLE', 'LIKELY', 'VERY_LIKELY'.         max_findings: The maximum number of findings to report; 0 = no maximum.         timeout: The number of seconds to wait for a response from the API.     """      # Instantiate a client.     dlp = google.cloud.dlp_v2.DlpServiceClient()      # Prepare info_types by converting the list of strings into a list of     # dictionaries.     if not info_types:         info_types = ["FIRST_NAME", "LAST_NAME", "EMAIL_ADDRESS"]     info_types = [{"name": info_type} for info_type in info_types]      # Specify how the content should be inspected. Keys which are None may     # optionally be omitted entirely.     inspect_config = {         "info_types": info_types,         "exclude_info_types": True,         "include_quote": True,         "min_likelihood": min_likelihood,         "limits": {"max_findings_per_request": max_findings},     }      # Setting default file types as CSV files     if not file_types:         file_types = ["CSV"]      # Construct a cloud_storage_options dictionary with the bucket's URL.     url = f"gs://{bucket}/*"     storage_config = {         "cloud_storage_options": {             "file_set": {"url": url},             "bytes_limit_per_file": 200,             "file_types": file_types,             "files_limit_percent": 90,             "sample_method": "RANDOM_START",         }     }      # Tell the API where to send a notification when the job is complete.     topic = google.cloud.pubsub.PublisherClient.topic_path(project, topic_id)     actions = [{"pub_sub": {"topic": topic}}]      # Construct the inspect_job, which defines the entire inspect content task.     inspect_job = {         "inspect_config": inspect_config,         "storage_config": storage_config,         "actions": actions,     }      # Convert the project id into full resource ids.     parent = f"projects/{project}/locations/global"      # Call the API     operation = dlp.create_dlp_job(         request={"parent": parent, "inspect_job": inspect_job}     )     print(f"Inspection operation started: {operation.name}")      # Create a Pub/Sub client and find the subscription. The subscription is     # expected to already be listening to the topic.     subscriber = google.cloud.pubsub.SubscriberClient()     subscription_path = subscriber.subscription_path(project, subscription_id)      # Set up a callback to acknowledge a message. This closes around an event     # so that it can signal that it is done and the main thread can continue.     job_done = threading.Event()      def callback(message):         try:             if message.attributes["DlpJobName"] == operation.name:                 # This is the message we're looking for, so acknowledge it.                 message.ack()                  # Now that the job is done, fetch the results and print them.                 job = dlp.get_dlp_job(request={"name": operation.name})                 print(f"Job name: {job.name}")                 if job.inspect_details.result.info_type_stats:                     print("Findings:")                     for finding in job.inspect_details.result.info_type_stats:                         print(                             f"Info type: {finding.info_type.name}; Count: {finding.count}"                         )                 else:                     print("No findings.")                  # Signal to the main thread that we can exit.                 job_done.set()             else:                 # This is not the message we're looking for.                 message.drop()         except Exception as e:             # Because this is executing in a thread, an exception won't be             # noted unless we print it manually.             print(e)             raise      # Register the callback and wait on the event.     subscriber.subscribe(subscription_path, callback=callback)     finished = job_done.wait(timeout=timeout)     if not finished:         print(             "No event received before the timeout. Please verify that the "             "subscription provided is subscribed to the topic provided."         )

REST

Input JSON:

POST https://dlp.googleapis.com/v2/projects/[PROJECT-ID]/dlpJobs?key={YOUR_API_KEY}  {   "inspectJob":{     "storageConfig":{       "cloudStorageOptions":{         "fileSet":{           "url":"gs://[BUCKET-NAME]/*"         },         "bytesLimitPerFile":"200",         "fileTypes":[           "TEXT_FILE"         ],         "filesLimitPercent":90,         "sampleMethod":"RANDOM_START"       }     },     "inspectConfig":{       "infoTypes":[         {           "name":"PERSON_NAME"         }       ],       "excludeInfoTypes":true,       "includeQuote":true,       "minLikelihood":"POSSIBLE"     },     "actions":[       {         "saveFindings":{           "outputConfig":{             "table":{               "projectId":"[PROJECT-ID]",               "datasetId":"testingdlp"             },             "outputSchema":"BASIC_COLUMNS"           }         }       }     ]   } }

Dopo aver inviato l'input JSON in una richiesta POST all'endpoint specificato, viene creato un job di Sensitive Data Protection e l'API invia la seguente risposta.

Output JSON:

{   "name":"projects/[PROJECT-ID]/dlpJobs/[JOB-ID]",   "type":"INSPECT_JOB",   "state":"PENDING",   "inspectDetails":{     "requestedOptions":{       "snapshotInspectTemplate":{        },       "jobConfig":{         "storageConfig":{           "cloudStorageOptions":{             "fileSet":{               "url":"gs://[BUCKET_NAME]/*"             },             "bytesLimitPerFile":"200",             "fileTypes":[               "TEXT_FILE"             ],             "sampleMethod":"TOP",             "filesLimitPercent":90           }         },         "inspectConfig":{           "infoTypes":[             {               "name":"PERSON_NAME"             }           ],           "minLikelihood":"POSSIBLE",           "limits":{            },           "includeQuote":true,           "excludeInfoTypes":true         },         "actions":[           {             "saveFindings":{               "outputConfig":{                 "table":{                   "projectId":"[PROJECT-ID]",                   "datasetId":"[DATASET-ID]",                   "tableId":"[TABLE-ID]"                 },                 "outputSchema":"BASIC_COLUMNS"               }             }           }         ]       }     }   },   "createTime":"2018-05-30T22:22:08.279Z" }

Limitare le scansioni BigQuery

Per attivare il campionamento in BigQuery limitando la quantità di dati analizzati, specifica i seguenti campi facoltativi all'interno di BigQueryOptions:

rowsLimit: il numero massimo di righe da scansionare. Se la tabella ha più righe di questo valore, le righe rimanenti vengono omesse. Se non viene impostato o se viene impostato su 0, verranno analizzate tutte le righe.
rowsLimitPercent: la percentuale massima di righe da scansionare (compresa tra 0 e 100). Le righe rimanenti vengono omesse. Se imposti questo valore su 0 o 100, non viene applicato alcun limite. Il valore predefinito è 0. È possibile specificare solo uno tra rowsLimit e rowsLimitPercent.

Attenzione :un problema noto sta causando un comportamento imprevisto del campo rowsLimitPercent. Ti consigliamo di utilizzare rowsLimit.
sampleMethod: Come campionare le righe se non vengono analizzate tutte. Se non specificato, la scansione inizia dall'alto. Questo campo può essere impostato su uno dei due valori:
- TOP: la scansione inizia dall'alto.
- RANDOM_START: la scansione inizia da una riga selezionata in modo casuale.
excludedFields: Campi della tabella che identificano in modo univoco le colonne da escludere dalla lettura. Ciò può contribuire a ridurre la quantità di dati analizzati e a diminuire il costo complessivo di un job di ispezione.
includedFields: i campi della tabella che identificano in modo univoco righe specifiche all'interno della tabella da scansionare.

Un'altra funzionalità utile per limitare i dati scansionati, in particolare quando si scansionano tabelle partizionate, è TimespanConfig. TimespanConfig ti consente di filtrare le righe della tabella BigQuery fornendo valori di ora di inizio e fine per definire un intervallo di tempo. La protezione dei dati sensibili esegue la scansione solo delle righe che contengono un timestamp all'interno di questo intervallo di tempo.

Gli esempi seguenti mostrano l'utilizzo dell'API DLP per analizzare un sottoinsieme di 1000 righe di una tabella BigQuery. La scansione inizia da una riga casuale.

Go

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

import ( 	"context" 	"fmt" 	"io" 	"time"  	dlp "cloud.google.com/go/dlp/apiv2" 	"cloud.google.com/go/dlp/apiv2/dlppb" 	"cloud.google.com/go/pubsub" )  // inspectBigQueryTableWithSampling inspect bigQueries for sensitive data with sampling func inspectBigQueryTableWithSampling(w io.Writer, projectID, topicID, subscriptionID string) error { 	// projectId := "your-project-id" 	// topicID := "your-pubsub-topic-id" 	// or provide a topicID name to create one 	// subscriptionID := "your-pubsub-subscription-id" 	// or provide a subscription name to create one  	ctx := context.Background()  	// Initialize a client once and reuse it to send multiple requests. Clients 	// are safe to use across goroutines. When the client is no longer needed, 	// call the Close method to cleanup its resources. 	client, err := dlp.NewClient(ctx) 	if err != nil { 		return err 	}  	// Closing the client safely cleans up background resources. 	defer client.Close()  	// Specify the BigQuery table to be inspected. 	tableReference := &dlppb.BigQueryTable{ 		ProjectId: "bigquery-public-data", 		DatasetId: "usa_names", 		TableId:   "usa_1910_current", 	}  	bigQueryOptions := &dlppb.BigQueryOptions{ 		TableReference: tableReference, 		RowsLimit:      int64(10000), 		SampleMethod:   dlppb.BigQueryOptions_RANDOM_START, 		IdentifyingFields: []*dlppb.FieldId{ 			{Name: "name"}, 		}, 	}  	// Provide storage config with BigqueryOptions 	storageConfig := &dlppb.StorageConfig{ 		Type: &dlppb.StorageConfig_BigQueryOptions{ 			BigQueryOptions: bigQueryOptions, 		}, 	}  	// Specify the type of info the inspection will look for. 	// See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types 	infoTypes := []*dlppb.InfoType{ 		{Name: "PERSON_NAME"}, 	}  	// Specify how the content should be inspected. 	inspectConfig := &dlppb.InspectConfig{ 		InfoTypes:    infoTypes, 		IncludeQuote: true, 	}  	// Create a PubSub Client used to listen for when the inspect job finishes. 	pubsubClient, err := pubsub.NewClient(ctx, projectID) 	if err != nil { 		return err 	} 	defer pubsubClient.Close()  	// Create a PubSub subscription we can use to listen for messages. 	// Create the Topic if it doesn't exist. 	t := pubsubClient.Topic(topicID) 	if exists, err := t.Exists(ctx); err != nil { 		return err 	} else if !exists { 		if t, err = pubsubClient.CreateTopic(ctx, topicID); err != nil { 			return err 		} 	}  	// Create the Subscription if it doesn't exist. 	s := pubsubClient.Subscription(subscriptionID) 	if exists, err := s.Exists(ctx); err != nil { 		return err 	} else if !exists { 		if s, err = pubsubClient.CreateSubscription(ctx, subscriptionID, pubsub.SubscriptionConfig{Topic: t}); err != nil { 			return err 		} 	}  	// topic is the PubSub topic string where messages should be sent. 	topic := fmt.Sprintf("projects/%s/topics/%s", projectID, topicID)  	action := &dlppb.Action{ 		Action: &dlppb.Action_PubSub{ 			PubSub: &dlppb.Action_PublishToPubSub{ 				Topic: topic, 			}, 		}, 	}  	// Configure the long running job we want the service to perform. 	inspectJobConfig := &dlppb.InspectJobConfig{ 		StorageConfig: storageConfig, 		InspectConfig: inspectConfig, 		Actions: []*dlppb.Action{ 			action, 		}, 	}  	// Create the request for the job configured above. 	req := &dlppb.CreateDlpJobRequest{ 		Parent: fmt.Sprintf("projects/%s/locations/global", projectID), 		Job: &dlppb.CreateDlpJobRequest_InspectJob{ 			InspectJob: inspectJobConfig, 		}, 	}  	// Use the client to send the request. 	j, err := client.CreateDlpJob(ctx, req) 	if err != nil { 		return err 	} 	fmt.Fprintf(w, "Job Created: %v", j.GetName())  	// Wait for the inspect job to finish by waiting for a PubSub message. 	// This only waits for 10 minutes. For long jobs, consider using a truly 	// asynchronous execution model such as Cloud Functions. 	c, cancel := context.WithTimeout(ctx, 10*time.Minute) 	defer cancel() 	err = s.Receive(c, func(ctx context.Context, msg *pubsub.Message) { 		// If this is the wrong job, do not process the result. 		if msg.Attributes["DlpJobName"] != j.GetName() { 			msg.Nack() 			return 		} 		msg.Ack()  		// Stop listening for more messages. 		defer cancel() 	}) 	if err != nil { 		return err 	}  	resp, err := client.GetDlpJob(ctx, &dlppb.GetDlpJobRequest{ 		Name: j.GetName(), 	}) 	if err != nil { 		return err 	} 	r := resp.GetInspectDetails().GetResult().GetInfoTypeStats() 	if len(r) == 0 { 		fmt.Fprintf(w, "No results") 		return err 	} 	for _, s := range r { 		fmt.Fprintf(w, "\nFound %v instances of infoType %v\n", s.GetCount(), s.GetInfoType().GetName()) 	} 	return nil  }

Java

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

 import com.google.api.core.SettableApiFuture; import com.google.cloud.dlp.v2.DlpServiceClient; import com.google.cloud.pubsub.v1.AckReplyConsumer; import com.google.cloud.pubsub.v1.MessageReceiver; import com.google.cloud.pubsub.v1.Subscriber; import com.google.privacy.dlp.v2.Action; import com.google.privacy.dlp.v2.BigQueryOptions; import com.google.privacy.dlp.v2.BigQueryOptions.SampleMethod; import com.google.privacy.dlp.v2.BigQueryTable; import com.google.privacy.dlp.v2.CreateDlpJobRequest; import com.google.privacy.dlp.v2.DlpJob; import com.google.privacy.dlp.v2.FieldId; import com.google.privacy.dlp.v2.GetDlpJobRequest; import com.google.privacy.dlp.v2.InfoType; import com.google.privacy.dlp.v2.InfoTypeStats; import com.google.privacy.dlp.v2.InspectConfig; import com.google.privacy.dlp.v2.InspectDataSourceDetails; import com.google.privacy.dlp.v2.InspectJobConfig; import com.google.privacy.dlp.v2.LocationName; import com.google.privacy.dlp.v2.StorageConfig; import com.google.pubsub.v1.ProjectSubscriptionName; import com.google.pubsub.v1.PubsubMessage; import java.io.IOException; import java.util.concurrent.ExecutionException; import java.util.concurrent.TimeUnit; import java.util.concurrent.TimeoutException;  public class InspectBigQueryTableWithSampling {    public static void main(String[] args) throws Exception {     // TODO(developer): Replace these variables before running the sample.     String projectId = "your-project-id";     String topicId = "your-pubsub-topic-id";     String subscriptionId = "your-pubsub-subscription-id";     inspectBigQueryTableWithSampling(projectId, topicId, subscriptionId);   }    // Inspects a BigQuery Table   public static void inspectBigQueryTableWithSampling(       String projectId, String topicId, String subscriptionId)       throws ExecutionException, InterruptedException, IOException {     // Initialize client that will be used to send requests. This client only needs to be created     // once, and can be reused for multiple requests. After completing all of your requests, call     // the "close" method on the client to safely clean up any remaining background resources.     try (DlpServiceClient dlp = DlpServiceClient.create()) {       // Specify the BigQuery table to be inspected.       BigQueryTable tableReference =           BigQueryTable.newBuilder()               .setProjectId("bigquery-public-data")               .setDatasetId("usa_names")               .setTableId("usa_1910_current")               .build();        BigQueryOptions bigQueryOptions =           BigQueryOptions.newBuilder()               .setTableReference(tableReference)               .setRowsLimit(1000)               .setSampleMethod(SampleMethod.RANDOM_START)               .addIdentifyingFields(FieldId.newBuilder().setName("name"))               .build();        StorageConfig storageConfig =           StorageConfig.newBuilder().setBigQueryOptions(bigQueryOptions).build();        // Specify the type of info the inspection will look for.       // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types       InfoType infoType = InfoType.newBuilder().setName("PERSON_NAME").build();        // Specify how the content should be inspected.       InspectConfig inspectConfig =           InspectConfig.newBuilder().addInfoTypes(infoType).setIncludeQuote(true).build();        // Specify the action that is triggered when the job completes.       String pubSubTopic = String.format("projects/%s/topics/%s", projectId, topicId);       Action.PublishToPubSub publishToPubSub =           Action.PublishToPubSub.newBuilder().setTopic(pubSubTopic).build();       Action action = Action.newBuilder().setPubSub(publishToPubSub).build();        // Configure the long running job we want the service to perform.       InspectJobConfig inspectJobConfig =           InspectJobConfig.newBuilder()               .setStorageConfig(storageConfig)               .setInspectConfig(inspectConfig)               .addActions(action)               .build();        // Create the request for the job configured above.       CreateDlpJobRequest createDlpJobRequest =           CreateDlpJobRequest.newBuilder()               .setParent(LocationName.of(projectId, "global").toString())               .setInspectJob(inspectJobConfig)               .build();        // Use the client to send the request.       final DlpJob dlpJob = dlp.createDlpJob(createDlpJobRequest);       System.out.println("Job created: " + dlpJob.getName());        // Set up a Pub/Sub subscriber to listen on the job completion status       final SettableApiFuture<Boolean> done = SettableApiFuture.create();        ProjectSubscriptionName subscriptionName =           ProjectSubscriptionName.of(projectId, subscriptionId);        MessageReceiver messageHandler =           (PubsubMessage pubsubMessage, AckReplyConsumer ackReplyConsumer) -> {             handleMessage(dlpJob, done, pubsubMessage, ackReplyConsumer);           };       Subscriber subscriber = Subscriber.newBuilder(subscriptionName, messageHandler).build();       subscriber.startAsync();        // Wait for job completion semi-synchronously       // For long jobs, consider using a truly asynchronous execution model such as Cloud Functions       try {         done.get(15, TimeUnit.MINUTES);       } catch (TimeoutException e) {         System.out.println("Job was not completed after 15 minutes.");         return;       } finally {         subscriber.stopAsync();         subscriber.awaitTerminated();       }        // Get the latest state of the job from the service       GetDlpJobRequest request = GetDlpJobRequest.newBuilder().setName(dlpJob.getName()).build();       DlpJob completedJob = dlp.getDlpJob(request);        // Parse the response and process results.       System.out.println("Job status: " + completedJob.getState());       System.out.println("Job name: " + dlpJob.getName());       InspectDataSourceDetails.Result result = completedJob.getInspectDetails().getResult();       System.out.println("Findings: ");       for (InfoTypeStats infoTypeStat : result.getInfoTypeStatsList()) {         System.out.print("\tInfo type: " + infoTypeStat.getInfoType().getName());         System.out.println("\tCount: " + infoTypeStat.getCount());       }     }   }    // handleMessage injects the job and settableFuture into the message reciever interface   private static void handleMessage(       DlpJob job,       SettableApiFuture<Boolean> done,       PubsubMessage pubsubMessage,       AckReplyConsumer ackReplyConsumer) {     String messageAttribute = pubsubMessage.getAttributesMap().get("DlpJobName");     if (job.getName().equals(messageAttribute)) {       done.set(true);       ackReplyConsumer.ack();     } else {       ackReplyConsumer.nack();     }   } }

Node.js

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

// Import the Google Cloud client libraries const DLP = require('@google-cloud/dlp'); const {PubSub} = require('@google-cloud/pubsub');  // Instantiates clients const dlp = new DLP.DlpServiceClient(); const pubsub = new PubSub();  // The project ID to run the API call under // const projectId = 'my-project';  // The project ID the table is stored under // This may or (for public datasets) may not equal the calling project ID // const dataProjectId = 'my-project';  // The ID of the dataset to inspect, e.g. 'my_dataset' // const datasetId = 'my_dataset';  // The ID of the table to inspect, e.g. 'my_table' // const tableId = 'my_table';  // The name of the Pub/Sub topic to notify once the job completes // TODO(developer): create a Pub/Sub topic to use for this // const topicId = 'MY-PUBSUB-TOPIC'  // The name of the Pub/Sub subscription to use when listening for job // completion notifications // TODO(developer): create a Pub/Sub subscription to use for this // const subscriptionId = 'MY-PUBSUB-SUBSCRIPTION'  // DLP Job max time (in milliseconds) const DLP_JOB_WAIT_TIME = 15 * 1000 * 60;  async function inspectBigqueryWithSampling() {   // Specify the type of info the inspection will look for.   // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types   const infoTypes = [{name: 'PERSON_NAME'}];    // Specify the BigQuery options required for inspection.   const storageItem = {     bigQueryOptions: {       tableReference: {         projectId: dataProjectId,         datasetId: datasetId,         tableId: tableId,       },       rowsLimit: 1000,       sampleMethod:         DLP.protos.google.privacy.dlp.v2.BigQueryOptions.SampleMethod           .RANDOM_START,       includedFields: [{name: 'name'}],     },   };    // Specify the action that is triggered when the job completes.   const actions = [     {       pubSub: {         topic: `projects/${projectId}/topics/${topicId}`,       },     },   ];    // Construct request for creating an inspect job   const request = {     parent: `projects/${projectId}/locations/global`,     inspectJob: {       inspectConfig: {         infoTypes: infoTypes,         includeQuote: true,       },       storageConfig: storageItem,       actions: actions,     },   };   // Use the client to send the request.   const [topicResponse] = await pubsub.topic(topicId).get();    // Verify the Pub/Sub topic and listen for job notifications via an   // existing subscription.   const subscription = await topicResponse.subscription(subscriptionId);    const [jobsResponse] = await dlp.createDlpJob(request);   const jobName = jobsResponse.name;    // Watch the Pub/Sub topic until the DLP job finishes   await new Promise((resolve, reject) => {     // Set up the timeout     const timer = setTimeout(() => {       reject(new Error('Timeout'));     }, DLP_JOB_WAIT_TIME);      const messageHandler = message => {       if (message.attributes && message.attributes.DlpJobName === jobName) {         message.ack();         subscription.removeListener('message', messageHandler);         subscription.removeListener('error', errorHandler);         clearTimeout(timer);         resolve(jobName);       } else {         message.nack();       }     };      const errorHandler = err => {       subscription.removeListener('message', messageHandler);       subscription.removeListener('error', errorHandler);       clearTimeout(timer);       reject(err);     };      subscription.on('message', messageHandler);     subscription.on('error', errorHandler);   });   const [job] = await dlp.getDlpJob({name: jobName});   console.log(`Job ${job.name} status: ${job.state}`);    const infoTypeStats = job.inspectDetails.result.infoTypeStats;   if (infoTypeStats.length > 0) {     infoTypeStats.forEach(infoTypeStat => {       console.log(         `  Found ${infoTypeStat.count} instance(s) of infoType ${infoTypeStat.infoType.name}.`       );     });   } else {     console.log('No findings.');   } }  await inspectBigqueryWithSampling();

PHP

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

 use Google\Cloud\Dlp\V2\Action; use Google\Cloud\Dlp\V2\Action\PublishToPubSub; use Google\Cloud\Dlp\V2\BigQueryOptions; use Google\Cloud\Dlp\V2\BigQueryOptions\SampleMethod; use Google\Cloud\Dlp\V2\BigQueryTable; use Google\Cloud\Dlp\V2\Client\DlpServiceClient; use Google\Cloud\Dlp\V2\CreateDlpJobRequest; use Google\Cloud\Dlp\V2\DlpJob\JobState; use Google\Cloud\Dlp\V2\FieldId; use Google\Cloud\Dlp\V2\GetDlpJobRequest; use Google\Cloud\Dlp\V2\InfoType; use Google\Cloud\Dlp\V2\InspectConfig; use Google\Cloud\Dlp\V2\InspectJobConfig; use Google\Cloud\Dlp\V2\StorageConfig; use Google\Cloud\PubSub\PubSubClient;  /**  * Inspect BigQuery for sensitive data with sampling.  * The following examples demonstrate using the Cloud Data Loss Prevention  * API to scan a 1000-row subset of a BigQuery table. The scan starts from  * a random row.  *  * @param string $callingProjectId  The project ID to run the API call under.  * @param string $topicId           The Pub/Sub topic ID to notify once the job is completed.  * @param string $subscriptionId    The Pub/Sub subscription ID to use when listening for job.  * @param string $projectId         The Google Cloud Project ID.  * @param string $datasetId         The BigQuery Dataset ID.  * @param string $tableId           The BigQuery Table ID to be inspected.  */ function inspect_bigquery_with_sampling(     string $callingProjectId,     string $topicId,     string $subscriptionId,     string $projectId,     string $datasetId,     string $tableId ): void {     // Instantiate a client.     $dlp = new DlpServiceClient();     $pubsub = new PubSubClient();     $topic = $pubsub->topic($topicId);      // Specify the BigQuery table to be inspected.     $bigqueryTable = (new BigQueryTable())         ->setProjectId($projectId)         ->setDatasetId($datasetId)         ->setTableId($tableId);      $bigQueryOptions = (new BigQueryOptions())         ->setTableReference($bigqueryTable)         ->setRowsLimit(1000)         ->setSampleMethod(SampleMethod::RANDOM_START)         ->setIdentifyingFields([             (new FieldId())                 ->setName('name')         ]);      $storageConfig = (new StorageConfig())         ->setBigQueryOptions($bigQueryOptions);      // Specify the type of info the inspection will look for.     // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types     $personNameInfoType = (new InfoType())         ->setName('PERSON_NAME');     $infoTypes = [$personNameInfoType];      // Specify how the content should be inspected.     $inspectConfig = (new InspectConfig())         ->setInfoTypes($infoTypes)         ->setIncludeQuote(true);      // Specify the action that is triggered when the job completes.     $pubSubAction = (new PublishToPubSub())         ->setTopic($topic->name());      $action = (new Action())         ->setPubSub($pubSubAction);      // Configure the long running job we want the service to perform.     $inspectJob = (new InspectJobConfig())         ->setInspectConfig($inspectConfig)         ->setStorageConfig($storageConfig)         ->setActions([$action]);      // Listen for job notifications via an existing topic/subscription.     $subscription = $topic->subscription($subscriptionId);      // Submit request     $parent = "projects/$callingProjectId/locations/global";     $createDlpJobRequest = (new CreateDlpJobRequest())         ->setParent($parent)         ->setInspectJob($inspectJob);     $job = $dlp->createDlpJob($createDlpJobRequest);      // Poll Pub/Sub using exponential backoff until job finishes     // Consider using an asynchronous execution model such as Cloud Functions     $attempt = 1;     $startTime = time();     do {         foreach ($subscription->pull() as $message) {             if (                 isset($message->attributes()['DlpJobName']) &&                 $message->attributes()['DlpJobName'] === $job->getName()             ) {                 $subscription->acknowledge($message);                 // Get the updated job. Loop to avoid race condition with DLP API.                 do {                     $getDlpJobRequest = (new GetDlpJobRequest())                         ->setName($job->getName());                     $job = $dlp->getDlpJob($getDlpJobRequest);                 } while ($job->getState() == JobState::RUNNING);                 break 2; // break from parent do while             }         }         printf('Waiting for job to complete' . PHP_EOL);         // Exponential backoff with max delay of 60 seconds         sleep(min(60, pow(2, ++$attempt)));     } while (time() - $startTime < 600); // 10 minute timeout      // Print finding counts     printf('Job %s status: %s' . PHP_EOL, $job->getName(), JobState::name($job->getState()));     switch ($job->getState()) {         case JobState::DONE:             $infoTypeStats = $job->getInspectDetails()->getResult()->getInfoTypeStats();             if (count($infoTypeStats) === 0) {                 printf('No findings.' . PHP_EOL);             } else {                 foreach ($infoTypeStats as $infoTypeStat) {                     printf(                         '  Found %s instance(s) of infoType %s' . PHP_EOL,                         $infoTypeStat->getCount(),                         $infoTypeStat->getInfoType()->getName()                     );                 }             }             break;         case JobState::FAILED:             printf('Job %s had errors:' . PHP_EOL, $job->getName());             $errors = $job->getErrors();             foreach ($errors as $error) {                 var_dump($error->getDetails());             }             break;         case JobState::PENDING:             printf('Job has not completed. Consider a longer timeout or an asynchronous execution model' . PHP_EOL);             break;         default:             printf('Unexpected job state. Most likely, the job is either running or has not yet started.');     } }

Python

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

import threading  import google.cloud.dlp import google.cloud.pubsub   def inspect_bigquery_table_with_sampling(     project: str,     topic_id: str,     subscription_id: str,     min_likelihood: str = None,     max_findings: str = None,     timeout: int = 300, ) -> None:     """Uses the Data Loss Prevention API to analyze BigQuery data by limiting     the amount of data to be scanned.     Args:         project: The Google Cloud project id to use as a parent resource.         topic_id: The id of the Cloud Pub/Sub topic to which the API will             broadcast job completion. The topic must already exist.         subscription_id: The id of the Cloud Pub/Sub subscription to listen on             while waiting for job completion. The subscription must already             exist and be subscribed to the topic.         min_likelihood: A string representing the minimum likelihood threshold             that constitutes a match. One of: 'LIKELIHOOD_UNSPECIFIED',             'VERY_UNLIKELY', 'UNLIKELY', 'POSSIBLE', 'LIKELY', 'VERY_LIKELY'.         max_findings: The maximum number of findings to report; 0 = no maximum.         timeout: The number of seconds to wait for a response from the API.     """      # Instantiate a client.     dlp = google.cloud.dlp_v2.DlpServiceClient()      # Specify how the content should be inspected. Keys which are None may     # optionally be omitted entirely.     inspect_config = {         "info_types": [{"name": "PERSON_NAME"}],         "min_likelihood": min_likelihood,         "limits": {"max_findings_per_request": max_findings},         "include_quote": True,     }      # Specify the BigQuery table to be inspected.     # Here we are using public bigquery table.     table_reference = {         "project_id": "bigquery-public-data",         "dataset_id": "usa_names",         "table_id": "usa_1910_current",     }      # Construct a storage_config containing the target BigQuery info.     storage_config = {         "big_query_options": {             "table_reference": table_reference,             "rows_limit": 1000,             "sample_method": "RANDOM_START",             "identifying_fields": [{"name": "name"}],         }     }      # Tell the API where to send a notification when the job is complete.     topic = google.cloud.pubsub.PublisherClient.topic_path(project, topic_id)     actions = [{"pub_sub": {"topic": topic}}]      # Construct the inspect_job, which defines the entire inspect content task.     inspect_job = {         "inspect_config": inspect_config,         "storage_config": storage_config,         "actions": actions,     }      # Convert the project id into full resource ids.     parent = f"projects/{project}/locations/global"      # Call the API     operation = dlp.create_dlp_job(         request={"parent": parent, "inspect_job": inspect_job}     )     print(f"Inspection operation started: {operation.name}")      # Create a Pub/Sub client and find the subscription. The subscription is     # expected to already be listening to the topic.     subscriber = google.cloud.pubsub.SubscriberClient()     subscription_path = subscriber.subscription_path(project, subscription_id)      # Set up a callback to acknowledge a message. This closes around an event     # so that it can signal that it is done and the main thread can continue.     job_done = threading.Event()      def callback(message: google.cloud.pubsub_v1.subscriber.message.Message) -> None:         try:             if message.attributes["DlpJobName"] == operation.name:                 # This is the message we're looking for, so acknowledge it.                 message.ack()                  # Now that the job is done, fetch the results and print them.                 job = dlp.get_dlp_job(request={"name": operation.name})                 print(f"Job name: {job.name}")                  if job.inspect_details.result.info_type_stats:                     for finding in job.inspect_details.result.info_type_stats:                         print(                             f"Info type: {finding.info_type.name}; Count: {finding.count}"                         )                 else:                     print("No findings.")                  # Signal to the main thread that we can exit.                 job_done.set()             else:                 # This is not the message we're looking for.                 message.drop()          except Exception as e:             # Because this is executing in a thread, an exception won't be             # noted unless we print it manually.             print(e)             raise      # Register the callback and wait on the event.     subscriber.subscribe(subscription_path, callback=callback)     finished = job_done.wait(timeout=timeout)     if not finished:         print(             "No event received before the timeout. Please verify that the "             "subscription provided is subscribed to the topic provided."         )

C#

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

 using Google.Api.Gax.ResourceNames; using Google.Cloud.Dlp.V2; using Google.Cloud.PubSub.V1; using System.Collections.Generic; using System.Threading; using System.Threading.Tasks; using static Google.Cloud.Dlp.V2.InspectConfig.Types;  public class InspectBigQueryWithSampling {     public static async Task<DlpJob> InspectAsync(         string projectId,         int maxFindings,         bool includeQuote,         string topicId,         string subId,         Likelihood minLikelihood = Likelihood.Possible,         IEnumerable<FieldId> identifyingFields = null,         IEnumerable<InfoType> infoTypes = null)     {          // Instantiate the dlp client.         var dlp = DlpServiceClient.Create();          // Construct Storage config.         var storageConfig = new StorageConfig         {             BigQueryOptions = new BigQueryOptions             {                 TableReference = new BigQueryTable                 {                     ProjectId = "bigquery-public-data",                     DatasetId = "usa_names",                     TableId = "usa_1910_current",                 },                 IdentifyingFields =                 {                     identifyingFields ?? new FieldId[] { new FieldId { Name = "name" } }                 },                 RowsLimit = 100,                 SampleMethod = BigQueryOptions.Types.SampleMethod.RandomStart             }         };          // Construct the inspect config.         var inspectConfig = new InspectConfig         {             InfoTypes = { infoTypes ?? new InfoType[] { new InfoType { Name = "PERSON_NAME" } } },             Limits = new FindingLimits             {                 MaxFindingsPerRequest = maxFindings,             },             IncludeQuote = includeQuote,             MinLikelihood = minLikelihood         };          // Construct the pubsub action.         var actions = new Action[]         {             new Action             {                 PubSub = new Action.Types.PublishToPubSub                 {                     Topic = $"projects/{projectId}/topics/{topicId}"                 }             }         };          // Construct the inspect job config using the actions.         var inspectJob = new InspectJobConfig         {             StorageConfig = storageConfig,             InspectConfig = inspectConfig,             Actions = { actions }         };          // Issue Create Dlp Job Request.         var request = new CreateDlpJobRequest         {             InspectJob = inspectJob,             ParentAsLocationName = new LocationName(projectId, "global"),         };          // We keep the name of the job that we just created.         var dlpJob = dlp.CreateDlpJob(request);         var jobName = dlpJob.Name;          // Listen to pub/sub for the job.         var subscriptionName = new SubscriptionName(projectId, subId);         var subscriber = await SubscriberClient.CreateAsync(             subscriptionName);          // SimpleSubscriber runs your message handle function on multiple threads to maximize throughput.         await subscriber.StartAsync((PubsubMessage message, CancellationToken cancel) =>         {             if (message.Attributes["DlpJobName"] == jobName)             {                 subscriber.StopAsync(cancel);                 return Task.FromResult(SubscriberClient.Reply.Ack);             }             else             {                 return Task.FromResult(SubscriberClient.Reply.Nack);             }         });          // Get the latest state of the job from the service.         var resultJob = dlp.GetDlpJob(new GetDlpJobRequest         {             DlpJobName = DlpJobName.Parse(jobName)         });          // Parse the response and process results.         System.Console.WriteLine($"Job status: {resultJob.State}");         System.Console.WriteLine($"Job Name: {resultJob.Name}");         var result = resultJob.InspectDetails.Result;         foreach (var infoType in result.InfoTypeStats)         {             System.Console.WriteLine($"Info Type: {infoType.InfoType.Name}");             System.Console.WriteLine($"Count: {infoType.Count}");         }         return resultJob;     } }

REST

Input JSON:

POST https://dlp.googleapis.com/v2/projects/[PROJECT-ID]/dlpJobs?key={YOUR_API_KEY}  {   "inspectJob":{     "storageConfig":{       "bigQueryOptions":{         "tableReference":{           "projectId":"bigquery-public-data",           "datasetId":"usa_names",           "tableId":"usa_1910_current"         },         "rowsLimit":"1000",         "sampleMethod":"RANDOM_START",         "includedFields":[           {             "name":"name"           }         ]       }     },     "inspectConfig":{       "infoTypes":[         {           "name":"FIRST_NAME"         }       ],       "includeQuote":true     },     "actions":[       {         "saveFindings":{           "outputConfig":{             "table":{               "projectId":"[PROJECT-ID]",               "datasetId":"testingdlp",               "tableId":"bqsample3"             },             "outputSchema":"BASIC_COLUMNS"           }         }       }     ]   } }

Dopo aver inviato l'input JSON in una richiesta POST all'endpoint specificato, viene creato un job di Sensitive Data Protection e l'API invia la seguente risposta.

Output JSON:

{   "name": "projects/[PROJECT-ID]/dlpJobs/[JOB-ID]",   "type": "INSPECT_JOB",   "state": "PENDING",   "inspectDetails": {     "requestedOptions": {       "snapshotInspectTemplate": {},       "jobConfig": {         "storageConfig": {           "bigQueryOptions": {             "tableReference": {               "projectId": "bigquery-public-data",               "datasetId": "usa_names",               "tableId": "usa_1910_current"             },             "rowsLimit": "1000",             "sampleMethod": "RANDOM_START",             "includedFields": [               {                 "name": "name"               }             ]           }         },         "inspectConfig": {           "infoTypes": [             {               "name": "FIRST_NAME"             }           ],           "limits": {},           "includeQuote": true         },         "actions": [           {             "saveFindings": {               "outputConfig": {                 "table": {                   "projectId": "[PROJECT-ID]",                   "datasetId": "[DATASET-ID]",                   "tableId": "bqsample"                 },                 "outputSchema": "BASIC_COLUMNS"               }             }           }         ]       }     },     "result": {}   },   "createTime": "2022-11-04T18:53:48.350Z" }

Al termine dell'esecuzione del job di ispezione e dell'elaborazione dei risultati da parte di BigQuery, i risultati della scansione sono disponibili nella tabella di output BigQuery specificata. Per saperne di più sul recupero dei risultati dell'ispezione, consulta la sezione successiva.

Recuperare i risultati dell'ispezione

Puoi recuperare un riepilogo di un DlpJob utilizzando il metodo projects.dlpJobs.get. L'DlpJob restituito include l'oggetto InspectDataSourceDetails, che contiene sia un riepilogo della configurazione del job (RequestedOptions) sia un riepilogo del risultato del job (Result). Il riepilogo del risultato include:

processedBytes: la dimensione totale in byte che è stata elaborata.
totalEstimatedBytes: Stima del numero di byte rimanenti da elaborare.
InfoTypeStatistics object: Statistiche sul numero di istanze di ogni infoType trovate durante il job di ispezione.

Per i risultati completi del job di ispezione, hai diverse opzioni. A seconda del Action che hai scelto, i job di ispezione sono:

Salvato in BigQuery (l'oggetto SaveFindings) nella tabella specificata. Prima di visualizzare o analizzare i risultati, assicurati che il job sia stato completato utilizzando il metodo projects.dlpJobs.get, descritto di seguito. Tieni presente che puoi specificare uno schema per archiviare i risultati utilizzando l'oggetto OutputSchema.
Pubblicato in un argomento Pub/Sub (l'oggetto PublishToPubSub). L'argomento deve aver concesso i diritti di accesso alla pubblicazione al account di servizio Sensitive Data Protection che esegue l'invio delle notifiche DlpJob.
Pubblicato su Security Command Center.
Pubblicato in Data Catalog.
Pubblicato su Cloud Monitoring.

Per analizzare grandi quantità di dati generati da Sensitive Data Protection, puoi utilizzare gli strumenti BigQuery integrati per eseguire analisi SQL avanzate o strumenti come Looker Studio per generare report. Per ulteriori informazioni, consulta Analisi e generazione di report sui risultati di Sensitive Data Protection. Per alcuni esempi di query, consulta Esecuzione di query sui risultati in BigQuery.

L'invio di una richiesta di ispezione del repository di archiviazione a Sensitive Data Protection crea ed esegue un'istanza dell'oggetto DlpJob in risposta. L'esecuzione di questi job può richiedere secondi, minuti o ore, a seconda delle dimensioni dei dati e della configurazione specificata. Se scegli di pubblicare in un argomento Pub/Sub (specificando PublishToPubSub in Action), le notifiche vengono inviate automaticamente all'argomento con il nome specificato quando lo stato del job cambia. Il nome dell'argomento Pub/Sub è specificato nel formato projects/[PROJECT-ID]/topics/[PUBSUB-TOPIC-NAME].

Hai il controllo completo sui lavori che crei, inclusi i seguenti metodi di gestione:

projects.dlpJobs.cancel method: Interrompe un job attualmente in corso. Il server fa del suo meglio per annullare il job, ma la riuscita non è garantita. Il job e la relativa configurazione rimarranno fino a quando non li eliminerai (con .
projects.dlpJobs.delete method: Elimina un job e la relativa configurazione.
projects.dlpJobs.get method: Retrieves a single job and returns its status, its configuration, and, if the job is done, summary results.
Metodo projects.dlpJobs.list: recupera un elenco di tutti i job e include la possibilità di filtrare i risultati.

Passaggi successivi

Scopri di più sulla creazione di job di ispezione dell'archiviazione. Consulta Creazione e pianificazione di job di ispezione di Sensitive Data Protection.
Scopri di più sulla creazione di una copia anonimizzata dei dati archiviati.
Scopri di più sui tipi di file supportati durante l'ispezione dei bucket Cloud Storage. Consulta la sezione Tipi di file supportati.

Controlla la presenza di dati sensibili nello spazio di archiviazione e nei database di Google Cloud Mantieni tutto organizzato con le raccolte Salva e classifica i contenuti in base alle tue preferenze.

Best practice

Identificare e dare la priorità alla scansione

Assicurarsi che Sensitive Data Protection possa accedere ai tuoi dati

Limitare l'ambito delle prime scansioni

Pianificare le scansioni

Latenza job

Prima di iniziare

Ispeziona una posizione Cloud Storage

Console

Protocollo

Java

Node.js

Python

Go

PHP

C#

Ispezionare un tipo di Datastore

Console

Protocollo

Java

Node.js

Python

Go

PHP

C#

Esamina una tabella BigQuery

Console

Protocollo

Java

Node.js

Python

Go

PHP

C#

Configurare l'ispezione dello spazio di archiviazione

Limitare la quantità di contenuti esaminati

Limitare le scansioni di Cloud Storage

C#

Go

Java

Node.js

PHP

Python

REST

Limitare le scansioni BigQuery

Go

Java

Node.js

PHP

Python

C#

REST

Recuperare i risultati dell'ispezione

Passaggi successivi

Controlla la presenza di dati sensibili nello spazio di archiviazione e nei database di Google Cloud