Cloud Storage Security Help Docs
Release Notes
  • Introduction
  • Getting Started
    • How to Subscribe
      • Pay-As-You-Go (PAYG)
      • Bring Your Own License/GovCloud (BYOL)
      • AWS Transfer Family
    • How to Deploy
      • Steps to Deploy
      • Advanced Deployment Considerations
      • AWS Transfer Family
    • How to Configure
  • Console Overview
    • Dashboard
    • Malware Scanning
      • AWS
        • Buckets
        • Amazon EBS Volumes
        • Amazon EFS Volumes
        • Amazon FSx Volumes
        • WorkDocs Connections
      • Azure
        • Blob Containers
      • GCP
        • GCP Buckets
    • See What's Infected
      • Findings
      • Malware History
      • Results
    • Schedules
    • Monitoring
      • Error Logs
      • Bucket Settings
      • Deployment
      • Jobs
      • Notifications
      • Storage Assessment
      • Usage
    • Configuration
      • Classification Rule Sets
      • Classification Custom Rules
      • Scan Settings
      • Console Settings
      • AWS Integrations
      • Job Networking
      • API Agent Settings
      • Proactive Notifications
      • License Management
      • Event Agent Settings
    • Access Management
      • Manage Users
      • Manage Accounts
        • Linking an AWS Account
        • Linking an Azure Account
        • Linking a GCP Account
      • Manage Groups
    • Support
      • Getting Started
      • Stay Connected
      • Contact Us
      • Documentation
  • Product Updates
  • How It Works
    • Scanning Overview
      • Event Driven Scanning for New Files
      • Retro Scanning for Pre-Existing Files
      • API Driven Scanning
    • Architecture Overview
    • Deployment Details
    • Sizing Discussion
    • Integrations
      • AWS Security Hub
      • AWS CloudTrail Lake
      • AWS Transfer Family
      • Amazon GuardDuty
      • Amazon Bedrock
    • Demo Videos
    • Scanning APIs
    • SSO Integrations
      • Entra ID SSO Integration
      • Okta SSO Integration
  • Frequently Asked Questions
    • Getting Started
    • Product Functionality
    • Architecture Related
    • Supported File Types
  • Troubleshooting
    • CloudFormation Stack failures
    • Cross-Region Scanning on with private network
    • API Scanning: Could not connect to SSL/TLS (v7)
    • Password not received after deployment
    • Conflicted buckets
    • Modifying scaling info post-deployment
    • Objects show unscannable with access denied
    • Remote account objects not scanning
    • My scanning agents keep starting up and immediately shutting down
    • I cannot access the management console
    • Linked Account Out of Date
    • Rebooting the Management Console
    • Error when upgrading to the latest major version
    • I Cannot Create/Delete an API Agent
  • Release Notes
    • Latest (v8)
    • v7
    • v6 and older
  • Contact Us & Support
  • Data Processing Agreement
  • Privacy Policy
Powered by GitBook
On this page
  • On-Demand Scanning
  • Scheduled Scanning
  1. How It Works
  2. Scanning Overview

Retro Scanning for Pre-Existing Files

PreviousEvent Driven Scanning for New FilesNextAPI Driven Scanning

Last updated 1 year ago

Retro Scanning is the scanning of existing objects. Whether you want to scan all your existing data the first time you setup the product or you have compliance requirements that dictate a regular cadence of data scanning, Retro Scanning will allow you to scan and re-scan your existing S3 objects. Unlike event driven scanning which looks at all "new" objects coming into buckets and has scanning triggered by events, retro scanning leverages S3 crawling to determine the work set. If it is the first time you are enabling a bucket for event scanning, you'll be prompted to scan existing data which will automatically default to a time window of the beginning of time (March 14, 2006) through the current time. We won't continue past the current time of activation as event scanning will take care of everything going forward. The default time window can be changed. If you are triggering retro scanning without a bucket activation, you will be prompted to pick a time slice for scanning. This can still be all time or a window of time of your choosing.

It looks as follows:

  1. You have existing objects within your AWS account(s)

  2. Via the console, you trigger a scan for existing objects (on-demand or through a schedule). This can be all objects within the bucket(s) or a subset based on a time window.

  3. The Antivirus for Amazon S3 spins up a Fargate Run Task(s) for each on-demand request or schedule that will trigger the crawling process which will crawl all objects within the bucket(s) and add entries to a temporary Retro SQS Queue uniquely made for the job

    Amazon S3 doesn't allow for simple searching and segmenting of the objects to process so all objects must be crawled. Only the objects that match the time window will be added to the queue for processing. If you have 1 million objects in a bucket and the time window dictates such that only 50,000 are processed, all 1 million will be crawled, but only the 50,000 will be scanned.

    Antivirus for Amazon S3 will spin up a number of Run Task scanning agents with the notion to complete the work in ~1 hour. Depending on the volume (# of buckets, # of objects, size of buckets) related to the job this could be a "lot" of agents. Ultimately, this has no affect on the cost as running 1 agent for 100 hours to complete the job is the same cost as running 100 agents for 1 hour to complete the job.

    You may face service quota limits when running big jobs or multiple jobs at the same time. The default value for the number of these tasks is 1000 in most accounts. If it was needed to go beyond 1000 the process will not stop or break, it will just take longer than it would if you could spin more tasks up. If you know your loads will typically require more tasks beyond that, you can make a request to AWS to increase your limits.

  4. Once crawling has completed for a job, a new set of Fargate Run Tasks are spun up to perform the scanning. Antivirus for Amazon S3 will automatically spin up the number of scanning agents to complete the scan.

    The retro agents are the same agent as the event agents, but run in as Fargate Run Tasks. These agents will destroy themselves when the job has completed.

  5. Entries are pulled from the queue identifying the object to scan. The object is retrieved and scanned

On-Demand Scanning

Scheduled Scanning

On-Demand Scanning is a user initiated scan from within the management console GUI. Triggered from the , a user can on a one-off basis select one or many buckets and a time window to scan objects. This scan will be treated and tracked as a "job" on the under the Monitoring section of the console.

Scheduled Scanning allows you to process new files or existing files based on a schedule. Instead of processing new files as they come in, your workflow may allow them to be scanned once per day. For compliance reasons you may be required to scan all of your files on a quarterly basis and Scheduled Scanning will allow you to do that. Learn more about . Scheduled scans will be treated and tracked as a "job" on the under the Monitoring section of the console.

schedule based scanning
Jobs page
Jobs page
Bucket Protection page