Skip to content

Scanning Overview

Object Flow

There are a number of ways for objects to be placed into buckets: direct upload, cli and more predominantly through applications providing interfaces and workflows with employees, customers and partners. However the objects arrive, Cloud Storage Security sees three main interaction mechanisms with those objects: event driven, api driven and retro driven (the looking back upon). Currently, Antivirus for Amazon S3 delivers both event driven scanning as well as retro scanning.

Event Driven Scanning

Event driven scanning is where an event, in this case the All object create event, is leveraged on the bucket so any time an object is created/modified within the bucket an event is raised. Antivirus for Amazon S3 places and event destination / handler onto the protected buckets which listen for these events to trigger scanning. This allows Antivirus for Amazon S3 to easily plugin to any existing workflow you have without modifications.

So this looks as follows:

  1. An object is added to a protected bucket
  2. An event is raised and sent to an SNS Topic
  3. The Antivirus for Amazon S3 provides an SQS Queue which subscribes to the Topic
  4. One or more Antivirus for Amazon S3 Agents are monitoring the queue
  5. Entries are pulled from the queue identifying the object to scan. The object is retrieved and scanned
  6. Objects are handled according to the Scan Settings you have set
    a. All objects are tagged
    b. Infected files are moved to a quarantine bucket (default behavior)


This flow and behavior is irrespective of region. Amazon S3 buckets have a global view, but are regionally placed. This flow will be performed local to each region you enable buckets for scanning.

Retro Scanning

Retro Scanning is the scanning of existing objects. Whether you want to scan all your existing data the first time you setup the product or you have compliance requirements that dictate a regular cadence of data scanning, Retro Scanning will allow you scan and re-scan your existing S3 objects. Unlike event driven scanning which looks at all "new" objects coming into buckets and has scanning triggered by events, retro scanning leverages S3 crawling to determine the work set. If it is the first time you are enabling a bucket for event scanning, you'll be prompted to scan existing data which will automatically default to a time window of the beginning of time (March 14, 2006) through the current time. We won't continue past the current time of activation as event scanning will take care of everything going forward. The default time window can be changed. If you are triggering retro scanning without a bucket activation, you will be prompted to pick a time slice for scanning. This can still be all time or a window of time of your choosing.

It looks as follows:

  1. You have existing objects within your AWS account(s)
  2. Via the console, you trigger a scan for existing objects

    This can be all objects within the bucket(s) or a subset based on a time window

  3. The Antivirus for Amazon S3 crawling process will crawl all objects within the bucket(s) and add entries to a Retro SQS Queue

    Amazon S3 doesn't allow for simple searching and segmenting of the objects to process so all objects must be crawled. Only the objects that match the time window will be added to the queue for processing. If you have 1 million objects in a bucket and the time window dictates such that only 50,000 are processed, all 1 million will be crawled, but only the 50,000 will be scanned.

  4. One or more Antivirus for Amazon S3 Agents are monitoring the queue

    The retro agents are the same agent as the event agents, but run in a separate cluster service. Retro agents do not run all the time like the event agents do. Rather, the retro agent will spin up when the queue has entries added to it and then completely spin down when the queue is cleared.

  5. Entries are pulled from the queue identifying the object to scan. The object is retrieved and scanned

Scheduled Scanning

Scheduled Scanning allows you to process new files or existing files based on a schedule. Instead of processing new files as they come in, your workflow may allow them to be scanned once per day. For compliance reasons you may be required to scan all of your files on a quarterly basis and Scheduled Scanning will allow you to do that. Learn more about schedule based scanning.

API Driven Scanning

API driven scanning is the notion of a true real-time scan. You have a workflow that demands a verdict at the time of uploaded. This is often necessary in applications where users are waiting to be told the upload was successful and the file accepted. APIs allow the application to make a direct handoff of the file to the scanning agent.

Coming soon!
If you have interests in this approach, please Contact Us.

Last update: February 4, 2021