Scanning Overview¶
Object Flow¶
There are a number of ways for objects to be placed into buckets: direct upload, cli and more predominantly through applications providing interfaces and workflows with employees, customers and partners. However the objects arrive, Cloud Storage Security sees three main interaction mechanisms with those objects: event driven, api driven and retro driven (the looking back upon). Currently, Antivirus for Amazon S3 delivers both event driven scanning
as well as retro scanning
.
Event Driven Scanning¶
Event driven
scanning is where an event, in this case the All object create event
, is leveraged on the bucket so any time an object is created/modified within the bucket an event is raised. Antivirus for Amazon S3 places and event destination / handler onto the protected buckets which listen for these events to trigger scanning. This allows Antivirus for Amazon S3
to easily plugin to any existing workflow you have without modifications.
So this looks as follows:
- An object is added to a protected bucket
- An event is raised and sent to an SNS Topic
- The Antivirus for Amazon S3 provides an SQS Queue which subscribes to the Topic
- One or more
Antivirus for Amazon S3 Agents
are monitoring the queue - Entries are pulled from the queue identifying the object to scan. The object is retrieved and scanned
- Objects are handled according to the Scan Settings you have set
a. All objects are tagged
b. Infected files are moved to a quarantine bucket (default behavior)
Note
This flow and behavior is irrespective of region. Amazon S3 buckets have a global view, but are regionally placed. This flow will be performed local to each region you enable buckets for scanning.
Retro Scanning¶
Retro Scanning
is the scanning of existing objects. Whether you want to scan all your existing data the first time you setup the product or you have compliance requirements that dictate a regular cadence of data scanning, Retro Scanning will allow you scan and re-scan your existing S3 objects. Unlike event driven scanning
which looks at all "new" objects coming into buckets and has scanning triggered by events, retro scanning
leverages S3 crawling to determine the work set. If it is the first time you are enabling a bucket for event scanning, you'll be prompted to scan existing data which will automatically default to a time window of the beginning of time (March 14, 2006) through the current time. We won't continue past the current time of activation as event scanning will take care of everything going forward. The default time window can be changed. If you are triggering retro scanning without a bucket activation, you will be prompted to pick a time slice for scanning. This can still be all time
or a window of time of your choosing.
It looks as follows:
- You have existing objects within your AWS account(s)
- Via the console, you trigger a scan for existing objects
This can be all objects within the bucket(s) or a subset based on a time window
- The Antivirus for Amazon S3 crawling process will crawl all objects within the bucket(s) and add entries to a Retro SQS Queue
Amazon S3 doesn't allow for simple searching and segmenting of the objects to process so all objects must be crawled. Only the objects that match the time window will be added to the queue for processing. If you have 1 million objects in a bucket and the time window dictates such that only 50,000 are processed, all 1 million will be crawled, but only the 50,000 will be scanned.
- One or more
Antivirus for Amazon S3 Agents
are monitoring the queueThe
retro agents
are the same agent as theevent agents
, but run in a separate cluster service. Retro agents do not run all the time like the event agents do. Rather, the retro agent will spin up when the queue has entries added to it and then completely spin down when the queue is cleared. - Entries are pulled from the queue identifying the object to scan. The object is retrieved and scanned
Scheduled Scanning¶
Scheduled Scanning allows you to process new files or existing files based on a schedule. Instead of processing new files as they come in, your workflow may allow them to be scanned once per day. For compliance reasons you may be required to scan all of your files on a quarterly basis and Scheduled Scanning will allow you to do that. Learn more about schedule based scanning.
API Driven Scanning¶
API driven scanning is the notion of a true real-time scan. You have a workflow that demands a verdict at the time of uploaded. This is often necessary in applications where users are waiting to be told the upload was successful and the file accepted. APIs allow the application to make a direct handoff of the file to the scanning agent.
Coming soon!
If you have interests in this approach, please Contact Us.