Scan Settings

There are four configuration adjustments you can make to the Scanning/Classification Agents

  1. Tag Name changes

  2. Infected File handling (for AV) and Matching File handling (for DC)

  3. Scan Engine choice

  4. Decisions around which objects to scan (Scan list / Skip list).

These Scan Behavior Settings modifications will apply to all agents currently running and new ones that spin up going forward no matter the region. Changes made here will not require a reboot for any agent, but could take 30 seconds to take affect. Such as, custom naming the tags we apply to each object as well as providing scan list and skip list functionality for the buckets.

These are global changes and therefore it is not currently possible to create different behavior by region or group

Object Tag Keys

Every file the S3 scanner touches has an AWS Tag applied to it. You can change the default key names if required or desired, but not the values.

KeyDescription

scan-result

Identifies whether a file is found to be clean or having issues. Possible values: Clean, Infected, Unscannable, Error, InfectedAllowed

  • Clean = no issues found with file

  • Infected = malware found;

  • InfectedAllowed = represents a file you have moved from a quarantine bucket back into its originating bucket; could be a "false positive"

  • Unscannable = object is password protected or greater than maximum size allowed (20MB for CrowdStrike, 2GB for ClamAV, and 5TB for Sophos)

  • Error = access to object issues: KMS permissions, cross account permissions, bucket policy blocking, other

date-scanned

The date and time the object was scanned

message

A description of what has been identified in the file. Only populated for Error or Unscannable results.hint

virus-name

Name of virus identified

uploaded-by

AWS user identifier

AWS allows an object to have only 10 tags applied to it. At most we will add 5 tags (for infected files) and only 2 tags for clean files. If you have a number of tags on your object already, we will trim the number of tags we add to ensure none of the existing tags are dropped. If only 1 tag is available for example, we will write only the scan-result tag onto the object.

Action for Infected Files

There are 3 main actions you can take with an infected file: move (default), delete and keep.

Move directs the scanner to take the file and place it (copy then delete) in a quarantine bucket. The console creates a quarantine bucket in each region you enable buckets for scanning. The bucket will be named uniquely with the ApplicationID tacked onto it along with the region it was created in. This is the default behavior.

Delete directs the scanner to remove the file entirely.

Keep directs the scanner to leave the file in the bucket it found it. The scanner will still tag the object appropriately.

Restore Quarantined Files

Using the default Move action places files identified as infected into a quarantine bucket. You may need to restore the file from quarantine bucket back into the originating bucket. You can do this on a one-time basis or permanently. Check out the Allow Suspect Files documentation for more information.

Anecdotally

Most customers continue with the default value of Move, but we have seen a number go with Keep. They keep the objects in place and leverage a bucket policy to make access to them available only when they are tagged with scan-result=clean.

Sample Bucket Policy to do this

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "BlockAccessExceptClean",
                "Effect": "Deny",
                "Principal": "*",
                "Action": "s3:GetObject",
                "Resource": "arn:aws:s3:::<staging bucket or any bucket>/*",
                "Condition": {
                    "StringNotEquals": {
                        "aws:PrincipalArn": [
                            "arn:aws:iam::<account-number>:role/CloudStorageSecConsoleRole-<applicationID>",
                            "arn:aws:iam::<account-number>:role/CloudStorageSecAgentRole-<applicationID>",
                            "<account-number>"
                        ],
                        "s3:ExistingObjectTag/scan-result": [
                            "Clean",
                            "Unscannable",
                            "InfectedAllowed"
                        ]
                    }
                }
            }
        ]
    }

We have a number of customers also using a "2 Bucket System" for more of a "physical separation" of the object storage. They utilize a staging/dirty bucket for all files to first be placed in. This is where all the scanning takes place. Once a file is found to be clean it is copied/moved to the production/usable bucket(s). At that point then downstream users and applications will be aware of the files and able to use them. This system uses a combination of our standard event-based scanning (on the staging bucket) and a lambda triggered by our real-time notifications. For more information on how to set this up, check out the 2 Bucket System write-up here.

Quarantine Buckets

If you leverage the Move action above, then the quarantine bucket and its settings come into play. By default, we place a quarantine bucket in each account (installed and linked accounts) that has buckets being protected by event-based scanning or on-demand/scheduled scanning. Because we don't want to pull files across accounts to store we have this as the default behavior. The quarantine buckets are also created per region so we aren't pulling across regions to store. So theoretically, you could have 20+ buckets added to your S3 if you are protecting data in that many regions.

We have had a number of customers determine they do not want to quarantine in the linked accounts. They would prefer to have all infected files quarantined to their centralized security account (the account the Antivirus for Amazon S3 solution is installed). There is an option here in the settings to Move to Main Account. Simply tick the toggle and we'll change how the application quarantines files. We will start storing all infected files into a centralized quarantine bucket.

A Lifecycle Policy is added to the quarantine bucket. The default policy indicates to keep files indefinitely, but you can change that so quarantined files will be deleted after a certain number of days. Pick a value that will give you time to work through whatever workflows you have in place to examine quarantined files.

Scanning Engine

Antivirus for Amazon S3 has been designed to work with multiple AV engines easily. Currently, there are three engines available: Sophos, CrowdStrike, and ClamAV. You can choose any engine and even easily switch it after the fact, but there are some distinctions that could weigh the decision in one favor or the other.

  • Choice comes down to cost vs large files with a bit of brand-name mixed in as well.

  • In large environments, the additional scan costs of the Sophos engine could be somewhat accounted for by the performance gains and the need to run less infrastructure.

  • CrowdStrike is an excellent ML-based engine, however it comes with limitations on maximum file size and file types that it can scan.

  • Look to the table below for the key differentiators.

EngineIncluded in Base Scan CostMaximum File SizePerformance

Sophos

No - add $0.10 per GB

Up to 5 TB

Better

CrowdStrike

No - add $0.10 per GB

Up to 20 MB

Best for ML-based scan

ClamAV

Yes

Up to 2 GB

Good

Please note, if you choose to use both Sophos and CrowdStrike you will receive a discounted premium rate of $0.15 per GB.

Multi-engine Scanning

Multi-engine scanning provides two options to use multiple engines for scanning. The two choices for how to use multi-engine scanning are: All Files and By File Size.

  • All Files indicates every file that is event-based scanned or on-demand/schedule scanned will be processed by the enabled engines.

  • By File Size indicates smaller files (<2GB) will be scanned by ClamAV-only or (<20MB) will be scanned by CrowdStrike-only (since ClamAV and CrowdStrike have a size limitation and can't scan above 2GB or 20MB respectively). Larger files (>2GB or >20MB) will be scanned by Sophos-only. This allows you to take advantage of the scan cost savings offered by ClamAV or ML-based scanning offer by CrowdStrike for part of your files, but still allow for those files that are too big for ClamAV or CrowdStrike to still be scanned.

Multi-Engine SelectionFiles (<20MB)Files (<2gb)Larger Files (>2gb)

All Files

All three engines

Sophos or ClamAV

Sophos-only

By File Size

CrowdStrike or ClamAV

ClamAV-only

Sophos-only

Bucket Protection Method

As of v7.00.000 we automatically use EventBridge to resolve any bucket conflicts.

If Protect with Event Bridge is enabled globally then we will protect all selected buckets with Event Bridge without acknowledgment.

If Protect with Event Bridge is not enabled we will protect buckets using the "best choice". If the bucket can be protected with the S3 Event Notification we will do so, but if conflicted we will fail over to Event Bridge.

You can learn more about how EventBridge works with protected buckets here.

Private Mirror - Local Signature Updates

In certain situations, you may prefer the scanning agents to retrieve signature updates locally rather than reaching outside of your account. We have customers who have requirements where the applications that touch data cannot go out to the public internet. Local updates will allow you to better control and potentially eliminate outbound access for the VPCs hosting the scanning agents. This option allows you to specify an Amazon S3 bucket in your account for the scanning agents to look to for signature updates. You can get the updates into this bucket however you see fit (sample lambda provided below). Each time a scanning agent boots it grabs the latest definitions. A running scanning agent will check every 1 hours for ClamAV (every 15 minutes for Sophos) for new signature definitions.

If you do not use local updates, ClamAV is setup to reach out over the internet to directly download the updates. This is happening at the agent, so any and all agents will need to speak to the public internet (outbound only) to retrieve the updates. Sophos is updated differently. Cloud Storage Security hosts an S3 bucket in our account that we maintain and update with the Sophos updates files. When running the Sophos engine your agents will retrieve updates directly from our bucket.

For both engines you can choose to have your agents look to your local bucket. We have provide a sample, fully working lambda for each engine. The ClamAV lambda pulls the updates down from the internet and the Sophos lambda copies the update files from our bucket to yours.

Note

If you can't wait 1 hour (or the 15 minutes) after a new signature update comes out, simply reboot all your agents and they will pick the new updates up immediately.

This is for signature updates only. Engine updates are done as part of our build process and will be rolled out with the next release.

If you choose to use Private Mirror with multi-engine scanning, you will need to setup Private Mirror for both sets of updates. You can and will need to use the same bucket for both.

Sample Lambdas

We provide a lambda function for Sophos. Sophos requires authentication to download their updates, so we are hosting them in a bucket where your account will be added to the permissions list. This is done automatically by switching on Local Updates.

Sophos Lambda

Because the updates are taking place inside of AWS already you may not consider it a need for local updates unlike how we download Clam updates directly from the internet. But, if you still want updates coming from your own bucket, then read on.

Sophos updates are already done from a bucket. A bucket we have granted your application account access to and the agents simply pull the updates from that bucket we host. For local updates for the Sophos engine all you really need to do is copy the update files from our bucket to yours. Like we do for ClamAV updates, we have provided some sample code you can turn into a lambda to pull the updates over.

Lambda Code

import boto3
import botocore
from datetime import datetime, timezone
CSS_SOPHOS_BUCKET = 'css-sophos-updates'
VDL_FILE_NAME = 'vdl.zip'
IDE_FILE_NAME = 'ide.zip'
DESTINATION_SOPHOS_BUCKET = '<enter-your-bucket-name-here>'
LAST_UPDATED_FILE_NAME = 'def_files_last_updated.txt'
def lambda_handler(event, context):
s3 = boto3.client('s3')
vdlLastModified = datetime(2006, 3, 14)
ideLastModified = datetime(2006, 3, 14)
try:
    lastUpdatedObj = s3.get_object(Bucket=DESTINATION_SOPHOS_BUCKET, Key=LAST_UPDATED_FILE_NAME)
    lastUpdated = lastUpdatedObj['Body'].read().decode('utf-8').split('|')
    vdlLastModified = datetime.strptime(lastUpdated[0], '%Y-%m-%dT%H:%M:%S%z')
    ideLastModified = datetime.strptime(lastUpdated[1], '%Y-%m-%dT%H:%M:%S%z')
except botocore.exceptions.ClientError as e:
    if e.response['Error']['Code'] not in ['404', 'NoSuchKey']:
        raise
filesWereUpdated = False
try:
    s3.copy_object(
        Bucket=DESTINATION_SOPHOS_BUCKET,
        Key=VDL_FILE_NAME,
        CopySource={'Bucket': CSS_SOPHOS_BUCKET, 'Key': VDL_FILE_NAME},
        CopySourceIfModifiedSince=vdlLastModified,
        ACL='bucket-owner-full-control'
    )
    vdlLastModified = datetime.now(timezone.utc)
    filesWereUpdated = True
except botocore.exceptions.ClientError as e:
    if e.response['Error']['Code'] != 'PreconditionFailed':
        raise
try:
    s3.copy_object(
        Bucket=DESTINATION_SOPHOS_BUCKET,
        Key=IDE_FILE_NAME,
        CopySource={'Bucket': CSS_SOPHOS_BUCKET, 'Key': IDE_FILE_NAME},
        CopySourceIfModifiedSince=ideLastModified,
        ACL='bucket-owner-full-control'
    )
    ideLastModified = datetime.now(timezone.utc)
    filesWereUpdated = True
except botocore.exceptions.ClientError as e:
    if e.response['Error']['Code'] != 'PreconditionFailed':
        raise
if filesWereUpdated:
    object = boto3.resource('s3').Object(DESTINATION_SOPHOS_BUCKET, LAST_UPDATED_FILE_NAME)
    object.put(Body=f'{vdlLastModified.strftime("%Y-%m-%dT%H:%M:%SZ")}|{ideLastModified.strftime("%Y-%m-%dT%H:%M:%SZ")}')
return {
    'statusCode': 200,
    'body': 'Sophos VDL and/or IDE file(s) were updated.' if filesWereUpdated else 'Sophos VDL and IDE files are already up to date.'
}

Permissions to add to Lambda

{
    "Sid": "GetSophosFilesFromCSS",
    "Effect": "Allow",
    "Action": [
        "s3:GetObject",
        "s3:GetObjectTagging"
    ],
    "Resource": [
        "arn:aws:s3:::css-sophos-updates/*",
        "arn:aws:s3:::css-sophos-updates"
    ]
},
{
    "Sid": "PlaceSophosFilesInMirrorBucket",
    "Effect": "Allow",
    "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:PutObjectTagging",
        "s3:PutObjectAcl"
    ],
    "Resource": [
        "arn:aws:s3:::<enter-your-bucket-name-here>/*",
        "arn:aws:s3:::<enter-your-bucket-name-here>"
    ]
}

Detailed Steps

Please refer to the detailed steps for creating a Lambda to retrieve updates under the ClamAV lambda Detailed Steps. The steps will be exactly the same with the exception of Step 3 and Step 4.3 which will be explained below.

  1. Sub for Step 3 above - Provide Code for Lambda

    Copy Lambda code in the section above . . .

  2. And paste it into the function .py file

  3. Deploy the code

  4. Sub for Step 4.3 above - Add Permissions to Lambda Role Copy the policy section above . . .

  5. And paste it into the role JSON

Remember the comma on the preceding bracket!

Make sure to follow all other steps for creating the trigger and so forth and you should be all set.

Extra Large File Scanning

Most of our customers are scanning files under the Fargate disk cap of 200GB. But, there are those in many industries (life sciences, media, etc.) that do have files that are larger than 200GB and some much more so. Antivirus for Amazon S3 supports up to the Amazon S3 file size maximum (5TB) for file scanning.

This is done by bypassing the internal disk limitations of Fargate leveraging an EC2 instance(s) for such files. The "extra large file scanning" doesn't have to be leveraged for only really big files, but can be used to scan any size file over the disk size you have assigned to the standard scanning agents we provide. For example, let's say you occasionally need to process 50GB files, but it isn't worth it to you to keep a larger disk attached to every scanning agent running. So you keep the default disk size of 20GB, and have the Extra Large File Scanning toggle switch on. Any file 15GB and smaller will be processed by the scanning agent, but any file greater than 15GB in size will be scanned by the extra large file scanning process. This can ensure that no file is ever skipped due to size, but if large files are rare in your system you don't have to sit on the expense of a larger default disk.

Extra large file scanning can be triggered by event based scanning, retro scanning and even API based scanning(scan existing API only). When any of these scanning agents picks up a file that is too large to scan (too large based on the disk size assigned under the Agent Settings or API Agent Settings) and the Extra Large File Scanning toggle is on, a Job is defined to be kicked off. The job will be picked up within 10 minutes and kicked off. A temporary EC2 instance will be spun up with an EBS volume of the size defined in the Disk Size field. The EC2 will pick up the file and scan it. Because it is a "job", it is monitored under the Jobs page. On the Jobs page you can monitor the job going through "Not Started" while waiting for the EC2 to start up, "Scanning", and "Completed". Each "large file" is treated as its own job. If you have 50 large files come in then 50 jobs will be kicked off for the duration it takes to scan each individual file.

You must have either the Sophos engine selected or have multi-engine scanning model enabled with By File Size selected if you intend to use CrowdStrike and need extra large files scanned.

Sample scenarios and scanning outcomes: (Note: subtract 5GB for overhead from disk sizes for agents and Extra Large File Scan size)

ScenarioScanning Outcome

File is smaller than defined scanning agent disk size

Scanning agent picks file up to scan Scan Result is whatever the outcome of file is

File is larger than defined scanning agent disk size Extra Large File Scanning is off

Scanning agent rejects file and does not even attempt to scan it Scan Result is set to Unscannable

File is larger than defined scanning agent disk size Extra Large File Scanning is on

Scanning agent creates an Extra Large File Scan Job and moves on to the next file Large File Job shows up on the Jobs page and is kicked off within 10 minutes of creation Scan Result is set to whatever the outcome of the file is

File is larger than Extra Large File Scanning disk size Extra Large File Scanning is on

Scan Result is set to Unscannable

Scan and Skip Lists

Buckets themselves are inherently skipped by the fact they are turned off to start. When you enable a bucket for scanning, you are explicitly marking it to be scanned. When you do this and nothing else, then all objects in the given bucket will be scanned no matter the path inside the bucket. This portion is all handled from the Bucket Protection page.

The Scan List and Skip List located here inside the Agent Configuration page allows you to take this concept a step further by allowing you to apply it to paths (folders) within the buckets. Scan listing and skip listing are opposites of one another. Scan listing a path is explicitly marking that specific path(s) within that bucket to be scanned. All other paths within the bucket will be ignored. You can list as many paths within the bucket as you'd like. For example, you have 5 different paths within the bucket and you want to scan only 3 of them. Simply add the 3 you want scanned to the Scan List. There is no need to add the other 2 paths to the Skip List as they will automatically be skipped. Skip listing a path will stop that defined path(s) from being scanned, but leave all others to be scanned. Depending on how many you want to include versus exclude you can choose which list to leverage. From the previous example, you could have just Skip listed the 2 paths you didn't want to scan which leaves the other 3 to be scanned. With numbers that split you could go either way, but if you have a much larger number of paths, one may become more self-evident.

Special Scan / Skip List Capabilities

The root of the bucket is also a "path" that we define in the settings as an empty path. So, if you'd like to scan or skip list the root along with your paths you can do so.

You can use a wild card (*) in the path. This is useful in a repeated sub-path structure where you need to skip a certain folder amongst all those top level folders.

For example, you have a path structure that is Year/Month/scanMe and Year/Month/skipMe where the month reflects each month of the year creating 12 unique paths. Underneath that Month folder you have folders you want scanned or skipped. You can create a path one time to setup scanning in every Month folder like this: Year/*/scanMe. That path will scan the scanMe folder under every month in the bucket without you having to add 12 entries.

This could also be leveraged not just in the path, but down to the object name as well. If you only wanted to scan a certain file type in a given bucket/folder you could put a path with *.jpg.

✏️ Note

As you can see, but wasn't spelled out, the * does not mean everything at the level it is placed and below. If you wanted everything underneath Year you could place a path /Year/ and that would do absolutely everything below Year. The wildcard represents the level itself where it is placed only.

Similar to wild cards an often used with wild cards, you can specify a global entry or your Scan and Skip Lists. If you have a repeated path structure across all your buckets where being able to define a scan / skip entry that would apply to all, you can select the _GLOBAL_ option in place of a bucket to then define across all buckets.

Let's take a look at an example. The following is an S3 bucket with 4 paths (folders) in it. With the default settings, meaning no paths defined, the root and all 4 folders will be scanned.

First thing you need to do is enter the <bucket name> in the Scan List or Skip List field and click Add Scan list.

You'll get:

Next, you'll add the path you want to scan and click the Add Entry button. *This can be the full multi-depth path.

That's it. You've now identified that the only thing you will scan in the css-protect-01 bucket will be the scan-me folder and nothing else. The steps above can be used for skip listing as well. Look below at the examples.

You probably wouldn't have a scan list and a skip list entry for the same bucket as we see in this example, but it is possible and I'm sure a scenario could be found to support it.

AV Two-Bucket System Configuration

With the Two-Bucket System you can move objects from a source bucket or region to a different bucket and/or prefix after it has been successfully scanned and tagged as Clean.

All other scan result types (Infected, Error, Unscannable), remain in the protected source bucket.

When using this method you no longer need to add Lambda Functions to move your files, as outlined here. With the AV Two-Bucket System Configuration the agent task itself will promote the clean files as part of the scanning process.

This feature will not protect the bucket(s) configured within the setting. You will still need to ensure the bucket(s) are protect within the Bucket Protection page.

There are two options for configuring the Two-Bucket System:

  1. By Region When using this setting, any protected bucket within the choosen region will have its clean files delivered to the destination bucket.

    1. Choose a region from the Add Region selection

    2. Choose a target destination bucket

    3. Optionally, choose a desired prefix to place objects in.

  2. By Bucket

    1. Choose a bucket from the Add Bucket selection

    2. Choose a target destination bucket

    3. Optionally, choose a desired prefix to place objects in.

If you have very long object paths, specifying a prefix for either By Region or By Bucket could cause you to exceed the max key length and impact file delivery.

Delete any region or bucket by clicking the delete icon next to its source.

Be sure to click the save button to apply your changes.

Last updated