The run_profiler.sh script enables processing and profiling of image datasets for the Visual Layer platform. It supports multiple data sources, including local directories, S3 buckets, HTTP/HTTPS URLs, and file lists in .txt, .csv, or .parquet formats. With flexible execution modes and broad data compatibility, the script adapts to a wide range of environments and workflows.

The rest of this article outlines script usage syntax and provides detailed explanations for each command-line parameter.

Usage

Following is the command-line syntax for running the script:

./run_profiler.sh [-h] [-p <path> -n <dataset_name>] [-e compose|local] [-r]

Parameters

Following are the command-line parameters supported by the script:


Dataset Path (-p, required)

Specifies the source location of your dataset. Multiple formats are supported:

-p
string
required
Local directory~/data/images
S3 buckets3://mybucket/images
HTTP/HTTPS URLhttps://example.com/dataset
File listPath to a .txt, .csv, or .parquet file containing image paths

Dataset Name (-n, required)

A human-readable name for your dataset. This name will be used for identification in logs and results.

-n
string
required

Execution Mode (-e)

Determines how the processing pipeline is executed.

default: compose

-e
string
default:"compose"
composeUses Docker Compose and the Visual Layer API (recommended for most users, default mode)
localRuns directly on your local machine using a Python virtual environment (advanced)

Reduce Disk Space (-r)

Enables reduced disk space consumption mode. Useful for large datasets or environments with limited storage.

default: false

When enabled, activates serve_mode=reduce_disk_space for optimized storage usage.

-r
boolean
default:"false"

Help (-h)

Displays usage information and example commands.

-h
boolean

Examples

./run_profiler.sh -p ~/data/ds1 -n 'dataset1'

Execution Modes

The script supports two execution modes for processing datasets:

Compose Mode (Default)

Recommended for most users. Uses Docker Compose to run the processing pipeline with the Visual Layer API.

FeatureDescription
Processing MethodProcesses datasets through HTTP API endpoint
Supported Data SourcesLocal, S3, HTTP/HTTPS
Path HandlingAutomatically handles path encoding for remote sources
IntegrationIntegrates with the full Visual Layer service stack

API Endpoint: POST http://localhost:2080/api/v1/process

Parameters sent to API:

ParameterDescription
pathDataset source path (URL-encoded for remote sources)
nameDataset name
serve_modeSet to reduce_disk_space when -r flag is used

Local Mode

Advanced users only. Runs the pipeline directly using Python virtual environment.

  • Executes pipeline.controller.controller module directly
  • Requires local Python virtual environment setup
  • Uses manual flow configuration (MANUAL_FLOW=yes)
  • Configures device settings based on hardware type (CPU/GPU)

Environment Variables Set:

  • MANUAL_FLOW=yes
  • FASTDUP_PRODUCTION=1
  • PREFECT_LOGGING_SETTINGS_PATH=./.vl/prefect-logging.yaml
  • Device-specific settings for CPU mode

Data Source Support

Local Paths

  • Converts relative paths to absolute paths using realpath
  • For file inputs (lists), copies to .vl/ directory for container access
  • Validates path existence before processing

Remote Paths

Source TypeExample/Details
S3s3://bucket/path
HTTP/HTTPShttp:// or https:// URLs
File DetectionAutomatically detects file extensions (.txt, .csv, .parquet)
URL EncodingApplies proper encoding for API transmission

Supported File Formats

  • Image Directories: Any directory containing image files
  • File Lists:
    • .txt: Plain text file with one file path per line
    • .csv: CSV file with file paths
    • .parquet: Parquet file containing file path data

Error Handling

The script includes comprehensive error handling:

Error TypeDescription
Missing ArgumentsDisplays usage information and exits
Invalid PathsValidates local path existence
Invalid Execution ModeEnsures mode is either compose or local
API FailuresCaptures and displays API error responses
Pipeline FailuresHandles local pipeline execution errors

Dependencies

DependencyPurpose/Usage
BashShell environment
curlAPI communication (compose mode)
Python 3Local execution and URL encoding
Docker ComposeCompose mode execution
Virtual Environment (./venv_local/)Local mode Python environment

Output

Compose Mode

  • Success: Displays dataset processing confirmation with response
  • Failure: Shows API error message in red text

Local Mode

  • Runs pipeline with full logging output
  • Returns to original shell environment on completion

Performance Considerations

Reduced Disk Space Mode (-r)

  • Activates serve_mode=reduce_disk_space parameter
  • Optimizes storage usage during processing
  • Recommended for large datasets or limited storage environments

Hardware Configuration

  • CPU Mode: Automatically configures all processing devices to use CPU
  • GPU Mode: Uses default GPU acceleration when available

Integration

This script integrates with the Visual Layer platform’s core components:

ComponentDescription
Pipeline ControllerOrchestrates the complete processing workflow
DatabaseStores processed dataset metadata and results
API ServiceProvides RESTful interface for dataset operations
Storage SystemsSupports local filesystem, S3, and HTTP sources

Troubleshooting