run_profiler.sh
script enables processing and profiling of image datasets for the Visual Layer platform. It supports multiple data sources, including local directories, S3 buckets, HTTP/HTTPS URLs, and file lists in .txt
, .csv
, or .parquet
formats. With flexible execution modes and broad data compatibility, the script adapts to a wide range of environments and workflows.
The rest of this article outlines script usage syntax and provides detailed explanations for each command-line parameter.
-p
, required)Local directory | ~/data/images |
S3 bucket | s3://mybucket/images |
HTTP/HTTPS URL | https://example.com/dataset |
File list | Path to a .txt , .csv , or .parquet file containing image paths |
-n
, required)-e
)compose
compose | Uses Docker Compose and the Visual Layer API (recommended for most users, default mode) |
local | Runs directly on your local machine using a Python virtual environment (advanced) |
-r
)false
When enabled, activates serve_mode=reduce_disk_space
for optimized storage usage.
-h
)Feature | Description |
---|---|
Processing Method | Processes datasets through HTTP API endpoint |
Supported Data Sources | Local, S3, HTTP/HTTPS |
Path Handling | Automatically handles path encoding for remote sources |
Integration | Integrates with the full Visual Layer service stack |
POST http://localhost:2080/api/v1/process
Parameters sent to API:
Parameter | Description |
---|---|
path | Dataset source path (URL-encoded for remote sources) |
name | Dataset name |
serve_mode | Set to reduce_disk_space when -r flag is used |
pipeline.controller.controller
module directlyMANUAL_FLOW=yes
)MANUAL_FLOW=yes
FASTDUP_PRODUCTION=1
PREFECT_LOGGING_SETTINGS_PATH=./.vl/prefect-logging.yaml
realpath
.vl/
directory for container accessSource Type | Example/Details |
---|---|
S3 | s3://bucket/path |
HTTP/HTTPS | http:// or https:// URLs |
File Detection | Automatically detects file extensions (.txt , .csv , .parquet ) |
URL Encoding | Applies proper encoding for API transmission |
.txt
: Plain text file with one file path per line.csv
: CSV file with file paths.parquet
: Parquet file containing file path dataError Type | Description |
---|---|
Missing Arguments | Displays usage information and exits |
Invalid Paths | Validates local path existence |
Invalid Execution Mode | Ensures mode is either compose or local |
API Failures | Captures and displays API error responses |
Pipeline Failures | Handles local pipeline execution errors |
Dependency | Purpose/Usage |
---|---|
Bash | Shell environment |
curl | API communication (compose mode) |
Python 3 | Local execution and URL encoding |
Docker Compose | Compose mode execution |
Virtual Environment (./venv_local/ ) | Local mode Python environment |
-r
)serve_mode=reduce_disk_space
parameterComponent | Description |
---|---|
Pipeline Controller | Orchestrates the complete processing workflow |
Database | Stores processed dataset metadata and results |
API Service | Provides RESTful interface for dataset operations |
Storage Systems | Supports local filesystem, S3, and HTTP sources |
Common Issues