Skip to content

Configuration

S2ST Translator uses a layered configuration system with the following priority (highest to lowest):

  1. Command-line flags
  2. Environment variables
  3. config/default.yaml
  4. Built-in defaults

The main configuration file is config/default.yaml:

# Model settings
model:
size: "large" # small, medium, large
device: "auto" # auto, cuda, mps, cpu
dtype: "float16" # float16, float32
num_beams: 5 # Beam search beams
temperature: 1.0 # Sampling temperature
# Translation defaults
translation:
source_lang: "auto" # auto-detect or language code
target_lang: "eng" # Default target language
task: "s2st" # Speech-to-speech translation
# Audio processing
audio:
target_sample_rate: 16000 # SeamlessM4T requirement
normalize: true
to_mono: true
# File paths
paths:
input_dir: "./input"
output_dir: "./output"
translated_subdir: "translated"
logs_subdir: "logs"
# Logging
logging:
level: "INFO" # DEBUG, INFO, WARNING, ERROR
console:
enabled: true
file:
enabled: true
max_bytes: 10485760 # 10MB
backup_count: 5
# Processing
processing:
num_workers: 1
batch_size: 1
resume_from_checkpoint: true
# Security (optional)
security:
api_key: null # Set to require API key
VariableDescriptionDefault
SEAMLESS_DEVICEForce device: auto, cuda, mps, cpuauto
SEAMLESS_MODEL_SIZEModel size: small, medium, largemedium
LOG_LEVELLogging levelINFO
API_KEYAPI authentication keyNone
NUM_WORKERSNumber of worker processes1

Example:

Terminal window
export SEAMLESS_DEVICE=cuda
export SEAMLESS_MODEL_SIZE=large
./start.sh
SizeModel IDVRAMQualitySpeed
smallseamless-m4t-medium (v1)~4GBGoodFast
mediumseamless-m4t-medium (v1)~8GBGoodMedium
largeseamless-m4t-v2-large~16GBBestSlower

Note The small and medium sizes use the same v1 model. The large size uses the v2 model with improved quality.

S2ST Translator supports these language codes:

CodeLanguage
engEnglish
deuGerman
fraFrench
spaSpanish
itaItalian
porPortuguese
nldDutch
polPolish
rusRussian
ukrUkrainian
cmnChinese (Mandarin)
jpnJapanese
korKorean
arbArabic
hinHindi
turTurkish
afr, amh, arb, ary, arz, asm, azj, bel, ben, bos, bul, cat, ceb, ces, ckb,
cmn, cym, dan, deu, ell, eng, est, eus, fin, fra, fuv, gaz, gle, glg, guj,
hau, heb, hin, hrv, hun, hye, ibo, ind, isl, ita, jav, jpn, kam, kan, kat,
kaz, kea, khk, khm, kir, kor, lao, lit, ltz, lug, luo, lvs, mai, mal, mar,
mkd, mlt, mni, mya, nld, nno, nob, npi, nya, oci, ory, pan, pbt, pes, pol,
por, ron, rus, slk, slv, sna, snd, som, spa, srp, swe, swh, tam, tel, tgk,
tgl, tha, tur, ukr, urd, uzn, vie, xho, yor, zho, zul

To require API key authentication:

# config/default.yaml
security:
api_key: "your-secret-key-here"

Or via environment:

Terminal window
export API_KEY="your-secret-key-here"
./start.sh

Clients must include the key in requests:

Terminal window
curl -H "X-API-KEY: your-secret-key-here" http://localhost:8000/api/jobs

For limited VRAM, use the medium model:

model:
size: "medium"
dtype: "float16"

For high throughput with multiple GPUs:

processing:
num_workers: 2

Or via environment:

Terminal window
NUM_WORKERS=2 ./start.sh

Adjust audio chunk duration for memory vs. quality tradeoff:

audio:
chunk_duration: 10.0 # seconds