ℹ️

There’s an in-app tool for measuring transcription and summarization speed.

To contribute to this page, send your own hardware config and performance data to contact@watchlightstudio.com

Summarization performance (TPS)

In-app summarization performance is measured in tokens per second (TPS). On average, a token is ~3/4 of a word, and faster is better. More efficient hardware gives higher TPS, which reduces the amount of time required to perform LLM services. Less time spent thinking means lower energy requirements, and thus less expensive compute.

In general, running local models is less expensive than running with OpenAI API services. If Nucleate is configured to use OpenAI, the default model is set to GPT3.5-turbo, which is a good balance of quality vs cost. OpenAI charges differently for input vs output tokens, but the approximate cost is ~0.57M tokens/dollar (as of July 2025). The cost of running locally varies, but estimates are provided below.

Running locally and assuming 250W device, 0.21 cents/kWh
100 TPS => 6.86M tokens/dollar  
50 TPS => 3.43M tokens/dollar  
25 TPS => 1.71M tokens/dollar  
8 TPS => 0.57M tokens/dollar (same as OpenAI pricing)

Example performance

Local LLM Summarization using a NVIDIA 3090 FE GPU (24GB VRAM):

Mistral+Overdrive
Run 186132
Run 288153
Run 398142
Run 492146
Run 586140
Average (TPS)90 +/- 5.1142 +/- 7.7

Local LLM Summarization using a NVIDIA 3060 Laptop GPU (6GB VRAM):

Transcription performance (speed multiplier)

In-app transcription performance is measured as a simple ratio of transcription speed vs audio file length. If it takes 18s to transcribe a 180s audio file, the multiplier is 10x. Similar to TPS, more efficient hardware gives higher multipliers, which reduces the time required to perform analysis, energy, and cost.

There are two possible backends for transcription, Whisper and Faster-Whisper. Whisper is natively compatible on both Mac and Windows and enables hardware acceleration on both. Faster-Whisper is optimized on Windows and enables slight speed improvements for GPU-accelerated support and substantial improvements on CPU-only transcription.

ℹ️
Whisper requires FFmpeg to be installed per the python openai-whisper dependency. I cannot ship FFmpeg as part of the app, but it is easily downloaded following these instructions.

There are several different models that can be used on both Whisper and Faster-Whisper, including: base, small, medium, large-v3, and large-v3-turbo. In general, there is a tradeoff between model size and accuracy, where larger models are more accurate but generally slower. Because Nucleate performs summarization on transcripts, minor errors are typically ignored or ‘washed out’ by the LLM. Medium, large-v3, and large-v3-turbo are all similarly performant. The default model is set to “medium” for Nucleate Pro users if adequate VRAM or RAM is detected.

Running local transcription is far less expensive than transcribing with OpenAI’s API. OpenAI charges per minute of transcribed audio, at a rate of ~2.8h/dollar as of July 2025). The cost of running locally varies, but estimates are provided below.

Running locally and assuming 250W device, 0.21 cents/kWh
50 xMultiplier => 952h/dollar  
25 xMultiplier => 476h/dollar  
10 xMultiplier => 190h/dollar  
1 xMultiplier => 19h/dollar  
0.15 xMultiplier => 2.8h/dollar (same as OpenAI pricing)

Example performance

Local transcription using a NVIDIA 3090 FE GPU (24GB VRAM):

WhisperBaseSmallMediumLarge-v3Large-v3-turbo
Model size0.7 GB1.8 GB4.7 GB9.7 GB9.7 GB
—————————–———–———–——————————–
Run 127.1017.7910.256.244.73
Run 221.6817.919.585.416.13
Run 330.0918.3410.476.335.36
Run 428.8519.1810.175.604.35
Average26.9 ± 3.718.3 ± 0.610.1 ± 0.45.9 ± 0.55.1 ± 0.8
Faster-WhisperBaseSmallMediumLarge-v3Large-v3-turbo
Model size0.5 GB0.9 GB2.0 GB3.7 GB1.9 GB
—————————–———–———–——————————–
Run 136.7427.6617.6412.9924.57
Run 231.8828.1218.2110.7427.25
Run 335.9328.3018.2314.0927.49
Run 427.8224.7816.1412.4727.63
Average33.1 ± 4.127.2 ± 1.617.6 ± 1.012.6 ± 1.426.7 ± 1.5

Local transcription using a Ryzen 7900 CPU (12 core):

WhisperBaseSmallMediumLarge-v3Large-v3-turbo
Model size1.0 GB2.0 GB5.6 GB9.6 GB5.1 GB
—————————–———–———–——————————–
Run 112.345.492.340.611.72
Run 212.575.872.49--
Run 312.645.932.43--
Average12.5 ± 0.25.8 ± 0.22.4 ± 0.10.6 ± NA1.7 ± NA
Faster-WhisperBaseSmallMediumLarge-v3Large-v3-turbo
Model size0.5 GB0.8 GB1.5 GB2.4 GB1.7 GB
—————————–———–———–——————————–
Run 116.077.102.661.602.32
Run 219.876.793.16--
Run 316.898.392.58--
Average17.6 ± 2.07.4 ± 0.82.8 ± 0.31.6 ± NA2.3 ± NA

Local transcription using a 2020 Macbook Pro (Intel x86, 8GB):

Faster-WhisperBaseSmallMediumLarge-v3Large-v3-turbo
Model size- GB- GB- GB- GB- GB
—————————–———–———–——————————–
Run 111.253.991.46--
Run 211.284.061.49--
Run 311.194.101.48--
Average11.2 ± 0.14.1 ± 0.11.5 ± 0.1--