transformers-bnlang
onnxruntime-bnlang-এর উপরে Bnlang transformers — pipeline API। দ্রুত LLM generation-এর জন্য optionally onnxruntime-genai-bnlang।
v0.1.0-তে text generation সমর্থিত।
এটি একটি বেসরকারি তৃতীয় পক্ষীয় বাইন্ডিং। Hugging Face বা Microsoft-এর সাথে সম্পর্কিত নয়।
Read this in English — README.en.md
দ্রুত শুরু
import "transformers-bnlang" as bt;
ধরি জেন = bt.পাইপলাইন("text-generation", "./models/Qwen2.5-0.5B-Instruct");
ধরি ফল = জেন.চালান("Hello", {
max_new_tokens: 100,
temperature: 0.7,
top_k: 50,
do_sample: true
});
লিখুন(ফল["generated_text"]);
জেন.বন্ধ_করুন();দুটি engine, একই public API
// Default: নিজেদের লেখা native loop। যেকোনো decoder-only ONNX model
// (Qwen, Llama, Mistral, Phi, …) চলবে। স্ট্যান্ডার্ড HF লেআউট পড়ে
// (config.json, tokenizer.json, onnx/model.onnx)।
bt.পাইপলাইন("text-generation", "./models/Qwen2.5-0.5B-Instruct");
// Fast path: onnxruntime-genai-এর মধ্য দিয়ে (Microsoft-এর specialized LLM
// runtime)। ~১০× দ্রুত, কিন্তু genai_config.json-সহ একটি model dir চাই।
// একই run/chat/close API।
bt.পাইপলাইন("text-generation", "./models/qwen-0.5b-dml-int4",
{ engine: "onnxruntime-genai" });Export-গুলো
| নাম | ধরন |
|---|---|
সংস্করণ |
string |
পাইপলাইন(task, model_dir, options) |
factory → pipeline object |
অটো_টোকেনাইজার.মডেল_থেকে(dir) |
factory → tokenizer |
অটো_টোকেনাইজার.ফাইল_থেকে(path) |
factory → tokenizer |
tok.এনকোড_করুন(text) |
function → list of ids |
tok.ডিকোড_করুন(ids) |
function → string |
tok.বিশেষ_আইডি(name) |
function → int |
tok.বন্ধ_করুন() |
function |
gen.চালান(text, opts) |
function → result map |
gen.আলাপ_করুন(messages, opts) |
function → result map |
gen.বন্ধ_করুন() |
function |
স্থাপত্য_নিবন্ধন(name, descriptor) |
function — নতুন architecture যোগ |
Pipeline option
| Option | ধরন | Default | অর্থ |
|---|---|---|---|
model |
string | onnx/model.onnx |
ONNX file-এর path (relative অথবা absolute) |
config |
string | config.json |
model config-এর path |
tokenizer |
string | tokenizer.json |
tokenizer config-এর path |
architecture |
map | (config থেকে পড়ে) | per-arch descriptor: layer, KV head, EOS, chat template, … |
engine |
string | (our loop) | "onnxruntime-genai" হলে fast path |
execution_providers |
list | ["CPU"] |
ORT EP-গুলোর priority list, যেমন ["DML", "CPU"] |
log_severity_level |
int | 3 |
ORT log verbosity (0=verbose … 4=fatal) |
Run-time option (gen.চালান(text, opts))
| Option | Default | অর্থ |
|---|---|---|
max_new_tokens |
32 | prompt-এর পরে কতগুলো token generate হবে |
do_sample |
false |
true হলে temperature / top-k / top-p sampling চালু |
temperature |
1.0 |
softmax temperature |
top_k |
0 (off) |
শীর্ষ-K candidate-এ সীমাবদ্ধ |
top_p |
1.0 (off) |
nucleus filter |
seed |
0 |
non-zero দিলে sampler-কে reseed |
চ্যাট
ধরি ফল = জেন.আলাপ_করুন([
{ role: "system", content: "তুমি একজন সহায়ক assistant।" },
{ role: "user", content: "বাংলাদেশের রাজধানী কী?" }
], { max_new_tokens: 64 });
লিখুন(ফল["new_text"]);Chat template architecture descriptor থেকে আসে (Qwen-এর জন্য chatml, Llama/Mistral-এর জন্য llama2)। কাস্টম template 0.2-এ আসবে।
সমর্থিত architecture (config.json model_type → descriptor)
model_type |
Layer | KV head | Head dim | EOS | Template |
|---|---|---|---|---|---|
qwen2 |
24 | 2 | 64 | 151643, 151645 | chatml |
llama |
32 | 8 | 128 | 2 | llama2 |
mistral |
32 | 8 | 128 | 2 | llama2 |
নতুন model family যোগ করতে একটি entry — দেখুন lib/architectures.bnl এবং স্থাপত্য_নিবন্ধন(name, descriptor) escape hatch।
লোকাল বিল্ড
# Windows
.\build.ps1 # cmake configure + build -> build/windows-x64/transformers-bnlang.dll# macOS / Linux
./build.shলেআউট
bnl.json manifest (main + targets ম্যাপ)
CMakeLists.txt build config
CMakePresets.json প্রতি platform-এ একটি preset
lib/
index.bnl public API (ইংরেজি + বাংলা re-export)
pipeline.bnl pipeline("text-generation", ...) dispatch + engine switch
tokenizer.bnl AutoTokenizer.from_pretrained / from_file
architectures.bnl Qwen2, Llama, Mistral-এর descriptor (extensible)
chat_template.bnl chatml + llama2 template
generation.bnl KV-cache loop + sampling ("our loop" engine)
src/ C++ source (publish-এ আসে না)
bnl/plugin.h C ABI
main.cpp bnl_load + argmax_last + sample_last native
bpe.{h,cpp} byte-level BPE tokenizer (~700 LOC)
external/json.hpp vendored nlohmann/json single-header
build/<triple>/ cmake output (gitignored)
test/
smoke.bnl dtype + tokenizer round-tripলাইসেন্স
MIT. ORT এবং GenAI prebuilt গুলো Microsoft-এর MIT লাইসেন্সে। তৃতীয় পক্ষীয় attribution NOTICES.md-এ — বিশেষত nlohmann/json statically embed করা আছে।
transformers-bnlang
Bnlang transformers — pipeline API on top of onnxruntime-bnlang (and optionally onnxruntime-genai-bnlang for fast LLM generation).
v0.1.0 supports text generation.
Unofficial third-party binding. Not affiliated with Hugging Face or Microsoft.
এই README-এর বাংলা সংস্করণ — README.md
Quick start
import "transformers-bnlang" as bt;
var gen = bt.pipeline("text-generation", "./models/Qwen2.5-0.5B-Instruct");
var out = gen.run("Hello", {
max_new_tokens: 100,
temperature: 0.7,
top_k: 50,
do_sample: true
});
print(out["generated_text"]);
gen.close();Two engines, one public API
// Default: our hand-rolled native loop. Runs ANY decoder-only ONNX
// model (Qwen, Llama, Mistral, Phi, …). Reads standard HF layout
// (config.json, tokenizer.json, onnx/model.onnx).
bt.pipeline("text-generation", "./models/Qwen2.5-0.5B-Instruct");
// Fast path: routes through onnxruntime-genai (Microsoft's
// specialized LLM runtime). ~10× faster, requires a model dir
// with genai_config.json. Same run/chat/close API.
bt.pipeline("text-generation", "./models/qwen-0.5b-dml-int4",
{ engine: "onnxruntime-genai" });Exports
| Name | Kind |
|---|---|
version |
string |
pipeline(task, model_dir, options) |
factory → pipeline object |
AutoTokenizer.from_pretrained(dir) |
factory → tokenizer |
AutoTokenizer.from_file(path) |
factory → tokenizer |
tok.encode(text) |
function → list of ids |
tok.decode(ids) |
function → string |
tok.special_id(name) |
function → int |
tok.close() |
function |
gen.run(text, opts) |
function → result map |
gen.chat(messages, opts) |
function → result map |
gen.close() |
function |
register_architecture(name, descriptor) |
function — add a new architecture |
Pipeline options
| Option | Type | Default | Meaning |
|---|---|---|---|
model |
string | onnx/model.onnx |
path to the ONNX file (relative or absolute) |
config |
string | config.json |
path to the model config |
tokenizer |
string | tokenizer.json |
path to the tokenizer config |
architecture |
map | (read from config) | per-arch descriptor: layers, KV heads, EOS, chat template, … |
engine |
string | (our loop) | "onnxruntime-genai" to use the fast path |
execution_providers |
list | ["CPU"] |
ORT EPs to try in order, e.g. ["DML", "CPU"] |
log_severity_level |
int | 3 |
ORT log verbosity (0=verbose … 4=fatal) |
Run-time options (gen.run(text, opts))
| Option | Default | Meaning |
|---|---|---|
max_new_tokens |
32 | tokens to generate after the prompt |
do_sample |
false |
true enables temperature / top-k / top-p sampling |
temperature |
1.0 |
softmax temperature |
top_k |
0 (off) |
restrict to top-K candidates |
top_p |
1.0 (off) |
nucleus filter |
seed |
0 |
non-zero reseeds the sampler |
Chat
var out = gen.chat([
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "What is the capital of Bangladesh?" }
], { max_new_tokens: 64 });
print(out["new_text"]);Chat templates are picked from the architecture descriptor (chatml for Qwen, llama2 for Llama / Mistral). Custom templates land in 0.2.
Supported architectures (config.json model_type → descriptor)
model_type |
Layers | KV heads | Head dim | EOS | Template |
|---|---|---|---|---|---|
qwen2 |
24 | 2 | 64 | 151643, 151645 | chatml |
llama |
32 | 8 | 128 | 2 | llama2 |
mistral |
32 | 8 | 128 | 2 | llama2 |
Add a new model family with one entry — see lib/architectures.bnl and the register_architecture(name, descriptor) escape hatch.
Build (local)
# Windows
.\build.ps1 # cmake configure + build -> build/windows-x64/transformers-bnlang.dll# macOS / Linux
./build.shLayout
bnl.json manifest (main + targets map)
CMakeLists.txt build config
CMakePresets.json one preset per platform
lib/
index.bnl public API (English + Bangla re-exports)
pipeline.bnl pipeline("text-generation", ...) dispatch + engine switch
tokenizer.bnl AutoTokenizer.from_pretrained / from_file
architectures.bnl descriptors for Qwen2, Llama, Mistral (extensible)
chat_template.bnl chatml + llama2 templates
generation.bnl KV-cache loop + sampling (the "our loop" engine)
src/ C++ — BPE tokenizer + argmax/sample (excluded from published tarball)
bnl/plugin.h C ABI contract
main.cpp bnl_load + argmax_last + sample_last natives
bpe.{h,cpp} byte-level BPE tokenizer (~700 LOC)
external/json.hpp vendored nlohmann/json single-header
build/<triple>/ cmake output (gitignored)
test/
smoke.bnl dtype + tokenizer round-tripLicense
MIT. Underlying ORT and GenAI prebuilts are MIT-licensed by Microsoft. See NOTICES.md for third-party attribution — in particular nlohmann/json, which is statically embedded.
MIT License
Copyright (c) 2026 Bnlang | Mamun
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to furnish persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Third-party notices — transformers-bnlang
This package contains code from third parties. Two distinct relationships:
Statically embedded — code compiled into our binary
.dll/.so/.dylib. The original license requires we ship the copyright + permission notice alongside the binary. That's the full text below for nlohmann/json.Transitive runtime dependency — code loaded at runtime via the onnxruntime-bnlang plugin (which itself dynamically links to ONNX Runtime). Attribution + upstream pointer is below.
nlohmann/json — statically embedded in the binary
- Project: JSON for Modern C++
- Upstream: https://github.com/nlohmann/json
- Version vendored: 3.11.3 (at
src/external/json.hppin our source tree) - License: MIT
- Copyright: Copyright (c) 2013–2023 Niels Lohmann
- SPDX: MIT
Used by the BPE tokenizer to parse tokenizer.json. Because the library
is header-only, its compiled code ends up inside our shipped binary.
Full license text
MIT License
Copyright (c) 2013-2023 Niels Lohmann
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to furnish persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.ONNX Runtime — transitive (via onnxruntime-bnlang)
- Project: Microsoft ONNX Runtime
- Upstream: https://github.com/microsoft/onnxruntime
- License: MIT
- Copyright: Copyright (c) Microsoft Corporation
This package depends on onnxruntime-bnlang, which downloads and dynamically
links to ORT's prebuilt binaries. We do not bundle ORT itself; see
onnxruntime-bnlang/NOTICES.md for the full attribution.
The full MIT license text is available at https://github.com/microsoft/onnxruntime/blob/main/LICENSE