torch-bnlang
libtorch (PyTorch C++)-এর জন্য Bnlang বাইন্ডিং।
এটি একটি বেসরকারি তৃতীয় পক্ষীয় বাইন্ডিং। Meta / PyTorch Foundation-এর সাথে সম্পর্কিত নয়। অফিসিয়াল libtorch C++ prebuilt ব্যবহার করে।
Read this in English — README.en.md
দ্রুত শুরু
TorchScript .pt মডেল চালানো
import "torch-bnlang" as tch;
লিখুন(tch.সংস্করণ);
লিখুন(tch.ডিভাইসসমূহ());
ধরি মডেল = tch.জিট_মডিউল.খুলুন("model.pt", { device: "cpu" });
ধরি আউট = মডেল.চালান({
"input_ids": { dtype: "int64", shape: [1, 5], data: [...] },
"attention_mask": { dtype: "int64", shape: [1, 5], data: [...] }
});
// আউট["output"] হলো { dtype, shape, handle } — টেনসর_ডেটা দিয়ে materialize:
ধরি ফলাফল = tch.টেনসর_ডেটা(আউট["output"]["handle"]);
tch.টেনসর_বন্ধ_করুন(আউট["output"]["handle"]);
মডেল.বন্ধ_করুন();.safetensors সরাসরি লোড করা (কনভার্শন লাগে না)
import "torch-bnlang" as tch;
ধরি ওজন = tch.load_safetensors("model.safetensors");
// ওজন = { "model.embed_tokens.weight": { dtype, shape, handle }, ... }
ধরি ids = tch.view(tch.arange(0, 7, 1, "int64"), [1, 7]);
ধরি h = tch.embedding(ওজন["model.embed_tokens.weight"], ids);
// ... এর পরে রয়েছে rms_norm, attention_block, swiglu_mlp ইত্যাদিtransformers-bnlang/lib/architectures/qwen2.bnl এবং llama.bnl-এ পূর্ণ
forward পাসের উদাহরণ আছে।
Export-গুলো
সেশন / মডিউল
| নাম | ধরন |
|---|---|
সংস্করণ |
string |
ডিভাইসসমূহ() |
function → list of `"cpu" |
module_load(path, opts) |
function → handle |
module_close(handle) |
function |
module_info(handle) |
function → schema map |
module_run(handle, feeds) |
function → outputs map |
জিট_মডিউল.খুলুন(path, opts) |
factory → module object |
টেনসর হ্যান্ডেল
| নাম | ধরন |
|---|---|
tensor_from(spec) |
function → handle map |
টেনসর_ডেটা(handle) |
function → list |
টেনসর_বন্ধ_করুন(handle) |
function |
tensor_argmax_last(handle, vocab) |
function → int |
tensor_sample_last(handle, vocab, temp, top_k, top_p, seed) |
function → int |
.safetensors লোডার
| নাম | ধরন |
|---|---|
load_safetensors(path) |
function → { name: handle_map } |
NN ops (Path B-র জন্য)
| নাম | কী করে |
|---|---|
embedding(weight, indices) |
টোকেন lookup |
linear(input, weight, bias_or_null) |
matmul + optional bias |
matmul(a, b) |
generic matmul |
rms_norm(input, weight, eps) |
RMS normalization (fp32-internal) |
silu(x) / softmax(x, dim) |
activations |
scaled_dot_product_attention(q, k, v, mask, is_causal) |
torch-এর fused SDPA |
apply_rotary(q, k, cos, sin) |
RoPE (Llama/Qwen split-half) |
cat(tensors, dim) / slice(t, dim, s, e) / transpose(t, d0, d1) / view(t, shape) |
shape ops |
repeat_kv(kv, n_rep) |
GQA expansion |
add(a, b) / mul(a, b) / to_dtype(t, dtype) |
elementwise + dtype |
arange(start, end, step, dtype) / ones(shape, dtype) / zeros(shape, dtype) |
tensor creation |
build_rope_cache(positions, head_dim, theta) |
precompute cos/sin |
attention_block(...) |
fused QKV→reshape→RoPE→KV concat→GQA expand→SDPA→out proj |
swiglu_mlp(x, gate_w, up_w, down_w) |
fused SwiGLU MLP |
থ্রেডিং + প্রসেস কন্ট্রোল
| নাম | কী করে |
|---|---|
set_num_threads(n) / get_num_threads() |
intra-op প্যারালেলিজম টিউন |
stdout_write(text) |
newline ছাড়া stdout লিখা (Node-এর process.stdout.write সমতুল্য) |
exit(code) |
তাৎক্ষণিক প্রসেস এক্সিট (libtorch থ্রেডপুল hang এড়ায়) |
অবস্থা (v1.0.0)
যা কাজ করে:
- TorchScript
.ptলোডিং + forward pass .safetensorsসরাসরি লোডিং (কনভার্শন স্টেপ ছাড়া)- Tensor handle table (zero-copy KV-cache reuse downstream LLM loop-এর জন্য)
- Fused
attention_block+swiglu_mlpops - bf16 ↔ fp32 auto cast (লোড টাইমে — CPU-তে দ্রুত)
- Architecture-aware vocab (যেকোনো vocab size কাজ করে)
- InferenceMode globally enabled + CPU thread tuning
পরবর্তী:
- আরো device backend: CUDA, MPS (CMake flag flip + CUDA libtorch দিয়ে)
- আরো architecture forward pass (Mistral, Gemma, Phi-3, GPT-2)
- Image-gen ops: conv2d, group_norm, interpolate (SD / SDXL সমর্থনের জন্য)
- mmap-ভিত্তিক safetensors লোডিং (বড় মডেলের জন্য কম মেমরি)
লোকাল বিল্ড
# Windows
bnl script/install.bnl # deps/windows-x64/ এ libtorch CPU prebuilt নামায় (~250 MB)
.\build.ps1 # cmake configure + build -> build/windows-x64/*.dll# macOS / Linux
bnl script/install.bnl
./build.shCUDA সমর্থনের জন্য:
# CUDA libtorch deps/windows-x64/-এ ম্যানুয়ালি বসান, তারপর:
cmake --preset windows-x64 -DUSE_CUDA=ON
cmake --build build/windows-x64 --config Releaseক্রস-প্ল্যাটফর্ম
প্রতি platform-এ আলাদাভাবে build হয়। bnl runtime import-এর সময় bnl.json-এর
artifact path খোঁজে:
| Triple | বিল্ড artifact |
|---|---|
windows-x64 |
build/windows-x64/torch-bnlang.dll |
linux-x64 |
build/linux-x64/torch-bnlang.so |
darwin-arm64 |
build/darwin-arm64/torch-bnlang.dylib |
darwin-x64 |
build/darwin-x64/torch-bnlang.dylib |
libtorch-এর সঙ্গে আসা runtime DLL/SO গুলো (torch_cpu, c10, fbgemm,
libiomp5md ইত্যাদি) plugin-এর পাশে কপি হয়; bnl core-এর
LOAD_WITH_ALTERED_SEARCH_PATH সেগুলোই আগে খুঁজে।
লেআউট
bnl.json manifest (main + native path)
CMakeLists.txt build config
CMakePresets.json প্রতি platform-এ একটি preset
lib/
index.bnl public API (ইংরেজি + বাংলা re-export)
jit_module.bnl JitModule / জিট_মডিউল
src/ C++ source (publish-এ আসে না)
bnl/plugin.h C ABI
json.hpp vendored nlohmann/json (single-header)
main.cpp bnl_load entry + devices / stdout_write / exit
module.{h,cpp} torch::jit::script::Module handle table + forward routing
tensor.{h,cpp} bnl_value <-> torch::Tensor + handle table + argmax/sample
safetensors.{h,cpp} .safetensors parser
ops.{h,cpp} NN op surface (embedding, linear, rms_norm,
attention_block, swiglu_mlp, ...)
deps/<triple>/ libtorch prebuilt (gitignored; install দ্বারা populate হয়)
build/<triple>/ cmake output (gitignored)
script/
install.bnl libtorch prebuilt fetch + extract
install-metadata.bnl প্রতি platform-এ URL + sha256লাইসেন্স
MIT. libtorch (BSD-3-Clause), nlohmann/json (MIT) — তৃতীয় পক্ষীয়
attribution NOTICES.md-এ আছে।
torch-bnlang
Bnlang binding for libtorch (PyTorch C++).
Unofficial third-party binding. Not affiliated with Meta or the PyTorch Foundation. Consumes the official libtorch C++ prebuilt.
এই README-এর বাংলা সংস্করণ — README.md
Quick start
Run a TorchScript .pt model
import "torch-bnlang" as tch;
print(tch.version);
print(tch.devices());
var mod = tch.JitModule.open("model.pt", { device: "cpu" });
var out = mod.run({
"input_ids": { dtype: "int64", shape: [1, 5], data: [...] },
"attention_mask": { dtype: "int64", shape: [1, 5], data: [...] }
});
// out["output"] is { dtype, shape, handle } — materialize with tensor_to_data:
var result = tch.tensor_to_data(out["output"]["handle"]);
tch.tensor_close(out["output"]["handle"]);
mod.close();Load .safetensors directly — no conversion step
import "torch-bnlang" as tch;
var weights = tch.load_safetensors("model.safetensors");
// weights = { "model.embed_tokens.weight": { dtype, shape, handle }, ... }
var ids = tch.view(tch.arange(0, 7, 1, "int64"), [1, 7]);
var h = tch.embedding(weights["model.embed_tokens.weight"], ids);
// ... continue with rms_norm, attention_block, swiglu_mlp, etc.Full forward-pass examples live in
transformers-bnlang/lib/architectures/qwen2.bnl and llama.bnl.
Exports
Session / module
| Name | Kind |
|---|---|
version |
string |
devices() |
function → list of `"cpu" |
module_load(path, opts) |
function → handle |
module_close(handle) |
function |
module_info(handle) |
function → schema map |
module_run(handle, feeds) |
function → outputs map |
JitModule.open(path, opts) |
factory → module object |
Tensor handles
| Name | Kind |
|---|---|
tensor_from(spec) |
function → handle map |
tensor_to_data(handle) |
function → list |
tensor_close(handle) |
function |
tensor_argmax_last(handle, vocab) |
function → int |
tensor_sample_last(handle, vocab, temp, top_k, top_p, seed) |
function → int |
.safetensors loader
| Name | Kind |
|---|---|
load_safetensors(path) |
function → { name: handle_map } |
NN ops (for the Path B "torch-native" path)
| Name | What it does |
|---|---|
embedding(weight, indices) |
token lookup |
linear(input, weight, bias_or_null) |
matmul + optional bias |
matmul(a, b) |
generic matmul |
rms_norm(input, weight, eps) |
RMS normalization (fp32-internal) |
silu(x) / softmax(x, dim) |
activations |
scaled_dot_product_attention(q, k, v, mask, is_causal) |
torch's fused SDPA |
apply_rotary(q, k, cos, sin) |
RoPE (Llama/Qwen split-half) |
cat(tensors, dim) / slice(t, dim, s, e) / transpose(t, d0, d1) / view(t, shape) |
shape ops |
repeat_kv(kv, n_rep) |
GQA expansion |
add(a, b) / mul(a, b) / to_dtype(t, dtype) |
elementwise + dtype |
arange(start, end, step, dtype) / ones(shape, dtype) / zeros(shape, dtype) |
tensor creation |
build_rope_cache(positions, head_dim, theta) |
precompute cos/sin |
attention_block(...) |
fused QKV→reshape→RoPE→KV concat→GQA expand→SDPA→out proj |
swiglu_mlp(x, gate_w, up_w, down_w) |
fused SwiGLU MLP |
Threading + process control
| Name | What it does |
|---|---|
set_num_threads(n) / get_num_threads() |
intra-op parallelism tuning |
stdout_write(text) |
unbuffered no-newline stdout write (Node's process.stdout.write equivalent) |
exit(code) |
immediate process exit (avoids libtorch thread-pool hang on Windows) |
Status (v1.0.0)
Working:
- TorchScript
.ptloading + forward pass - Direct
.safetensorsloading (no conversion step) - Tensor handle table (zero-copy KV-cache reuse for downstream LLM loops)
- Fused
attention_block+swiglu_mlpops - Auto bf16 → fp32 cast at load (faster CPU matmul on x86 without AVX-512-BF16)
- Architecture-aware vocab size (any vocab works)
- InferenceMode globally enabled + CPU thread tuning
Coming next:
- More device backends: CUDA, MPS (CMake flag flip + CUDA libtorch)
- More architecture forwards (Mistral, Gemma, Phi-3, GPT-2)
- Image-gen ops: conv2d, group_norm, interpolate (for SD / SDXL)
- mmap-based safetensors loading (lower memory for large models)
Build (local)
# Windows
bnl script/install.bnl # download libtorch CPU prebuilt into deps/windows-x64/ (~250 MB)
.\build.ps1 # cmake configure + build -> build/windows-x64/*.dll# macOS / Linux
bnl script/install.bnl
./build.shFor CUDA support:
# Drop a CUDA libtorch into deps/windows-x64/ manually, then:
cmake --preset windows-x64 -DUSE_CUDA=ON
cmake --build build/windows-x64 --config ReleaseCross-platform packaging
Built per platform; the bnl runtime picks the right artifact from
bnl.json at import time:
| Triple | Build artifact |
|---|---|
windows-x64 |
build/windows-x64/torch-bnlang.dll |
linux-x64 |
build/linux-x64/torch-bnlang.so |
darwin-arm64 |
build/darwin-arm64/torch-bnlang.dylib |
darwin-x64 |
build/darwin-x64/torch-bnlang.dylib |
The libtorch runtime libraries (torch_cpu, c10, fbgemm,
libiomp5md, …) ship alongside the plugin and are loaded via bnl core's
LOAD_WITH_ALTERED_SEARCH_PATH (plugin dir searched first).
Layout
bnl.json manifest (main + native path)
CMakeLists.txt build config
CMakePresets.json one preset per platform
lib/
index.bnl public API (English + Bangla re-exports)
jit_module.bnl JitModule + session-instance methods
src/ C++ binding (excluded from published tarball)
bnl/plugin.h C ABI contract
json.hpp vendored nlohmann/json (single-header)
main.cpp bnl_load entry + devices / stdout_write / exit
module.{h,cpp} torch::jit::script::Module handle table + forward routing
tensor.{h,cpp} bnl_value <-> torch::Tensor + handle table + argmax/sample
safetensors.{h,cpp} .safetensors parser
ops.{h,cpp} NN op surface (embedding, linear, rms_norm,
attention_block, swiglu_mlp, ...)
deps/<triple>/ libtorch prebuilt (gitignored; install populates)
build/<triple>/ cmake output (gitignored)
script/
install.bnl download + extract libtorch prebuilt
install-metadata.bnl URLs + sha256 per platformLicense
MIT. libtorch is BSD-3-Clause (PyTorch project); nlohmann/json is MIT.
See NOTICES.md for third-party attribution.
MIT License
Copyright (c) 2026 Bnlang | Mamun
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to furnish persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Third-party notices — torch-bnlang
This package's binary plugin dynamically links to and consumes prebuilt
binaries from the projects below. They are downloaded at install time by
script/install.bnl from their official distribution channels and remain
governed by their original licenses.
libtorch (PyTorch C++ runtime)
- Project: PyTorch
- Upstream: https://github.com/pytorch/pytorch
- License: BSD 3-Clause
- Copyright: Copyright (c) 2016- Facebook, Inc (Adam Paszke); Copyright (c) 2014- Facebook, Inc (Soumith Chintala); Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert); Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu); Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu); Copyright (c) 2011-2013 NYU (Clement Farabet); Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston); Copyright (c) 2006 Idiap Research Institute (Samy Bengio); Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Used as the inference engine. We download the official libtorch C++
prebuilt (torch_cpu.dll, c10.dll, torch.dll, fbgemm.dll,
libiomp5md.dll, ...) at install time from
https://pytorch.org/get-started/locally/ (the C++/Java tab) and
dynamically link to it from our plugin. We do not redistribute the
libtorch binaries; users fetch them from PyTorch's official download
servers.
The full BSD 3-Clause license text is available at https://github.com/pytorch/pytorch/blob/main/LICENSE
nlohmann/json
- Project: JSON for Modern C++
- Upstream: https://github.com/nlohmann/json
- License: MIT
- Copyright: Copyright (c) 2013-2025 Niels Lohmann
Vendored as src/json.hpp (single-header) and used by safetensors.cpp
to parse the JSON header of .safetensors files. Not redistributed as a
separate binary — compiled inline into torch-bnlang.dll.
The full MIT license text is available at https://github.com/nlohmann/json/blob/develop/LICENSE.MIT
safetensors format
- Project: safetensors (file format specification)
- Upstream: https://github.com/huggingface/safetensors
- License: Apache 2.0 (the reference Rust implementation)
- Copyright: Copyright (c) Hugging Face
Our src/safetensors.cpp is a clean-room re-implementation of the
file-format parser in C++ — no code copied from Hugging Face's Rust
crate. The format itself is an open specification.
Intel OpenMP runtime
- Project: Intel OpenMP (
libiomp5md.dll) - Upstream: ships with libtorch's Windows prebuilt
- License: Intel Simplified Software License (as redistributed by PyTorch)
Loaded by libtorch for intra-op parallelism on CPU. Not redistributed by this package directly — it arrives as part of the libtorch zip.