torch-bnlang

libtorch (PyTorch C++)-এর জন্য Bnlang বাইন্ডিং।

এটি একটি বেসরকারি তৃতীয় পক্ষীয় বাইন্ডিং। Meta / PyTorch Foundation-এর সাথে সম্পর্কিত নয়। অফিসিয়াল libtorch C++ prebuilt ব্যবহার করে।

Read this in English — README.en.md

দ্রুত শুরু

TorchScript `.pt` মডেল চালানো

import "torch-bnlang" as tch;

লিখুন(tch.সংস্করণ);
লিখুন(tch.ডিভাইসসমূহ());

ধরি মডেল = tch.জিট_মডিউল.খুলুন("model.pt", { device: "cpu" });

ধরি আউট = মডেল.চালান({
    "input_ids":      { dtype: "int64", shape: [1, 5], data: [...] },
    "attention_mask": { dtype: "int64", shape: [1, 5], data: [...] }
});
// আউট["output"] হলো { dtype, shape, handle } — টেনসর_ডেটা দিয়ে materialize:
ধরি ফলাফল = tch.টেনসর_ডেটা(আউট["output"]["handle"]);
tch.টেনসর_বন্ধ_করুন(আউট["output"]["handle"]);

মডেল.বন্ধ_করুন();

`.safetensors` সরাসরি লোড করা (কনভার্শন লাগে না)

import "torch-bnlang" as tch;

ধরি ওজন = tch.load_safetensors("model.safetensors");
// ওজন = { "model.embed_tokens.weight": { dtype, shape, handle }, ... }

ধরি ids = tch.view(tch.arange(0, 7, 1, "int64"), [1, 7]);
ধরি h   = tch.embedding(ওজন["model.embed_tokens.weight"], ids);
// ... এর পরে রয়েছে rms_norm, attention_block, swiglu_mlp ইত্যাদি

transformers-bnlang/lib/architectures/qwen2.bnl এবং llama.bnl-এ পূর্ণ forward পাসের উদাহরণ আছে।

Export-গুলো

সেশন / মডিউল

নাম	ধরন
`সংস্করণ`	string
`ডিভাইসসমূহ()`	function → list of `"cpu"
`module_load(path, opts)`	function → handle
`module_close(handle)`	function
`module_info(handle)`	function → schema map
`module_run(handle, feeds)`	function → outputs map
`জিট_মডিউল.খুলুন(path, opts)`	factory → module object

টেনসর হ্যান্ডেল

নাম	ধরন
`tensor_from(spec)`	function → handle map
`টেনসর_ডেটা(handle)`	function → list
`টেনসর_বন্ধ_করুন(handle)`	function
`tensor_argmax_last(handle, vocab)`	function → int
`tensor_sample_last(handle, vocab, temp, top_k, top_p, seed)`	function → int

`.safetensors` লোডার

নাম	ধরন
`load_safetensors(path)`	function → `{ name: handle_map }`

NN ops (Path B-র জন্য)

নাম	কী করে
`embedding(weight, indices)`	টোকেন lookup
`linear(input, weight, bias_or_null)`	matmul + optional bias
`matmul(a, b)`	generic matmul
`rms_norm(input, weight, eps)`	RMS normalization (fp32-internal)
`silu(x)` / `softmax(x, dim)`	activations
`scaled_dot_product_attention(q, k, v, mask, is_causal)`	torch-এর fused SDPA
`apply_rotary(q, k, cos, sin)`	RoPE (Llama/Qwen split-half)
`cat(tensors, dim)` / `slice(t, dim, s, e)` / `transpose(t, d0, d1)` / `view(t, shape)`	shape ops
`repeat_kv(kv, n_rep)`	GQA expansion
`add(a, b)` / `mul(a, b)` / `to_dtype(t, dtype)`	elementwise + dtype
`arange(start, end, step, dtype)` / `ones(shape, dtype)` / `zeros(shape, dtype)`	tensor creation
`build_rope_cache(positions, head_dim, theta)`	precompute cos/sin
`attention_block(...)`	fused QKV→reshape→RoPE→KV concat→GQA expand→SDPA→out proj
`swiglu_mlp(x, gate_w, up_w, down_w)`	fused SwiGLU MLP

থ্রেডিং + প্রসেস কন্ট্রোল

নাম	কী করে
`set_num_threads(n)` / `get_num_threads()`	intra-op প্যারালেলিজম টিউন
`stdout_write(text)`	newline ছাড়া stdout লিখা (Node-এর `process.stdout.write` সমতুল্য)
`exit(code)`	তাৎক্ষণিক প্রসেস এক্সিট (libtorch থ্রেডপুল hang এড়ায়)

অবস্থা (v1.0.0)

যা কাজ করে:

TorchScript .pt লোডিং + forward pass
.safetensors সরাসরি লোডিং (কনভার্শন স্টেপ ছাড়া)
Tensor handle table (zero-copy KV-cache reuse downstream LLM loop-এর জন্য)
Fused attention_block + swiglu_mlp ops
bf16 ↔ fp32 auto cast (লোড টাইমে — CPU-তে দ্রুত)
Architecture-aware vocab (যেকোনো vocab size কাজ করে)
InferenceMode globally enabled + CPU thread tuning

পরবর্তী:

আরো device backend: CUDA, MPS (CMake flag flip + CUDA libtorch দিয়ে)
আরো architecture forward pass (Mistral, Gemma, Phi-3, GPT-2)
Image-gen ops: conv2d, group_norm, interpolate (SD / SDXL সমর্থনের জন্য)
mmap-ভিত্তিক safetensors লোডিং (বড় মডেলের জন্য কম মেমরি)

লোকাল বিল্ড

# Windows
bnl script/install.bnl    # deps/windows-x64/ এ libtorch CPU prebuilt নামায় (~250 MB)
.\build.ps1                # cmake configure + build  ->  build/windows-x64/*.dll

# macOS / Linux
bnl script/install.bnl
./build.sh

CUDA সমর্থনের জন্য:

# CUDA libtorch deps/windows-x64/-এ ম্যানুয়ালি বসান, তারপর:
cmake --preset windows-x64 -DUSE_CUDA=ON
cmake --build build/windows-x64 --config Release

ক্রস-প্ল্যাটফর্ম

প্রতি platform-এ আলাদাভাবে build হয়। bnl runtime import-এর সময় bnl.json-এর artifact path খোঁজে:

Triple	বিল্ড artifact
`windows-x64`	`build/windows-x64/torch-bnlang.dll`
`linux-x64`	`build/linux-x64/torch-bnlang.so`
`darwin-arm64`	`build/darwin-arm64/torch-bnlang.dylib`
`darwin-x64`	`build/darwin-x64/torch-bnlang.dylib`

libtorch-এর সঙ্গে আসা runtime DLL/SO গুলো (torch_cpu, c10, fbgemm, libiomp5md ইত্যাদি) plugin-এর পাশে কপি হয়; bnl core-এর LOAD_WITH_ALTERED_SEARCH_PATH সেগুলোই আগে খুঁজে।

লেআউট

bnl.json                 manifest (main + native path)
CMakeLists.txt           build config
CMakePresets.json        প্রতি platform-এ একটি preset

lib/
  index.bnl              public API (ইংরেজি + বাংলা re-export)
  jit_module.bnl         JitModule / জিট_মডিউল

src/                     C++ source (publish-এ আসে না)
  bnl/plugin.h           C ABI
  json.hpp               vendored nlohmann/json (single-header)
  main.cpp               bnl_load entry + devices / stdout_write / exit
  module.{h,cpp}         torch::jit::script::Module handle table + forward routing
  tensor.{h,cpp}         bnl_value <-> torch::Tensor + handle table + argmax/sample
  safetensors.{h,cpp}    .safetensors parser
  ops.{h,cpp}            NN op surface (embedding, linear, rms_norm,
                         attention_block, swiglu_mlp, ...)

deps/<triple>/           libtorch prebuilt (gitignored; install দ্বারা populate হয়)
build/<triple>/          cmake output (gitignored)

script/
  install.bnl            libtorch prebuilt fetch + extract
  install-metadata.bnl   প্রতি platform-এ URL + sha256

লাইসেন্স

MIT. libtorch (BSD-3-Clause), nlohmann/json (MIT) — তৃতীয় পক্ষীয় attribution NOTICES.md-এ আছে।

torch-bnlang

Bnlang binding for libtorch (PyTorch C++).

Unofficial third-party binding. Not affiliated with Meta or the PyTorch Foundation. Consumes the official libtorch C++ prebuilt.

এই README-এর বাংলা সংস্করণ — README.md

Quick start

Run a TorchScript `.pt` model

import "torch-bnlang" as tch;

print(tch.version);
print(tch.devices());

var mod = tch.JitModule.open("model.pt", { device: "cpu" });

var out = mod.run({
    "input_ids":      { dtype: "int64", shape: [1, 5], data: [...] },
    "attention_mask": { dtype: "int64", shape: [1, 5], data: [...] }
});
// out["output"] is { dtype, shape, handle } — materialize with tensor_to_data:
var result = tch.tensor_to_data(out["output"]["handle"]);
tch.tensor_close(out["output"]["handle"]);

mod.close();

Load `.safetensors` directly — no conversion step

import "torch-bnlang" as tch;

var weights = tch.load_safetensors("model.safetensors");
// weights = { "model.embed_tokens.weight": { dtype, shape, handle }, ... }

var ids = tch.view(tch.arange(0, 7, 1, "int64"), [1, 7]);
var h   = tch.embedding(weights["model.embed_tokens.weight"], ids);
// ... continue with rms_norm, attention_block, swiglu_mlp, etc.

Full forward-pass examples live in transformers-bnlang/lib/architectures/qwen2.bnl and llama.bnl.

Exports

Session / module

Name	Kind
`version`	string
`devices()`	function → list of `"cpu"
`module_load(path, opts)`	function → handle
`module_close(handle)`	function
`module_info(handle)`	function → schema map
`module_run(handle, feeds)`	function → outputs map
`JitModule.open(path, opts)`	factory → module object

Tensor handles

Name	Kind
`tensor_from(spec)`	function → handle map
`tensor_to_data(handle)`	function → list
`tensor_close(handle)`	function
`tensor_argmax_last(handle, vocab)`	function → int
`tensor_sample_last(handle, vocab, temp, top_k, top_p, seed)`	function → int

`.safetensors` loader

Name	Kind
`load_safetensors(path)`	function → `{ name: handle_map }`

NN ops (for the Path B "torch-native" path)

Name	What it does
`embedding(weight, indices)`	token lookup
`linear(input, weight, bias_or_null)`	matmul + optional bias
`matmul(a, b)`	generic matmul
`rms_norm(input, weight, eps)`	RMS normalization (fp32-internal)
`silu(x)` / `softmax(x, dim)`	activations
`scaled_dot_product_attention(q, k, v, mask, is_causal)`	torch's fused SDPA
`apply_rotary(q, k, cos, sin)`	RoPE (Llama/Qwen split-half)
`cat(tensors, dim)` / `slice(t, dim, s, e)` / `transpose(t, d0, d1)` / `view(t, shape)`	shape ops
`repeat_kv(kv, n_rep)`	GQA expansion
`add(a, b)` / `mul(a, b)` / `to_dtype(t, dtype)`	elementwise + dtype
`arange(start, end, step, dtype)` / `ones(shape, dtype)` / `zeros(shape, dtype)`	tensor creation
`build_rope_cache(positions, head_dim, theta)`	precompute cos/sin
`attention_block(...)`	fused QKV→reshape→RoPE→KV concat→GQA expand→SDPA→out proj
`swiglu_mlp(x, gate_w, up_w, down_w)`	fused SwiGLU MLP

Threading + process control

Name	What it does
`set_num_threads(n)` / `get_num_threads()`	intra-op parallelism tuning
`stdout_write(text)`	unbuffered no-newline stdout write (Node's `process.stdout.write` equivalent)
`exit(code)`	immediate process exit (avoids libtorch thread-pool hang on Windows)

Status (v1.0.0)

Working:

TorchScript .pt loading + forward pass
Direct .safetensors loading (no conversion step)
Tensor handle table (zero-copy KV-cache reuse for downstream LLM loops)
Fused attention_block + swiglu_mlp ops
Auto bf16 → fp32 cast at load (faster CPU matmul on x86 without AVX-512-BF16)
Architecture-aware vocab size (any vocab works)
InferenceMode globally enabled + CPU thread tuning

Coming next:

More device backends: CUDA, MPS (CMake flag flip + CUDA libtorch)
More architecture forwards (Mistral, Gemma, Phi-3, GPT-2)
Image-gen ops: conv2d, group_norm, interpolate (for SD / SDXL)
mmap-based safetensors loading (lower memory for large models)

Build (local)

# Windows
bnl script/install.bnl    # download libtorch CPU prebuilt into deps/windows-x64/ (~250 MB)
.\build.ps1                # cmake configure + build  ->  build/windows-x64/*.dll

# macOS / Linux
bnl script/install.bnl
./build.sh

For CUDA support:

# Drop a CUDA libtorch into deps/windows-x64/ manually, then:
cmake --preset windows-x64 -DUSE_CUDA=ON
cmake --build build/windows-x64 --config Release

Cross-platform packaging

Built per platform; the bnl runtime picks the right artifact from bnl.json at import time:

Triple	Build artifact
`windows-x64`	`build/windows-x64/torch-bnlang.dll`
`linux-x64`	`build/linux-x64/torch-bnlang.so`
`darwin-arm64`	`build/darwin-arm64/torch-bnlang.dylib`
`darwin-x64`	`build/darwin-x64/torch-bnlang.dylib`

The libtorch runtime libraries (torch_cpu, c10, fbgemm, libiomp5md, …) ship alongside the plugin and are loaded via bnl core's LOAD_WITH_ALTERED_SEARCH_PATH (plugin dir searched first).

Layout

bnl.json                 manifest (main + native path)
CMakeLists.txt           build config
CMakePresets.json        one preset per platform

lib/
  index.bnl              public API (English + Bangla re-exports)
  jit_module.bnl         JitModule + session-instance methods

src/                     C++ binding (excluded from published tarball)
  bnl/plugin.h           C ABI contract
  json.hpp               vendored nlohmann/json (single-header)
  main.cpp               bnl_load entry + devices / stdout_write / exit
  module.{h,cpp}         torch::jit::script::Module handle table + forward routing
  tensor.{h,cpp}         bnl_value <-> torch::Tensor + handle table + argmax/sample
  safetensors.{h,cpp}    .safetensors parser
  ops.{h,cpp}            NN op surface (embedding, linear, rms_norm,
                         attention_block, swiglu_mlp, ...)

deps/<triple>/           libtorch prebuilt (gitignored; install populates)
build/<triple>/          cmake output (gitignored)

script/
  install.bnl            download + extract libtorch prebuilt
  install-metadata.bnl   URLs + sha256 per platform

License

MIT. libtorch is BSD-3-Clause (PyTorch project); nlohmann/json is MIT. See NOTICES.md for third-party attribution.

MIT License

Copyright (c) 2026 Bnlang | Mamun

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to furnish persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Third-party notices — torch-bnlang

This package's binary plugin dynamically links to and consumes prebuilt binaries from the projects below. They are downloaded at install time by script/install.bnl from their official distribution channels and remain governed by their original licenses.

libtorch (PyTorch C++ runtime)

Project: PyTorch
Upstream: https://github.com/pytorch/pytorch
License: BSD 3-Clause
Copyright: Copyright (c) 2016- Facebook, Inc (Adam Paszke); Copyright (c) 2014- Facebook, Inc (Soumith Chintala); Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert); Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu); Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu); Copyright (c) 2011-2013 NYU (Clement Farabet); Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston); Copyright (c) 2006 Idiap Research Institute (Samy Bengio); Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)

Used as the inference engine. We download the official libtorch C++ prebuilt (torch_cpu.dll, c10.dll, torch.dll, fbgemm.dll, libiomp5md.dll, ...) at install time from https://pytorch.org/get-started/locally/ (the C++/Java tab) and dynamically link to it from our plugin. We do not redistribute the libtorch binaries; users fetch them from PyTorch's official download servers.

The full BSD 3-Clause license text is available at https://github.com/pytorch/pytorch/blob/main/LICENSE

nlohmann/json

Project: JSON for Modern C++
Upstream: https://github.com/nlohmann/json
License: MIT

Vendored as src/json.hpp (single-header) and used by safetensors.cpp to parse the JSON header of .safetensors files. Not redistributed as a separate binary — compiled inline into torch-bnlang.dll.

The full MIT license text is available at https://github.com/nlohmann/json/blob/develop/LICENSE.MIT

safetensors format

Project: safetensors (file format specification)
Upstream: https://github.com/huggingface/safetensors
License: Apache 2.0 (the reference Rust implementation)
Copyright: Copyright (c) Hugging Face

Our src/safetensors.cpp is a clean-room re-implementation of the file-format parser in C++ — no code copied from Hugging Face's Rust crate. The format itself is an open specification.

Intel OpenMP runtime

Project: Intel OpenMP (libiomp5md.dll)
Upstream: ships with libtorch's Windows prebuilt
License: Intel Simplified Software License (as redistributed by PyTorch)

Loaded by libtorch for intra-op parallelism on CPU. Not redistributed by this package directly — it arrives as part of the libtorch zip.

torch-bnlang

torch-bnlang

দ্রুত শুরু

TorchScript .pt মডেল চালানো

.safetensors সরাসরি লোড করা (কনভার্শন লাগে না)

Export-গুলো

সেশন / মডিউল

টেনসর হ্যান্ডেল

.safetensors লোডার

NN ops (Path B-র জন্য)

থ্রেডিং + প্রসেস কন্ট্রোল

অবস্থা (v1.0.0)

লোকাল বিল্ড

ক্রস-প্ল্যাটফর্ম

লেআউট

লাইসেন্স

torch-bnlang

Quick start

Run a TorchScript .pt model

Load .safetensors directly — no conversion step

Exports

Session / module

Tensor handles

.safetensors loader

NN ops (for the Path B "torch-native" path)

Threading + process control

Status (v1.0.0)

Build (local)

Cross-platform packaging

Layout

License

Third-party notices — torch-bnlang

libtorch (PyTorch C++ runtime)

nlohmann/json

safetensors format

Intel OpenMP runtime

TorchScript `.pt` মডেল চালানো

`.safetensors` সরাসরি লোড করা (কনভার্শন লাগে না)

`.safetensors` লোডার

Run a TorchScript `.pt` model

Load `.safetensors` directly — no conversion step

`.safetensors` loader