GPU Llama3 是 AI Skill Hub 本期精选AI工具之一。综合评分 8.0 分,整体质量较高。我们强烈推荐将其纳入你的 AI 工具库,帮助提升工作效率。
GPU Llama3 是一款基于 Java 开发的开源工具,专注于 GPU、Java、AI 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
GPU Llama3 是一款基于 Java 开发的开源工具,专注于 GPU、Java、AI 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
# 克隆仓库 git clone https://github.com/beehive-lab/GPULlama3.java cd GPULlama3.java # 查看安装说明 cat README.md # 按 README 完成环境依赖安装后即可使用
# 查看帮助 gpullama3.java --help # 基本运行 gpullama3.java [options] <input> # 详细使用说明请查阅文档 # https://github.com/beehive-lab/GPULlama3.java
# gpullama3.java 配置说明 # 查看配置选项 gpullama3.java --config-example > config.yml # 常见配置项 # output_dir: ./output # log_level: info # workers: 4 # 环境变量(覆盖配置文件) export GPULLAMA3.JAVA_CONFIG="/path/to/config.yml"
--interactive and --instruct modes.Click here to view a more detailed list of the transformer optimizations implemented in TornadoVM.
Click here to see the roadmap of the project.
-----------
Ensure you have the following installed and configured:
You can add GPULlama3.java directly to your Maven project by including the following dependency in your pom.xml:
JDK 21:
<dependency>
<groupId>io.github.beehive-lab</groupId>
<artifactId>gpu-llama3</artifactId>
<version>0.4.0</version>
</dependency>
JDK 25:
<dependency>
<groupId>io.github.beehive-lab</groupId>
<artifactId>gpu-llama3</artifactId>
<version>0.4.0-jdk25</version>
</dependency>
TORNADOVM_HOME environment variable set (see Setup section above)| Model Size | Recommended GPU Memory |
|---|---|
| 1B models | 7GB (default) |
| 3-7B models | 15GB |
| 8B models | 20GB+ |
Note: If you still encounter memory issues, try:
-----------
----------- <table style="border: none;"> <tr style="border: none;"> <td style="width: 40%; vertical-align: middle; border: none;"> <img src="docs/ll.gif" > </td> <td style="vertical-align: middle; padding-left: 20px; border: none;"> <strong>Llama3</strong> models written in <strong>native Java</strong> automatically accelerated on GPUs with <a href="https://github.com/beehive-lab/TornadoVM" target="_blank"><strong>TornadoVM</strong></a>. Runs Llama3 inference efficiently using TornadoVM's GPU acceleration. <br><br> Currently, supports <strong>Llama3</strong>, <strong>Mistral</strong>, <strong>Devstral 2</strong>, <strong>Qwen2.5</strong>, <strong>Qwen3</strong>, <strong>Phi-3</strong>, <strong> IBM Granite 3.2+ </strong> and <strong> IBM Granite 4.0 </strong> models in the GGUF format. Also, it is used as GPU inference engine in <a href="https://docs.quarkiverse.io/quarkus-langchain4j/dev/gpullama3-chat-model.html" target="_blank">Quarkus</a> and <a href="https://docs.langchain4j.dev/integrations/language-models/gpullama3-java" target="_blank">LangChain4J</a>. <br><br> Builds on <a href="https://github.com/mukel/llama3.java">Llama3.java</a> by <a href="https://github.com/mukel">Alfonso² Peterssen</a>. Previous integration of TornadoVM and Llama2 it can be found in <a href="https://github.com/mikepapadim/llama2.tornadovm.java">llama2.tornadovm</a>. </td> </tr> </table>
-----------
```bash
curl -Ls https://sh.jbang.dev | bash -s - app setup
jbang app install gpullama3@beehive-lab gpullama3 -m model.gguf -p "Hello!"
or the local:bash
You can run GPULlama3.java fully containerized with GPU acceleration enabled via OpenCL or PTX using pre-built Docker images. More information as well as examples to run with the containers are available at docker-gpullama3.java.
| Backend | Docker Image | Pull Command |
|---|---|---|
| **OpenCL** | [beehivelab/gpullama3.java-nvidia-openjdk-opencl](https://hub.docker.com/r/beehivelab/gpullama3.java-nvidia-openjdk-opencl) | docker pull beehivelab/gpullama3.java-nvidia-openjdk-opencl |
| **PTX (CUDA)** | [beehivelab/gpullama3.java-nvidia-openjdk-ptx](https://hub.docker.com/r/beehivelab/gpullama3.java-nvidia-openjdk-ptx) | docker pull beehivelab/gpullama3.java-nvidia-openjdk-ptx |
```bash docker run --rm -it --gpus all \ -v "$PWD":/data \ beehivelab/gpullama3.java-nvidia-openjdk-opencl \ /gpullama3/GPULlama3.java/llama-tornado \ --gpu --verbose-init \ --opencl \ --model /data/Llama-3.2-1B-Instruct.FP16.gguf \ --prompt "Tell me a joke"
Use from catalog:
```bash
jbang LlamaTornadoCli.java -m beehive-llama-3.2-1b-instruct-fp16.gguf --interactive
#### Basic Inference Run a model with a text prompt:
./llama-tornado --gpu --verbose-init --opencl --model beehive-llama-3.2-1b-instruct-fp16.gguf --prompt "Explain the benefits of GPU acceleration."
#### GPU Execution (FP16 Model) Enable GPU acceleration with Q8_0 quantization:
./llama-tornado --gpu --verbose-init --model beehive-llama-3.2-1b-instruct-fp16.gguf --prompt "tell me a joke"
llamaTornado (Java 25 single-file script)llamaTornado is a zero-dependency Java 25 single-file script that replaces the Python launcher. It requires java 25+ on your PATH:
./llamaTornado --gpu --verbose-init --metal --model /Users/abien/work/workspaces/llms/Mistral-7B-Instruct-v0.3.Q8_0.gguf --prompt "what is java"
-----------
Supported command-line options include:
cmd ➜ llama-tornado --help
usage: llama-tornado [-h] --model MODEL_PATH [--prompt PROMPT] [-sp SYSTEM_PROMPT] [--temperature TEMPERATURE] [--top-p TOP_P] [--seed SEED] [-n MAX_TOKENS]
[--stream STREAM] [--echo ECHO] [-i] [--instruct] [--gpu] [--opencl] [--ptx] [--gpu-memory GPU_MEMORY] [--heap-min HEAP_MIN] [--heap-max HEAP_MAX]
[--debug] [--profiler] [--profiler-dump-dir PROFILER_DUMP_DIR] [--print-bytecodes] [--print-threads] [--print-kernel] [--full-dump]
[--show-command] [--execute-after-show] [--opencl-flags OPENCL_FLAGS] [--max-wait-events MAX_WAIT_EVENTS] [--verbose]
GPU-accelerated LLaMA.java model runner using TornadoVM
options:
-h, --help show this help message and exit
--model MODEL_PATH Path to the LLaMA model file (e.g., beehive-llama-3.2-8b-instruct-fp16.gguf) (default: None)
LLaMA Configuration:
--prompt PROMPT Input prompt for the model (default: None)
-sp SYSTEM_PROMPT, --system-prompt SYSTEM_PROMPT
System prompt for the model (default: None)
--temperature TEMPERATURE
Sampling temperature (0.0 to 2.0) (default: 0.1)
--top-p TOP_P Top-p sampling parameter (default: 0.95)
--seed SEED Random seed (default: current timestamp) (default: None)
-n MAX_TOKENS, --max-tokens MAX_TOKENS
Maximum number of tokens to generate (default: 512)
--stream STREAM Enable streaming output (default: True)
--echo ECHO Echo the input prompt (default: False)
--suffix SUFFIX Suffix for fill-in-the-middle request (Codestral) (default: None)
Mode Selection:
-i, --interactive Run in interactive/chat mode (default: False)
--instruct Run in instruction mode (default) (default: True)
Hardware Configuration:
--gpu Enable GPU acceleration (default: False)
--opencl Use OpenCL backend (default) (default: None)
--ptx Use PTX/CUDA backend (default: None)
--gpu-memory GPU_MEMORY
GPU memory allocation (default: 7GB)
--heap-min HEAP_MIN Minimum JVM heap size (default: 20g)
--heap-max HEAP_MAX Maximum JVM heap size (default: 20g)
Debug and Profiling:
--debug Enable debug output (default: False)
--profiler Enable TornadoVM profiler (default: False)
--profiler-dump-dir PROFILER_DUMP_DIR
Directory for profiler output (default: /home/mikepapadim/repos/gpu-llama3.java/prof.json)
TornadoVM Execution Verbose:
--print-bytecodes Print bytecodes (tornado.print.bytecodes=true) (default: False)
--print-threads Print thread information (tornado.threadInfo=true) (default: False)
--print-kernel Print kernel information (tornado.printKernel=true) (default: False)
--full-dump Enable full debug dump (tornado.fullDebug=true) (default: False)
--verbose-init Enable timers for TornadoVM initialization (llama.EnableTimingForTornadoVMInit=true) (default: False)
Command Display Options:
--show-command Display the full Java command that will be executed (default: False)
--execute-after-show Execute the command after showing it (use with --show-command) (default: False)
Advanced Options:
--opencl-flags OPENCL_FLAGS
OpenCL compiler flags (default: -cl-denorms-are-zero -cl-no-signed-zeros -cl-finite-math-only)
--max-wait-events MAX_WAIT_EVENTS
Maximum wait events for TornadoVM event pool (default: 32000)
--verbose, -v Verbose output (default: False)
View TornadoVM's internal behavior: ```bash
./llama-tornado --gpu --model model.gguf --prompt "..." --print-threads --print-bytecodes --print-kernel ```
Since LangChain4j v1.7.1, GPULlama3.java is officially supported as a model provider. This means you can directly use GPULlama3.java inside your LangChain4j applications without extra glue code, just run on your GPU.
📖 Learn more: LangChain4j Documentation
Example agentic workflows with GPULlama3.java + LangChain4j 🚀
How to use: ```java GPULlama3ChatModel model = GPULlama3ChatModel.builder() .modelPath(modelPath) .temperature(0.9) // more creative .topP(0.9) // more variety .maxTokens(2048) .onGPU(Boolean.TRUE) // if false, runs on CPU though a lightweight implementation of llama3.java .build();
git clone https://github.com/beehive-lab/GPULlama3.java.git
#### Install the TornadoVM SDK on Linux or macOS
Ensure that your JAVA_HOME points to a supported JDK before using the SDK. Download an SDK package matching your OS, architecture, and accelerator backend (opencl, ptx).
TornadoVM is distributed through our [**official website**](https://www.tornadovm.org/downloads) and **SDKMAN!**. Install a version that matches your OS, architecture, and accelerator backend.
All TornadoVM SDKs are available on the [SDKMAN! TornadoVM page](https://sdkman.io/sdks/tornadovm/).
#### SDKMAN! Installation (Recommended)
##### Install SDKMAN! if not installed alreadybash curl -s "https://get.sdkman.io" | bash source "$HOME/.sdkman/bin/sdkman-init.sh" sdk version ##### Install TornadoVM via SDKMAN!bash sdk install tornadovm
#### Verify TornadoVM is Installed Correctlybash tornado --devices
To integrate it into your codebase or IDE (e.g., IntelliJ) or custom build system (like IntelliJ, Maven, or Gradle), use the --show-command flag. This flag shows the exact Java command with all JVM flags that are being invoked under the hood to enable seamless execution on GPUs with TornadoVM. Hence, it makes it simple to replicate or embed the invoked flags in any external tool or codebase.
llama-tornado --gpu --model beehive-llama-3.2-1b-instruct-fp16.gguf --prompt "tell me a joke" --show-command
<details> <summary>📋 Click to see the JVM configuration </summary>
/home/mikepapadim/.sdkman/candidates/java/current/bin/java \
-server \
-XX:+UnlockExperimentalVMOptions \
-XX:+EnableJVMCI \
-Xms20g -Xmx20g \
--enable-preview \
-Djava.library.path=/home/mikepapadim/manchester/TornadoVM/bin/sdk/lib \
-Djdk.module.showModuleResolution=false \
--module-path .:/home/mikepapadim/manchester/TornadoVM/bin/sdk/share/java/tornado \
-Dtornado.load.api.implementation=uk.ac.manchester.tornado.runtime.tasks.TornadoTaskGraph \
-Dtornado.load.runtime.implementation=uk.ac.manchester.tornado.runtime.TornadoCoreRuntime \
-Dtornado.load.tornado.implementation=uk.ac.manchester.tornado.runtime.common.Tornado \
-Dtornado.load.annotation.implementation=uk.ac.manchester.tornado.annotation.ASMClassVisitor \
-Dtornado.load.annotation.parallel=uk.ac.manchester.tornado.api.annotations.Parallel \
-Dtornado.tvm.maxbytecodesize=65536 \
-Duse.tornadovm=true \
-Dtornado.threadInfo=false \
-Dtornado.debug=false \
-Dtornado.fullDebug=false \
-Dtornado.printKernel=false \
-Dtornado.print.bytecodes=false \
-Dtornado.device.memory=7GB \
-Dtornado.profiler=false \
-Dtornado.log.profiler=false \
-Dtornado.profiler.dump.dir=/home/mikepapadim/repos/gpu-llama3.java/prof.json \
-Dtornado.enable.fastMathOptimizations=true \
-Dtornado.enable.mathOptimizations=false \
-Dtornado.enable.nativeFunctions=fast \
-Dtornado.loop.interchange=true \
-Dtornado.eventpool.maxwaitevents=32000 \
"-Dtornado.opencl.compiler.flags=-cl-denorms-are-zero -cl-no-signed-zeros -cl-finite-math-only" \
--upgrade-module-path /home/mikepapadim/manchester/TornadoVM/bin/sdk/share/java/graalJars \
@/home/mikepapadim/manchester/TornadoVM/bin/sdk/etc/exportLists/common-exports \
@/home/mikepapadim/manchester/TornadoVM/bin/sdk/etc/exportLists/opencl-exports \
--add-modules ALL-SYSTEM,tornado.runtime,tornado.annotation,tornado.drivers.common,tornado.drivers.opencl \
-cp /home/mikepapadim/repos/gpu-llama3.java/target/gpu-llama3-1.0-SNAPSHOT.jar \
org.beehive.gpullama3.LlamaApp \
-m beehive-llama-3.2-1b-instruct-fp16.gguf \
--temperature 0.1 \
--top-p 0.95 \
--seed 1746903566 \
--max-tokens 512 \
--stream true \
--echo false \
-p "tell me a joke" \
--instruct
</details>
-----------
The above model can we swapped with one of the other models, such as beehive-llama-3.2-3b-instruct-fp16.gguf or beehive-llama-3.2-8b-instruct-fp16.gguf, depending on your needs. Check models below.
-----------
高性能AI推理工具,易于集成
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ MIT 协议 — 最宽松的开源协议之一,可自由商用、修改、分发,仅需保留版权声明。
经综合评估,GPU Llama3 在AI工具赛道中表现稳健,质量优秀。如果你已有明确的使用需求,可以直接上手体验;如果还在评估阶段,建议对比同类工具后再做决策。
| 原始名称 | GPULlama3-java |
| Topics | GPUJavaAI加速 |
| GitHub | https://github.com/beehive-lab/GPULlama3.java |
| License | MIT |
| 语言 | Java |
收录时间:2026-06-09 · 更新时间:2026-06-09 · License:MIT · AI Skill Hub 不对第三方内容的准确性作法律背书。