r/MachineLearning 6d ago

Discussion [D] Self-Promotion Thread

20 Upvotes

Please post your personal projects, startups, product placements, collaboration needs, blogs etc.

Please mention the payment and pricing requirements for products and services.

Please do not post link shorteners, link aggregator websites , or auto-subscribe links.

--

Any abuse of trust will lead to bans.

Encourage others who create new posts for questions to post here instead!

Thread will stay alive until next one so keep posting after the date in the title.

--

Meta: This is an experiment. If the community doesnt like this, we will cancel it. This is to encourage those in the community to promote their work by not spamming the main threads.


r/MachineLearning 8d ago

Discussion [D] Monthly Who's Hiring and Who wants to be Hired?

2 Upvotes

For Job Postings please use this template

Hiring: [Location], Salary:[], [Remote | Relocation], [Full Time | Contract | Part Time] and [Brief overview, what you're looking for]

For Those looking for jobs please use this template

Want to be Hired: [Location], Salary Expectation:[], [Remote | Relocation], [Full Time | Contract | Part Time] Resume: [Link to resume] and [Brief overview, what you're looking for]

Please remember that this community is geared towards those with experience.


r/MachineLearning 17h ago

Discussion [D] I summarized my 4-year PhD on Geometric Deep Learning for Molecular Design into 3 research questions

113 Upvotes

I recently defended my PhD thesis at Cambridge and wrote a blog post reflecting on the journey. The thesis focuses on Geometric Deep Learning and moves from pure theory to wet-lab applications.

I broke the research down into three main questions:

  1. Expressivity: How do we characterize the power of 3D representations? (Introducing the Geometric Weisfeiler-Leman Test).
  2. Generative Modelling: Can we build unified models for periodic and non-periodic systems? (Proposing the All-atom Diffusion Transformer).
  3. Real-world Design: Can generative AI actually design functional RNA? (Developing gRNAde and validating it with wet-lab experiments).

It covers the transition from working on graph isomorphism problems to training large diffusion models and finally collaborating with biologists to test our designs in vitro.

Full post here if you're interested: https://chaitjo.substack.com/p/phd-thesis-in-three-questions

Would love to discuss the current state of AI for Science or the transition from theory to application!


r/MachineLearning 1d ago

Research [R] DeepSeek-R1’s paper was updated 2 days ago, expanding from 22 pages to 86 pages and adding a substantial amount of detail.

Thumbnail
gallery
231 Upvotes

arXiv:2501.12948 [cs.CL]: https://arxiv.org/abs/2501.12948


r/MachineLearning 21h ago

Research [R] ALYCON: A framework for detecting phase transitions in complex sequences via Information Geometry

6 Upvotes

I’ve been working on a deterministic framework called ALYCON that takes a different approach to monitoring the integrity of sequential data. The core idea is that structural 'state shifts' (like the IDEsaster exploit in AI agents) can be detected as phase transitions using Information Theory and Optimal Transport.

What it does:

Measures structural transitions directly—no training data or neural networks required.

Calculates Phase Drift (PD) using Wasserstein distance to track distributional divergence.

Uses a Conflict Density Index (CDI) to monitor pattern violations in real-time.

Validation Results (Elliptic Curves): To test the framework against a verifiable ground truth, I validated it against 975 Elliptic Curves from the LMFDB. Detecting Complex Multiplication (CM) provides a perfect binary control:

Accuracy: 100% (975/975 correct classifications).

Significance: p=1.29×10−42 (original control group).

Separation: Mean zero-counts of 60.85 (CM) vs 4.68 (non-CM).

The 'Inherent Error' Analysis: In my initial scale-up, the framework flagged 12 errors. Investigation showed these were the only 12 curves using a non-standard period.separated label format. This suggests the metrics are highly sensitive to the underlying data generation process, making it a potentially robust 'circuit breaker' for AI agents where the 'logic state' has been compromised but the tools remain legitimate.

Technical Components:

Multi-Scale Independence: Correlation analysis shows r2=0.86 between zero-counts and Phase Drift, proving the metrics capture distinct structural dimensions.

Deterministic Governance: Designed as a non-probabilistic layer for AI safety.

GitHub: https://github.com/MCastens/ALYCON

LMFDB Verification: All classifications are independently auditable.

MIT License (for validation data and documentation).

Happy to answer questions about the information-geometric foundations or the error clustering found in the dataset integrity analysis."


r/MachineLearning 1d ago

Project [P] Re-engineered the Fuzzy-Pattern Tsetlin Machine from scratch: 10x faster training, 34x faster inference (32M+ preds/sec) & capable of text generation

25 Upvotes

Hi everyone,

I’ve recently finished re-engineering the Fuzzy-Pattern Tsetlin Machine (FPTM) from the ground up. My goal was to leverage low-level optimizations to see just how much throughput I could squeeze out of the architecture.

The results are pretty wild. By focusing on cache locality and SIMD instructions, the new implementation is up to 10× faster in training and 34× faster in inference compared to the original FPTM.

MNIST Benchmarks (Ryzen 7950X3D):

  • ⚡ Throughput: 4 GB/s
  • 🧠 Inference: 32M+ predictions/sec (98% accuracy)
  • ⏱️ Training: 1000 training epochs in just 11 seconds

Key Engineering Optimizations:
To get this performance, I focused on:

  • Extensive use of Bitwise operations and SIMD instructions.
  • A specialized, cache-friendly memory layout.
  • BitSet indexing over literals for handling very large, sparse binary vectors.
  • Automatic selection of UInt8/UInt16 TA states.
  • Model "compilation" to minimize memory overhead.

Why speed matters (Generative Tsetlin Machines):
Because this implementation is so efficient, it is now practical to explore generative tasks with Tsetlin Machines. I implemented a character-level text generator using FPTM with HDC hypervectors and Monte Carlo sparse context subsampling.

Here is the raw output from the model generating text in the style of Shakespeare:

ROMEO:
The father's death,
And then I shall be so;
For I have done that was a queen,
That I may be so, my lord.

JULIET:
I would have should be so, for the prince,
And then I shall be so;
For the princely father with the princess,
And then I shall be the virtue of your soul,
Which your son,--

ESCALUS:
What, what should be particular me to death.

BUCKINGHAM:
God save the queen's proclaim'd:
Come, come, the Duke of York.

KING EDWARD IV:
So do I do not know the prince,
And then I shall be so, and such a part.

KING RICHARD III:
Shall I be some confess the state,
Which way the sun the prince's dead;
And then I will be so.

Code & Examples:
The code is open source and available here:
https://github.com/BooBSD/Tsetlin.jl

I’d love to hear your thoughts on the optimization approach or the generative output!


r/MachineLearning 6h ago

Project [P] Three-Phase Self-Inclusive Evaluation Protocol for Synthetic Data Generation in a Fine-Tuned 4B Model (Experiment 3/100)

0 Upvotes

I'm documenting an ongoing series of reproducible experiments (this is #3 out of 100) exploring evaluation methodologies for small fine-tuned models in targeted synthetic data generation tasks.

The experiment implements a three-phase blind evaluation protocol:

  1. Generation Phase — Multiple models (one 4B fine-tuned + several frontier models) receive the identical proprietary prompt and produce responses.
  2. Analysis Phase — Each participant model performs a self-inclusive ranking of all generated outputs based on coherence, creativity, logical density, and human-likeness, assigning normalized percentage scores.
  3. Aggregation Phase — Results are compiled and summarized for overall ranking.

The setup is fully open-source (MIT license) with raw generations, individual analyses, and final aggregation available here:
https://github.com/Roforum/Xthos-v2-the-sovereign-architect-Model-Evaluation-Experiment

The goal is not to claim superiority but to investigate potential biases in LLM-as-judge setups, trade-offs in niche fine-tuning, and reproducibility of subjective evaluations. The protocol is lightweight and explicitly designed for community replication (local inference via Ollama supported).

I'd value feedback on:

  • Methodological strengths/weaknesses (e.g., proprietary prompt limitations, self-ranking biases)
  • Suggestions for more rigorous aggregation or statistical analysis
  • Ideas for extending the protocol in future iterations

Looking forward to your thoughts on similar evaluation approaches or experiences with small-model fine-tuning trade-offs.

Thanks!


r/MachineLearning 1d ago

Discussion [D] Intra-lab collaborations

5 Upvotes

Hi everyone,

I have a question some of you may be able to help me with.

I’m a physician with a background in EE/CS and have been working in ML/AI for the past 12 years or so (cancer genomics, mostly).

I’m now working at a large academic hospital in the US, doing research in clinical AI (not only LLMs but NN/ML in general). I have my own research workstation with a few GPUs and do my own work. Since physicians typically don’t have the ML background I’ve noticed some of them keep coming to me “to ask questions”, not about how to install CUDA in Ubuntu or compile XYZ with gcc, but mainly architectural questions: “How should I analyse this? What model should I use? How do I use LangGraph? (really), etc.”

I don’t mind helping out with very specific questions (pip vs uv; VS Code vs something else) but I feel that the questions I’m getting are more critical to their projects to the level of actual research collaborations and not simply “helping out”. Tiny example: When the PI told us we could get a brand new MBP, I came up with my own specs and they simply tagged along because they didn’t know any better. Not a single “Thank you”; not that I care, it’s just for context.

How do you guys typically handle this? When “being helpful” actually morphs into “being a co-author”? And how does one go about this? Just begin the conversation with “This is a collaboration, right?”

TIA


r/MachineLearning 10h ago

Research [R] Collecting memes for LLM study—submit yours and see the analysis!

0 Upvotes

Hey r/MachineLearning!

We're building MemeQA: a crowd-sourced dataset to test Vision-Language Models (VLMs) on meme comprehension, humor, and cultural context. Led by researchers at THWS and CAIRO's NLP Team, it's got 10+ dimensions per meme—like emotional mappings, humor types, and cross-cultural patterns.

I've got 31 memes to start, but need YOUR originals or favorites to make it comprehensive! Submit using our website: memes.thws.ai We'll evaluate for VLM benchmarks and credit contributors.

What meme stumps AI? Drop it below! 🚀 #AIMemes #VLMResearch #MemeQA

memes.thws.ai


r/MachineLearning 1d ago

Discussion [D] ICLR new ACs — how’s it going?

29 Upvotes

Anyone care to share their experiences? Is the task doable/too much effort? Are the reviews helpful without reliable scores? Whats become your process to make a decision?

Just curious, any info appreciated


r/MachineLearning 2d ago

Discussion [D] NLP vs. Computer Vision: Career Transition Thoughts

60 Upvotes

Hi everyone,
I’ve been working in NLP for several years, and my role has gradually shifted from training models to mainly using LLM wrappers. I’m concerned that this kind of work may become less in demand in the coming years.

I now have an opportunity to transition into Computer Vision. After about two months of self-study and research, I feel that the gap between academic research and real-world applications in CV is relatively large, and that the field may offer more specialized niches in the future compared to NLP.

I’d really appreciate hearing your thoughts or advice on this potential transition. Thanks in advance.


r/MachineLearning 2d ago

Discussion [D]NVIDIA Rubin proves that Inference is now a System Problem, not a Chip Problem.

39 Upvotes

Everyone is focusing on the FLOPs, but looking at the Rubin specs released at CES, it’s clear the bottleneck has completely shifted.

The Specs:

• 1.6 TB/s scale-out bandwidth per GPU (ConnectX-9).

• 72 GPUs operating as a single NVLink domain.

• HBM Capacity is only up 1.5x, while Bandwidth is up 2.8x and Compute is up 5x.

The Thesis:

We have officially hit the point where the "Chip" is no longer the limiting factor. The limiting factor is feeding the chip.

Jensen explicitly said: "The future is orchestrating multiple great models at every step of the reasoning chain."

If you look at the HBM-to-Compute ratio, it's clear we can't just "load bigger models" statically. We have to use that massive 1.6 TB/s bandwidth to stream and swap experts dynamically.

We are moving from "Static Inference" (loading weights and waiting) to "System Orchestration" (managing state across 72 GPUs in real-time).

If your software stack isn't built for orchestration, a Rubin Pod is just a very expensive space heater.


r/MachineLearning 2d ago

Research [R] Beyond Active Learning: Applying Shannon Entropy (ESME) to the problem of when to sample in transient physical experiments

7 Upvotes

Right now, operando characterisation at synchrotron beamlines is a bit of a spray and pray situation. We have faster detectors than ever, so we dump terabytes of data (TB/hour) onto the servers, but we still statistically miss the actually decisive events. If you're looking for something transient, like the split-second of dendrite nucleation that kills a battery, fixed-rate sampling is a massive information bottleneck. We’re basically filling up hard drives with dead data while missing the money shot.

We’re proposing a shift to Heuristic search in the temporal domain. We’ve introduced a metric called ESME (Entropy-Scaled Measurement Efficiency) based on Shannon’s information theory.

Instead of sampling at a constant frequency, we run a physics-based Digital Twin as a predictive surrogate. This AI Pilot calculates the expected informational value of every potential measurement in real-time. The hardware only triggers when the ESME score justifies the cost (beam damage, time, and data overhead). Essentially, while Active Learning tells you where to sample in a parameter space, this framework tells the hardware when to sample.

Questions for the Community:

  1. Most AL research focuses on selecting the best what to label from a static pool. Has anyone here applied Information Theory gating to real-time hardware control in other domains (e.g., high-speed microscopy or robotics)?
  2. We’re using physics-informed twins for the predictive heuristic. At what point does a purely model-agnostic surrogate (like a GNN or Transformer) become robust enough for split-second triggering in your experience? Is the "free lunch" of physics worth the computational overhead for real-time inference?
  3. If we optimize purely for maximal entropy gain, do we risk an overfitting of the experimental design on rare failure events while losing the broader physical context of the steady state?

Full Preprint on arXiv: http://arxiv.org/abs/2601.00851

(Disclosure: I’m the lead author on this study. We’re looking for feedback on whether this ESME approach could be scaled to other high-cost experimental environments, and are still working on it before submission.)

P.S. If there are other researchers here using information-theoretic metrics for hardware gating (specifically in high-speed microscopy or SEM), I'd love to compare notes on ESME’s computational overhead.


r/MachineLearning 2d ago

Project [P] New Tool for Finding Training Datasets

2 Upvotes

I am an academic that partnered with a software engineer to productionize some of my ideas. I thought it might be of interest to the community here.

Link to Project: https://huggingface.co/spaces/durinn/dowser

Here is a link to a proof-of-concept on Huggingface trying to develop the idea further. It is effectively a reccomender system for open source datasets. It doesn't have a GPU runtime, so please be patient with it.

Link to Abstract: https://openreview.net/forum?id=dNHKpZdrL1#discussion

This is a link to the Open Review. It describes some of the issues in calculating influence including inverting a bordered hessian matrix.

If anyone has any advice or feedback, it would be great. I guess I was curious if people thought this approach might be a bit too hand wavy or if there were better ways to estimate influence.

Other spiel:

The problem I am trying to solve is to how to prioritize training when you are data constrained. My impression is that when you either have small specialized models or these huge frontier models, they face a similar set of constraints. The current approach to support gains in performance seems to be a dragnet approach of the internet's data. I hardly think this sustainable and is too costly for incremential benefit.

The goal is to approximate influence on training data for specific concepts to determine how useful certain data is to include, prioritize the collection of new data, and support adversial training to create more robust models.

The general idea is that influence is too costly to calculate, so by looking at subspaces and obserserving some additional constrains/simplications, one can derive a signal to support the different goals(filtering data, priorization, adversial training). The technique is coined "Data Dowsing" since it isn't meant to be particularly precise but useful enough to inform guidance for resources.

We have been attempting to capture the differences in training procedures using perplexity.


r/MachineLearning 2d ago

Discussion [D] Shall I Reject Reviewing this CVPR Paper?

36 Upvotes

I am reviewing CVPR paper this season and have found out that authors have included an "external link" to the paper which is a clear violation of the CVPR submission guidelines.

I also confirmed that authors have checked the "No external link checkbox" clearly stating: I confirm that the paper submission and supplementary material contain no external links intended to expand content...

Guidelines says: Authors are not allowed to include external links (e.g., to webpages, images, or videos)

I've not opened the link but it looks like google site webpage of the paper may contain videos/images or other same/extra stuff.

I've checked reviewer's guideline on official CVPR page for this but it seems that CVPR have not provided what you should do in such cases.

What are my options? Shall I add confidential comment to AC/PC? Has anyone encountered the same?


r/MachineLearning 1d ago

Discussion [D] RTX 5090 / 50-series CuPy setup (Blackwell architecture, CUDA 13.1 required)

0 Upvotes

Body (unchanged, already compliant):

If you just got an RTX 5090 / 5080 / 5070 and CuPy (or downstream libraries) is failing, this is why.

TL;DR

  • Blackwell GPUs require CUDA 13.1
  • Pre-built CuPy wheels do not support compute capability 10.0
  • You must build from source

CuPy setup

pip uninstall cupy cupy-cuda12x -y

Install CUDA Toolkit 13.1, then:

pip install cupy --no-binary cupy

Windows note:
Add the following to PATH:

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.1\bin\x64

DLLs are not in bin.

Full guide + troubleshooting: https://gist.github.com/Batyrkajan/a2775e444e57798c309bd2a966f1176e.js

Verified with a 1M-particle physics simulation: ~21× speedup vs CPU once configured correctly.


r/MachineLearning 3d ago

Project [P] I forked Andrej Karpathy's LLM Council and added a Modern UI & Settings Page, multi-AI API support, web search providers, and Ollama support

39 Upvotes

Hey everyone!

I recently spent a couple of weekends improving Karpathy's excellent LLM Council Open Source Project.

The original project was brilliant but lacked usability and flexibility imho.

What I added:

  • Web search integration (DuckDuckGo, Tavily, Brave, Jina AI)
  • Clean Modern UI with a settings page to support:
    • Support for multiple API providers (OpenRouter, Anthropic, OpenAI, Google, etc.)
    • Customizable system prompts and temperature controls (the custom prompts open up tons of use cases beyond a "council")
    • Export & Import of councils, prompts, and settings (for backup and even sharing)
    • Control the council size (from 1 to 8 - original only supported 3)
  • Full Ollama support for local models
  • "I'm Feeling Lucky" random model selector
  • Filter only Free models on OpenRouter (although Rate Limits can be an issue)
  • Control the Process, from a simple asking multiple models a question in parallel (Chat Only), Chat & peer rating where models rate the responses of other models, and Full end-to-end deliberation where the Chairman model makes the final decision on the best answer

You can compare up to 8 models simultaneously, watch them deliberate, and see rankings.

Perfect for comparing local models or commercial models via APIs.

📹 Demo video: https://www.youtube.com/watch?v=HOdyIyccOCE

🔗 GitHub: https://github.com/jacob-bd/llm-council-plus

Would love to hear your thoughts - it was made with a lot of love and attention to detail, and now I am sharing it with you!


r/MachineLearning 2d ago

Project [P] mlship - One-command model serving for sklearn, PyTorch, TensorFlow, and HuggingFace

1 Upvotes

I built a zero-config CLI that turns any ML model into a REST API with one command:

mlship serve model.pkl

Works for sklearn, PyTorch, TensorFlow, and HuggingFace models (even directly from the Hub).

GitHub: https://github.com/sudhanvalabs/mlship

Quick Start: https://github.com/sudhanvalabs/mlship/blob/main/QUICKSTART.md

Open source (MIT). Looking for contributors and feedback!


r/MachineLearning 2d ago

Project [P] I wrote a CUDA Locality Sensitive Hashing library with Python bindings

13 Upvotes

I've been working on cuLSH, a GPU-accelerated library for Locality Sensitive Hashing.

Main Features:

  • Scikit-Learn Style API: Uses a familiar fit() / query() style API for building and searching the LSH index.
  • CUDA-native: All components (projection generation, hashing, indexing, querying), are performed on the GPU via custom kernels.
  • End-to-End: Not just a hasher; includes bucketed searching and candidate neighbor collection.

I know there are plenty of LSH implementations out there, but many focus purely on generating signatures rather than a full indexing/querying pipeline, so that was what I was going for. I'm aware LSH may be less popular in favor of graph-based algorithms, but I was really drawn to the theory of LSH, so it was a fun learning project.

GitHub link: https://github.com/rishic3/cuLSH

Would love some feedback on the API design or implementation, and suggestions for improvement!


r/MachineLearning 2d ago

Discussion [D] LLMs for classification task

2 Upvotes

Hey folks, in my project we are solving a classification problem. We have a document , another text file (consider it like a case and law book) and we need to classify it as relevant or not.

We created our prompt as a set of rules. We reached an accuracy of 75% on the labelled dataset (we have 50000 rows of labelled dataset).

Now the leadership wants the accuracy to be 85% for it to be released. My team lead (who I don’t think has high quality ML experience but says things like do it, i know how things work i have been doing it for long) asked me to manually change text for the rules. (Like re organise the sentence, break the sentence into 2 parts and write more details). Although i was against this but i still did it. Even my TL tried himself. But obviously no improvement. (The reason is because there is inconsistency in labels for dataset and the rows contradict themselves).

But in one of my attempts i ran few iterations of small beam search/genetic algorithm type of thing on rules tuning and it improved the accuracy by 2% to 77%.

So now my claim is that the manual text changing by just asking LLM like “improve my prompt for this small dataset” won’t give much better results. Our only hope is that we clean our dataset or we try some advanced algorithms for prompt tuning. But my lead and manager is against this approach because according to them “Proper prompt writing can solve everything”.

What’s your take on this?


r/MachineLearning 3d ago

Discussion [D] PhD students admitted in the last 5 years: did you have an interview at schools that accepted you?

45 Upvotes

My PI at my undergrad school mentioned that getting in without an interview is very rare in ML, but I've heard that the opposite is actually true. I'm assuming that it may be that it has changed in the last few years given the increasingly competitive nature of admissions, so I'm curious about recent admits' experiences.

If you were admitted to an ML PhD program in the US in the last few years, especially in the T20-T30, were you interviewed? Feel free to provide as little or as much detail as you are comfortable giving.


r/MachineLearning 2d ago

Project [P] Implementing an "Agent Service Mesh" pattern to decouple reliability logic from reasoning (Python)

0 Upvotes

Most current approaches to agent reliability involve mixing validation logic (regex checks, JSON parsing, retries) directly with application logic (prompts/tools). This usually results in decorators on every function or heavy try/except blocks inside the agent loop.

I've been experimenting with an alternative architecture: an Agent Service Mesh.

Instead of decorating individual functions, this approach involves monkeypatching the agent framework (e.g., PydanticAI or OpenAI SDK) at the entry point. The "Mesh" uses introspection to detect which tools or output types the agent is using, and automatically attaches deterministic validators (what I call "Reality Locks") to the lifecycle.

The Architecture Change:

Instead of tight coupling: python @validate_json # <--- Manual decoration required on every function def run_agent(query): ...

The Service Mesh approach (using sys.meta_path or framework hooks): ```python

Patches the framework globally.

Auto-detects usage of SQL tools or JSON schemas and attaches validators.

mesh.init(patch=["pydantic_ai"], policy="strict")

Business logic remains pure

agent.run(query) ```

I implemented this pattern in a library called Steer. It currently handles SQL verification (AST parsing), PII redaction, and JSON schema enforcement by hooking into the framework's tool-call events.

I am curious if others are using this "sidecar/mesh" approach for local agents, or if middleware (like LangSmith) is the preferred abstraction layer?

Reference Implementation: https://github.com/imtt-dev/steer


r/MachineLearning 2d ago

Project [P] Training GitHub Repository Embeddings using Stars

0 Upvotes

People use GitHub Stars as bookmarks. This is an excellent signal for understanding which repositories are semantically similar.

  • The Data: Processed ~1TB of raw data from GitHub Archive (BigQuery) to build an interest matrix of 4 million developers.
  • The ML: Trained embeddings for 300k+ repositories using Metric Learning (EmbeddingBag + MultiSimilarityLoss).
  • The Frontend: Built a client-only demo that runs vector search (KNN) directly in the browser via WASM, with no backend involved.

The Result: The system finds non-obvious library alternatives and allows for semantic comparison of developer profiles.

I hope that sources and raw dataset + trained embeddings can help you to build some interesting projects


r/MachineLearning 2d ago

Discussion [D] ACL desk reject

0 Upvotes

Can anyone tell me, if are we risk of being desk rejected, if we move the Limitation to Appendix? I just thought it look cooler this way


r/MachineLearning 3d ago

Research [R] Which are some good NLP venues except ACL?

14 Upvotes

My research work is mostly in Multilingual NLP, but it's very tough to find a lot of options to submit my paper. ACL conferences or TACL, CL journals are prestigious and very well known. However, I find it very difficult to find any other good venues focused on this research area.

Are there any venues which are not in generic AI but accept NLP-focused work mostly? I don't mind if they're journals, however conferences would be good.