Why are there so many English-first AI models from China? Are they not interested in serving their own population? Or is it that if they publish Chinese-first models it won't get publicity in the West?
whynotmaybe 2 hours ago [-]
Haven't we reached a situation where English is the de facto language of scientific research, especially AI benchmarks ?
It's clearly impossible for me to try anything in Chinese, I'd need a translation.
chvid 2 hours ago [-]
All LLMs are trained on the same basic blob of data - mostly in English, mostly pirated books and stuff.
enlyth 2 hours ago [-]
I assume a large portion of high quality training material is in English
sigmoid10 2 hours ago [-]
You'd be correct. The largest portion of all languages in Common Crawl (aka the "whole open internet" training corpus) is English with 43%. No other language even reaches double digit percentages. The next biggest one is Russian at 6%, followed by German at 5%.
choutianxius 1 hours ago [-]
One reason is that there is no "good" search engine in China. The most popular one, Baidu, is like garbage compared to Google search. The most useful training data in Chinese would likely be from the social media and video sharing platforms, which I guess is much more difficult to crawl and clean up.
spacebanana7 21 minutes ago [-]
I wonder whether English text having fewer characters provides an advantage somehow.
jmole 12 minutes ago [-]
not really, since tokenization combines multiple characters
mensetmanusman 2 hours ago [-]
English won. The Chinese youth struggle to write their own calligraphy characters they can read now. Typing favors English.
rahimnathwani 1 hours ago [-]
It's easy and fast to type Chinese sentences using a keyboard.
throwaway519 1 hours ago [-]
The pendulum already turned back. The current generation under 20 grew up with touchscreens. That obseletes input with pinyin; many don't care if the device has no keyboard.
thenthenthen 10 minutes ago [-]
Input is so interesting in China, basically a sorta t9 but just single letters and picking the right characters, with common/frequently used characters first, using pinyin. For example to say “ How are you?” You just type “nhm” (Ni Hao Ma) and 你好吗 shows up as suggestion/autofill. You can make surprisingly long sentences using this method.
bilbo0s 2 hours ago [-]
The mandarin language models obviously exist, but what would you do with them if they provided access to them? And what knowledge would be in them? What is the body of knowledge encoded in Mandarin? What does that look like?
Sad reality is that not many outside of China have the facility with Mandarin to use those models. Even non-native Mandarin speakers who claim to be "fluent", are often messing up intended meaning in text. Or making literal translations that wind up making no sense.
Inside of China, llm use will be Mandarin based. Outside, it seems to me English is the natural choice.
Irony of Irony, probably the best way for a non Mandarin speaking layman to test a Mandarin based model would be to use another LLM to translate prompts to Mandarin.
It's a sad future we're looking at.
Or a brilliant one.
Time will tell.
johnla 2 hours ago [-]
For it to be brilliant, AI needs to be a benevolent tool all the time. It would take just a few malignant actors to turn our world upside. I suspect it'll follow the same Internet and social media path. Great at first, grow markets, bring us together and then take a turn.
34679 2 hours ago [-]
Nearly everyone in the urban areas of China spoke some English when I visited way back in 1995. It's a bilingual society.
crazygringo 1 hours ago [-]
This is not true. I was in Beijing around then and never met a single person who spoke English if they hadn't learned it for professional reasons (they worked in tourism, international business, etc.).
It could not have been further from a bilingual society.
rahimnathwani 1 hours ago [-]
I lived in Beijing and Shanghai for 9 years (2010-2019) and this is NOT my impression at all.
rahimnathwani 1 hours ago [-]
When you guys use gguf files in ollama, do you normally create a modelfile to go with it, or just hope that whatever default ollama has work with the new model?
I’ll typically use the defaults initially and then use a Modelfile if it’s something I plan on using. I think you can dump the modelfile ollama uses to have a template to work with.
monkmartinez 36 minutes ago [-]
If you ollama pull <model> the modelfile will be downloaded along with the blob. To modify the model permanently, you can copypasta the modelfile into a text editor and then create a new model from the old modelfile with the changes you require/made.
Here is my workflow when using Open WebUI:
1. ollama show qwen3:30b-a3b-q8_0 --modelfile
2. Paste the contents of the modelfile into -> admin -> models -> OpenwebUI and rename qwen3:30b-a3b-q8_0-monkversion-1
3. Change parameters like num_gpu 90 to change layers... etc.
4. Keep | Delete old file
Pay attention to the modelfile, it will show you something like this: # To build a new Modelfile based on this, replace FROM with:
# FROM qwen3:30b-a3b-q8_0 and you need to make sure the paths are correct. I store my models on a large nvme drive that isn't default ollama as an example of why that matters.
EDIT TO ADD:
The 'modelfile' workflow is a pain in the booty. It's a dogwater pattern and I hate it. Some of these models are 30 to 60GB and copying the entire thing to change one parameter is just dumb.
However, ollama does a lot of things right and it makes it easy to get up and running. VLLM, SGLang, Mistral.rs and even llama.cpp require a lot more work to setup.
o11c 6 minutes ago [-]
Pretty sure the whole reason Ollama uses raw hashes everywhere is to avoid copying the whole NN gigabytes every time.
monkmartinez 1 minutes ago [-]
Maybe I am doing something wrong! When I change parameters on the modelfile, the whole thing is copied. You can't just edit the file as far as I know, you have to create another 38GB monster to change num_ctx to a reasonable number.
rahimnathwani 19 minutes ago [-]
Sorry, I should have been clearer.
I meant when you download a gguf file from huggingface, instead of using a model from ollama's library.
monkmartinez 5 minutes ago [-]
ollama pull hf.co/unsloth/Qwen3-30B-A3B-GGUF:Q4_K_M and the modelfile comes with it. It may have errors in the template or parameters this way. It has to be converted to GGUF/GGML prior to using it this way. You can, of course, convert and create the specific ollama model from bf16 safetensors as well.
gizmodo59 1 hours ago [-]
Its funny to see benchmarks where they omit the top performing models like O3 (Which is the best model in many benchmarks currently) and Gemini Pro/Claude 3.7.
daveguy 56 minutes ago [-]
Those are much much larger models, and they are proprietary. Those model providers just don't have the distilled versions identified and available.
Notice most of the models they are comparing with are 7B models. The exception is also an open weights model (Qwen-2.5-32B-RL-Zero). Even with 32B parameters the MiMo-7B outperforms it.
vessenes 28 minutes ago [-]
Umm wow. Great benchmarks. I’m looking forward to chatting with this one.
A couple things stand out to me — first is that the 7B model is trained on 25T tokens(!). This is Meta-scale training; Llama 4 Maverick was trained on 22T or so. (Scout, the smaller model: 40T).
Second, this is an interesting path to take - not a distilled model or an RL layer to get reasoning out of another model, but a from-scratch RL model with reasoning baked in; the claims seem to indicate you get a lot of extra efficiency per-parameter doing this.
I don’t have experience with Xiaomi models, so I’m cautious about this one until I play with it, but it looks like a super viable local reasoning model from the stats.
Jotalea 4 hours ago [-]
I wonder if they will use this model for their AI assistant on their Xiaomi 15 series phones. They most likely will. I'm not really sure what to expect from it.
These benchmark numbers cannot be real for a 7b model
strangescript 4 hours ago [-]
The smaller models have been creeping upward. They don't make headlines because they aren't leapfrogging the mainline models from the big companies, but they are all very capable.
I loaded up a random 12B model on ollama the other day and couldn't believe how good it competent it seemed and how fast it was given the machine I was on. A year or so ago, that would have not been the case.
apples_oranges 4 hours ago [-]
exactly, it seems to validate my assumption from some time ago, that we will mostly use local models for everyday tasks.
pzo 4 hours ago [-]
yeah especially that this simplifies e.g. doing mobile app for 3rd party developers - not extra cost, no need to setup proxy server, monitoring usage to detect abuse, don't need to make complicated subscription plan per usage.
We just need Google or Apple to provide their own equivalent of both: Ollama and OpenRouter so user either use inference for free with local models or BringYourOwnKey and pay themself for tokens/electricity bill. We then just charge smaller fee for renting or buying our cars.
mring33621 1 hours ago [-]
strong agree
my employer talks about spending 10s of millions on AI
but, even at this early stage, my experiments indicate that the smaller, locally-run models are just fine for a lot of tech and business tasks
this approach has definite privacy advantages and likely has cost advantages, vs pay-per-use LLM over API.
jillesvangurp 4 hours ago [-]
Including figuring out which more expensive models to use when needed instead of doing that by default. Early LLMs were not great at reasoning and not great at using tools. And also not great at reproducing knowledge. Small models are too small to reliably reproduce knowledge but when trained properly they are decent enough for simple reasoning tasks. Like deciding whether to use a smarter/slower/more expensive model.
wg0 4 hours ago [-]
But who will keep them updated and what incentive they would have? That's I can't imagine. Bit vague.
ebiester 2 hours ago [-]
Eventually? Microsoft and Copilot, and Apple and Siri - even if they have to outsource their model making. It will be a challenge to desktop Linux.
WorldPeas 32 minutes ago [-]
I figure this will take the same shape as package distribution. If you have ever used a linux distribution you’ll always see a couple .edu domains serving you packages. Big tech might be able to have specialized models, but following the linux paradigm, it will likely have more cutting edge but temperamental models from university research
cruzcampo 4 hours ago [-]
Who keeps open source projects maintained and what incentive do they have?
jsheard 4 hours ago [-]
Most open source projects don't need the kinds of resources that ML development does. Access to huge GPU clusters is the obvious one, but it's easy to forget that the big players are also using huge amounts of soulcrushing human labor for data acquisition, cleaning, labeling and fine tuning, and begrudgingly paying for data they can't scrape. People coding in their free time won't get very far without that supporting infrastructure.
I think ML is more akin to open source hardware, in the sense that even when there are people with the relevent skills willing to donate their time for free, the cost of actually realizing their ideas is still so high that it's rarely feasible to keep up with commercial projects.
cruzcampo 4 hours ago [-]
That's a fair point. I think GPU clusters are the big one, the rest sounds like a good fit for volunteer work.
simiones 3 hours ago [-]
For the bigger open source projects, companies who use that code for making money. Such as Microsoft and Google and IBM (and many others) supporting Linux because they use it extensively. The same answer may end up applying to these models though - if they really become something that gets integrated into products and internal workflows, there will be a market for companies to collaborate on maintaining a good implementation rather than competing needlessly.
nickip 4 hours ago [-]
What model? I have been using api's mostly since ollama was too slow for me.
patates 3 hours ago [-]
I really like Gemma 3. Some quantized version of the 27B will be good enough for a lot of things. You can also take some abliterated version[0] with zero (like zero zero) guardrails and make it write you a very interesting crime story without having to deal with the infamous "sorry but I'm a friendly and safe model and cannot do that and also think about the children" response.
Qwen3 and some of the smaller gemma's are pretty good and fast. I have a gist with my benchmark #'s here on my m4 pro max (with a whole ton of ram, but most small models will fit on a well spec'ed dev mac.)
Last time I did that I was also impressed, for a start.
Problem was that of a top ten book recommendations only the first 3 existed and the rest was a casually blended hallucination delivered in perfect English without skipping a beat.
"You like magic? Try reading the Harlew Porthouse series by JRR Marrow, following the orphan magicians adventures in Hogwesteros"
And the further towards the context limit it goes the deeper this descent into creative derivative madness it goes.
It's entertaining but limited in usefulness.
omnimus 4 hours ago [-]
LLMs are not search engines…
Philpax 3 hours ago [-]
An interesting development to look forward to will be hooking them up to search engines. The proprietary models already do this, and the open equivalents are not far behind; the recent Qwen models are not as great at knowledge, but are some of the best at agentic functionality. Exciting times ahead!
mirekrusin 3 hours ago [-]
Exactly, I think all those base models should be weeded out from this nonsense, kardashian-like labyrinths of knowledge complexities that just makes them dumber by taking space and compute time. If you can google out some nonsense news, it should stay there in search engines for retrieval. Models should be good at using search tools, not at trying to replicate their results. They should start from logic, math, programming, physics and so on, similar to how education system is suppose to equip you with. IMHO small models can give this speed advantage (faster to experiment ie. with parallel diverging results, ability to munch through more data etc). Stripped to this bare minimum they can likely be much smaller with impressive results, tunable, allow for huge context etc.
justlikereddit 3 hours ago [-]
They are generalists, being search engines is a subset of that.
bearjaws 3 hours ago [-]
My guess is that it is over fitted to the tests.
revel 2 hours ago [-]
They used RFT and there's only so many benchmarks out there, so I would be very surprised if they didn't train on the tests.
No, where can I try it? I saw a huggingface link but I wonder if they host it themselves somewhere to like how Alibaba does with Qwen chat.
yorwba 5 hours ago [-]
There is a HuggingFace space (probably not official) at: https://huggingface.co/spaces/orangewong/xiaomi-mimo-7b-rl You might have to wait a minute to get a response. Also, the space doesn't seem to have turn-taking implemented, so after giving the Assistant's response, it kept on generating the Human's next message and so on and so forth.
onefeduk21 2 hours ago [-]
[dead]
keepamovin 5 hours ago [-]
[flagged]
fredwu 3 hours ago [-]
Not sure why it would be "funny" as this is literally why they named the company Xiaomi.
For me it's funny that all the products are called "Rice-something" that's funny, hahaha! :)
bojan 2 hours ago [-]
Not that different from "apple" something.
thijson 1 hours ago [-]
I was reading the Steve Jobs biography and thought it was interesting that the choice in the name "apple" came from him wanting something that came before Atari in the yellow pages, and also that he had spent time at a Hippie apple orchard in Oregon.
I was reading a Jack Tramiel biography recently, and read that early on the two Steve's sought to sell Apple to Commodore for under a million dollars.
ReptileMan 1 hours ago [-]
Not quite. Rice-something has been used for goods coming from East Asia - depending on the quality of the goods in both derogatory and non derogatory manner. Like rice rockets - the japanese ultra high performance sport bikes for example
est 1 hours ago [-]
Nah, Xiaomi literally means Millet, which also prefixes Mi.
amazingamazing 3 hours ago [-]
just as funny as an Apple, for sure.
keepamovin 3 hours ago [-]
That's a good point hahahaha! :)
cruzcampo 4 hours ago [-]
Does Xiaomi literally mean Little Rice? That's what my very limited mandarin would suggest
keepamovin 4 hours ago [-]
That is what my literally also rather limited Chinese would suggest. haha
But with many single characters in Chinese, a Chinese person will tell you, if you ask for what a single character means, something like, "Well it's not so easy to pin down the meaning of that one. Sometimes we use it like this, and sometimes like that."
Sure, some characters have an easy meaning (for me, I think the rice in Mi is one of them!) but there's plenty where you cannot get a Chinese person to easily tell you what a single character means. I guess it's a little like, but not the same as, asking an English person to tell you, what any given "morpheme" (word part, like fac-) means. Hahaha. Not a perfect analogy tho! :)
Wow, that's interesting. I guess that's like a US company being called "MRE". We would view that like a veteran's owned and operated company. Interesting.
And all the products would be "MRE-Phone", "MRE-Pod", hehehe :)
est 1 hours ago [-]
That's just a fun coicidence but in reality LeiJun and 12 others from Kingsoft Corp founded Xiaomi after they had a bowl of millet gruel.
This is one of the things that everyone gets the reference, but it won't be good to admit it publicly. This quote is known to almost everyone born in that area, and it's the first thing that come to mind when you hear the name.
fenprace 1 hours ago [-]
Mandarin speaker here. Literally, Yes. Xiaomi means 'little rice'. But in reality when people say xiaomi, they always refer to another kind of crop, foxtail millet (https://en.wikipedia.org/wiki/Foxtail_millet). It is a traditional food and still very common in China and other place in Asia.
“In the later discussions, I suddenly thought of one of my favorite sayings — ‘A Buddha sees a single grain of rice as vast as Mount Sumeru.’”
This expression emphasizes the idea that even something seemingly small (like a grain of rice) can hold immense significance or value when viewed from a different perspective.
It's clearly impossible for me to try anything in Chinese, I'd need a translation.
Sad reality is that not many outside of China have the facility with Mandarin to use those models. Even non-native Mandarin speakers who claim to be "fluent", are often messing up intended meaning in text. Or making literal translations that wind up making no sense.
Inside of China, llm use will be Mandarin based. Outside, it seems to me English is the natural choice.
Irony of Irony, probably the best way for a non Mandarin speaking layman to test a Mandarin based model would be to use another LLM to translate prompts to Mandarin.
It's a sad future we're looking at.
Or a brilliant one.
Time will tell.
It could not have been further from a bilingual society.
https://github.com/ollama/ollama/blob/main/docs%2Fmodelfile....
Here is my workflow when using Open WebUI:
1. ollama show qwen3:30b-a3b-q8_0 --modelfile
2. Paste the contents of the modelfile into -> admin -> models -> OpenwebUI and rename qwen3:30b-a3b-q8_0-monkversion-1
3. Change parameters like num_gpu 90 to change layers... etc.
4. Keep | Delete old file
Pay attention to the modelfile, it will show you something like this: # To build a new Modelfile based on this, replace FROM with: # FROM qwen3:30b-a3b-q8_0 and you need to make sure the paths are correct. I store my models on a large nvme drive that isn't default ollama as an example of why that matters.
EDIT TO ADD: The 'modelfile' workflow is a pain in the booty. It's a dogwater pattern and I hate it. Some of these models are 30 to 60GB and copying the entire thing to change one parameter is just dumb.
However, ollama does a lot of things right and it makes it easy to get up and running. VLLM, SGLang, Mistral.rs and even llama.cpp require a lot more work to setup.
I meant when you download a gguf file from huggingface, instead of using a model from ollama's library.
Notice most of the models they are comparing with are 7B models. The exception is also an open weights model (Qwen-2.5-32B-RL-Zero). Even with 32B parameters the MiMo-7B outperforms it.
A couple things stand out to me — first is that the 7B model is trained on 25T tokens(!). This is Meta-scale training; Llama 4 Maverick was trained on 22T or so. (Scout, the smaller model: 40T).
Second, this is an interesting path to take - not a distilled model or an RL layer to get reasoning out of another model, but a from-scratch RL model with reasoning baked in; the claims seem to indicate you get a lot of extra efficiency per-parameter doing this.
I don’t have experience with Xiaomi models, so I’m cautious about this one until I play with it, but it looks like a super viable local reasoning model from the stats.
Also related reference https://en.wikipedia.org/wiki/Xiaomi#Name_etymology
I loaded up a random 12B model on ollama the other day and couldn't believe how good it competent it seemed and how fast it was given the machine I was on. A year or so ago, that would have not been the case.
We just need Google or Apple to provide their own equivalent of both: Ollama and OpenRouter so user either use inference for free with local models or BringYourOwnKey and pay themself for tokens/electricity bill. We then just charge smaller fee for renting or buying our cars.
my employer talks about spending 10s of millions on AI
but, even at this early stage, my experiments indicate that the smaller, locally-run models are just fine for a lot of tech and business tasks
this approach has definite privacy advantages and likely has cost advantages, vs pay-per-use LLM over API.
I think ML is more akin to open source hardware, in the sense that even when there are people with the relevent skills willing to donate their time for free, the cost of actually realizing their ideas is still so high that it's rarely feasible to keep up with commercial projects.
[0]: https://huggingface.co/mlabonne/gemma-3-12b-it-abliterated
https://gist.github.com/estsauver/a70c929398479f3166f3d69bce...
Problem was that of a top ten book recommendations only the first 3 existed and the rest was a casually blended hallucination delivered in perfect English without skipping a beat.
"You like magic? Try reading the Harlew Porthouse series by JRR Marrow, following the orphan magicians adventures in Hogwesteros"
And the further towards the context limit it goes the deeper this descent into creative derivative madness it goes.
It's entertaining but limited in usefulness.
Go look at the benchmark numbers of qwen3-4B if you think these are unrealistic.
Probably within few hours will be released.
But yeah waiting is the easier option
They could've called it Xiaomimo.
I think you meant Anthropic. OpenAI is "planning" to release an open weight model this year likely competing against the Llama models. [0]
I have not seen an open weight AI model ever being released by Anthropic at all.
[0] https://openai.com/open-model-feedback/
Source (Chinese): https://finance.sina.cn/tech/2020-11-26/detail-iiznctke33979...
I was reading a Jack Tramiel biography recently, and read that early on the two Steve's sought to sell Apple to Commodore for under a million dollars.
But with many single characters in Chinese, a Chinese person will tell you, if you ask for what a single character means, something like, "Well it's not so easy to pin down the meaning of that one. Sometimes we use it like this, and sometimes like that."
Sure, some characters have an easy meaning (for me, I think the rice in Mi is one of them!) but there's plenty where you cannot get a Chinese person to easily tell you what a single character means. I guess it's a little like, but not the same as, asking an English person to tell you, what any given "morpheme" (word part, like fac-) means. Hahaha. Not a perfect analogy tho! :)
Here's this list of morphemes I found just now thinking about this: https://www.fldoe.org/core/fileparse.php/16294/urlt/morpheme...
Seems incomplete list when you consider etymology of English words are often composed of parts from ages past! :)
And all the products would be "MRE-Phone", "MRE-Pod", hehehe :)
https://www.scmp.com/abacus/tech/article/3028654/documentary...
little rice
Yes.
But it's more complicated than that.
Here is the meaning of the name
Described here: https://finance.sina.cn/tech/2020-11-26/detail-iiznctke33979...
在后来的讨论中,我突然想到了我最喜欢的一句话——“佛观一粒米,大如须弥山”。
Translated into English, it means:
“In the later discussions, I suddenly thought of one of my favorite sayings — ‘A Buddha sees a single grain of rice as vast as Mount Sumeru.’”
This expression emphasizes the idea that even something seemingly small (like a grain of rice) can hold immense significance or value when viewed from a different perspective.
Thanks to chatgpt for translating this