Skip Navigation

User banner
Posts
7
Comments
56
Joined
2 yr. ago

  • Be wary that their docs are so and so. Nanonets OCR, Mistral OCR and MinerU will also extract formulas and images.

    One other model I forgot to mention is Docling. This one is quite quick to set up in a docker container, and will have a web interface ready to go where you can upload documents. This sort of follows the PaddleOCR pipeline, but also allows you to use vLMs.

    Good luck!

  • If you find that OCR doesn't get you very far, maybe try a small vLM to parse PNGs of the pages. For example, Nanonets OCR will do this, although quite slow if you don't have a GPU. It will give you a Markdown version of the page, which you can then translate with another tool.

    PaddleOCR might also be useful, since it focuses on Chinese, but it's more difficult to set up. To add to this, some other options are MinerU and MistralOCR (this is paid, but you can test it for free if you upload it in Mistral's library).

  • All the ones I mentioned can be installed with pip or uv if I am not mistaken. It would probably be more finicky than containers that you can put behind a reverse proxy, but it is possible if you wish to go that route. Ollama will also run system-wide, so any project will be able to use its API without you having to create a separate environment and download the same model twice in order to use it.

  • Ollama for API, which you can integrate into Open WebUI. You can also integrate image generation with ComfyUI I believe.

    It's less of a hassle to use Docker for Open WebUI, but ollama works as a regular CLI tool.

  • You really think this is all him? Guy's got a full team running the media. The Heritage Foundation's got its grimy hands all over the country.

  • I heard that Poland is also cheering for some MAGA guy in the next election... Troubling times ahead.

    For Romania, there might still be a chance in the run-off. However, the difference between the two candidates was quite large (20% difference; 1.8 million votes). Similarly, the other candidates seemed to have voters that would rather vote for the nazi. Most likely all hope is lost, but that 1% chance is still there.

  • You're right! Sorry for the typo. The older nomic-embed-text model is often used in examples, but granite-embedding is a more recent one and smaller for English-only text (30M parameters). If your use case is multi-language, they also offer a bigger one (278M parameters) that can handle English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese (Simplified). I would test them out a bit to see what works best for you.

    Furthermore, if you're not dependent on MariaDB for something else in your system, there are also some other vector databases I would recommend. Qdrant also works quite well, and you can integrate it pretty easily in something like LangChain. It really depends on how much you want to push your RAG workflow, but let me know if you have any other questions.

  • Have a look at Ollama embeddings. Easy to set up and the models are much smaller than a typical LLM.

  • It was just announced that the EU is pausing sustainability requirements on smaller business (< 500 employees) for 2 years. This stems from fears related to the trade war, as they want to keep smaller businesses competitive. Nevertheless, I'm pretty sure that this won't be great for the environment.

  • For notes, I have moved to Joplin with the option to synchronize my data using a WebDAV server. It works really well, and it has both a mobile and desktop app. If you're interested in developing your project, maybe you can have a look at the options this provides. For example, I really like the ability to separate notes between groups, assign tags, create drawings, and the possibility to use Markdown.

    Good luck with your projects! To mirror @enemenemu's suggestion, I would also look into collaborating with the people trying to push the EU Docs alternative. Not sure if that will work, but it's worth a shot if you're interested :D

  • Thanks for the SherpaTTS suggestion. I really like the GLaDOS voice <3

    I am not sure which phone you use, but are you able to set FUTO Voice as the default "Voice input" in the Android settings? I played around with a few apps, which show up. However, FUTO is not an option here :(

  • Thanks for the suggestion! I gave this a try, but it seems that it won't register any voice 🤔 However, it seems like it shows up in my settings, so it's a good sign. I'll try to get it to work :D

  • Thanks! I was actually looking at this, but I gave up because I couldn't really figure out how to get a multilingual model running through Obtainium. I'll try again :D

  • Free and Open Source Software @beehaw.org

    Open Source Text-to-Speech and Speech-to-Text on Android?

    Permanently Deleted

    Jump
  • Mine's just one I got from a random kid name generator.

    A bit off-topic: not sure why, but I keep seeing posts here on Lemmy lately about Romanian women pulling the short end of the stick in terms of gender equality. I hope I'm not offending in any way with this question, but is Romania sticking to the traditional gender roles?

  • Ok, but there are laws involved here. In Romania, you can't be president if you are under 35 years old, or, among others, if you have a criminal record. The people that were stopped from running for president weren't barred because they went against the mainstream parties, but because they openly promoted personalities that were doing the equivalent of the Holocaust in Romania. This is punishable by law by up to 3 years in jail, and they're being actively investigated.

    The lady in this post was previously denied her run in the summer of last year, and she kept quiet about it until now because they probably told her they won't pursue it further if she steps back. She took the deal, probably because she realises that she'd rather keep grifting on Facebook than spend 3 years in jail.

  • How can you have democracy if you let people vote for a person that says he will remove all political parties? There must be checks and balances that stop you at some point. Also, Romanian law prohibits candidates with ties to fascist or extremist ideologies from participating in elections. That's in the law, introduced by people that were democratically elected.

    But lets be honest, it’s the not being hostile to Russia that did it. Can’t have that in a US colony where they plan to have the biggest base for their imperialist wars.

    Sure, the US that is now serving up its allies on a silver platter to Putin? His friend Trump is going to revert sanctions any day now for that sweet oil. For power in the Middle East, maybe, but the EU is hopefully going to wake up soon and kick all American bases ASAP.

    And who helped the openly fascists ukranian to power in 2014?

    Firstly, the Euromaidan protests didn't get hundreds of thousands of people attending just because they got brainwashed by the EU/US. Allegedly, Russia attempted to do the same thing in Romania with Georgescu, and only a few hundred people showed up to protest the decision to take him off the ballot. People in Ukraine felt betrayed when Yanukovych wanted to reject EU and get closer with Russia, a country that has had 146% voter turnout during one of its recent elections. Arguably, maybe the EU is not the best, but its system is way more decentralized than Russia's, allowing better representation of its population and reducing the chance of corruption. At least we don't hear people that are criticizing the government "randomly" falling out of windows here...

    Secondly, Poroshenko was openly fascist? Or whom exactly do you mean? If I'm not mistaken, Poroshenko assigned a Jewish person as his prime minister. Or you might be hinting at the Azov Brigade being integrated by him into the national army? What would you do when Russia starts invading your country, though? Either way, you might be right that it is in the benefit of the EU (and perhaps US) to have closer ties with Ukraine, but it goes both ways. Ukraine did not like what happened in Georgia, and wanted more security and pro-democracy allies. That does not mean that the EU made Ukraine into a Nazi puppet state to fight Russia.

  • No, democracy is when you are intolerant with the intolerant. Maybe read up on what she says/does before commenting.

  • Fooyin is also a solid choice.

  • It's a bit short-sighted to say that Trump is the one calling in shots here, specifically to weaken the US. It is pretty clear that he is following the plan put forward by the Heritage Foundation word by word. If I understood correctly, the idea is to make the American economy more resilient at the expense of all of its (poor) citizens. Once that is done, they can then leverage their safe zone to further influence policies in other countries. For example, get the EU to lower regulations, so American companies can extract more wealth.

    Here is a quote from the actual "Project 2025 Mandate for Leadership" PDF:

    Needed reforms

    [...]

    Increase allied conventional defense burden-sharing. U.S. allies must take far greater responsibility for their conventional defense. U.S. allies must play their part not only in dealing with China, but also in dealing with threats from Russia, Iran, and North Korea.

    1. Make burden-sharing a central part of U.S. defense strategy with the United States not just helping allies to step up, but strongly encouraging them to do so.
    2. Support greater spending and collaboration by Taiwan and allies in the Asia–Pacific like Japan and Australia to create a collective defense model.
    3. Transform NATO so that U.S. allies are capable of fielding the great majority of the conventional forces required to deter Russia while relying on the United States primarily for our nuclear deterrent, and select other capabilities while reducing the U.S. force posture in Europe.
    4. Sustain support for Israel even as America empowers Gulf partners to take responsibility for their own coastal, air, and missile defenses both individually and working collectively.
    5. Enable South Korea to take the lead in its conventional defense against North Korea.

    [...]

    They are engineering most of these situations that we've seen in the media specifically to make the ideas more digestible to the average population. See the Zelenskyy case: "This is going to be great television" - the guy is not even hiding it.

    On one hand, Taiwan is right to say that the US won't abandon them. The US does not produce enough chips locally to just let them get gobbled up by China. However, this sort of "theatrics" is not over, and they will come up with a reason to scare Taiwan into investing a lot more in defence, specifically to prepare them for a fight to destabilize China.

    It's truly sad that this administration is now in power to push these ideas. The average American is going to become much poorer and hateful due to all protections previously put in place being dismantled. Hopefully people wake up and kick them out of office, but the damage done to foreign relationships is already done.

  • Precisely. The only enemy that the US conservative party sees is China. Everyone else is a business partner that they must strong arm into favourable deals for the US.

  • Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ @lemmy.dbzer0.com

    Archiving papers using Zotero headless?

    Technology @lemmy.world

    Redox OS 0.9.0 - Redox - Your Next(Gen) OS

    Linux @lemmy.ml

    Poll: GUI framework for widgets/apps in Wayland

    Arch Linux @lemmy.ml

    Installing AUR packages after using archinstall

    Linux @lemmy.ml

    Jump from Arch to NixOS?

    Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ @lemmy.dbzer0.com

    Sites or Trackers for Exam Dumps