DeepSeek becomes the top-rated free application

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • LazyPooky
    ADMINISTRATOR
    Level 35 - Rockin' Poster
    • Oct 2007
    • 7178

    #1

    DeepSeek becomes the top-rated free application

    There is a chatGPT competitor from China called DeepSeek which is completely free and works as good as the alternative paid AI chatbot services. DeepSeek has huge impact because it took only $6 million to train from scratch while western companies spent billions. It looks like that the paid business models are now simply collapsing because of that. Nvidia lost about $400 billion, I heard. Why need expensive chips when you can do the same with cheaper alternatives. DeepSeek researchers wrote in a paper last month that the DeepSeek-V3 used Nvidia's H800 chips (less advanced alternative) for training.

    DeepSeek is the first to be praised by the US tech industry as matching or even surpassing the performance of cutting-edge US models.

    Unlike OpenAI, which charges $20 to $200 per month for its services, DeepSeek offers its platform for free to individual users and charges only $0.14 per million tokens for developers. This stark contrast has made DeepSeek popular with small businesses and developers.

    DeepSeek-R1 claims to rival OpenAI's o1 model in reasoning and mathematical problem-solving. The platform's ability to generate Python code more effectively than ChatGPT has been a highlight in discussions among tech enthusiasts on communities like Reddit.
    https://www.newsweek.com/what-deepse...-rival-2021208


    DeepSeek, which overtook the rival AI platform ChatGPT to become the top-rated free application on Apple's App Store in the United States, says it uses lower-cost chips and less data, challenging a widespread bet in markets that AI will drive demand along a supply chain from chipmakers to data centres.

    DeepSeek's quick dominance raised questions over Microsoft., Meta Platforms, and Alphabet's hundreds of billions in planned spending on AI. The recently launched AI model is now seen as a strong rival to those from OpenAI and Meta Platforms Inc., as it is more cost-effective and operates on less powerful chips.

    The DeepSeek-V3 model was developed with an investment of $5.6 million, a fraction of the amounts the US tech giants spent. The open-source technology product has quickly risen to the top of Apple Inc.'s App Store rankings.

    According to experts, the DeepSeek crash erased nearly $1.5 trillion in stock market value.
    https://www.livemint.com/market/stoc...979056525.html
    Magnús: - I have fans of all ages and I don't think it's weird when older people like LazyTown. LazyTown appeals to people for many different reasons: dancing, acrobatics, etc.
  • possessor
    I like LazyTown.
    SPECIAL MEMBER
    Level 30 - Stepher
    • Oct 2021
    • 2925

    #2
    Please tell me this is better then Gemini.

    Note

    • chuft
      Stepher
      SPECIAL MEMBER
      MODERATOR
      Level 31 - Number 9
      • Dec 2007
      • 3287

      #3
      It's a very interesting story.

      I would not believe anything coming out of China however relating to costs etc. China is a totalitarian country and it lies constantly. Every analyst of China will tell you this, whether it's about business or anything else.

      One conclusion you can draw is the Chinese government has something even better for military purposes, or they would not have allowed DeepSeek into the open.

      A lot of media outlets are reporting DeepSeek is "open source" but it apparently isn't. I am not into AI, I dislike the concept intensely, but from what I read, there is a big difference between a "free model and open weights" vs "open source." Only the latter (including the code and data used to train it) would let you recreate the model itself.

      This model has political "programming" relating to the Chinese government.

      I will be interested to see where this goes. Nvidia is still absurdly overvalued by any sane metric. Most people panic selling Nvidia stock today do not understand much if anything about AI. They are just trend-following investors.
      l i t t l e s t e p h e r s

      Note

      • BRBFBI
        GETLAZY MEMBER
        Level 8 - Treehouse Builder
        • Oct 2023
        • 64

        #4
        Originally posted by chuft

        I would not believe anything coming out of China however relating to costs etc. China is a totalitarian country and it lies constantly. Every analyst of China will tell you this, whether it's about business or anything else.
        If the cost were wildly misrepresented we would know. DeepSeek published a research paper on their methods and why they needed far fewer chips. The shocking part of all of this isn't just that a small Chinese company created such a powerful AI, but the realization that any relatively small business with a few millions dollars can do the same.

        From the NYT: “It has become very clear that other companies, not just someone like OpenAI, can build these kinds of systems,” said Tim Dettmers, a researcher at the Allen Institute for Artificial Intelligence in Seattle and a professor of computer science at Carnegie Mellon University who specializes in building efficient A.I. systems. “DeepSeek used methods that anyone can duplicate.”

        Originally posted by chuft

        A lot of media outlets are reporting DeepSeek is "open source" but it apparently isn't. I am not into AI, I dislike the concept intensely, but from what I read, there is a big difference between a "free model and open weights" vs "open source." Only the latter (including the code and data used to train it) would let you recreate the model itself.
        My understanding is that the code is open source and therefore anyone with the resources could train an LLM with it. Are you saying it "doesn't count" if they don't share the the data they used to train it with?

        Originally posted by chuft

        This model has political "programming" relating to the Chinese government.
        For sure. I've seen people say they asked it about Tiananmen Square, etc... and it doesn't give a straight answer. I'm sure any of us could test it out if we made an account. All AI have "fine tuning" that adjusts how they respond to certain topics, which is concerning when you think of the influence these models have. I know people who use Chat GPT for advice nearly every day. I know people who talk to it like it's their therapist. If a small number of companies have control over millions of peoples' conversation partner I think that's bad for diversity of thought.

        Note

        • chuft
          Stepher
          SPECIAL MEMBER
          MODERATOR
          Level 31 - Number 9
          • Dec 2007
          • 3287

          #5
          I have to go to work and can't really respond to all of this in any depth. But there is interesting info here.

          https://stratechery.com/2025/deepseek-faq/

          For example

          "DeepSeek is clear that these costs are only for the final training run, and exclude all other expenses; from the V3 paper:
          Lastly, we emphasize again the economical training costs of DeepSeek-V3, summarized in Table 1, achieved through our optimized co-design of algorithms, frameworks, and hardware. During the pre-training stage, training DeepSeek-V3 on each trillion tokens requires only 180K H800 GPU hours, i.e., 3.7 days on our cluster with 2048 H800 GPUs. Consequently, our pre- training stage is completed in less than two months and costs 2664K GPU hours. Combined with 119K GPU hours for the context length extension and 5K GPU hours for post-training, DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data.


          So no, you can’t replicate DeepSeek the company for $5.576 million."



          and


          "So V3 is a leading edge model?

          It’s definitely competitive with OpenAI’s 4o and Anthropic’s Sonnet-3.5, and appears to be better than Llama’s biggest model. What does seem likely is that DeepSeek was able to distill those models to give V3 high quality tokens to train on.

          What is distillation?

          Distillation is a means of extracting understanding from another model; you can send inputs to the teacher model and record the outputs, and use that to train the student model. This is how you get models like GPT-4 Turbo from GPT-4. Distillation is easier for a company to do on its own models, because they have full access, but you can still do distillation in a somewhat more unwieldy way via API, or even, if you get creative, via chat clients.

          Distillation obviously violates the terms of service of various models, but the only way to stop it is to actually cut off access, via IP banning, rate limiting, etc. It’s assumed to be widespread in terms of model training, and is why there are an ever-increasing number of models converging on GPT-4o quality. This doesn’t mean that we know for a fact that DeepSeek distilled 4o or Claude, but frankly, it would be odd if they didn’t.

          Distillation seems terrible for leading edge models.

          It is! On the positive side, OpenAI and Anthropic and Google are almost certainly using distillation to optimize the models they use for inference for their consumer-facing apps; on the negative side, they are effectively bearing the entire cost of training the leading edge, while everyone else is free-riding on their investment.

          Indeed, this is probably the core economic factor undergirding the slow divorce of Microsoft and OpenAI. Microsoft is interested in providing inference to its customers, but much less enthused about funding $100 billion data centers to train leading edge models that are likely to be commoditized long before that $100 billion is depreciated."


          Lots more for those interested. Overall it actually seems like a gift to the tech industry, not a problem.
          l i t t l e s t e p h e r s

          Note

          • chuft
            Stepher
            SPECIAL MEMBER
            MODERATOR
            Level 31 - Number 9
            • Dec 2007
            • 3287

            #6
            Originally posted by BRBFBI
            I know people who use Chat GPT for advice nearly every day. I know people who talk to it like it's their therapist.

            That is truly disturbing.
            l i t t l e s t e p h e r s

            Note

            • LazyPooky
              ADMINISTRATOR
              Level 35 - Rockin' Poster
              • Oct 2007
              • 7178

              #7
              Last week there was an annual meeting in Davos: The World Economic Forum. There it was said that it's an AI battle between the USA and China, and the battle will be decided in the next two years. One of them will win. Once you are ahead, you can no longer be overtaken, because then you can also use AI to get further ahead. They thought the biggest chance was that America will win because of all the investments in AI. US: "we are going to invest a lot more and that will ensure a new golden age". That was before DeepSeek came along.

              Four things are needed to be able to win. You have to make sure you have the DATA. You have the CHIPS, used to make all the calculations. You have the MODELS that can do the thinking. You need a lot of ENERGY to make it possible for those chips to run. Because they use a lot of energy.

              Those are important things in the battle for global dominance. China probably wins on data because they are a bit smarter with data processing and have less to do with privacy. Computing power, chips, the quality of chips, the amount of chips, probably goes to the United States because of those Nvidia chips. The models probably goes the United States too. Energy is not entirely clear.

              Now one week later. The Americans are still going to invest lot of extra money in computing power and in the models. Although DeepSeek shows that you may need much less advanced computing power, and if you simply build on the models that are already there in the public domain (open source) then you can develop a new model much faster than anyone had previously thought. That is also much cheaper. This actually means that the thresholds to create these AI services, which are also very important for these companies to be able to make so much profit, are much lower than we first thought. That is of course a disadvantage for all those companies that have been working on this up to now. See how they lost a lot of money already.

              Then, perhaps no one will have that dominant position anymore. It's an advantage of the rest of the world. If you can suddenly make those models cheaper and easier and use less advanced chips, then other countries or continents might be able to make models themselves. Everyone can design models very quickly - Then the price will drop and we will make more use of them with opportunities for the rest of the world countries as well.

              The political significance of this might be even greater than the economic one. The US immediately gave a very clear message here: "America, we are going to do everything we can to become the winner in AI". If you want to prevent China from winning this race, then the United States must ensure that China cannot gain access to these chips and models. At all costs. Even if that is economically disadvantageous. It is protectionism. Although, some of the people in the government in the United States will say: "Let's not do any protection now, because we have already proven that it doesn't work." DeepSeek (China) somehow did have access to the chips. That debate is completely open in that respect.

              China will then take political and economic countermeasures. That could also be favorable for other countries and continents if the United States and China are fighting so hard politically/economically. I don't think it will come to that though but I do like where this is going.
              Magnús: - I have fans of all ages and I don't think it's weird when older people like LazyTown. LazyTown appeals to people for many different reasons: dancing, acrobatics, etc.

              Note

              • boredjedi
                Master
                SPECIAL MEMBER
                MODERATOR
                Level 35 - Rockin' Poster
                • Jun 2007
                • 7201

                #8
                Oppssss

                DeepSeek in Deep Trouble! (Internal database exposed)

                http://eighteenlightyearsago.ytmnd.com/

                Note

                • possessor
                  I like LazyTown.
                  SPECIAL MEMBER
                  Level 30 - Stepher
                  • Oct 2021
                  • 2925

                  #9
                  Someone ask the AI about what happened in June 1989

                  Note

                  • chuft
                    Stepher
                    SPECIAL MEMBER
                    MODERATOR
                    Level 31 - Number 9
                    • Dec 2007
                    • 3287

                    #10
                    As I mentioned earlier, DeepSeek was trained using tokens from OpenAI's AI. This is called "distillation." It cheated. It was not trained entirely on raw data. It essentially asked ChatGPT millions of questions and got the answers and used those questions and answers to train DeepSeek. Training it from scratch would be much more costly. What they did violated the Terms of Service for ChatGPT.

                    https://www.inc.com/ben-sherry/opena...-work/91140698


                    So I would not make the mistake of thinking the Chinese found some amazing efficiency. The efficiency in this case was cheating and stealing, and then lying about it, as is so often the case with what China does. Always be skeptical of what the Chinese say.

                    Distillation is normally used inside a company to train newer models using older ones, with full access because it is internal. The question facing US AI companies is how to make their AIs available for use without opening them to distillation from competitors.


                    The irony here of course is that all AI is based on stealing, namely other people's copyrighted works from the internet. So some people are snickering about it.

                    https://www.msn.com/en-us/money/othe...ai/ar-AA1y4W8Y


                    l i t t l e s t e p h e r s

                    Note

                    • LazyPooky
                      ADMINISTRATOR
                      Level 35 - Rockin' Poster
                      • Oct 2007
                      • 7178

                      #11
                      I read what you wrote about distillation, and I know what it is. I've just pointed out the days before and just after DeepSeek's introduction, before it really became clear how they got these models and data 'distillated'.

                      Nothing is safe on the internet and no one can be trusted. I learned that years ago.
                      Magnús: - I have fans of all ages and I don't think it's weird when older people like LazyTown. LazyTown appeals to people for many different reasons: dancing, acrobatics, etc.

                      Note

                      • chuft
                        Stepher
                        SPECIAL MEMBER
                        MODERATOR
                        Level 31 - Number 9
                        • Dec 2007
                        • 3287

                        #12
                        Another interesting article.

                        https://darioamodei.com/on-deepseek-and-export-controls

                        It does not take distillation into account but discusses a lot of other things relating to this insane race to the cliff.
                        l i t t l e s t e p h e r s

                        Note

                        • LazyPooky
                          ADMINISTRATOR
                          Level 35 - Rockin' Poster
                          • Oct 2007
                          • 7178

                          #13
                          Originally posted by chuft
                          Another interesting article.

                          https://darioamodei.com/on-deepseek-and-export-controls

                          It does not take distillation into account but discusses a lot of other things relating to this insane race to the cliff.
                          It is good to see it from another angle. Interesting to read about shifting the curve. The article focused a lot on the export controls, which is a political decision. You can not determine in advance whether it works, as I described in my previous post there are also opponents who will say that it does not work. But according to this article the chips, mainly H800, are not illegally obtained so the discussion about export controls on those chips is not useful, and only applies to the chips that are not allowed to be exported to China. If done right, China will be stuck with the older H800 chips. I'm still hoping other countries and continents will join the 'race', because that would be a good opportunity now.

                          The more I read about it the more I think that the rumor about distillation has been spread mainly by OpenAI, Microsoft and The White House. So OpenAI, a company that has been obtaining large amounts of data from all of humankind largely in an unauthorized manner and, in some cases, in violation of the terms of service of those from whom they have been taking from, is now complaining that they have been robbed of their data. The story is ironic, the company is a hypocrite. 😋
                          Magnús: - I have fans of all ages and I don't think it's weird when older people like LazyTown. LazyTown appeals to people for many different reasons: dancing, acrobatics, etc.

                          Note

                          • chuft
                            Stepher
                            SPECIAL MEMBER
                            MODERATOR
                            Level 31 - Number 9
                            • Dec 2007
                            • 3287

                            #14
                            Well yes, if you recall I posted


                            Originally posted by chuft


                            The irony here of course is that all AI is based on stealing, namely other people's copyrighted works from the internet. So some people are snickering about it.


                            Oh Dear, Did Someone Steal Something From OpenAI?
                            l i t t l e s t e p h e r s

                            Note

                            • LazyPooky
                              ADMINISTRATOR
                              Level 35 - Rockin' Poster
                              • Oct 2007
                              • 7178

                              #15
                              Originally posted by chuft
                              Well yes, if you recall I posted
                              yes. I know you posted that. That doesn't mean you've claimed that, so I can't say it.
                              I made the sentence to use irony vs hypocrisy again, in the end.

                              Magnús: - I have fans of all ages and I don't think it's weird when older people like LazyTown. LazyTown appeals to people for many different reasons: dancing, acrobatics, etc.

                              Note

                              Related Topics

                              Collapse

                              Working...