Hacker Newsnew | past | comments | ask | show | jobs | submit | fantispug's commentslogin

Yes, this seems to be a common capability - Anthropic and Mistral have something very similar as do resellers like AWS Bedrock.

I guess it lets them better utilise their hardware in quiet times throughout the day. It's interesting they all picked 50% discount.


Inference throughout scales really well with larger batch sizes (at the cost of latency) due to rising arithmetic intensity and the fact that it's almost always memory BW limited.


Bedrock has a batch mode but only for claude 3.5 which is like one year old, which isn't very useful.


50% is my personal threshold for a discount going from not worth it to worth it.


Meta's primary business is capturing attention and selling some of that attention to advertisers. They do this by distributing content to users in a way that maximizes attention. Content is a complement to their content distribution system.

LLMs, along with image and video generation models, are generators of very dynamic, engaging and personalised content. If Open AI or anyone else wins a monopoly there it could be terrible for Meta's business. Commoditizing it with Llama, and at the same time building internal capability and a community for their LLMs, was solid strategy from Meta.


So, imagine a world where everyone but Meta has access to generative AI.

There's two products:

A) (Meta) Hey, here are all your family members and friends, you can keep up with them in our apps, message them, see what they're up to, etc...

B) (OpenAI and others) Hey, we generated some artificial friends for you, they will write messages to you everyday, almost like a real human! They also look like this (queue AI generated profile picture). We will post updates on the imaginary adventures we come up with, written by LLMs. We will simulate a whole existence around you, "age" like real humans, we might even get married between us and have imaginary babies. You could attend our virtual generated wedding online, using the latest technology, and you can send us gifts and money to celebrate these significant events.

And, presumably, people will prefer to use B?

MEGA lmao.


I find this style changes the way I think about and write code transformations. It's also in shell pipelines, R's magrittr, and Clojure's thread macros, and can be emulated in some OO languages with methods that return the transformed object itself.


> R's magrittr

These days, base R already includes a native pipe operator (and it is literally `|>`, rather than magrittr's `%>%`).


It also works with universal function call syntax line in Nim. Though aesthetically I prefer `|>` for multi-line expressions.


yeah, in lua some libs are used just like this, using the : syntatic sugar, something line

value = table:function1():function2():function3()


I have seen it work better than LSH.

Each time you embed a document you search for approximate nearest neighbours before adding it, so it is O(N) like MinHash. Vector indexes like HNSW and PQ have better performance/quality tradeoffs than SimHash LSH which is the analogue of MinHash for cosine distance.

The quality depends on what you mean by near duplicate and the embedding model you use. Current models work well, and if you have labelled data you can fine tune them to be better.

The main drawback is the additional cost of embedding all the documents, especially for longer documents. But this cost has dropped really quickly with smaller models, better optimisations, and faster hardware.


If this is true, then why do you think that — as the OP article states — the developers of GPT3 chose to use non-ML-based techniques to deduplicate their dataset, when they would be the most equipped to use ML-based approaches?

Just the pure compute cost of needing to run an ML encoder over petabytes of data?

Or maybe because for their use-case — eliminating redundancy to reduce total dataset size and therefore training time — a non-domain-specific vectorization with a high-false-negative cluster-discovery rate was acceptable, because it just meant they'd "compress" the dataset slightly less well, and so get slightly more training time? (At the expense of increased bias in training toward the saliency of the features that weren't dedup'ed out; but that was going to happen regardless, and they likely already had a fully-general technique later in the pipeline for countering that.)


It covers a lot of the fundamentals in some detail (attention and transformers, decoding, transfer learning) that are underneath current cutting edge NLP; this is still a very good foundation likely to be good for several more years.

What might be missing is in-context learning, prompt engineering, novel forms of attention, RLHF, and LoRA (though it covers adaptors), but this is still changing rapidly and the details may be irrelevant in another year. If you have a look at a recent course like Stanford CS224N 2023 there's a lot of overlap.


> Maybe my data structure has a hash table of items, as well as a direct link to the largest item. When I say: "write the function to insert a new item in the list, and remember to update the largest item if it is larger than the current one", would Copilot do the right thing? Each step is easy in itself (adding an element to a hash, comparing an item to another one).

In general it won't solve all your problems, but it's helpful for automating simple things like this (but you still need to test edge cases). With this prompt in Python (which I'm more familiar with):

  from dataclasses import dataclass
  from typing import TypeVar
  
  T = TypeVar('T')
  
  @dataclass
  class MaxDict:
      items: dict[T, float]
      max_value_item: T

      def add_item(
It completed:

    def add_item(self, item: T, value: float):
        if value > self.items[self.max_value_item]:
            self.max_value_item = item
        self.items[item] = value
This was my second attempt; first I called it `max_item` and the completion did something about comparing the key.


With the prompt

  // Print the fibonnaci sequence, except the number 5.
  int main() {
I got the result:

    int a = 0;
    int b = 1;
    int c = 0;
    while (c < 100) {
        c = a + b;
        a = b;
        b = c;
        if (c != 5) {
            cout << c << endl;
        }
    }
    return 0;


This happened in mathematics about a decade ago when Wolfram Alpha came out. Lecturers started complaining that Wolfram Alpha could solve assignment problems with worked steps.

In both cases I think this is a real opportunity; we can let students get more quickly to bigger problems and systems thinking by leveraging these tools. It requires professors to start thinking innovatively about how to teach and assess these subjects.


And it does. Kids use wolfram alpha for a lot of the first and second year math courses in college now.


These issues are often interactions between packages. I use TRAMP daily without freezing, but only when I disable company mode in shell modes. VSCode certainly requires much less of this tweaking, even with a variety of extensions installed.


The killer feature of racket is it is very easy to make Domain Specific Languages (including the teaching language) and related tooling. However last time I looked the library ecosystem didn't seem great; there were many libraries but few that were actively maintained.


One could also argue that DSLs are not always good esp. in large projects maintained by many devs.


I think the counterargument is that in large projects, you will end up with DSLs whether you meant to or not. The philosophy of language oriented programming (LOP), as I take it or understand it, is that since you'll end up with them anyway, why not approach and build DSLs explicitly.


Another word for DSLs is functions (and objects).

DSLs are just better integrated into the base language so your source code doesn't look like:

    three = two.add(one)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: