Hacker Newsnew | past | comments | ask | show | jobs | submit | more shahbaby's commentslogin

Seems like solutions looking for a problem.


Mad respect to the Germans for taking a stand on this.

The entire idea of measuring worker performance is not only dehumanizing but is particularly flawed when it comes to knowledge work.

It's like trying to determine the faster car by racing through rush hour traffic. Or ignoring the fact that each car is on a different incline.

Knowing the right people and being in the right place at the right time can often make or break one's career.

Yet we are to incorporate these factors that are for the most part not in our control as a measurement of our own "performance". The unlucky get insult added to injury. The lucky get a dose of ego or fear (depending on their level of self-awareness).

It seems like corporate gaslighting to me.


Counterpoint: some people are just better at their jobs than others


Counterpoint: some people are simply faking their results, selling them better than they are, or scarify necessary parts which will bite the company months or years later.

This makes it hard to impossible to get reliable, or even fair results. Especially when the review is subjective and is lacking an objective metric as a base.


Personally, I think the addiction aspect of AI is not worse than what's already out there, it only gets attention right now because it's new.


Yeah it kind of matters how you define AI. Tiktok and other short-forms videos are arguably AI addiction.


I tend to agree, with current products, at least the ones I've used. But companies developing AI products would do their investors as disservice if they did not tune their models to maximize engagement. We are in the honeymoon phase of some of these models, but there lie dark times ahead.


Fully agree, I've found that LLMs aren't good at tasks that require evaluation.

Think about it, if they were good at evaluation, you could remove all humans in the loop and have recursively self improving AGI.

Nice to see an article that makes a more concrete case.


Humans aren't good at validation either. We need tools, experiments, labs. Unproven ideas are a dime a dozen. Remember the hoopla about room temperature superconductivity? The real source of validation is external consequences.


Human experts set the benchmarks and LLM’s cannot match them in most (maybe any?) fields requiring sophisticated judgment.

They are very useful for some things, but sophisticated judgment is not one of them.


I think there's more nuance, and the way I read the article is more "beware of these shortcomings", instead of "aren't good". LLM-based evaluation can be good. Several models have by now been trained on previous-gen models used in filtering data and validating RLHF data (pairwise or even more advanced). LLama3 is a good example of this.

My take from this article is that there are plenty of gotchas along the way, and you need to be very careful in how you structure your data, and how you test your pipelines, and how you make sure your tests are keeping up with new models. But, like it or not, LLM based evaluation is here to stay. So explorations into this space are good, IMO.


Let's take a step back and think about how LLMs are trained.

Think about when chat gpt gives you the side-by-side answers and asks you to rate which is "better".

Now consider the consequence of this at scale with different humans with different needs all weighing in on what "better" looks like.

This is probably why LLM generated code tends to have excessive comments. Those comments would probably get it a higher rating but you as a developer may not want that. It also hints at why there's inconsistency in coding styles.

In my opinion, the most important skill for developers today is not in writing the code but in being able to critically evaluate it.


Always. Every interview exists to narrow down the applicant pool. Unless it's an automated interview, I would rather stay unemployed than take one of those.

The prep depends on the interview type. Most fall into 3 categories.

1. Algorithms - I remember a better time in life when I had the luxury to wait until I had an interview scheduled and only then would I have the motivation to grind. I'm now grinding everyday since it takes time to make meaningful improvements. Although a part of me enjoys these type of questions, it does make me question my career choice. I guess the one silver lining is that I'm getting much better at solving these questions than when I was employed.

2. System design - For this type of interview I've found that it's all about your ability to guess what type of system they'll want you to build and what parts they'll be interested in focusing on.

3. Behavioural - This actually requires the most company specific preparation. Refining your behavioural stories to match what the company is looking for and with who you're interviewing with (i.e Recruiter or C-suite level). Thinking of meaningful questions to ask. Practicing mock interviews. It all takes time.


Not sure what you mean by this. Remote devs are usually better by necessity.


I'll share a few.

1. Don't get tunnel visioned into the problem you're solving. Prioritize maintaining your professional relationships, especially with your manager, over all else. Prioritize what your manager cares about over whatever you think is more important.

2. Async-communication does not replace real-time communication. People cannot help but reveal more when you're communicating in real-time. This is crucial because of the next point.

3. Do NOT assume good intentions. Sadly, the tech industry is full of backstabbers across organizations of all size. If people have an issue with how you're doing something, no matter how small, no matter how trivial, expect that it'll go directly to your manager and be kept hidden from you.


I'm in the job search right now but I wrote my own little tool to scrape linkedIn.

It's a simple automation which searches through 10 pages of job postings and stores the results in a log file.

This alone is helpful because it helps me avoid getting distracted by all the non-job related aspects of linkedIn; click-bait headlines, cringe posters etc.

In addition it also filters out jobs that it has seen before. I found that the exact same job can be posted again with a different ID so I use a hash of the job description as the ID.

I also have filters for certain words like "7+" which I assume means 7+ years of experience which I don't have. It's not perfect but it works well enough. I hate reading a job posting which looks decent only to find hidden near the end that they want someone with over a decade of experience.

After a few days of using this tool consistently, I'll reach a saturation point which means that after going through the first 10 pages it will not find any new job.

That's not a bad thing. That's actually the goal. It means that when an actual new job is posted, my tool will help me see it through all the clutter.

The saturation point gives me a goal to reach every day. Instead of some arbitrary goal like sending out 10 applications or spending 1 hour every day, my goal is always to keep going until I reach the saturation point. This encourages me to be consistent and gives me a reasonable stopping point.


Honestly? I'm working on one and I'd be hesitant to share because it's such a competitive space. Finding the problem is not that hard but creating a defendable moat is.


You're right; execution and defensibility are much more important than merely having an idea. One thing I’ve learned is that a good moat often comes from distribution, community, or even data feedback loops more than tech itself.

I now focus less on what no one has ever done and more on what people have done, but can be improved for a specific group. Niching down creates natural defensibility.

I Would love to hear more about your approach to building a moat if you're ever up for a brainstorming session!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: