Mad respect to the Germans for taking a stand on this.
The entire idea of measuring worker performance is not only dehumanizing but is particularly flawed when it comes to knowledge work.
It's like trying to determine the faster car by racing through rush hour traffic. Or ignoring the fact that each car is on a different incline.
Knowing the right people and being in the right place at the right time can often make or break one's career.
Yet we are to incorporate these factors that are for the most part not in our control as a measurement of our own "performance". The unlucky get insult added to injury. The lucky get a dose of ego or fear (depending on their level of self-awareness).
Counterpoint: some people are simply faking their results, selling them better than they are, or scarify necessary parts which will bite the company months or years later.
This makes it hard to impossible to get reliable, or even fair results. Especially when the review is subjective and is lacking an objective metric as a base.
I tend to agree, with current products, at least the ones I've used. But companies developing AI products would do their investors as disservice if they did not tune their models to maximize engagement. We are in the honeymoon phase of some of these models, but there lie dark times ahead.
Humans aren't good at validation either. We need tools, experiments, labs. Unproven ideas are a dime a dozen. Remember the hoopla about room temperature superconductivity? The real source of validation is external consequences.
I think there's more nuance, and the way I read the article is more "beware of these shortcomings", instead of "aren't good". LLM-based evaluation can be good. Several models have by now been trained on previous-gen models used in filtering data and validating RLHF data (pairwise or even more advanced). LLama3 is a good example of this.
My take from this article is that there are plenty of gotchas along the way, and you need to be very careful in how you structure your data, and how you test your pipelines, and how you make sure your tests are keeping up with new models. But, like it or not, LLM based evaluation is here to stay. So explorations into this space are good, IMO.
Let's take a step back and think about how LLMs are trained.
Think about when chat gpt gives you the side-by-side answers and asks you to rate which is "better".
Now consider the consequence of this at scale with different humans with different needs all weighing in on what "better" looks like.
This is probably why LLM generated code tends to have excessive comments. Those comments would probably get it a higher rating but you as a developer may not want that. It also hints at why there's inconsistency in coding styles.
In my opinion, the most important skill for developers today is not in writing the code but in being able to critically evaluate it.
Always. Every interview exists to narrow down the applicant pool. Unless it's an automated interview, I would rather stay unemployed than take one of those.
The prep depends on the interview type. Most fall into 3 categories.
1. Algorithms - I remember a better time in life when I had the luxury to wait until I had an interview scheduled and only then would I have the motivation to grind. I'm now grinding everyday since it takes time to make meaningful improvements. Although a part of me enjoys these type of questions, it does make me question my career choice. I guess the one silver lining is that I'm getting much better at solving these questions than when I was employed.
2. System design - For this type of interview I've found that it's all about your ability to guess what type of system they'll want you to build and what parts they'll be interested in focusing on.
3. Behavioural - This actually requires the most company specific preparation. Refining your behavioural stories to match what the company is looking for and with who you're interviewing with (i.e Recruiter or C-suite level). Thinking of meaningful questions to ask. Practicing mock interviews. It all takes time.
1. Don't get tunnel visioned into the problem you're solving. Prioritize maintaining your professional relationships, especially with your manager, over all else. Prioritize what your manager cares about over whatever you think is more important.
2. Async-communication does not replace real-time communication. People cannot help but reveal more when you're communicating in real-time. This is crucial because of the next point.
3. Do NOT assume good intentions. Sadly, the tech industry is full of backstabbers across organizations of all size. If people have an issue with how you're doing something, no matter how small, no matter how trivial, expect that it'll go directly to your manager and be kept hidden from you.
I'm in the job search right now but I wrote my own little tool to scrape linkedIn.
It's a simple automation which searches through 10 pages of job postings and stores the results in a log file.
This alone is helpful because it helps me avoid getting distracted by all the non-job related aspects of linkedIn; click-bait headlines, cringe posters etc.
In addition it also filters out jobs that it has seen before. I found that the exact same job can be posted again with a different ID so I use a hash of the job description as the ID.
I also have filters for certain words like "7+" which I assume means 7+ years of experience which I don't have. It's not perfect but it works well enough. I hate reading a job posting which looks decent only to find hidden near the end that they want someone with over a decade of experience.
After a few days of using this tool consistently, I'll reach a saturation point which means that after going through the first 10 pages it will not find any new job.
That's not a bad thing. That's actually the goal. It means that when an actual new job is posted, my tool will help me see it through all the clutter.
The saturation point gives me a goal to reach every day. Instead of some arbitrary goal like sending out 10 applications or spending 1 hour every day, my goal is always to keep going until I reach the saturation point. This encourages me to be consistent and gives me a reasonable stopping point.
Honestly? I'm working on one and I'd be hesitant to share because it's such a competitive space. Finding the problem is not that hard but creating a defendable moat is.
You're right; execution and defensibility are much more important than merely having an idea. One thing I’ve learned is that a good moat often comes from distribution, community, or even data feedback loops more than tech itself.
I now focus less on what no one has ever done and more on what people have done, but can be improved for a specific group. Niching down creates natural defensibility.
I Would love to hear more about your approach to building a moat if you're ever up for a brainstorming session!