I've found an interesting thread...
I'm going to share my thoughts here because I don't hear many reasons that are close to my opinion in terms of insights about AI (LLM) systems around me.
When ChatGPT was starting out with a bang, I was quite skeptical about AI systems (I still have my doubts). I understand the concepts and failures of “expert systems” in the 20th century. In addition, I understand the AI of that generation, that is, the realization by the perceptron network, and the basic structure, but I know the fact that it could not evolve.
It has been 30 to 40 years since then, but it is true that ChatGPT is blowing away the impression of that time. While thinking that it might be an illusion, I felt “something like consciousness” and felt surprise and a kind of fear. Well, if we kept on playing with it, we could see that it was still just a toy system...
Not long after that, GPT4 was released. The basic responses had improved a lot, and we were surprised and scared again. As for the fear, by this time I had a clear understanding that it was not so-called Skynet, and that it was undesirable for this to be the monopoly of a single company.
Of course, I also understand that it costs a lot of money to run an AI system (using a large number of ultra-expensive GPU systems for learning and operating a prompt system), but I was worried about the future because OpenAI had made such a good start.
There are also other aspects to this, but now I'd like to return to the technical side of things.
After GPT4 was released, people like me who were interested in the limits of this AI system and wanted to explore them began to appear. Among them, there was someone who was giving a very interesting session. This person is a mathematician and is well known in Japan for being able to write humorous documents.
https://gist.github.com/hyuki/5f6795852060fbf2ff6021b6856aa00a (in Japanese)
This talk is about how ChatGPT can provide insight into what is causing a contradiction when given a contradictory state due to a combination of graph structures. Watching this session, it seems like an ineffective AI system for a while, but the impact when it finds a contradiction is great.
I think this is not a partial copy of a paper from somewhere, and it is a result that cannot be obtained unless the inference is correctly guided. Watching this session, I couldn't help but feel the potential of AI technology. At the same time, I think that a kind of “prompt engineering” is also important here, where we have to carefully choose what we say during the session in order to get the desired result.
(This is also true when discussing with other humans.)
This is a bit of a tangent, but as a personal opinion on the future of AI systems: I think it will be faster for the world to be disrupted by unintentional information manipulation through AI systems simply being connected to social networking services before Skynet has a human-like ego... (Maybe that will happen soon)
Having said that, as I continued to use GPT4 myself, I eventually realized that it was also capable of answering questions based on a large amount of information sources with high accuracy, but would only give garbage-like answers to those that were not.
For example, I can understand and write C/C++/C#/F#/TypeScript. So, when I ask it to answer the same subject code in these languages, it outputs obviously random code in the case of F#.
Similarly, if I ask them to write code that is very narrowly constrained, such as old x86 boot code (loaded from the 512-byte boot sector), they will still produce code that needs a lot of reworking, and they will present it as if it were perfect. And the more I give them feedback on the code, the more quickly they forget about it and make the same mistakes again.
I got fed up with this, so when I got an undesirable answer, I would go back and ask the question again, and then have them answer it with a more conservative and careful content. Without even thinking about it, the information on “old boot code for x86” is so different from the information sources on “TypeScript” that I think this result is unavoidable. Even during the session, they kept mentioning old boot code for Linux.
More recently, the GPT4o1 model was released. This o1 model also surprised me in a different way, in that it appeared to be iterating (repeating) the inference process in a hierarchical manner.
The iterations appeared to be a process of checking for any inconsistencies before giving a wrong answer as much as possible. The interesting thing was that when it came to seeking answers to very complicated questions, it seemed that the process of generating an answer --> it's wrong --> generating an answer --> it's wrong, would repeat itself over and over again without ever coming to an end. Eventually, in the middle of this process, something like a 'fluctuation' would occur in the decision-making, and this would become the breakthrough that led to the true answer.
If each reasoning process could become 'smarter', it seemed that the level of practicality would increase. Moreover, this is similar to two or three people repeatedly discussing things, and it may be ideal for making advanced decisions. However, at the same time, if a certain level of context and consensus (?) is not shared, it seems that the exchange of conclusions will be meaningless. I think that these concerns have been resolved because this method has been adopted, but it is interesting.
I am interested in creating my own computer language, and I have been asking questions about the type system of the implementation and implementing it using the o1 model, but for relatively simple type systems, the implementation and corresponding test code also output valid results, including the validity of the theory (I have specified TypeScript just to be sure). However, when I gradually delve into complex type systems, I sometimes see the reason for the answer at a certain point.
The reason for this is that in the field of research into type systems, Haskell and OCaml are the mainstream languages used, and they are being pulled in the direction of the topics (i.e. papers) being discussed in these languages.
In computer language science, the superficial syntax is replaced by an abstract data structure called AST during the input process, and it deviates from the familiar syntax. However, the data structure ends up looking exactly like Haskell, and the flow of the type system discussion ends up concluding with a deep insight into the OCaml type system, and it easily moves away from the discussion of the “original” type system that I am thinking about.
For this reason, I have the feeling that ChatGPT and other AI systems are “quite” influenced by the amount and quality of information that was used as input for the learning. I feel that TypeScript is more fluent than F# in expressing results, and that it is necessary to make considerable efforts in terms of what we say in order to draw out discussions about theories that they have never heard of before.