Tau Development Update | September 2021 (Ep 34)
Transcript of the Tau monthly development update — September 2021 (Ep 34):
Thank you Fola and welcome everybody. On this month on the Agoras live side, Andrei has added the ability to search for public sessions and continues testing the service for getting ready for a final release fairly soon. He continued fixing a few minor issues and optimizations on the back end. Agoras live is getting really close to a potential release here. On the TML side, we’re also in release mode. We’re pretty much feature complete. We’re putting the final touches on all the features and we’re benchmarking and QA’ing all those features that will be put into TML 1.0.
In particular, Lucca continued working on the 2 CNF extraction. He revised the extraction function to extract two CNFs from BDDs while they’re being built and the ability to merge CNF’s back. Into BDDs while maintaining the canonicity of BDDs which is a very important property for BDDs. Murisi, as we talked about last month, did a great job, creating/implementing the proof tree extraction. This month he continued testing that and he optimized it and fixed a few bugs with it. He also added the flags to allow the user to specify what kind of proof they want.
He also implemented another major feature this month which is variable shifters and BDDs which are supposed to save a lot of memory usage for BDDs. Not necessarily to save execution time but save memory usage for BDDs. We went over that implementation and everybody both Ohad and Juan, reviewed his code on variable shifters. It’s being tested right now before inclusion in the TML 1.0 release. He also fixed the dead variable elimination in the presence of deletion rules which is one minor feature that he hadn’t gotten in the first implementation. He also updated the Eval predicate in the presence of proof of trees. Juan was very instrumental in assisting both Murisi and Umar in their tasks. But he also wrapped up the major implementation that he did with second order logic. He has also been running benchmarks for TML in preparation for the release and comparing them with the performance of other programming languages.
Finally, Tomas has been also testing the proof extraction feature that Murisi implemented last month. And he’s been working on integrating that into the TML IDE. He’s got some good ideas around that. He also continued his work on state blocks which should make TML easier to program with. And also, for the IDE, he’s working on making REPL work within the TML IDE, which makes the IDE itself much more useful to use. We’re hoping to get a TML 1.0 release for sure by the end of this year. But we’re very much concerned with performance and quality, basically the software. That’s our number one criteria not necessarily an expedient release.
I’d like to introduce you to a member of our team who’s been here But we haven’t been able to make him public. His name is Dr. Cramer and I’ll pass it over to Dr. Cramer introduced himself.
I have been participating in the meetings of the developer team now already for some months. One of the focus points recently has been trying to develop a logical language that users of the system can use in order to input information into the system. And the focus there was on the following problem. In this logical language, you should also be able, among many other things, to input information about how the system itself should function; how future versions of the system should function. It should be possible to make proposals about this, to debate proposed about this within the logical language of the system.
And one issue that we have encountered when trying to do this was that, well when you have a logical language that basically can talk about itself that’s what we can want here. It’s just a self-referential thing that within the language you can talk about how the system should work and the system itself uses that language. With this kind of self-referentiality there is a risk if one doesn’t set it up very carefully, that one allows for certain kinds of paradoxes that are kind of similar in nature to the famous liar paradox. Where one can say this sentence is false or the sentence is not true. And if you try to analyze it, if you think it’s true then you conclude it’s not true. If you think it’s not true you can do it, it’s true, so either way you get into a problem.
Basically, we have a bit of a similar issue about how we define the language that underlies the Tau system. Because it has to have certain capacities of self-reference in order for users to be able to talk about the Tau system within the Tau’s system. But at the same time, it has to either completely avoid the possibility of writing down paradoxical sentences or have some way of handling them in case people do write them down. We are still actually in the process of considering various options here on how to precisely handle this problem. There was a proposal by Ohad. There was a proposal by myself and well actually multiple proposals from each side. We’re constantly improving on this and have already got much more clarity on what precisely we require for the system. This has been one of these things where we realised we need to put quite a lot of careful thought into this in order to make this basic functioning of the system well defined.
The alternative we are looking into is of course functional programming and eventually a dependently typed infrastructure behind it. On the other side, as listeners here might know, I’m very much active in the development of an external theorem prover that’s also exploring concurrency and sequential theorem proving it’s a low base proof. I’ve made further progress here. In last month’s sessions, we discovered first a bug that was an interesting one. It was related to how to deal appropriately correctly with free variables in the Tableau.
So, the tableau reason I am building is actually a free variable tableau system. That means I have to take into account that I do maximum matchings on branches. Which I didn’t do in the previous version and that is fixed now. Today, we had a presentation in which I then showed the test example. We are looking at currently together also with the development group of the other provers in the TML side. I demonstrated the other models I can get. I also demonstrated how you can debug the tier improving systems in comparison to what the TPTP infrastructure provides namely a bunch of different tier improvement systems and model finders. In my theorem prover, for instance, I have provided an output that I can directly use as an input to those systems so that I can directly compare their responses to my own ones.
I have mentioned that this could be a general direction to set up a testing infrastructure for us. A big library of examples which runs over our own theorem provers and compares them usually against each other and in addition, comparing them with the external systems as available as TPTP. That could be interesting future work to automate that. What I do at the moment when debugging is often still by hand. I think we now have fruitful meetings in which we mutually inform each other about theorem proving processes and insights and the techniques there. I think this is much needed in order to early detect errors and less fruitful paths to go down.
To prevent let’s say erroneous programming from the beginning, communication here is very essential and I think with our weekly meetings we have a good setup there.
This month of September, we have been working on making changes to the arithmetic and basically non-relational raw terms for bit universe transformation. Previously, they were not being handled, now we have converted nearly all of the non-relation terms into relational terms in bit universe. So, that allows a concise, standard way and format of representing any raw program in bit universe. Secondly, we also added type checker for the fall terms with the first order logic terms.
We did have a type checker for the basic raw terms, but for structures like first order formulas that have raw terms at the leaf level. We needed to type check them as well. Having done so enables us to transform any raw first order formula into its corresponding bit universe transformation. We also added some basic tests for type checking and type inference. It was nice to have a standard representation for arithmetic terms in the bit universe. Now everything is actually a predicate there. There were also some discussions and design considerations of how to implement different options regarding bit universal publications for arithmetic terms and so on. How we transform and carry that information that we have at the type information level down to the BDDs in an efficient and concise way. We have different paths there and we’re just trying to figure out what is the best direction to take.
Finally, with Agoras live we reach the point where I am not adding any features, only a small feature of searching among public sessions. I concentrated all my efforts on finding and fixing all issues I may find. And on refactoring front-end and back-end code to be able to quickly find and resolve all issues that we plan to find with the quality assurance engineer, we’re looking to hire to help me on this side. I’m looking forward to the next month’s where I hopefully can along with this engineer wrap up the testing phase of Agoras live and release it to some beta testers prior to final release.
I made relevant progress on adding the second order logic feature. Where we are heading right now is to have this interval base representation for the second order variables or dual constraints-based representation for second order variables. They are roughly in place; we still have the issues to solve at the algebraic level and also on the implementation level, but the overall approach is at this point more or less confirmed. I think we’re on the right track there. We did some revision with the development team and Ohad quickly pointed out an aspect of the current implementation of current design that is not being considered. It relates to the the mutual constraint of second order variables where they have mutual dependencies.
That’s something I’m considering within the design, but I’m positive that we will also be able to handle that as well. The current implementation is still under design. It’s a very raw prototype, but we need to keep working on it. As I mentioned before, it is a possible path to the final solution and it’s not a major milestone that we accomplished this month. But at least it confirms much of what it has been said so far for the last couple of months. It is a decent way to attack the problem.
I have been testing TML on different aspects of first order logic and we got some interesting benchmarks related to batch transfers. These allow us to compare TML versus python implementation and C / C++ implementation. Showing some very interesting numbers for TML.
We observed that we were able to process a batch transfer operation faster than python. Of course, we are not faster than C but we were able to handle a data set with a database with 100 million entries. I think the most relevant points to highlight are the memory requirements for TML. While for C it’s a memory allocation of several hundreds of megabytes. In TML, we can represent that information in less than a megabyte. So, in terms of data compression TML is showing really powerful capabilities. This benchmark allowed us to identify some of our performance bottlenecks with regards to arithmetic features. So, I have been also shifting my attention into redesigning some very, let’s say deep details of TML in order to take advantage of our most advanced arithmetic features that should be able to really tackle this bottleneck in the short term.
We are focused on getting faster arithmetic in order to handle these huge databases in an efficient way.
This month I continued working on my implementation of the two CNF extraction. My last update I mentioned that I had laid out the main routines on how to do this. And during this month, I actually implemented it and I managed to finish the Merge function which essentially combines two CNFs coming from a high and a low note and a BDD node, combining them to the parent node so to speak. I was also able to finish the Add function which retrieves two CNFs from the BDDs during the builds of the BDD. Currently, I’m working on canonicity aspects when getting those specially represented BDD nodes in memory back to the BDD level because we have to preserve canonicity.
This has been a challenge and I’m making progress and hopefully I’ll get the feature finished within the next month. When this is done, we are looking into ways how we can make the best use of this abstraction and this will hopefully speed up TML even more and I’m very excited for this to come.
This month I improved the performance of proof trees by changing how they’re done internally. Now I’ve given some flags or some options to control the generation of proof trees. This will allow people to select whether they just want one tree, which is in fact one proof of how a package is derived. You can also ask for a forest which gives all the proofs. I also added some additional options about partial proofs where it tries to filter out some detail. Which might not be so desired by the user but you can also get a complete proof if you want the entire proof. Although it’s much larger and a bit harder to process.
I’ve also enabled negative facts to be proven. So, you can actually prove that this fact couldn’t have been derived at a particular point during the execution of the TML program. I have implemented variable shifting for BDDs and tested it and benchmarked it against the existing systemless implementation. This is still kind of in progress. I’ve given it off to the team and they’ve reviewed it and they’ve given me some feedback.
I hope to be addressing it during the coming month. But from what I’ve seen so far, it does reduce memory research and sometimes quite significantly like up to 30%. Although the performance does lag to different degrees, I’m sure that with further tuning I can also get the performance to improve on more cases. I’ve also prototyped a small resolution propositional theorem prover in TML. I’m using the proof of tree functionality which I was working on last month. Basically, it’s a prototype but essentially it just shows that you can use TML to prove various propositional theorems and using the proof of tree functionality will give you the sorts of rules that will be used to derive a particular contradiction.
I believe that this should be extendable to maybe quantify free formulas with predicates in them. Although, I’m seeing that if you’re trying to get the entire logic, you may run into some difficulties and some trade-offs. Which is similar to those encountered during implementation of eval. The other thing which I was working on this month was just fixing a bug with variable elimination. A bug when working on deletion rules but it’s now resolved. I was also updating eval to provide condensed outputs. I did this mostly by filtering out details which I believe were less relevant to users and just keeping the critical detail.
In doing this I was also able to yield shorter proof trees. I was able to yield shorter proof trees by simplifying the process by which eval works. Now it’s easy to kind of see how eval is deriving certain facts.
Aside from being involved in almost everything you hear here, I also continue with the implementation of the early parser which will replace the parser component in TML. Also, in the logical language that we are considering; to consider aspects of self-reference as explained before. How to turn those logics into programming languages or specification languages. The main missing component is how to treat side effects and also without breaking the nice logical opportunities that we managed to achieve.