Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Terence Tao makes progress on the Collatz conjecture (2019) (quantamagazine.org)
71 points by rzk on March 29, 2023 | hide | past | favorite | 60 comments


Related: Veritasium did a video about the Collatz (3x+1) conjecture

https://www.veritasium.com/videos/2021/7/30/the-simplest-mat...


> Tao used this weighting technique to prove that almost all Collatz starting values — 99% or more — eventually reach a value that is quite close to 1. This allowed him to draw conclusions along the lines of 99% of starting values greater than 1 quadrillion eventually reach a value below 200.

If they reach a value below 200 then they surely will reach 1 (since all sub 200 sequences have been brute forced). What am I missing?


I don't think it's correct as stated in the article.

Looking at the paper, it basically says: the orbit starting with N dips to f(N) for "most" N, where f is a function that goes to infinity.

So, you can't pick f(N) = 200, but you can at least pick an f(N) that goes to infinity really slowly--a lot more slowly than previous results (f(N) = ~N^0.7924 is mentioned as a previous result).


What you're missing is that "99% or more" is not 100%. Tao's technique does not preclude the existence of a small cycle of large numbers that never get below 200 (or rather, below 2^64, since we have tried all of those).


I am commenting on the fact that these two statements are equivalent:

* This allowed him to draw conclusions along the lines of 99% of starting values greater than 1 quadrillion eventually reach a value below 200.

* This allowed him to draw conclusions along the lines of 99% of starting values greater than 1 quadrillion eventually reach 1.

yet the author used the first which seems weird


The paper's "200" comes from plugging "1 quadrillion" into Terence Tao's new theorem directly, and not applying any other results. I think they wanted to communicate only the meaning of the theorem, not the context around it.


At some point I considered doing a computational search for cycles (perhaps using a SAT solver or something similar), but then I found this on Wikipedia:

> The length of a non-trivial cycle is known to be at least 186265759595.

This probably makes it very hard to find a cycle unless we know of some properties of the numbers involved in the cycle.


Common sense suggests that the length of the cycle can be greater than any chosen N.

The algorithm alternates between 2 steps: multiplication by 3 +1 followed by division by a power of 2 (equal to the number of trailing 0 bits; on average it divides by 4), etc. (So on average you multiply by 3, but divide by 4.)

This argument simultanesouly 1) motivates the claim of the conjecture 2) predicts that the cycle length is unlimited (but always finite).


That's neither common sense nor mathematically correct.


Take a random[1] odd[2] number and represent it in binary. Its lowest bit is 1. Multiply by 3. The first bit is 1, but the rest have a 50% chance of being 0 and a 50% chance of being 1[3]. Adding 1 (the rest of the 3n+1 step) will flip a sequence like [0111] to [1000]. The number will then be divided by 2^k, where k is the number of trailing zeroes. How many trailing zeroes do we expect? We always start with one. There's a 50% chance the next one will be 0, and if it is, another 50% chance the next one will be, etc. So the average is 1 + 1/2 + 1/4 + 1/8 + 1/16 + ... = 2 trailing zeroes. Dividing off two zeroes in binary is the same as dividing the number by 4.

So on average, we multiply a "random" odd number by 3, but divide it by 4.

So yes, it is "mathematically correct" in a very real sense. It's a bit weird that you felt the need to claim it's incorrect, without explaining why.

[1] Think of grouping the integers based on how many bits they take up. We're looking at patterns like [1----1] where each "-" has a 50/50 chance of being 0 or 1. For any given window size, this analysis holds, except for a small number of very pathological edge cases that become vanishingly unlikely for even modest window sizes.

[2] It's common to consider each step only applying to odd numbers, and always dividing until the result is odd again. It's clear that any analysis on this "simplified" version trivially extends to the evens as well.

[3] This shouldn't actually be taken as a given, since we've modified our original bit distribution by multiplying by 3, but it is true, which you'd probably agree with if you played around for a bit with multiplying random bits by 3. You can figure out the distributions of the resulting bits (include the carry bit as well). The highest bit can be wonky, but the probability that the run of zeroes reaches the highest bit vanishes as the window size grows.


What you are describing is the probability heuristic. That is mathematically correct. But what wasn't correct is that from this fact you can deduce that "the cycle length is unlimited (but always finite)". The whole point of the conjecture is to prove that only the trivial cycle exists. If you can deduce that the cycle length is finite but large, you've already disproven the conjecture.

Sorry for being terse in my original comment. I should have elaborated.


Your claim depends on the idea that the set of numbers {3, 9, 15, 21, 27, ...} = {3 * N | for all odd N} have an even distribution of 0s and 1s in their binary representation (ignoring the last digit, which is always a 1). But this claim does not seem like common sense, in fact it's not clear to me that it's true. The algebra suggests to me that the bias in favor of 1 bits scales logarithmically in favor of 1 bits (multiplying by 3 is the same as shifting the number to the left by 1 and adding the original number to it, since the original number had a bias in favor of 1 bits, then that bias carries over when shifted to the left), and a simple Python script I wrote to test this hypothesis on the first billion integers shows just such a bias in favor of the 1 digits.

Neither of these are rigorous demonstrations but without some kind of evidence to indicate the contrary, or a proof, I think OP was right to state that this claim is neither common sense or mathematically correct.

Below is a script you can use to empirically test the bias for yourself (or identify a mistake that I've made):

      def count_bits(n):
          count = 0
          while n:
              count += n & 1
              n >>= 1
          return count

      def experiment(count):
        count_0_bits = 0
        count_1_bits = 0

        for i in range(1, count, 2):
            num = 3 * i
            num >>= 1 # Remove the least significant bit
            count_1_bits += count_bits(num)
            count_0_bits += num.bit_length() - count_bits(num)

        print(f"count: {count}")
        print(f"0 bits: {count_0_bits} {count_0_bits / (count_0_bits + count_1_bits)}")
        print(f"1 bits: {count_1_bits} {count_1_bits / (count_0_bits + count_1_bits)}")
        print("")


It's pointless to argue about what is or is not "common sense", which is why I didn't mention it. I'm only concerned about whether it's correct.

The obvious flaw I see in your code is that you're starting from 1 and not ignoring the topmost bit, which is also always 1. We know the Collatz conjecture is true up to something like 10^20, which is over 2^66. We've always only been talking about "sufficiently large" numbers. The probability of the top bit affecting the outcome of one step (3n+1, then divide until odd) is (1/2)^66; completely negligible. So starting at 1 will give you a clearly biased answer.

But you're right it needs to be proven. Here's a brief sketch:

To multiply a number N by 3, we can bit-shift once to get 2N, and add it back to the original.

        N =  1001101
       2N = 10011010
          + ________
       3N = 11100111
    carry = 0011000
Consider a bit somewhere in the middle of a number. It's 0 (p = 0.5) or 1 (p = 0.5). To multiply it by 3, we need to know the bit to its immediate right (to add, same distribution) and the carry bit (distribution not known yet). The 8 possibilities for (carry + bit + neighbor) = (nextCarry, result) are:

    (0 + 0 + 0) = (0, 0)  with p = 1/2 * 1/2 * p(carry = 0)
    (0 + 0 + 1) = (0, 1)  etc.
    (0 + 1 + 0) = (0, 1)  
    (0 + 1 + 1) = (1, 0)

    (1 + 0 + 0) = (0, 1)  with p = 1/2 * 1/2 * p(carry = 1)
    (1 + 0 + 1) = (1, 0)  etc.
    (1 + 1 + 0) = (1, 0)
    (1 + 1 + 1) = (1, 1)
Out of the top group of 4, there's a 50/50 chance the resulting bit is 0/1. Out of the bottom group of 4, there's a 50/50 chance the resulting bit is 0/1. So the distribution of the carry bit doesn't actually matter: the resulting bit has a 50% chance of being 1, and a 50% chance of being 0.

This makes some intuitive sense to me: writing a number in base B, then multiplying by any number M that is relatively prime to B, should "scramble" the digits: M is a generator of the integers modulo B. Write out the digits up to B-1 and multiply by M, you get { 0, 1M, 2M, 3M, ... } which will not repeat digits. The carry digit might complicate things for higher bases, but binary is simple enough that it doesn't matter. Keep in mind, the bias would have to be huge (something like 1.58:1) to overcome the difference between the x3 and /4.

> since the original number had a bias in favor of 1 bits, then that bias carries over when shifted to the left

After adding 1, the lowest 1-bit becomes a 0 (with a carry of 1, which we saw doesn't matter), so the result of 3n+1 is biased in favor of 0 on the low-end, which is why we end up dividing by 4 on average, and not 2.

The highest 1-bit does not matter. You seem to be thinking that a 10-bit number has a bias like "one extra '1' out of 10 bits", but it's much more like "a (1/2)^10 chance of an extra '1'". That bias very quickly becomes negligible for any of the numbers we're talking about.


I went down the same line of thought as you when this problem nerd sniped me 5 years ago (and periodically over the last 5 years). I actually now think you may be able to do a computer proof that there are no trivial cycles other than 1 with a clever construction of large integers that lets you consider them in chunks, but I don't think there's any chance of finding a cycle with computation absent some weird number theory.

Edit: The cycle length seems to have gotten longer since I last checked.


The last time I wasted some time on this I had a similar notion: "start big" with numbers on the order of 1 million bits and try and demonstrate that different sections of those bits (prefix,middle,suffix) change according to some pattern that can then be generalised "down" to just 64 bits, which has been manually tested.


I think the cycles are something of a distraction. Ultimately the problem comes down to this.

Given a number N with prime factors (p1, p2, …pn) And N+1 with prime factors (q1, q2, …qm)

What is the relation between (p1, p2, …pn) and (q1, q2, …qm)


I hate and fear the Collatz conjecture and Goldbach conjecture.

Unlike a lot of other unsolved math problems they are very simple to state and think about but then end up in a quagmire.

I have lost many nights of sleep and productive days getting nerd sniped by these problems.


There's lots more of these kinds of problems.

Fermat's Last Theorem used to be one of them.

Another one (that's actually solvable by mere mortals) is the 'firing squad synchronization problem':

> The name of the problem comes from an analogy with real-world firing squads: the goal is to design a system of rules according to which an officer can command an execution detail to fire so that its members fire their rifles simultaneously.

> More formally, the problem concerns cellular automata, arrays of finite state machines called "cells" arranged in a line, such that at each time step each machine transitions to a new state as a function of its previous state and the states of its two neighbors in the line. For the firing squad problem, the line consists of a finite number of cells, and the rule according to which each machine transitions to the next state should be the same for all of the cells interior to the line, but the transition functions of the two endpoints of the line are allowed to differ, as these two cells are each missing a neighbor on one of their two sides.

> The states of each cell include three distinct states: "active", "quiescent", and "firing", and the transition function must be such that a cell that is quiescent and whose neighbors are quiescent remains quiescent. Initially, at time t = 0, all states are quiescent except for the cell at the far left (the general), which is active. The goal is to design a set of states and a transition function such that, no matter how long the line of cells is, there exists a time t such that every cell transitions to the firing state at time t, and such that no cell belongs to the firing state prior to time t.

It's fun to work out.


John Conway showed a generalization of this problem is undecidable. I can't find a PDF but there's a blurb and citation on wikipedia: https://en.wikipedia.org/wiki/Collatz_conjecture#cite_ref-33

I don't know enough to say anything more intelligent about it, but I guess if you are reading through comments on this article you might find it interesting!


Any theories about massively large numbers and Collatz conjecture? Would Collatz conjecture reduce Graham's number to 1? Would these proofs get presented before a generic one?


I never thought one could actually do arithmetic on Graham-size numbers. If you didn't have the branch in Collatz, then it would be algorithmically computable. But then so would Collatz itself.


I don't think that's the issue. The Collatz conjecture has a non-branching form in binary (using weird operations), but that doesn't really help anything because the operations are hard to analyze as mathematical operators.

Here is the non-branching form:

0. Start with an odd number n

1. Apply n = 3n + 1

2. Count trailing zeroes of n

3. Right shift n by the trailing zero count

4. Return to (1), stopping at n = 1


I don't think 'non-branching' is something we can rigorously define.

I mean, lambda calculus doesn't have any build-in branching, but you can style write arbitrary programs in it. Same for 'post correspondence problem' or SKI calculus.


I agree with you completely. People get hung up on the fact that the conjecture has a "2-way" decision, but if you redefine the problem, there are a ton of ways to make it have no decision points. The one I posed is a simple mutation, but there are turing machines that compute the Collatz algorithm, you can turn it into a geometry problem, etc.

All of these can be defined as "branching" or "non-branching" however you want.


Yes.

'Branching' and 'non-branching' make sense in the context of specific architectures and/or languages. Just not really in an abstract algorithmic sense.


How do you "count trailing zeroes of n" without branching?


Here's how you can do a 'branchless' one, using Python syntax:

   f = lambda n: (n & 1) * n + ((n + 1) >> 1)
No need to count any trailing zeros.

Of course, the multiplication sort-of acts as a branch.


I don't know how you formulate it mathematically for unbounded numbers without branching. On a computer, there are a variety of ways (including a single assembly instruction), but pretty much all of them rely on n being bounded.

If you're trying to feed a SAT solver, you can bound the size of the numbers at some level, and then pass one of the closed-form formulas into the solver.


Yeah, it seems important that the formula support unbounded numbers, because the numbers can grow inside the sequence.

Naively, it seems like any starting value (except those for which you already know the full sequence) might lead to values that exceed any arbitrary bound.


From a theoretical standpoint, I think you could also attempt to approach this by thinking about a field where the "remove trailing zeros" step is built in. It may be a very weird field to do math on, and we may be unprepared for the math that is required, but it's a way to remove branching from the operations required for a proof.

There's nothing inherently special about the branchiness of "remove trailing zeros" as an operation if you drill down into it: addition, for example, also needs conditional execution (based on the carry in to each section of a number).

Otherwise, if you're searching for a counterexample, bound n by some ridiculous number of bits (whatever your supercomputer supports) and run your solver with a simple closed form.


If a computational process could not complete before the heat death of the universe, does it have an output?


Are you talking about an abstract mathematical process, or a specific physical embodiment?


> Intuition might suggest that the number you start with affects the number you end up with

End up with...? That's a weird way to phrase it. The algorithm can run infinitely. The number you start with does affect the sequence. They should have omitted that paragraph all together.


> The algorithm can run infinitely.

Really? Isn't the whole point of proving the conjecture attempting to answer that question? If you have an example of a starting integer that causes the algorithm to run infintely I'd be eager to hear it.


Ending up with an infinite repeating sequence and "running infinitely" are not the same thing


Just use a slightly different rule that stops when it reaches 1.


Has the acyclic unbounded case been ruled out?


Man, I think if you think this level of precision is hard to disambiguate I wonder how you handle the notational abuse common in Mathematics.

It might not be the field for you.


I have a proof, would love HN’s perspective.

A cycle means the product of all of the increases (each of which is a scaling factor of 3x+1 / x) equal the product of the decreases (a power of 2). 3x+1 / x = 3 + 1/x, so except for trivially small x, each increase is just a bit larger than 3. The product of those increases is a small amount larger than a power of 3, and has to exactly equal a power of 2.

Larger cycles allow that little error above a power of 3 to compound, but larger cycles also necessitate larger values of x, which shrink the error.


Looking at the alleged proof on your GitHub page (https://github.com/dankearney/collatz/blob/2dd21babd5fbced73...):

> As the cycles get longer, the nearest power of 2 actually trends further away. Why? https://mathoverflow.net/a/116960 explains this nicely -- powers of 2 and 3 cannot be close to one another as powers get larger. We need a power of 2 to be close to a power of 3, but that is not possible.

You misinterpreted the MathOverflow answer. Quoting its result, "At least half of the powers of 2, in a density sense, have a distance from the powers of three of at least one fifth of their size." Notice that it does not set any bound that applies to all powers of 2. Therefore, your assertion remains unproven.

For an explicit counterexample, let's take your n = 1000000001. Let e = 10439860591 and o = 6586818670. Then, 2^e ≈ 3^o + 2.796·10^3142711177, but (3 + 1/n)^o ≈ 3^o + 2.206·10^3142711189. Therefore, 3^o ≤ 2^e ≤ (3 + 1/n)^o, contrary to the strong form of your assertion.

Also, you have not proven the nonexistence of sequences which diverge to infinity.


Right. The theory that the parent came up with does seem like a good idea. However, you can't really expect to have been the first one to come up with this for a problem with as many iq-eyeball-hours as the collatz. Indeed, some googling suggests that this approach of matching powers of 2s and powers of 3s is one existing path for collatz adventurers: https://terrytao.wordpress.com/2011/08/25/the-collatz-conjec...

Although I don't understand it, a skim of that Terry Tao post suggests that if the statement about powers of 2 and 3 getting getting farther apart is actually true, it would be a big step towards proving the (weak) collatz.


The powers of 2 and 3 getting further away bit is not correct as you mention but it isn't actually all that important. My Github page is a WIP so bear with me..

If a cycle does exist for some large-ish cycle length, let’s say 100 odd numbers, then it means that (3+1/x1)(3+1/x2)…(3+1/x100) exactly equals some power of 2.

Since (3+1/xn) rapidly approaches 3.0, for any large x1, the product of (3+1/x1)(3+1/x2) etc is going to be pretty close to a power of 3, but it needs to be much larger than that to equal a power of 2, since powers of 2 and 3 are always somewhat far apart. That means that x1, x2 etc must be relatively small so that the product of the (3+1/x) can actually be substantially larger than a power of 3. That puts a hard floor on the possible cycle values, and that floor actually trends smaller as the cycle gets larger since the distance between a power of 3 and the nearest power of 2 also trends larger.


> since powers of 2 and 3 are always somewhat far apart.

This is the part where your argument fails. On a logarithmic scale, powers of 2 and powers of 3 can become arbitrarily close together.

To see this, notice that for any ε > 0, we can create a rational approximation x/y such that 0 < ln 2/ln 3 - x/y ≤ ε. Multiplying by y ln 3, we have 0 < y ln 2 - x ln 3 ≤ εy ln 3, or equivalently, x ln 3 < y ln 2 ≤ (x + εy) ln 3. Finally, we take the exponential to get 3^x < 2^y ≤ 3^(x + εy) = 3^(x(1 + εy/x)). As a consequence of Dirichlet's approximation theorem, εy/x can become arbitrarily small, so our upper bound 3^(x(1 + εy/x)) can become arbitrarily close to 3^x on a logarithmic scale.


Unfortunately, the true property of getting farther apart would have to be more sophisticated than the version the parent used. You can easily find powers of 2 and 3 arbitrarily close on the logarithmic scale by generating the convergents of log 2/log 3.


Got an example? From my exploration, the distance between a power of 3 and the nearest power of 3 tends to grow.


Did you not see the n = 1000000001, e = 10439860591, o = 6586818670 example in my earlier post? (I edited it in a while after initially posting it.)


Slight tangent: I wonder if there's a limit to how close they can get in an absolute sense (instead of in a logarithmic sense)?


Yes; see, e.g., the answer at https://mathoverflow.net/a/116852, adjacent to the one cited by the parent. This gives a hard lower bound of |2^x - 3^y| ≥ max(2^x, 3^y)/((e max(x, y))^821013301). (But it's not sufficient for the parent's argument, since (3 + 1/n₁)^o = 3^o·(1 + 1/(3n₁))^o grows larger than 3^o by an exponential factor, so we can always increase (e, o) until the interval is wide enough.)


That sounds like a heuristic, probabilistic argument for why we expect the conjecture to hold, but it doesn't seem like an airtight proof. (Not even a sketch of a proof.)


It’s not probabilistic — for any cycle length, you can compute the minimum possible x value that could possibly be the lowest odd number in the cycle. That value trends smaller as the cycle length grows.


Sorry, I still don't get it.

Could you explain it with fewer steps left out?

To be honest, I doubt that you will have come up with something that Terrence Tao and other geniuses haven't, but it's interesting to figure out exactly where you went wrong. (Or, on the off chance, seeing how you might be right after all.)


Yes and thanks for giving me the time of day!

1: Derive that, for a cycle with o odds and e evens, that the product of the increases due to odds (3 + 1/x) must exactly equal the product of the decreases by halving, which comes out to 2^e.

The trivial cycle meets this, since (3+1/1) = 2^2.

As another example, for the alternative rule where odd numbers go up by 3*n + 5, there are some cycles, like 19 => 62 => 31 => 98 => 49 => 152 => 76 => 38 => 19. The increases here come out to approximately 3.26, 3.16, and 3.10, the product of which is exactly 32, or 2^5.

2: Observe that each (3 + 1/x) term (except for trivially small x) is greater than 3, but near it. For x=101, (3+1/x) is about 3.01. As a result, the product of the increases is going to be larger than a product of 3, but somewhat close to one.

3: Observe that we need the little error above the power of 3 to be large enough that the LHS exactly equals a power of 2. This is challenging because as the numbers in the cycle get larger, (3 + 1/x) gets closer to 3.0, which makes it harder for the product to be much larger than a power of 3. And, for large powers of 3, the nearest power of 2 is never particularly close — we actually need this error to compound to a product substantially larger than the power of 3. We can attempt to achieve this with longer cycles for more compounding error, or smaller values of x.

4: Observe that, for a given number of odd numbers in a cycle, we can find an upper bound for the lowest value in the cycle. For example, let’s use o=5. We need (3+1/x1)(3+1/x2)(3+1/x3)(3+1/x4)(3+1/x5) = 2^e. 3^5 is 243, so the easiest e value is 8, making the RHS 256.

If we define x1 as the smallest number in the cycle, we can see that x1 can’t be too large. x1=1001 implies that x2, x3 etc are at least 1003 and 1005 etc. Plugging these in, the LHS is 243.4 — not near 256. And making x2, x3 etc larger only makes it worse.

Okay, so try a smaller x1 - you can compute that the largest possible x1 that can possibly make the LHS equal 256 is 31.

However we know that 31 and every number below it does not have a cycle, so we know there are no cycles of length 5.

We can actually define the RHS as the next power of 2 above 3^o, which is floor(log2(3^o))+1.

5: Repeat this process for larger cycle lengths and you see that the largest viable value does not trend larger, it trends smaller.


> harder for the product to be much larger than a power of 3

Unfortunately plenty of things that are hard to do still exist. Take primes for example: every time you pass another prime there are more primes that could be factors of larger numbers, so it must be harder and harder for large numbers to be prime. Somehow they still manage.

Edit: I just quickly ran your prod(3 + 1/x) for x in {1001,1003,...,1001 + 2n} for various values of n, and empirically they are indistinguishable from uniformly distributed between powers of two on a log scale. You're going to need a better argument than "it's hard to hit a power of two with a product of (3 + 1/x)".

Edit 2: (re your point 4) your floor(log2(3^n)) + 1 also seems to be an incorrect upper bound: many of the products I calculated for my first edit are above it.

Edit 3: Even if that upper bound did hold, your point 5 doesn't stand up to a quick test: I followed your method for loop lengths up to 1000, and the upper bounds on x_1 bounce around a lot but tend to grow.


Poor choice of words perhaps, but the further parts explain


See my edits for more substantive points, but more generally it is really useful to formalize your argument. Once you do that, it is much easier to see what parts of the argument are on a firm footing, and which ones are inadequately justified. Lots of plausible-sounding sentences about math turn out to be wrong.


Wow, thanks for the deep dive! This is so fascinating that someone would be willing to dive into my ideas. HN rules.

I clearly need a much more clear explanation of things — for example the floor(log2(( part is being misinterpreted here. Thanks for diving in and giving me pointers


That doesn't sound complete, are there steps you haven't written out? You've given some true facts about cycles. Why does your observation about Collatz cycles imply the Collatz conjecture, which is that every sequence is part of a cycle?


I’m proving that there are no cycles above the trivial cycle with this, and yeah this post is just a quick summary


It wasn't enough detail to spot the error in the proof.


I’m going to polish it up and will share with you, I explained to it eru in more detail




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: