Converting Floating Ratios To Int
Solution 1:
float.as_integer_ratio
:
In [1064]: f = .5
In [1065]: f.as_integer_ratio()
Out[1065]: (1, 2)
Solution 2:
sorry not a python coder but here is general approach (not bounded to a lib or language):
definitions
so you got 2 (or
N
) floatsa,b
and want to have 2 integersaa,bb
such that:a/b == aa/bb
approach
float numbers are just integer mantissas shifted by integer exponent of base 2 to left (or right if negative exponent) so:
a = sign(a)*mantisa(a)*2^exponent(a) = sign(a)*(mantisa(a)<<exponent(a)) b = sign(b)*mantisa(b)*2^exponent(b) = sign(b)*(mantisa(b)<<exponent(b))
so if we shift both
a,b
numbers so the msb (most significant bit) of mantissa of the bigger magnitude number will go to msb of some integer variable you transformed thea,b
into integers without changing their ratio (unless some bits of mantissa are cut of due to smaller bit-width of target variable data-type). Its like multiplying the numbers with the same constant.extract exponents from
a,b
that can be simply done by directly extract the exponent bits a s integer number and substract bias from it to make it signed or by using
log2()
math function.compute
shift
we need to bit shift the mantissas of
a,b
byshift
bits or multiplya,b
by2^shift
so the bigger magnitude number will be biggest which still fits into integer variable. So if I assume32
bit signed integer we want that msb of the bigger magnitude number will be30
(bits are numbered from0
and we want leave the last bit as is so we can still apply sign).the computation is simple:
shift=max( exponent(a), exponent(b) ); shift=30-shift; // shift-=_f32_man_bits; // this is just in case of bit-shifting
bitshift or multiply the
a,b
and construct resultso simply convert
a,b
to integer as described in previous bullet. After that you can divide the resultign integers by their GCD or shift them right until lsb ofa
orb
is nonzero (remove trailing zeros).Here small example in binary:
exponent(b)=2 exponent(a)=-3 | | | 0.0010101110b <- a 100.01101b <- b -------------------------------------------------------------------------- _f32_man_bits = 23 // 32 bit float has 24 bit mantisa but first one is implicit shift = 30 - max(exponent(b),exponent(a)) = 30 - 2 = 28 -------------------------------------------------------------------------- ????????????????????????????????.0000000000b <- 32 bit integer variable 00000010101110000000000000000000.0000000000b <- a * (1 << shift) = mantissa(a)|(1<<_f32_man_bits) << (shift + exponent(a) - _f32_man_bits) 01000110100000000000000000000000.0000000000b <- b * (1 << shift) = mantissa(b)|(1<<_f32_man_bits) << (shift + exponent(b) - _f32_man_bits) | msb is zero so sign can still be applied ...
Removing trailing zeros can be done like this:
// remove trailing zerosfor (;((aa|bb)&1)==0;) { aa>>=1; bb>>=1; }
the above example would change to:
0000001010111b 0100011010000b
The division by GCD can be done like this (after removing trailing zeros):
// divide by GCDfor (int d=3;(d<=aa)&&(d<=bb);d+=2) while ((aa%d)+(bb%d)==0) { aa/=d; bb/=d; }
At last apply sign.
Here C++ floating example (multiply):
voidf32_ratio0(int &aa,int &bb,float a,float b)// aa/bb = a/b{
// IEEE 754 constantsconst DWORD _f32_man_bits=23; // mantisa bits (without implicit one)// variablesint shift,d;
int expa,siga;
int expb,sigb;
// extract parts of a,b
siga=(a<0.0); a=fabs(a); sigb=(b<0.0); b=fabs(b);
expa=floor(log(a)/log(2.0)); expb=floor(log(b)/log(2.0));
// compute shift
shift=expa; if (shift<expb) shift=expb; // max(expa,expb)
shift=30-shift; // shift msb of bigger mantisa to 30th bit of integer// construct result
aa=float(a*pow(2.0,shift));
bb=float(b*pow(2.0,shift));
// remove trailing zerosfor (;((aa|bb)&1)==0;)
{
aa>>=1;
bb>>=1;
}
// divide by GCDfor (d=3;(d<=aa)&&(d<=bb);d+=2)
while ((aa%d)+(bb%d)==0)
{ aa/=d; bb/=d; }
// signif (siga) aa=-aa;
if (sigb) bb=-bb;
}
Here C++ integer example (shifting):
voidf32_ratio1(int &aa,int &bb,float a,float b)// aa/bb = a/b{
// IEEE 754 constantsconst DWORD _f32_sig =0x80000000; // signconst DWORD _f32_exp =0x7F800000; // exponentconst DWORD _f32_exp_sig=0x40000000; // exponent signconst DWORD _f32_exp_bia=0x3F800000; // exponent biasconst DWORD _f32_exp_lsb=0x00800000; // exponent LSBconst DWORD _f32_man =0x007FFFFF; // mantisaconst DWORD _f32_man_msb=0x00400000; // mantisa MSBconst DWORD _f32_man_bits=23; // mantisa bits (without implicit one)const DWORD _f32_exp_bias=127; // exponent bias// float bits accessunion
{
float f; // 32bit floating point
DWORD u; // 32 bit uint
} y;
// variablesint shift,d;
int mana,expa,siga;
int manb,expb,sigb;
// extract parts of a
y.f=a;
mana=(y.u&_f32_man)|_f32_exp_lsb;
expa=((y.u&_f32_exp)>>_f32_man_bits)-_f32_exp_bias;
siga=(y.u&_f32_sig);
// extract parts of b
y.f=b;
manb=(y.u&_f32_man)|_f32_exp_lsb;
expb=((y.u&_f32_exp)>>_f32_man_bits)-_f32_exp_bias;
sigb=(y.u&_f32_sig);
// compute shift
shift=expa; if (shift<expb) shift=expb; // max(expa,expb)
shift=(30-_f32_man_bits)-shift; // shift msb of bigger mantisa to 30th bit of integer// construct result
d=shift+expa; aa=mana; if (d<0) aa>>=-d; elseif (d>0) aa<<=d;
d=shift+expb; bb=manb; if (d<0) bb>>=-d; elseif (d>0) bb<<=d;
// remove trailing zerosfor (;((aa|bb)&1)==0;)
{
aa>>=1;
bb>>=1;
}
// divide by GCDfor (d=3;(d<=aa)&&(d<=bb);d+=2)
while ((aa%d)+(bb%d)==0)
{ aa/=d; bb/=d; }
// signif (siga) aa=-aa;
if (sigb) bb=-bb;
}
where DWORD
is any unsigned 32 bit data-type for example:
typedefunsigned __int32 DWORD;
The double
precision will be done in the same manner only the constants changes and 64bit
or 2x32bit
variables are needed to store the integer mantissas and results...
Accuracy is dependent on the relative distance of the exponents. If the numbers have too big difference the resulting numbers would not fit into target integers resulting in the smaller magnitude number converting to zero if:
abs( exponent(a) - exponent(b) ) >= 31
Again if bigger bit-widths are used for the integers the 31 will change accordingly ...
Now your examples:
// a b a/b 0.50000 / 1.00000 = 0.500000// floats// aa bb aa/bb 1 / 2 = 0.500000// ratio01 / 2 = 0.500000// ratio1// a b a/b 0.50000 / 0.60000 = 0.833333// floats// aa bb aa/bb 4194304 / 5033165 = 0.833333// ratio04194304 / 5033165 = 0.833333// ratio1
Note that 0.6
can not be represented by floats exactly hence big values of aa,bb
!!! To solve that you need to add rounding but for that you need to know the threshold that tels you what part of number to round... Without knowing the targeted range of floats or accuracy I can not implement this safely...
If you want to preserve ratio between more floats than simply add them to function.
Here floating C++ example for 3 variables:
voidf32_ratio0(int &aa,int &bb,int &cc,float a,float b,float c)// aa/bb/cc = a/b/c{
// IEEE 754 constantsconst DWORD _f32_man_bits=23; // mantisa bits (without implicit one)// variablesint shift,d;
int expa,siga;
int expb,sigb;
int expc,sigc;
// extract parts of a,b
siga=(a<0.0); a=fabs(a); sigb=(b<0.0); b=fabs(b); sigc=(c<0.0); c=fabs(c);
expa=floor(log(a)/log(2.0)); expb=floor(log(b)/log(2.0)); expc=floor(log(c)/log(2.0));
// compute shift
shift=expa; // max(expa,expb)if (shift<expb) shift=expb;
if (shift<expc) shift=expc;
shift=30-shift; // shift msb of bigger mantisa to 30th bit of integer// construct result
aa=float(a*pow(2.0,shift));
bb=float(b*pow(2.0,shift));
cc=float(c*pow(2.0,shift));
// remove trailing zerosfor (;((aa|bb|cc)&1)==0;)
{
aa>>=1;
bb>>=1;
cc>>=1;
}
// divide by GCDfor (d=3;(d<=aa)&&(d<=bb)&&(d<=cc);d+=2)
while ((aa%d)+(bb%d)+(cc%d)==0)
{ aa/=d; bb/=d; cc/=d; }
// signif (siga) aa=-aa;
if (sigb) bb=-bb;
if (sigc) cc=-cc;
}
and your example result:
// a b c0.50000 / 0.60000 / 1.00000// aa bb cc4194304 / 5033165 / 8388608
[Edit1] N
case algorithm
extract parts of the
N
floatsO(N)
so we have floats
a0,a1,a2,...,a(N-1)
and want integer exponentse0,e1,...
and mantissasm0,m1,...
and signss0,s1,...
. For 32 bit floats it would be (using // IEEE 754 constants from above examples):int i,m[N],e[N],s[N]; float a[N]={ ... your numbers here ... }; unsigned __int32 *u=(unsigned __int32*)a,i; for (i=0;i<N;i++) { m[i]=(u[i]&_f32_man)|_f32_exp_lsb; a[i]=((u[i]&_f32_exp)>>_f32_man_bits)-_f32_exp_bias; s[i]=(u[i]&_f32_sig); }
compute
shift
itsO(N)
so first compute max of
e[i]
O(N)
and than theshift
itselfO(1)
// shift = max(e[0...N-1])int shift; for (shift=e[0],i=1;i<N;i++) if (shift<e[i]) shift=e[i]; // shift shift=30-shift;
apply shift and construct result
O(N)
for (i=0;i<N;i++) { int d=shift+e[i]-_f32_man_bits; if (d<0) m[i]>>=-d; elseif (d>0) m[i]<<= d; if (s[i]) m[i]=-m[i]; }
the results are in
m[]
.
Solution 3:
The short answer, as heemayl mentions, is to use .as_integer_ratio()
.
The problem with it is that sometimes, because of reasons regarding floating point precision, it returns big integers that although harder to parse by human eye are more precise.
For example:
>>> 0.25.as_integer_ratio()
(1, 4)
>>> 0.4.as_integer_ratio()
(3602879701896397, 9007199254740992)
Great but we know that 2/5
is also 0.4
. To get around simplifying, you can use the Fraction
built-in .limit_denominator() method.
In the case that you do NOT want to use the Fraction
built-in (you really should use it), I found that this works, although I am not sure if it's equivalent to using the Fraction
built-in.
>>> tuple(reversed((1 / 0.4).as_integer_ratio()))
(2, 5)
Post a Comment for "Converting Floating Ratios To Int"