|
THE GENETIC SEQUENCE BINARY FACTOR GROUPING ROUTINES |
|||||||||||||
|
These routines were written for the purpose of analysing raw data published by the Broad Institute www.broad.mit.edu by the title : MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia.The routine transforms the mean numerical scan values CEL file number column into a large contigious binary field that is sequentially divided into sets of 16-12 bits long arrays. These array bit-patterns are searched for max of reoccurences by left to right bit recombining (truncation) from 16 to 8 bit resulting in an optimized variable length resulting order array that is prooved for max of re-occurences also used in a compression routine gaining factor groups redundancy as high as possible. Field groups are outputed in A-P symbol files and tested for long composite repeating symbol chains and stored produce a series of proto-text definitions that represent the reocurences of the measured numerical values. |
|||||||||||||
|
GENE MAPS - GENOME SEQUENCE LISTS mouse.zip |
|||||||||||||
|
Genome (sub)sequences extracted with binary factor grouping from CEL file(s) published under the title:
Transformation from
commited progenitor to leukemia stem cells initiated by
MLL-AF9 Data>> The above listed raw data files scans gene text symbols dictionary at 23.11.2007
|
|||||||||||||
|
Genome (sub)sequences extracted with binary factor grouping from CEL file(s) published under the title:
Classification of Human Lung Carcinomas by mRNA Expression Profiling Reveals Distinct Adenocarcinoma Sub-classes Data(1)>> Data(2)>> |
|||||||||||||
|
Genome (sub)sequences extracted with binary factor grouping from CEL file(s) published under the title:
The molecular signature of mediastinal large B-cell lymphoma differs from that of other diffuse large B-cell lymphomas and shares features with classical Hodgkin lymphoma Data>> |
|||||||||||||
|
Genome (sub)sequences extracted with binary factor grouping from CEL file(s) published under the title:
MLL translocations specify a distinct gene expression
profile that distinguishes a unique leukemia. Data>> The above listed raw data files scans gene text symbols dictionary at 27.09.2007 |
|||||||||||||
|
Genome (sub)sequences extracted with binary factor grouping from CEL file(s) published under the title:
Gene Expression Correlates of Clinical Prostate Cancer Behavior Data>> |
|||||||||||||
|
Genome (sub)sequences extracted with binary factor grouping from CEL file(s) published under the title:
A zebrafish bmyb mutation causes genome instability and increased cancer susceptibility Data>> |
|||||||||||||
|
These
raw CEL data files were published by the reasons stated in the
published titles, yet here there were computed using this
paticular method that certainly differs. |
|||||||||||||
|
|
|||||||||||||
|
|
|||||||||||||
|
M67 |
M66 |
M64 |
M56 |
|---|---|---|---|
|
!3:1,1,0,1 ADBF ACCE ACFB AACJ !3:1,0,1,1 ADDF ADGC AJON AACC !3:1,1,0,1 ADED ACHN ACDP AACP !3:1,1,0,1 ADDC ACCH ABGB AACF !3:1,1,0,1 ADMD ACKN AELF AACL !3:1,1,0,1 ADOC ACHJ ABMG AACH !4:1,1,1,1 ADGL ACLC AJJL AACK !3:1,1,0,1 ADHC ACKK AGKN AACI !3:1,1,0,1 ADPI ACJH ABEK AACC !3:0,1,1,1 AFLD ACMK AJAD AACI |
!3:1,1,1,0 ADKP ACAH AJAP AAMJ !3:1,1,1,0 ADGL ACNO AJPL AAKP !3:1,1,0,1 ADOO ACIN AMDI AACI !3:1,1,0,1 ADNO ACBP ABCH AACH !3:1,1,0,1 ADPP ACGG ABPF AACK !3:1,1,0,1 ADEO ACLM ABCB AACN !3:1,1,0,1 ADAD ACAH ACJB AACL !4:1,1,1,1 ADNO ACAM AJLF AACO !3:1,0,1,1 ADGP ABGI AJGD AACF !3:1,1,0,1 ADMG ACHD ACNF AACD !3:1,1,0,1 ADFH ACIJ ACBI AACJ !3:0,1,1,1 AEDM ACPP AJJC AACM !3:1,1,0,1 ADAP ACCG ACJH AACA !3:1,0,1,1 ADPG AGKH AJHD AACA !3:1,1,0,1 ADED ACHD ACMB AACG !3:1,1,0,1 ADBB ACMB ABFP AACF !3:1,1,0,1 ADCN ACCC ABFI AACH !3:1,1,0,1 ADDD ACIJ ADFP AACK !3:1,1,0,1 ADDG ACNC ABOF AACC !3:1,1,1,0 ADEM ACOB AJPJ AAFD |
!4:1,1,1,1 ADEO ACOI AJNM AACK !3:0,1,1,1 AGLG ACCO AJFN AACG !3:1,1,0,1 ADFF ACMK ADCA AACN !3:1,1,0,1 ADAB ACJN ANLI AACF !3:1,1,0,1 ADLD ACIA ACPB AACB !3:1,0,1,1 ADFK AMFJ AJNK AACL !3:1,1,0,1 ADGN ACJG ALPJ AACL !3:1,0,1,1 ADGN ABKI AJGA AACE !3:1,1,0,1 ADEJ ACFC ACMO AACL !3:1,1,0,1 ADKP ACIO ADGL AACI !3:1,1,0,1 ADPO ACJK ACHL AACN !3:1,1,0,1 ADAK ACPL ADNP AACN !3:1,1,0,1 ADCP ACOG ACKG AACE !3:1,1,0,1 ADBD ACDN ABDF AACO !3:1,1,0,1 ADEK ACBJ ADMI AACK !3:1,1,0,1 ADLO ACEN ABNF AACN |
!3:1,1,0,1 ADFD ACAD AIGB AACH !3:1,1,0,1 ADDD ACOG ABDP AACK !4:1,1,1,1 ADGD ACGJ AJHG AACI !3:1,1,0,1 ADIP ACGH ACHL AACP !3:1,1,0,1 ADCH ACFJ ACAO AACE !3:1,1,0,1 ADGD ACGN ADBN AACL !3:1,1,0,1 ADLJ ACHM ABKG AACH !3:1,1,0,1 ADIP ACKB AEDE AACH !3:1,1,0,1 ADAO ACGL ABID AACJ !3:0,1,1,1 ACKK ACMB AJEF AACJ !3:1,1,1,0 ADFH ACFB AJAL AAHD !3:1,1,0,1 ADBE ACFL ADEL AACC !3:1,1,0,1 ADEB ACGI AICL AACH |

All symbolic number sequences searched have the significant 4-bit group positions at 1 1 1 2. Comparisons vs pattern sequence at those positions had produced the results. The left graph built by found values in M67
displayes factor matching positions group values.
Raw data CEL file CL2001030501AA.CEL published under:MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia was searched for number groups (x6) having factor values B B B B C C at positions 1 1 1 1 1 (0-3) having 38 repetitions for the number sequence ABCH ABBJ ABBB ABGP ACEP ACHF. The search produced the following table of CEL file row,column and measured max and mesured average values:
|
row |
column |
max |
average |
std |
|---|---|---|---|---|
|
2 29 34 67 139 185 218 239 272 413 500 592 623 639 10 61 109 131 141 487 576 14 69 103 147 302 494 568 265 267 636 62 104 215 224 |
401 401 401 401 401 401 401 401 401 401 401 401 401 401 402 402 402 402 402 402 402 403 403 403 403 403 403 403 404 404 404 405 405 405 405 |
1771 3718.3 1265.5 952.3 1082 1913.5 2409 1131 3865.8 1380 1138 1117.8 1411.3 1306 1523 870 1334.5 1718 3155.8 1111.5 1374 1248 1558 1627.8 1476.5 9283.3 1215.3 1113 1131.5 1028.5 1474.3 1182 995 942 1567 |
367.8 629.7 281 273 273.1 295.2 273.9 295.2 591.3 281.6 281.6 281.8 295.9 281.7 295.2 273.6 273.7 273.7 281.8 273.9 273.5 273.7 295.4 295.9 295.4 591.5 281.2 295.1 295.5 295 367.9 367.6 295.6 273.6 295.4 |
25 ABGP 367 20 ACHF 629 20 ABBJ 281 20 ABBB 273 25 ABBB 273 16 ABCH 295 25 ABBB 273 25 ABCH 295 20 ACEP 591 25 ABBJ 281 25 ABBJ 281 20 ABBJ 281 16 ABCH 295 25 ABBJ 281 25 ABCH 295 25 ABBB 273 20 ABBB 273 25 ABBB 273 20 ABBJ 281 16 ABBB 273 20 ABBB 273 20 ABBB 273 25 ABCH 295 20 ABCH 295 20 ABCH 295 16 ACEP 591 20 ABBJ 281 25 ABCH 295 20 ABCH 295 16 ABCH 295 20 ABGP 367 20 ABGP 367 25 ABCH 295 25 ABBB 273 25 ABCH 295 |
The rest of binary groups (eg ABEB ABEJ ABAG ABBP ACBO ACPN) and their values are:

|
A |
B |
C |
D |
E |
F |
|---|---|---|---|---|---|
|
389 327 402 402 296 321 321 257 271 271 313 373 301 355 267 258 295 338 317 257 277 295 295 279 499 261 261 260 397 425 275 290 334 317 420 333 421 286 |
270 313 291 291 312 329 329 398 396 396 389 505 266 388 262 276 281 273 263 272 403 339 339 289 499 354 354 425 437 421 369 290 295 367 264 378 480 360 |
341 465 282 282 302 262 262 308 266 266 290 361 319 357 259 367 273 362 262 286 275 322 322 412 267 401 401 433 449 293 320 294 285 337 278 314 290 364 |
501 393 302 302 291 287 287 283 274 274 363 257 405 261 315 381 367 278 369 268 354 284 432 306 429 457 457 299 291 421 261 267 295 267 379 345 263 283 |
645 645 549 549 544 542 542 545 623 623 737 529 612 665 585 587 591 587 645 595 745 667 569 514 569 555 555 533 633 615 627 535 537 619 516 561 658 613 |
562 663 578 578 612 765 765 527 738 738 673 557 596 678 578 631 629 568 513 586 600 601 667 701 653 693 693 734 662 580 687 521 609 540 731 698 645 555 |
These results display max column number values distribution for sequences extracted from CL2001031627AA.CEL (Classification of Human Lung Carcinomas by mRNA Expression Profiling Reveals Distinct Adenocarcinoma Sub-classes) extracted using the described routines.
|
mean column value 20 (chart of max measured values) |
mean column value 24 (chart of max measured values) |
|---|---|
|
|
|
|
mean column value 18 (chart of max measured values) |
mean column value 19 (chart of max measured values) |
|---|---|
|
|
|
Chosen (subsequent numbers and number groups) numbers are then populating the values of a large array as shown in the bellow listed (partial) results from CL2001031609AA.CEL published under the above stated title allowing gene identification.
|
row |
column |
max |
average |
std |
|---|---|---|---|---|
|
93 94 96 97 98 99 100 102 103 104 105 106 107 108 109 110 112 113 114 115 117 118 119 121 ... 228 229 230 231 232 233 234 235 236 237 238 240 241 242 243 ... |
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ... 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 ... |
174 175.8 142.8 124.8 158 146 169.3 146.3 105.3 162.5 159.3 151.5 140.5 145 107 152.5 344 218.8 223 141 160 154 153.3 135 ... 152.8 170 156 298.3 316.3 202.3 176.5 113.3 116 112.8 139 190 217 141.3 328 ... |
25.9 35.9 17.4 25.3 17.8 17.8 21 16.1 20.3 19.4 23.4 23.3 16.6 16.6 16.8 23.7 45.3 24.2 18.7 27.9 16 19.2 19.3 23.5 ... 23.3 24.1 19.5 25.3 21.7 20.7 21.6 20.9 17.3 17.5 22.3 30.7 37.9 16.8 61.5 ... |
20 AABJ 25 20 AACD 35 20 AABB 17 20 AABJ 25 16 AABB 17 20 AABB 17 20 AABF 21 20 AABA 16 20 AABE 20 16 AABD 19 20 AABH 23 20 AABH 23 20 AABA 16 25 AABA 16 25 AABA 16 20 AABH 23 25 AACN 45 20 AABI 24 25 AABC 18 25 AABL 27 25 AABA 16 25 AABD 19 20 AABD 19 25 AABH 23 ... 20 AABH 23 20 AABI 24 16 AABD 19 20 AABJ 25 20 AABF 21 16 AABE 20 20 AABF 21 20 AABE 20 16 AABB 17 20 AABB 17 25 AABG 22 25 AABO 30 25 AACF 37 20 AABA 16 25 AADN 61 ... |
This is a chart of sqare rot mean (average) distances between group members of mean column values (red line) vs their actual mesured mean column value (blue line), blue line values (partially) listed in the above table.
In this example the max of (average) distances between group members of mean column values reaches max average distance for number 17.
(Partial) grouping results from CL2001030501AA.CEL from MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia.
|
row |
column |
max |
average |
std |
|---|---|---|---|---|
|
336 337 338 339 340 341 342 344 345 346 347 348 ... 338 339 340 341 342 343 344 345 347 348 349 351 352 353 355 356 357 358 359 360 ... 326 327 328 329 330 331 332 333 334 335 336 337 338 340 341 342 343 344 345 346 347 348 350 351 352 353 353 354 356 357 358 359 360 361 362 363 364 365 366 367 369 ... |
3 3 3 3 3 3 3 3 3 3 3 3 ... 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 ... 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 ... |
923 1072 2108.3 1066 2148.3 2023.5 876 2498.8 1282 948 1358.8 1184 ... 3205.5 1143 3003 2536.3 984 967 1258.8 1354 1778.3 960 1026 1327 879 1132 979.5 852 996.5 964.3 955.3 973.8 ... 1528 1166.5 1010 1271.8 977 1069 923.5 995 878 888.8 1260 1002 891.8 1653 4049.8 1077.5 1093.5 1150.8 868.8 833 974.3 1161 1244 972.3 1077.8 845 845 1400 893.8 1290 1220 1121.8 1030 1166 1205 824 1164 1295.5 1384 1499 2106 ... |
257.3 203.9 403.3 211.2 300.7 326.4 264.9 397.9 409.8 191.6 347.7 246.6 ... 413.3 182.4 417.5 603.3 181.4 589.7 280.4 256.5 400.2 216.8 212.1 499.7 162.6 176.7 227.4 188.2 211.7 257.7 258.5 209.1 ... 316.7 293.3 173.2 260.6 417.7 226.1 193.9 204.2 219.8 200.6 257.4 278.7 231.8 339.9 445 264.5 227 221.3 201.1 217.6 254.4 232 226.5 176.7 280.3 311.2 311.2 294.5 270.2 318.3 290.4 320.7 214.6 243.7 231.1 169.9 249.2 298.1 276.7 350.7 307.5 ... |
25 ABAB 257 25 AAML 203 20 ABJD 403 25 AAND 211 20 ABCM 300 16 ABEG 326 20 ABAI 264 16 ABIN 397 20 ABJJ 409 20 AALP 191 16 ABFL 347 20 AAPG 246 ... 20 ABJN 413 25 AALG 182 25 ABKB 417 20 ACFL 603 25 AALF 181 25 ACEN 589 20 ABBI 280 25 ABAA 256 20 ABJA 400 25 AANI 216 25 AANE 212 25 ABPD 499 25 AAKC 162 16 AALA 176 20 AAOD 227 16 AALM 188 20 AAND 211 20 ABAB 257 16 ABAC 258 20 AANB 209 ... 25 ABDM 316 20 ABCF 293 25 AAKN 173 20 ABAE 260 25 ABKB 417 25 AAOC 226 20 AAMB 193 25 AAMM 204 25 AANL 219 20 AAMI 200 25 ABAB 257 25 ABBG 278 20 AAOH 231 20 ABFD 339 16 ABLN 445 20 ABAI 264 20 AAOD 227 16 AANN 221 20 AAMJ 201 20 AANJ 217 16 AAPO 254 20 AAOI 232 16 AAOC 226 20 AALA 176 20 ABBI 280 16 ABDH 311 16 ABDH 311 25 ABCG 294 20 ABAO 270 25 ABDO 318 25 ABCC 290 20 ABEA 320 25 AANG 214 25 AAPD 243 20 AAOH 231 25 AAKJ 169 25 AAPJ 249 20 ABCK 298 25 ABBE 276 25 ABFO 350 25 ABDD 307 ... |
This is a chart of sqare rot mean (average) distances between group members of mean column values (red line) vs their actual mesured mean column value (blue line), blue line values (partially) listed in the above table.
In this example the max of (average) distances between group members of mean column values reaches max average distance for number 257.
This is the (partial) screening sequence from CL2001030509AA.CEL (leukemia datasets)
|
column |
row |
max |
average |
group |
group |
group |
|---|---|---|---|---|---|---|
|
11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 |
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
702 673 728 620 641 631 643 708 562 651 669 689 533 552 634 712 765 706 794 769 |
136 AAII 143 AAIP 134 AAIG 132 AAIE 121 AAHJ 118 AAHG 148 AAJE 126 AAHO 142 AAIO 143 AAIP 139 AAIL 156 AAJM 75 AAEL 95 AAFP 121 AAHJ 160 AAKA 166 AAKG 96 AAGA 138 AAIK 123 AAHL |
AAII AAIP AAIG AAIE AAHJ AAHG AAJE AAHO AAIO AAIP AAIL AAJM AAKG AAGA AAIK AAHL |
AAII AAIP AAIG AAIE AAHJ |
AAII AAIP AAIG AAIE AAHJ AAHG |
Where averages and subgroups for the following group(s) are:
ABFN (349) ABGA (352) ABGB (353) ABGC (354) ABGD (355) ABGE (356) ABGF (357) ABGG (358) ABGH (359) ABGI (360) ABGJ (361) ABGL (363) ABGM (364) ABGN (365) ABGO (366) ABGP (367) ABHA (368) ABHC (370) ABHD (371) ABHE (372) ABHF (373) ABHG (374) ABHH (375) ABHI (376) ABHJ (377) ABHK (378) ABHL (379) ABHN (381) ABHO (382) ABHP (383) ABIA (384) ABIB (385) ABIC (386) ABID (387) ABIE (388) ABIF (389) ABIG (390) ABIH (391) ABII (392) ABIJ (393) ABIK (394) ABIL (395) ABIM (396) ABIN (397) ABIO (398) ABIP (399) ABJA (400) ABJB (401) ABJC (402) ABJD (403) ABJF (405) ABJG (406) ABJH (407) ABJI (408) ABJJ (409) ABJL (411) ABJM (412) ABJO (414)
ABEA (320) ABEB (321) ABEC (322) ABED (323) ABEE (324) ABEG (326) ABEH (327) ABEI (328) ABEJ (329) ABEK (330) ABEL (331) ABEN (333) ABEO (334) ABEP (335) ABFA (336) ABFB (337) ABFD (339) ABFE (340) ABFF (341) ABFH (343) ABFI (344) ABFJ (345) ABFK (346) ABFO (350)
ABJP (415) ABKA (416) ABKB (417) ABKC (418) ABKD (419) ABKE (420) ABKF (421) ABKG (422) ABKH (423) ABKI (424) ABKJ (425) ABKK (426) ABKN (429) ABKO (430) ABKP (431) ABLB (433) ABLD (435) ABLF (437) ABLG (438) ABLH (439)
ABCP (303) ABDA (304) ABDB (305) ABDE (308) ABDF (309) ABDG (310) ABDH (311) ABDI (312) ABDJ (313) ABDK (314) ABDL (315) ABDM (316) ABDN (317) ABDO (318) ABDP (319)
ABLI (440) ABLJ (441) ABLK (442) ABLL (443) ABLN (445) ABLO (446) ABLP (447) ABMA (448) ABMB (449) ABMC (450) ABMD (451) ABMF (453) ABMG (454) ABMH (455) ABMI (456) ABMJ (457)
ABCC (290) ABCD (291) ABCE (292) ABCG (294) ABCH (295) ABCI (296) ABCJ (297) ABCK (298) ABCL (299) ABCM (300) ABCN (301) ABCO (302)
ABMN (461) ABMP (463) ABNA (464) ABNB (465) ABNC (466) ABND (467) ABNG (470) ABNH (471)
ABBH (279) ABBI (280) ABBJ (281) ABBK (282) ABBL (283) ABBM (284) ABBN (285) ABBO (286) ABBP (287) ABCA (288) ABCB (289)
ABNJ (473) ABNK (474) ABNM (476) ABNN (477) ABOA (480) ABOB (481)
ABAN (269) ABAP (271) ABBA (272) ABBB (273) ABBC (274) ABBD (275) ABBE (276) ABBF (277) ABBG (278)
ABOF (485) ABOH (487) ABOI (488) ABOJ (489) ABOK (490) ABOL (491) ABOM (492) ABOO (494) ABOP (495)
ABAE (260) ABAF (261) ABAG (262) ABAH (263) ABAI (264) ABAJ (265) ABAK (266) ABAL (267) ABAM (268)
ABPD (499) ABPE (500) ABPG (502) ABPI (504) ABPJ (505)
ABAB (257) ABAB (257) ABAC (258) ABAD (259)
ABPK (506) ABPL (507) ABPM (508) ABPN (509) ABPP (511)
Where search through the scan CEL file values based on band limits and displayed as averaged (sub) group values CEL (CL2001030511AA.CEL) column values enables FASTA coding:
|
155 378 1024 139 AAIK 156 378 1368 147 AAJC 157 378 1836 486 ABOG 158 378 2340 260 ABAD 159 378 1355 209 AANB 160 378 1248 180 AALD 161 378 1770 1222 AEMD 164 378 2565 304 ABDE 165 378 1813 197 AAMH 166 378 2266 534 ACBG 167 378 4675 445 ABLF 168 378 2953 496 ABPC 169 378 1276 245 AAPH 170 378 2006 275 ABBH 171 378 3954 506 ABPI 172 378 8868 1101 AEEO 173 378 9304 1247 AEOB |
12 138-139 18 146-147 199 481-492 79 257-262 55 209-210 37 179-180 404 1212-1226 103 300-316 49 197-201 219 528-541 171 427-448 204 493-503 73 245-249 90 274-285 209 502-507 387 1098-1107 411 1245-1254 |
Where the following (sub)groups and their average values:
binary band memebers:6 : ABPJ ABPN ABPF ABPF ABPN ABPJ
binary band avg:5.050000e+02 upper limit:509 lower limit:501 binary band avg(int):505
binary band memebers:4 : ABOF ABON ABOE ABOH
binary band avg:4.872500e+02 upper limit:493 lower limit:484 binary band avg(int):487
binary band memebers:5 : ABML ABMK ABMH ABMJ ABMD
binary band avg:4.560000e+02 upper limit:459 lower limit:451 binary band avg(int):456
Where described by the following data row in the file MLL_AF9.gct from www.broad.mit.edu:
1415913_at ribosomal protein S13 455.171 505.366 457.024 500.844 487.858 496.07
To Borce Dzinleski
These routines were written by
Dzinleski Jasenko jasenko@unet.com.mk
who is the author of C/C++ based routines for encryption/decryption, large numbers operations, the 123SQL
database engine and the simplified mariaBasic interpreter which
are undergoing projects. This project is self-financing and any
contributions are welcomed.
This site resulted in years
long support from Borce & Dusica Dzinleski and Nada
Popstefanova and is devoted to them and especially to my daughter Maria
Dzinleska.The author is currently seeking for a developers job and this is his cv.
IMPLEMENTATION
RIFF(WAV) COMPRESSION (principia example)
This is a binary compression implementation on PDA recording device output file (16 bit, 8000 Hz, 128 kbps Wav). This functional example performs loosless wav compression on PDA recording file up to 350 sec gaining an average of 35%.
15.05.2008 VRM 1.0.0 Download File mar70.zip
BINARY COMPRESSION 77
This is a binary compression based on 2-byte long data binary shifting concatenation into dictionary entries that are left truncated (common in ASCII text files). Tested on large text files produces a fast average of 40%.
04.06.2008 VRM 1.1.0 Download File mar77.zip
BINARY COMPRESSION 79
This a binary compression based on right(low) bit truncation of 2-byte data into 8-bit dictionary entries also
performing routine princilpe used in the bellow listed routines. It performs fast and efficien data storage.
06.06.2008 VRM 1.1.15 Download File mar79.zip
Binary Compression 79 at Brothersoft.com
THE BINARY COMPRESSION ROUTINE
Binary
compression methods are widely used in communications, data storage and
numeric analysis. Exploring genetic complexity numeric
sequences employ such methods. Some of them are presented on
this site together with a command-line Win32
implementation(s) that demonstrates the capability of compression of
large ASCII data files and binary files and also slightly modified in
numeric data sequence analysis.
This binary compression method is based on numeric sequence
generated by the following binary formula as presented by the
C/C++ syntax:
#define op_7(x,y)(((x+y)^y)|(((x&y)!=0)?(x&y)/y:0))
. This numeric sequence represents all numbers from 0-255(8-bit) for
0-127(7-bit) arguments in an x-y matrix manner. When always x=y
and x:0-127 it results in all 8-bit odd numbers. When applied on a 2-byte data sequence
it results in 14 or less bits long index. Combined together with
one 1-bit substracting indicator it will allow compression. Using
these arguments as dictionary entries coded by hi/lo/length indicators
whose reocurring indexes are stored insted of the input data
allows gain of an average 30% compression in large ASCII text files.
This numeric sequence formula was generated by another
routine written for the purpose of exploring numeric sequences
generation.
This is an compression Win32 command-line tool based on binary compression. This example states the speed and efficiency of this static large ASCII files compression method.
Purchase
Binary Compression 1.3.3 released 04.09.2007 (Price 10$
,service Protexis.com)
04.09.2007 VRM 1.3.3 Download File mar.zip
THE BINARY FACTOR GROUPING COMPRESSION ROUTINE
This
compression example uses binary pattern indexing by 2-byte sequence bit truncation from 16-12 bits in order to
gain max of dictionary reoccurences. This compression method is
a compression gain vs unoptimized compression speed compromise.
This
example states the corectness of the genetic text complexity
display routine since its dictionary covers most of the numeric
sequences occurences. Yet this compression example is subject
of further development.
21.09.2007
VRM 1.4.0 Download
File mar73.zip
SECOND IMPLEMENTATION - Binary Text Compression
This is a fast and efficient compression example that executes fast input
data indexing and dictionary reoccurence search based on binary 4x4-bit long data samples. Indexed sequences are checked vs
variable data length buffer.
Thus this compression method gains speed
concerning strict 4x4(16) - bit long dictionary patterns. This routine is subject of
further development.
Purchase
Binary Text Compression 1.3.3 released 04.09.2007 (Price 10$
,service Protexis.com)
04.09.2007 VRM 1.3.3 Download File mar9.zip

THIRD IMPLEMENTATION - ASCII Text File Fast Sort/Indexing Routine
This
is a fast sorting/indexing example that builds a file position
sorting tree as a result of n-depth text file line byte sorting.
The sorted sequence tree may expand to further depth levels,
this routine uses default depth 6. It exibits fast sorting of a text file up to the size 100K lines/rows.
E.g.: C:\msort -f "War and Peace NT.txt"
30.10.2007 VRM 1.3.1 Download File msort3.zip

THE RANDOM KEYS DISTRIBUTION ENCRYPTION ROUTINE
This
is a strong encryption/decryption routine based on a 4 number keys random seed distribution hash.
The command line switches to encrypt are
E.g.: C:\r7 -a <key1 number> -b <key2 number> -c <key3 number> -d <key4 number> -e "filename.txt"
and the command line switches to decrypt are
E.g.: C:\r7 -a <key1 number> -b <key2 number> -c <key3 number> -d <key4 number> -f "filename.txt"
The 4 key numbers following the -a -b -c -d switches should have the values between 10000 and 99999. They are the entry seed values and are used instead of the common password protection method.
Cyphering strength is high due to use of hashed number table based on 4 function rundom number distribution.
This routine was written by the authors wish to try to improve message privacy while sent across the
networks. Division remainders distributions are tested in the following 4 ways for number choice :
|
1.1(min)... if(n=0||l==0){n=rs[l][1];continue;} if(minmv>rs[l][2]){minmv=rs[l][2];minl=l;} } } |
1.2(max)... if(n=0||l==0){n=rs[l][1];continue;} if(maxmv<rs[l][2]){maxmv=rs[l][2];maxl=l;} } } |
|---|---|
|
2.1(min)... if(n=0||l==0){n=rs[l][1];continue;} n=rs[l][1]; }else{n=0;} } |
2.2(max)... if(n=0||l==0){n=rs[l][1];continue;} n=rs[l][1]; }else{n=0;} } |
(1) Each of the entered key numbers resultant distribution series (3-133)*(3-7) according to these criteria are written in a 4 column table
#define op_A(w,x,y,z)(((((w&0x0000ffff)<<16)|x)&0xffff0000)|((((y&0x0000ffff)<<16)|z)&0x0000ffff))
(2) Each table is hashed according the bellow listed binary criteria
(3) The 4 resulting tables are then re-hashed using the same binary criteria.
#define op_B(w,x,y,z)(((((x&0x0000ffff)<<16)|w)&0xffff0000)|((((z&0x0000ffff)<<16)|y)&0x0000ffff))
#define op_E(w,x,y,z)(op_A(w,x,y,z)>op_B(w,x,y,z)?op_A(w,x,y,z):op_B(w,x,y,z))
One out of the 4 functions running inside this encryption was used in the
Game of life which is listed for download,
and it states the diversity of random number distributions produced. 18.12.2007 VRM 1.3.3
Download
File r7.zip
Try looping this encryption in the following way:
Step 1.C:\r7 -a <key1 number> -b <key2 number> -c <key3 number> -d <key4 number> -e "filename.txt"
Step 2.C:\r7 -a <key5 number> -b <key6 number> -c <key7 number> -d <key8 number> -e "previous_output.mar"
...
...
Step n.
and repeat it in the same manner n times until the desired security level is gained.
Random Keys Distribution Encryption at Brothersoft.com
MARIAHASH THE ENCRYPTION ROUTINE
This is a fast encryption routine using proprietary hashing method. Cyphering strength depends on a large hashing number and password length. Password text must be entered in a password.txt file and should have between 50 and 100 characters.This routine was written by the authors wish to try to improve message privacy while sent across the networks.
09.06.2007 VRM 1.3.0
Download
File 79923.zip

THE 123SQL DATABASE ENGINE
This is an undergoing project
aimed to construct a small portable SQL database engine for PDA's, and
this is a functional browsing engine that contains data and sample
browsing statements. Data may be imported together with table/column
creation. Typically the source data may be spredsheet column TAB delimited
export data. Database/table/column creation may be viewed in the included
code following the -c switch. Table names and column names and field byte
sizes should be specified, but column/field lengths my also vary in size
row by row. The engine performs SQL keyword/syntax checking using the
syntax/keywords list files included. Object names check and object
attributes read is performed in the system database files named
123SQL_db_1.mar and 123SQL_db_2.mar. Database structure allows multiple
object browsing. The sorting/searching routines require low machine
resources thus meeting most modern PDA specifications and their sources
were also published under different names. {select} {*|column_name|column_name_1,...column_name_n} {from} {table_name|table_name_1,...table_name_n} [where |[column_name=string_litteral|column_name>string_litteral|column_name<string_litteral] |[column_name>string_litteral
and column_name<string_litteral] |[column_name[>|<]string_litteral and
column_name=string_litteral] |[column_name=string_litteral or
column_name=string_litteral or column_name=string_litteral] |[column_name>string_litteral
and column_name<string_litteral and column_name=string_litteral] ] The MariaBasic Interpreter
... ... ... [if [if | (computation(s)|print)] [if [if |
(computation(s)|print)] [while [if | (computation(s)|print)]] [if | (computation(s)|print)] ] [if [if |
(computation(s)|print)] [for [if | (computation(s)|print)]] [if | (computation(s)|print)] ] [for [if |
(computation(s)|print)] [for [if | (computation(s)|print)]] [if | (computation(s)|print)] ]
This project was founded on
the authors' unique relational database engine structure design. The
123SQL engine requires the following command line syntax:
E.g.:
C:\910791 -d "Sample"
for attaching and browsing the included
database, where Sample is the database name included. When
E.g.:
C:\910791 -c "import_data_file.txt"
the engine will create a
database table and table columns as specified in the included create.txt
syntax and import the data from the file name specified after the -c
switch. Number of column definitions and TAB delimited fields must match,
if specified column length is greater than data length space justification
will occur. Supported SQL like data browsing syntax is :
For the purpose of
implementing database methods the mariaBASIC Interpreter was
developed and when embeded in the engine will allow storing basic syntax
like procedures into the database and executing more complex database and
computing tasks.This interpreter allows basic like syntax commands like
nesting, statement loops, and conditional executions. The ZIP archive
ready for download includes a few .txt files which are sample basic syntax
supported nesting example source procedures that executed with command
line stating: E.g.: C:\9901 -e "sample.txt".
These
(sample1...5.txt) example sources show the code structure neccessary to
supply the program execution and the supported routine code syntax is :
variable
declarations:
{
[varname$="literal"]|[varname%=number|0]|[varname&=number|0]|[varname#=number|0]
}
[if|computations|print]
...
[if
([varname1=varname2]|[varname1>varname2]|[varname1<varname2]|[varname1>=varname2]|[varname1<=varname2])
then
...
[if|computations|print]
...
[for varname1=varname2|number to number
[if|computations|print]
...
[while([varname1=varname2]|[varname1>varname2]|[varname1<varname2]|[varname1>=varname2]|[varname1<=varname2])
[if|computations|print]
...
...
nested
block statement(s):
{end}
This interpreter although functional is subject of further development and changes will
occur. This package does not include all BASIC builtin functions except
the standard ones and more are going to get implemented. MariaBasic, when
compiled for some PDA's compilers enables a simple but efficient
programming PDA tool.
Purchase
mariaBasic Interpreter 1.3.7 released 09.07.2007 (Price 10$,service Protexis.com)Here is a pair of routines written in mariaBasic:
rem
print " ",var1%;
if var6%=50 then
var7#=var1%/var2%
end if
var2%=var13%+1
end if
var10%=var10%+1
end if
var10%=var10%+1
end if
next var2%
print "."
end if
next var1%
rem
rem
rem mariaBasic Sample code
rem
rem example: simple prime check
rem
rem
var1%=0
var2%=0
var3%=0
var5%=0
var6%=0
var7#=0
var8&=0
var9%=0
var10%=0
var11%=0
var13%=299
print "start"
print " "
for var1%=100 to 299
var6%=var1%/2
var7#=var1%/2
var7#=var7#-var6%
var6%=100*var7#
var10%=0+0
for var2%=2 to 298
var8&=var1%/var2%
var7#=var7#-var8&
var8&=1000000*var7#
if var2%>=var1% then
if var8&>0 then
if var8&<0 then
var11%=var1%-2
if var10%=var11% then
print " "
print "end"
end
And here is a sample random generator code written in mariaBasic:
rem for var1%=3 to 133 for var2%=3 to 7 var11#=1000*var2%/var1% print " ",var52&; end if
var52&=-1*var52& end if next var2% next var1% end
rem
rem
rem mariaBasic interreter sample code
rem
rem example: simple random number generator
rem
rem
var1%=1
var2%=1
var3%=11111
var4%=0
var51&=0
var52&=0
var11#=0
var12#=0
var4%=var3%/100
var12#=var3%/var11#
var51&=var12#*var4%
var52&=var12#*var4%/1000
var52&=var52&*1000
var52&=var52&-var51&
var4%=var4%+1
if var52&>0 then
if var52&<0 then
print " ",var52&;
var4%=var3%/100
That is equivalent to the following C/C++ code:
//----------------------------------------------------- int i,j,k,n; for(j=3;j<inr;++j) v11=1000*j/i;
printf(" %d",v52); } v52*=-1; } } } return(0); }
//
// mariaRandom Generator
//
//
// copyright Dzinleski Jasenko 2007
//-----------------------------------------------------
#include <stdio.h>
#define seed 11111
#define outr 133
#define inr 7
#define rang 1000
int main()
{
double v11,v12;
long v51,v52;
printf("\n\n\nThe mariaRandom Generator\n");
printf("\nWritten by Dzinleski Jasenko July,2007\n");
printf("OS Win32 VRM 1.0.1\n\n");
k=seed/100;
for(i=3;i<outr;++i)
{
{
v12=seed/v11;
v51=v12*k;
v52=v12*k/rang;
v52=v52*rang;
v52=v52-v51;
++k;
if(v52>0)
{
if(v52<0)
{
printf(" %d",v52);
k=seed/100;
And here is a game of life using this but yet improoved random number generator
Download a game of life VRM 1.2.1 at 17.07.2007 .
Game of life executes via the following command line switches e.g.:r31 -s 31193 -g 50 where the number following -s is the random seed number (mostly over 10000) and the number following the -g switch is the number of generations produced (mostly over 5) .
Similar but more sofisticated random key seed number distribution is used in THE RANDOM KEYS DISTRIBUTION ENCRYPTION ROUTINE providing strong message file encryption.

And here is the same game of life using 100x100 cells that outputs the generations data in a graphics BMP file format.
Download a game of life VRM 1.3.1 at 17.07.2007
THE FAST (ASCII and Unicode) TEXT FILES SEARCH ROUTINE
This
is a fast text search routine that allows single (or quoted composite) string search throughout an
ASCII or Unicode text (text containing) file(s). Unicode search will also allow strings contatining mixtures of different Unicode table(s).
E.g.:
1. (ASCII search) msearch3 <ASCII_input_filname.txt> <search_string>
2. (Unicode search) msearch3 <Unicode_input_filname.txt>
(search string in Unicode file uarg.txt and search results in Unicode file ures.txt)
03.07.2008 VRM 1.1.1
Download File msearch3.zip
THE FAST ASCII TEXT FILES SEARCH ROUTINE
This
is a fast text search routine that allows multi string (up to
10 search strings containing one or more words within) search throughout an
ascii text file. So, each search string (quoted) may have one or more words. The -s switch allows any match, while the -e switch allows only exact match.
E.g.: C:\msearch -s(-e) "package install"+"media"+"component"
-f "FreeBSD Handbook.htm"
E.g.: C:\msearch -s(-e) "network devices installation" -f "FreeBSD Handbook.htm"
E.g.:C:\msearch -s(-e) "trodes in his hands" -f "book_sd.txt"
E.g.:C:\msearch -s(-e) "Bezukhov and Natasha"+"Buonaparte Napoleon"+"Pierre" -f "War_and_Peace_NT.txt"
The program output will display all results along with their line
number file positions, the unique and composite sentence search phrase
matches together with their total occurence count.
15.04.2008 VRM 1.3.3
Download File msearch.zip
Purchase
msearch 1.3.3 released 15.04.2008 (Price 10$ ,service
Protexis.com)
msearch at Brothersoft.com
THE ASCII TEXT FILES SENTENCE CONTEXT SEARCH ROUTINE
This
is a text file complex search routine that allows text search build on the context - sentence words concerning a given subject.
This search allows automated search criteria build depending on sentence words contents and user choice. Sentence words files
and their sentence links are built during the indexing phase for a given text file. After indexing, the routine will
display all sentences for a choosen sentence subject (as enlisted in the words file) and allow detailed context search and
all sentences display concerning the choosen context. 15.04.2008 VRM 1.3.0
Download File r113.zip
For the indexing type:E.g.: C:\r113 -i "War_and_Peace_NT.txt"
For the context search type:E.g.: C:\r113 -s(e) "Bagration" -f "War_and_Peace_NT.txt"
The -s switch enables any match search when d was choosen, and -e switch enables only exact word matching.
The included files contain the examples book already indexed. Typically the search word is a name, or a subject
that is beeing oftenly described and attributed in the book text. So after viewing/choosing the desired sentence/search combination
all text lines containing the choosen words will be displayed. Thus viewing book contents by desired subject details requires smaller
amount of time.
THE FONT IMAGE RECOGNITION ROUTINE
This
routine creates a vector shape sequence file (using -i switch) out of an 100x100 pixels 24 bit colour depth black and white image representing
a character truetype image (font) or character freehand drawing. Then using the -c switch the two index files derived from
two different images are compared and graphics matching result is displayed. 27.04.2007 VRM 1.0.1
Download File cr13.zip
For the indexing type:
E.g.: C:\cr13 -i "Drawing1.bmp" "Drawing1_Index.txt"
For the comparisson of two different index files type:
E.g.: C:\cr13 -c "Drawing1_Index.txt" "Drawing2_Index.txt"
At present the routine builds shape vectors on black/white bitmaps, it does not support different resolution nor colors/color depth.
But how does it work?
(1) indexing, creates vector txt file (that might be the meta character file) out of the bmp image file in the following manner:
- inverts the b/w file matrix (the way human eye sees it),
- searches for quadrants (10x10 pixels sized) with 40/60% b/w ratio, thus finding character image edges (up to 8 pairs in the same row),
- creates vectors out of each qadrant,
- shifts quadrants by (only) few pixels UP since bmp edges do not always REALLY represent character ID curves, repeating vector creation...
and
(2) comparison of two vector files:
- shifts back all X-axis values subtracting them by absolute minX value,
- computes curve angles out of each quadrant values,
- computes resultant angles out of quadrant pairs building most real character curves,
- compares the two vector files angle pairs,
- computes matching statistics.
This development is aimed for PDA users using easier ways for text input.
To Maria Dzinleska
THE ROUTINE THAT GENERATES THE PRIME NUMBERS KEY PAIR OUT OF THEIR PRODUCT
These routines were written during and for the www.rsa.com prime key numbers context that requires finding the exact prime numbers key pair out of a very large (256,512...1024... bits long) product number. The routines were written in java and use the BIGINTEGER java class in order to compute the prime key pair.The starting point routine finds a prime numbers key pair with product_number_bit_length/2 bit length that give sufficient accuracy (near as far as possible) to the product number, the more the precissenes the more the computing time to spend. So the loop that computes the suggested starting prime number pair is limited with the corresponding number of equal product-target significant digits. The remaining procedures consequently perform a very long (all 1's and trailing ZEROS) 111...*10^N substractions from the suggested key pair measuring the distance (difference) from the target product number by subsequent multiplication checks. At the divergency point found and at a certain precissenes (number of equal significant digits) a new key pair may be generated through the first routine. Than the process has to be repeated while gaining more and more equal product-target significant digits.
23.07.2006 Download File Welcome.zip
How do these computations compute a very similar or near prime key pair out of a large product key?
Exmining the bellow listed mariBasic code and its (partial) output shows
a few number products appearing at large division loop distances and having a 0000 period between decimal remainder values. Testing those (listed) numbers might proove that most of them are
prime numbers. Testing large (200 decimal or more) product keys in this way would take indefinite time. So, the WelcomeQ routine uses a substraction operation on a proposed prime keypair.
The routine that generates prime keypairs that have a given decimal target product number matching is based on a binary field seed number modification basing only on target maching numbers as matching loop starting point.
The substraction number (having the (decimal) value of eg 1111111111000000000000000) shifts the 1111111111 period to the right by approoving that this way truncated prime keypair product matches more and more
decimals to the target product number. Actually there are sets of prime kepairs obtaining a certain decimal matching.Usually it is necessay to switch between different pairs in order to increase the decimal matching of the product.
And that is the main iteration of this method sometimes requiring examining and rejecting large number of prime keypairs in order to gain one or more decimal matching more. Gaining a 100 decimals precisenes on a common PC computer thus would not be hard to achieve.
These computations generate prime keys having computable decimal matching gain or complete product number matching compared to a given huge product number.
Brief order and explanation of execution steps:
(1) generate 5 or more (depending on computing resources) decimal matching places vs known target number prime keypairs (number of generated pairs also depends on computing resources)
(2) start subtracting by a given number of decimal 1....1x10^X and multiplying each of primes in a keypair observing gain or loss in decimal matching at product number vs target number. Observe matching gain vs number of 1...1 and X in 10^X in the subtraction factor. Thus prime distribution at that number point becomes visible.
(3) choose a prime probe as a base for generating new sets (depending on computing resources) of prime keypairs gaining usually somewhat less decimal matching places at product number vs target number.
(4) iterate through the previos steps seeking a point at the prime distribution which indicates the existence of the absolute matching keypair.
var1%=0
var2%=1234567
var3%=0
var5%=2
var6%=0
var7#=0
var8&=0
var9%=0
var10%=10000
var191%=0
var111#=0
var19%=10000
var11%=17317
var123%=91127
var13%=13009
var145%=98017
var15%=12251
var162%=98327
var17%=33757
var3%=var2%/2
while(var5%<var3%)
var7#=var10%*var2%/var5%
var8&=var2%/var5%
var8&=var7#-var8&*var10%
if var8&=0 then
print "=";
print var5%;
print "@";
print var7#;
print " ",var191%
var191%=0+0
end if
var191%=var191%+1
var5%=var5%+1
wend
end
=205759@6.000063e+04 1
=205760@6.000034e+04 1
=205761@6.000005e+04 1
=246909@5.000089e+04 41148
=246910@5.000069e+04 1
=246911@5.000049e+04 1
=246912@5.000028e+04 1
=246913@5.000008e+04 1
=308635@4.000087e+04 61722
=308636@4.000075e+04 1
=308637@4.000062e+04 1
=308638@4.000049e+04 1
=308639@4.000036e+04 1
=308640@4.000023e+04 1
=308641@4.000010e+04 1
=411509@3.000097e+04 102868
=411510@3.000090e+04 1
=411511@3.000083e+04 1
=411512@3.000075e+04 1
=411513@3.000068e+04 1
=411514@3.000061e+04 1
=411515@3.000053e+04 1
=411516@3.000046e+04 1
=411517@3.000039e+04 1
=411518@3.000032e+04 1
=411519@3.000024e+04 1
=411520@3.000017e+04 1
=411521@3.000010e+04 1
=411522@3.000002e+04 1
Dzinleski Jasenko - jasenko@unet.com.mk
Mailing
Address:
+38922770296
Dositej Obradovik 15/8
1000
Skopje Republic of Macedonia
All
published data, executables and sources from this site
described above apply to GNU General Public License and can be
used, copied, sold, redistributed or used in any other way only
by written permission of Jasenko Dzinleski. Copyright (C)
from 2001 and later by Jasenko Dzinleski
This program is free software;
you can redistribute it and/or modify it under the terms of the
GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your
option) any later version.
This program is distributed in
the hope that it will be useful, but WITHOUT ANY WARRANTY;
without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License
for more details.
You should have received a copy of the
GNU General Public License along with this program; if not,
write to the Free Software Foundation, Inc., 51 Franklin
Street, Fifth Floor, Boston, MA 02110-1301, USA.